On 9/21/2015 8:14 AM, Christian Heimes wrote:
On 2015-08-26 20:13, Endi Sukma Dewata wrote:
> As discussed on IRC, in b64encode() there's a code that converts Unicode
> string data into ASCII:
>
> if isinstance(data, six.text_type):
> data = data.encode('ascii')
>
> This conversion will not work if the string contains non-ASCII
> characters, which limits the usage of this method.
>
> It's not that Python 3's base64.b64encode() doesn't support ASCII text
> as noted in the method description, but it cannot encode Unicode string
> because Unicode doesn't have a binary representation unless it's encoded
> first.
>
> I think in this case the proper encoding for Unicode is UTF-8. So the
> line should be changed to:
>
> if isinstance(data, six.text_type):
> data = data.encode('utf-8')
>
> In b64decode(), the incoming data is a Unicode string containing the
> base-64 encoding characters which are all ASCII, so data.encode('ascii')
> will work, but to be more consistent it can also use data.encode('utf-8').
We discussed the ticket a couple of weeks ago on IRC. The function is
deliberately limited to ASCII only text in order to avoid encoding hell.
Python 3 tries to avoid encoding bugs by removing implicit encoding of
text and decoding of bytes.
The special treatment is only required for encoding/decoding X.509 data
in JSON strings for Python 3. Since it's a special case I changed the
patch. The additional two functions are now called decode_cert() and
encode_cert(). The functions are only used for X.509 PEM <-> DER in JSON.
Christian
ACK.
--
Endi S. Dewata