On 2015-08-26 20:13, Endi Sukma Dewata wrote:
As discussed on IRC, in b64encode() there's a code that converts
Unicode
string data into ASCII:
if isinstance(data, six.text_type):
data = data.encode('ascii')
This conversion will not work if the string contains non-ASCII
characters, which limits the usage of this method.
It's not that Python 3's base64.b64encode() doesn't support ASCII text
as noted in the method description, but it cannot encode Unicode string
because Unicode doesn't have a binary representation unless it's encoded
first.
I think in this case the proper encoding for Unicode is UTF-8. So the
line should be changed to:
if isinstance(data, six.text_type):
data = data.encode('utf-8')
In b64decode(), the incoming data is a Unicode string containing the
base-64 encoding characters which are all ASCII, so data.encode('ascii')
will work, but to be more consistent it can also use data.encode('utf-8').
We discussed the ticket a couple of weeks ago on IRC. The function is
deliberately limited to ASCII only text in order to avoid encoding hell.
Python 3 tries to avoid encoding bugs by removing implicit encoding of
text and decoding of bytes.
The special treatment is only required for encoding/decoding X.509 data
in JSON strings for Python 3. Since it's a special case I changed the
patch. The additional two functions are now called decode_cert() and
encode_cert(). The functions are only used for X.509 PEM <-> DER in JSON.
Christian