Python 实用宝典

Question 1

I have some data that is base64 encoded that I want to convert back to binary even if there is a padding error in it. If I use

base64.decodestring(b64_string)

it raises an ‘Incorrect padding’ error. Is there another way?

UPDATE: Thanks for all the feedback. To be honest, all the methods mentioned sounded a bit hit and miss so I decided to try openssl. The following command worked a treat:

openssl enc -d -base64 -in b64string -out binary_data

Question 2

As said in other responses, there are various ways in which base64 data could be corrupted.

However, as Wikipedia says, removing the padding (the ‘=’ characters at the end of base64 encoded data) is “lossless”:

From a theoretical point of view, the padding character is not needed, since the number of missing bytes can be calculated from the number of Base64 digits.

So if this is really the only thing “wrong” with your base64 data, the padding can just be added back. I came up with this to be able to parse “data” URLs in WeasyPrint, some of which were base64 without padding:

import base64
import re

def decode_base64(data, altchars=b'+/'):
    """Decode base64, padding being optional.

    :param data: Base64 data as an ASCII byte string
    :returns: The decoded byte string.

    """
    data = re.sub(rb'[^a-zA-Z0-9%s]+' % altchars, b'', data)  # normalize
    missing_padding = len(data) % 4
    if missing_padding:
        data += b'='* (4 - missing_padding)
    return base64.b64decode(data, altchars)

Tests for this function: weasyprint/tests/test_css.py#L68

Question 3

Just add padding as required. Heed Michael’s warning, however.

b64_string += "=" * ((4 - len(b64_string) % 4) % 4) #ugh

Question 4

It seems you just need to add padding to your bytes before decoding. There are many other answers on this question, but I want to point out that (at least in Python 3.x) base64.b64decode will truncate any extra padding, provided there is enough in the first place.

So, something like: b'abc=' works just as well as b'abc==' (as does b'abc=====').

What this means is that you can just add the maximum number of padding characters that you would ever need—which is three (b'===')—and base64 will truncate any unnecessary ones.

This lets you write:

base64.b64decode(s + b'===')

which is simpler than:

base64.b64decode(s + b'=' * (-len(s) % 4))

Question 5

“Incorrect padding” can mean not only “missing padding” but also (believe it or not) “incorrect padding”.

If suggested “adding padding” methods don’t work, try removing some trailing bytes:

lens = len(strg)
lenx = lens - (lens % 4 if lens % 4 else 4)
try:
    result = base64.decodestring(strg[:lenx])
except etc

Update: Any fiddling around adding padding or removing possibly bad bytes from the end should be done AFTER removing any whitespace, otherwise length calculations will be upset.

It would be a good idea if you showed us a (short) sample of the data that you need to recover. Edit your question and copy/paste the result of print repr(sample).

Update 2: It is possible that the encoding has been done in an url-safe manner. If this is the case, you will be able to see minus and underscore characters in your data, and you should be able to decode it by using base64.b64decode(strg, '-_')

If you can’t see minus and underscore characters in your data, but can see plus and slash characters, then you have some other problem, and may need the add-padding or remove-cruft tricks.

If you can see none of minus, underscore, plus and slash in your data, then you need to determine the two alternate characters; they’ll be the ones that aren’t in [A-Za-z0-9]. Then you’ll need to experiment to see which order they need to be used in the 2nd arg of base64.b64decode()

Update 3: If your data is “company confidential”:
(a) you should say so up front
(b) we can explore other avenues in understanding the problem, which is highly likely to be related to what characters are used instead of + and / in the encoding alphabet, or by other formatting or extraneous characters.

One such avenue would be to examine what non-“standard” characters are in your data, e.g.

from collections import defaultdict
d = defaultdict(int)
import string
s = set(string.ascii_letters + string.digits)
for c in your_data:
   if c not in s:
      d[c] += 1
print d

Question 6

Use

string += '=' * (-len(string) % 4)  # restore stripped '='s

Credit goes to a comment somewhere here.

>>> import base64

>>> enc = base64.b64encode('1')

>>> enc
>>> 'MQ=='

>>> base64.b64decode(enc)
>>> '1'

>>> enc = enc.rstrip('=')

>>> enc
>>> 'MQ'

>>> base64.b64decode(enc)
...
TypeError: Incorrect padding

>>> base64.b64decode(enc + '=' * (-len(enc) % 4))
>>> '1'

>>>

Question 7

If there’s a padding error it probably means your string is corrupted; base64-encoded strings should have a multiple of four length. You can try adding the padding character (=) yourself to make the string a multiple of four, but it should already have that unless something is wrong

Question 8

Check the documentation of the data source you’re trying to decode. Is it possible that you meant to use base64.urlsafe_b64decode(s) instead of base64.b64decode(s)? That’s one reason you might have seen this error message.

Decode string s using a URL-safe alphabet, which substitutes – instead of + and _ instead of / in the standard Base64 alphabet.

This is for example the case for various Google APIs, like Google’s Identity Toolkit and Gmail payloads.

Question 9

Adding the padding is rather… fiddly. Here’s the function I wrote with the help of the comments in this thread as well as the wiki page for base64 (it’s surprisingly helpful) https://en.wikipedia.org/wiki/Base64#Padding.

import logging
import base64
def base64_decode(s):
    """Add missing padding to string and return the decoded base64 string."""
    log = logging.getLogger()
    s = str(s).strip()
    try:
        return base64.b64decode(s)
    except TypeError:
        padding = len(s) % 4
        if padding == 1:
            log.error("Invalid base64 string: {}".format(s))
            return ''
        elif padding == 2:
            s += b'=='
        elif padding == 3:
            s += b'='
        return base64.b64decode(s)

Question 10

You can simply use base64.urlsafe_b64decode(data) if you are trying to decode a web image. It will automatically take care of the padding.

Question 11

There are two ways to correct the input data described here, or, more specifically and in line with the OP, to make Python module base64’s b64decode method able to process the input data to something without raising an un-caught exception:

Append == to the end of the input data and call base64.b64decode(…)
If that raises an exception, then

i. Catch it via try/except,

ii. (R?)Strip any = characters from the input data (N.B. this may not be necessary),

iii. Append A== to the input data (A== through P== will work),

iv. Call base64.b64decode(…) with those A==-appended input data

The result from Item 1. or Item 2. above will yield the desired result.

Caveats

This does not guarantee the decoded result will be what was originally encoded, but it will (sometimes?) give the OP enough to work with:

Even with corruption I want to get back to the binary because I can still get some useful info from the ASN.1 stream”).

See What we know and Assumptions below.

TL;DR

From some quick tests of base64.b64decode(…)

it appears that it ignores non-[A-Za-z0-9+/] characters; that includes ignoring =s unless they are the last character(s) in a parsed group of four, in which case the =s terminate the decoding (a=b=c=d= gives the same result as abc=, and a==b==c== gives the same result as ab==).
It also appears that all characters appended are ignored after the point where base64.b64decode(…) terminates decoding e.g. from an = as the fourth in a group.

As noted in several comments above, there are either zero, or one, or two, =s of padding required at the end of input data for when the [number of parsed characters to that point modulo 4] value is 0, or 3, or 2, respectively. So, from items 3. and 4. above, appending two or more =s to the input data will correct any [Incorrect padding] problems in those cases.

HOWEVER, decoding cannot handle the case where the [total number of parsed characters modulo 4] is 1, because it takes a least two encoded characters to represent the first decoded byte in a group of three decoded bytes. In uncorrupted encoded input data, this [N modulo 4]=1 case never happens, but as the OP stated that characters may be missing, it could happen here. That is why simply appending =s will not always work, and why appending A== will work when appending == does not. N.B. Using [A] is all but arbitrary: it adds only cleared (zero) bits to the decoded, which may or not be correct, but then the object here is not correctness but completion by base64.b64decode(…) sans exceptions.

What we know from the OP and especially subsequent comments is

It is suspected that there are missing data (characters) in the Base64-encoded input data
The Base64 encoding uses the standard 64 place-values plus padding: A-Z; a-z; 0-9; +; /; = is padding. This is confirmed, or at least suggested, by the fact that openssl enc ... works.

Assumptions

The input data contain only 7-bit ASCII data
The only kind of corruption is missing encoded input data
The OP does not care about decoded output data at any point after that corresponding to any missing encoded input data

Github

Here is a wrapper to implement this solution:

https://github.com/drbitboy/missing_b64

Question 12

Incorrect padding error is caused because sometimes, metadata is also present in the encoded string If your string looks something like: ‘data:image/png;base64,…base 64 stuff….’ then you need to remove the first part before decoding it.

Say if you have image base64 encoded string, then try below snippet..

from PIL import Image
from io import BytesIO
from base64 import b64decode
imagestr = 'data:image/png;base64,...base 64 stuff....'
im = Image.open(BytesIO(b64decode(imagestr.split(',')[1])))
im.save("image.png")

Question 13

Simply add additional characters like “=” or any other and make it a multiple of 4 before you try decoding the target string value. Something like;

if len(value) % 4 != 0: #check if multiple of 4
    while len(value) % 4 != 0:
        value = value + "="
    req_str = base64.b64decode(value)
else:
    req_str = base64.b64decode(value)

Question 14

In case this error came from a web server: Try url encoding your post value. I was POSTing via “curl” and discovered I wasn’t url-encoding my base64 value so characters like “+” were not escaped so the web server url-decode logic automatically ran url-decode and converted + to spaces.

“+” is a valid base64 character and perhaps the only character which gets mangled by an unexpected url-decode.

Question 15

In my case I faced that error while parsing an email. I got the attachment as base64 string and extract it via re.search. Eventually there was a strange additional substring at the end.

dHJhaWxlcgo8PCAvU2l6ZSAxNSAvUm9vdCAxIDAgUiAvSW5mbyAyIDAgUgovSUQgWyhcMDAyXDMz
MHtPcFwyNTZbezU/VzheXDM0MXFcMzExKShcMDAyXDMzMHtPcFwyNTZbezU/VzheXDM0MXFcMzEx
KV0KPj4Kc3RhcnR4cmVmCjY3MDEKJSVFT0YK

--_=ic0008m4wtZ4TqBFd+sXC8--

When I deleted --_=ic0008m4wtZ4TqBFd+sXC8-- and strip the string then parsing was fixed up.

So my advise is make sure that you are decoding a correct base64 string.

Question 16

You should use

base64.b64decode(b64_string, ' /')

By default, the altchars are '+/'.

Question 17

I ran into this problem as well and nothing worked. I finally managed to find the solution which works for me. I had zipped content in base64 and this happened to 1 out of a million records…

This is a version of the solution suggested by Simon Sapin.

In case the padding is missing 3 then I remove the last 3 characters.

Instead of “0gA1RD5L/9AUGtH9MzAwAAA==”

We get “0gA1RD5L/9AUGtH9MzAwAA”

        missing_padding = len(data) % 4
        if missing_padding == 3:
            data = data[0:-3]
        elif missing_padding != 0:
            print ("Missing padding : " + str(missing_padding))
            data += '=' * (4 - missing_padding)
        data_decoded = base64.b64decode(data)

According to this answer Trailing As in base64 the reason is nulls. But I still have no idea why the encoder messes this up…

Question 18

I want to encode an image into a string using the base64 module. I’ve ran into a problem though. How do I specify the image I want to be encoded? I tried using the directory to the image, but that simply leads to the directory being encoded. I want the actual image file to be encoded.

EDIT

I tried this snippet:

with open("C:\Python26\seriph1.BMP", "rb") as f:
    data12 = f.read()
    UU = data12.encode("base64")
    UUU = base64.b64decode(UU)

    print UUU

    self.image = ImageTk.PhotoImage(Image.open(UUU))

but I get the following error:

Traceback (most recent call last):
  File "<string>", line 245, in run_nodebug
  File "C:\Python26\GUI1.2.9.py", line 473, in <module>
    app = simpleapp_tk(None)
  File "C:\Python26\GUI1.2.9.py", line 14, in __init__
    self.initialize()
  File "C:\Python26\GUI1.2.9.py", line 431, in initialize
    self.image = ImageTk.PhotoImage(Image.open(UUU))
  File "C:\Python26\lib\site-packages\PIL\Image.py", line 1952, in open
    fp = __builtin__.open(fp, "rb")
TypeError: file() argument 1 must be encoded string without NULL bytes, not str

What am I doing wrong?

Question 19

I’m not sure I understand your question. I assume you are doing something along the lines of:

import base64

with open("yourfile.ext", "rb") as image_file:
    encoded_string = base64.b64encode(image_file.read())

You have to open the file first of course, and read its contents – you cannot simply pass the path to the encode function.

Edit: Ok, here is an update after you have edited your original question.

First of all, remember to use raw strings (prefix the string with ‘r’) when using path delimiters on Windows, to prevent accidentally hitting an escape character. Second, PIL’s Image.open either accepts a filename, or a file-like (that is, the object has to provide read, seek and tell methods).

That being said, you can use cStringIO to create such an object from a memory buffer:

import cStringIO
import PIL.Image

# assume data contains your decoded image
file_like = cStringIO.StringIO(data)

img = PIL.Image.open(file_like)
img.show()

Question 20

With python 2.x, you can trivially encode using .encode:

with open("path/to/file.png", "rb") as f:
    data = f.read()
    print data.encode("base64")

Question 21

The first answer will print a string with prefix b’. That means your string will be like this b’your_string’ To solve this issue please add the following line of code.

encoded_string= base64.b64encode(img_file.read())
print(encoded_string.decode('utf-8'))

Reference: Image to Base64 – Python – originally written by me

Question 22

As I said in your previous question, there is no need to base64 encode the string, it will only make the program slower. Just use the repr

>>> with open("images/image.gif", "rb") as fin:
...  image_data=fin.read()
...
>>> with open("image.py","wb") as fout:
...  fout.write("image_data="+repr(image_data))
...

Now the image is stored as a variable called image_data in a file called image.py Start a fresh interpreter and import the image_data

>>> from image import image_data
>>>

Question 23

Borrowing from what Ivo van der Wijk and gnibbler have developed earlier, this is a dynamic solution

import cStringIO
import PIL.Image

image_data = None

def imagetopy(image, output_file):
    with open(image, 'rb') as fin:
        image_data = fin.read()

    with open(output_file, 'w') as fout:
        fout.write('image_data = '+ repr(image_data))

def pytoimage(pyfile):
    pymodule = __import__(pyfile)
    img = PIL.Image.open(cStringIO.StringIO(pymodule.image_data))
    img.show()

if __name__ == '__main__':
    imagetopy('spot.png', 'wishes.py')
    pytoimage('wishes')

You can then decide to compile the output image file with Cython to make it cool. With this method, you can bundle all your graphics into one module.

Question 24

import base64
from PIL import Image
from io import BytesIO

with open("image.jpg", "rb") as image_file:
    data = base64.b64encode(image_file.read())

im = Image.open(BytesIO(base64.b64decode(data)))
im.save('image1.png', 'PNG')

Python 实用宝典

标签归档：base64

Python：base64解码时忽略“错误填充”错误

问题：Python：base64解码时忽略“错误填充”错误

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

使用base64编码图像文件

问题：使用base64编码图像文件