问题:Python 2和3之间的numpy数组的Pickle不兼容

我正在尝试使用此程序加载在Python 3.2中链接到此处的MNIST数据集:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

不幸的是,它给了我错误:

Traceback (most recent call last):
   File "mnist.py", line 7, in <module>
     train_set, valid_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

然后,我尝试在Python 2.7中解码腌制的文件,然后重新编码。因此,我在Python 2.7中运行了该程序:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f)

    # Printing out the three objects reveals that they are
    # all pairs containing numpy arrays.

    with gzip.open('mnistx.pkl.gz', 'wb') as g:
        pickle.dump(
            (train_set, valid_set, test_set),
            g,
            protocol=2)  # I also tried protocol 0.

它运行无误,因此我在Python 3.2中重新运行了该程序:

import pickle
import gzip
import numpy

# note the filename change
with gzip.open('mnistx.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

但是,它给了我与以前相同的错误。我该如何工作?


这是加载MNIST数据集的更好方法。

I am trying to load the MNIST dataset linked here in Python 3.2 using this program:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

Unfortunately, it gives me the error:

Traceback (most recent call last):
   File "mnist.py", line 7, in <module>
     train_set, valid_set, test_set = pickle.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x90 in position 614: ordinal not in range(128)

I then tried to decode the pickled file in Python 2.7, and re-encode it. So, I ran this program in Python 2.7:

import pickle
import gzip
import numpy


with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f)

    # Printing out the three objects reveals that they are
    # all pairs containing numpy arrays.

    with gzip.open('mnistx.pkl.gz', 'wb') as g:
        pickle.dump(
            (train_set, valid_set, test_set),
            g,
            protocol=2)  # I also tried protocol 0.

It ran without error, so I reran this program in Python 3.2:

import pickle
import gzip
import numpy

# note the filename change
with gzip.open('mnistx.pkl.gz', 'rb') as f:
    l = list(pickle.load(f))
    print(l)

However, it gave me the same error as before. How do I get this to work?


This is a better approach for loading the MNIST dataset.


回答 0

这似乎有点不兼容。它正在尝试加载一个假定为ASCII的“ binstring”对象,而在这种情况下,它是二进制数据。如果这是Python 3取消选取器中的错误,还是numpy对选取器的“滥用”,我不知道。

这是一种解决方法,但是我不知道此时数据的意义如何:

import pickle
import gzip
import numpy

with open('mnist.pkl', 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    p = u.load()
    print(p)

在Python 2中取消选择它,然后重新选择它只会再次导致相同的问题,因此您需要将其另存为另一种格式。

This seems like some sort of incompatibility. It’s trying to load a “binstring” object, which is assumed to be ASCII, while in this case it is binary data. If this is a bug in the Python 3 unpickler, or a “misuse” of the pickler by numpy, I don’t know.

Here is something of a workaround, but I don’t know how meaningful the data is at this point:

import pickle
import gzip
import numpy

with open('mnist.pkl', 'rb') as f:
    u = pickle._Unpickler(f)
    u.encoding = 'latin1'
    p = u.load()
    print(p)

Unpickling it in Python 2 and then repickling it is only going to create the same problem again, so you need to save it in another format.


回答 1

如果您在python3中遇到此错误,则可能是python 2和python 3之间的不兼容问题,对我来说,解决方案是load使用latin1编码:

pickle.load(file, encoding='latin1')

If you are getting this error in python3, then, it could be an incompatibility issue between python 2 and python 3, for me the solution was to load with latin1 encoding:

pickle.load(file, encoding='latin1')

回答 2

它似乎是Python 2和Python 3之间的不兼容问题。我尝试使用以下命令加载MNIST数据集:

    train_set, valid_set, test_set = pickle.load(file, encoding='iso-8859-1')

它适用于Python 3.5.2

It appears to be an incompatibility issue between Python 2 and Python 3. I tried loading the MNIST dataset with

    train_set, valid_set, test_set = pickle.load(file, encoding='iso-8859-1')

and it worked for Python 3.5.2


回答 3

由于迁移到unicode ,似乎在2.x和3.x之间的泡菜中存在一些兼容性问题。您的文件似乎已被python 2.x腌制,并且在3.x中对其进行解码可能很麻烦。

我建议用python 2.x将其解开,并保存为一种在您使用的两个版本中都能更好地播放的格式。

It looks like there are some compatablility issues in pickle between 2.x and 3.x due to the move to unicode. Your file appears to be pickled with python 2.x and decoding it in 3.x could be troublesome.

I’d suggest unpickling it with python 2.x and saving to a format that plays more nicely across the two versions you’re using.


回答 4

我只是偶然发现了这个片段。希望这有助于澄清兼容性问题。

import sys

with gzip.open('mnist.pkl.gz', 'rb') as f:
    if sys.version_info.major > 2:
        train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    else:
        train_set, valid_set, test_set = pickle.load(f)

I just stumbled upon this snippet. Hope this helps to clarify the compatibility issue.

import sys

with gzip.open('mnist.pkl.gz', 'rb') as f:
    if sys.version_info.major > 2:
        train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    else:
        train_set, valid_set, test_set = pickle.load(f)

回答 5

尝试:

l = list(pickle.load(f, encoding='bytes')) #if you are loading image data or 
l = list(pickle.load(f, encoding='latin1')) #if you are loading text data

pickle.load方法的文档中:

可选的关键字参数是fix_imports,编码和错误,用于控制对Python 2生成的pickle流的兼容性支持。

如果fix_imports为True,则pickle将尝试将旧的Python 2名称映射到Python 3中使用的新名称。

编码和错误告诉pickle如何解码Python 2腌制的8位字符串实例;它们分别默认为“ ASCII”和“ strict”。编码可以是“字节”,以将这些8位字符串实例读取为字节对象。

Try:

l = list(pickle.load(f, encoding='bytes')) #if you are loading image data or 
l = list(pickle.load(f, encoding='latin1')) #if you are loading text data

From the documentation of pickle.load method:

Optional keyword arguments are fix_imports, encoding and errors, which are used to control compatibility support for pickle stream generated by Python 2.

If fix_imports is True, pickle will try to map the old Python 2 names to the new names used in Python 3.

The encoding and errors tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The encoding can be ‘bytes’ to read these 8-bit string instances as bytes objects.


回答 6

有一个比泡菜快和容易的hi。我试图保存并在泡菜转储中阅读它,但是在阅读时有很多问题,浪费了一个小时,尽管我正在处理自己的数据以创建聊天机器人,但仍然找不到解决方案。

vec_x并且vec_y是numpy数组:

data=[vec_x,vec_y]
hkl.dump( data, 'new_data_file.hkl' )

然后,您只需阅读并执行以下操作:

data2 = hkl.load( 'new_data_file.hkl' )

There is hickle which is faster than pickle and easier. I tried to save and read it in pickle dump but while reading there were a lot of problems and wasted an hour and still didn’t find a solution though I was working on my own data to create a chatbot.

vec_x and vec_y are numpy arrays:

data=[vec_x,vec_y]
hkl.dump( data, 'new_data_file.hkl' )

Then you just read it and perform the operations:

data2 = hkl.load( 'new_data_file.hkl' )

声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。