UnicodeDecodeError：“ utf8”编解码器无法解码位置0的字节0xa5：无效的起始字节

Question 1

我正在使用Python-2.6 CGI脚本，但在执行此操作时在服务器日志中发现了此错误json.dumps()，

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(__getdata())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

“这里

__getdata()函数返回dictionary {}。

张贴这个问题之前我已经提到这个问题，操作系统，所以的。

更新

下一行损害了JSON编码器，

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

我有一个临时解决方案

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

但是我不确定这样做是否正确。

Question 2

I am using Python-2.6 CGI scripts but found this error in server log while doing json.dumps(),

Traceback (most recent call last):
  File "/etc/mongodb/server/cgi-bin/getstats.py", line 135, in <module>
    print json.dumps(__getdata())
  File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
    return _default_encoder.encode(obj)
  File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
    chunks = self.iterencode(o, _one_shot=True)
  File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
    return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xa5 in position 0: invalid start byte

Here ,

__getdata() function returns dictionary {} .

Before posting this question I have referred this of question os SO.

UPDATES

Following line is hurting JSON encoder,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

But I am not sure is it correct way to do it.

Question 3

该错误是因为字典中存在一些非ASCII字符，并且无法对其进行编码/解码。避免此错误的一种简单方法是使用encode()如下功能对此类字符串进行编码（如果a字符串为非ascii字符）：

a.encode('utf-8').strip()

Question 4

The error is because there is some non-ascii character in the dictionary and it can’t be encoded/decoded. One simple way to avoid this error is to encode such strings with encode() function as follows (if a is the string with non-ascii character):

a.encode('utf-8').strip()

Question 5

我仅通过在read_csv()命令中定义其他编解码器包来切换此设置：

encoding = 'unicode_escape'

例如：

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

Question 6

I switched this simply by defining a different codec package in the read_csv() command:

encoding = 'unicode_escape'

Eg:

import pandas as pd
data = pd.read_csv(filename, encoding= 'unicode_escape')

Question 7

试试下面的代码片段：

with open(path, 'rb') as f:
  text = f.read()

Question 8

Try the below code snippet:

with open(path, 'rb') as f:
  text = f.read()

Question 9

您的字符串中包含一个非ascii字符编码。

utf-8如果您需要在代码中使用其他编码，则可能无法解码。例如：

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

在这种情况下，编码是windows-1252必须要做的：

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

现在Unicode，您可以安全地编码为了utf-8。

Question 10

Your string has a non ascii character encoded in it.

Not being able to decode with utf-8 may happen if you’ve needed to use other encodings in your code. For example:

>>> 'my weird character \x96'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0x96 in position 19: invalid start byte

In this case, the encoding is windows-1252 so you have to do:

>>> 'my weird character \x96'.decode('windows-1252')
u'my weird character \u2013'

Now that you have Unicode, you can safely encode into utf-8.

Question 11

阅读时csv，我添加了一种编码方法：

import pandas as pd
dataset = pd.read_csv('sample_data.csv', header= 0,
                        encoding= 'unicode_escape')

Question 12

On read csv, I added an encoding method:

import pandas as pd
dataset = pd.read_csv('sample_data.csv', header= 0,
                        encoding= 'unicode_escape')

Question 13

此解决方案为我工作：

import pandas as pd
data = pd.read_csv("training.csv", encoding = 'unicode_escape')

Question 14

This solution worked for me:

import pandas as pd
data = pd.read_csv("training.csv", encoding = 'unicode_escape')

Question 15

在代码顶部设置默认编码器

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

Question 16

Set default encoder at the top of your code

import sys
reload(sys)
sys.setdefaultencoding("ISO-8859-1")

Question 17

从2018-05开始decode，至少可以直接使用Python 3直接处理此问题。

我正在使用以下代码段输入invalid start byte和invalid continuation byte输入错误。添加errors='ignore'为我修复。

with open(out_file, 'rb') as f:
    for line in f:
        print(line.decode(errors='ignore'))

Question 18

As of 2018-05 this is handled directly with decode, at least for Python 3.

I’m using the below snippet for invalid start byte and invalid continuation byte type errors. Adding errors='ignore' fixed it for me.

with open(out_file, 'rb') as f:
    for line in f:
        print(line.decode(errors='ignore'))

Question 19

灵感来自@aaronpenne和@Soumyaansh

f = open("file.txt", "rb")
text = f.read().decode(errors='replace')

Question 20

Inspired by @aaronpenne and @Soumyaansh

f = open("file.txt", "rb")
text = f.read().decode(errors='replace')

Question 21

简单的解决方案：

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

Question 22

Simple Solution:

import pandas as pd
df = pd.read_csv('file_name.csv', engine='python')

Question 23

下一行损害了JSON编码器，

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

我有一个临时解决方案

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

将其标记为正确（作为临时解决方案）（不确定）。

Question 24

Following line is hurting JSON encoder,

now = datetime.datetime.now()
now = datetime.datetime.strftime(now, '%Y-%m-%dT%H:%M:%S.%fZ')
print json.dumps({'current_time': now}) // this is the culprit

I got a temporary fix for it

print json.dumps( {'old_time': now.encode('ISO-8859-1').strip() })

Marking this as correct as a temporary fix (Not sure so).

Question 25

如果上述方法不适合您，则可能需要研究更改encoding其csv file本身。

使用Excel：

csv使用打开文件Excel
导航到“文件”菜单选项，然后单击“另存为”
单击浏览以选择保存文件的位置
输入想要的文件名
选择CSV (Comma delimited) (*.csv)选项
单击工具下拉框，然后单击Web选项。
在“编码”选项卡下，Unicode (UTF-8)从“将此文档另存为”下拉列表中选择选项。
保存文件

使用记事本：

csv file使用记事本打开
导航到文件>另存为选项
接下来，选择文件的位置
选择“保存类型”选项作为“所有文件”（。）
指定带.csv扩展名的文件名
从编码下拉列表中，选择UTF-8选项。
单击保存以保存文件

这样，您应该能够在import csv不遇到的情况下进行归档UnicodeCodeError。

Question 26

If the above methods are not working for you, you may want to look into changing the encoding of the csv file itself.

Using Excel:

Open csv file using Excel
Navigate to File menu option and click Save As
Click Browse to select a location to save the file
Enter intended filename
Select CSV (Comma delimited) (*.csv) option
Click Tools drop-down box and click Web Options
Under Encoding tab, select the option Unicode (UTF-8) from Save this document as drop-down list
Save the file

Using Notepad:

Open csv file using notepad
Navigate to File > Save As option
Next, select the location to the file
Select the Save as type option as All Files(.)
Specify the file name with .csv extension
From Encoding drop-down list, select UTF-8 option.
Click Save to save the file

By doing this, you should be able to import csv files without encountering the UnicodeCodeError.

Question 27

您可以使用任何特定用法和输入的标准编码。

utf-8 是默认值。

iso8859-1 西欧也很受欢迎。

例如： bytes_obj.decode('iso8859-1')

请参阅：文档

Question 28

You may use any standard encoding of your specific usage and input.

utf-8 is the default.

iso8859-1 is also popular for Western Europe.

e.g: bytes_obj.decode('iso8859-1')

see: docs

Question 29

在尝试了所有上述解决方法后，如果仍然抛出相同的错误，则可以尝试将文件导出为CSV（如果已有的话，第二次导出）。特别是如果您正在使用scikit learn，最好import将数据集作为CSV file。

我在一起度过了几个小时，而解决方案就是这么简单。将文件以CSV格式导出到Anaconda安装了分类器工具的目录，然后尝试。

Question 30

After trying all the aforementioned workarounds, if it still throws the same error, you can try exporting the file as CSV (a second time if you already have). Especially if you’re using scikit learn, it is best to import the dataset as a CSV file.

I spent hours together, whereas the solution was this simple. Export the file as a CSV to the directory where Anaconda or your classifier tools are installed and try.

Question 31

而不是寻找解码a5（Yen ¥）或96（en-dash –）的方法，而是告诉MySQL您的客户端已编码为“ latin1”，但您希望在数据库中使用“ utf8”。

使用UTF-8字符查看“问题”中的详细信息；我看到的不是我存储的

Question 32

Instead of looking for ways to decode a5 (Yen ¥) or 96 (en-dash –), tell MySQL that your client is encoded “latin1”, but you want “utf8” in the database.

See details in Trouble with UTF-8 characters; what I see is not what I stored

Question 33

就我而言，我不得不将文件另存为带有BOM的UTF8，而不仅仅是UTF8 utf8这个错误消失了。

Question 34

In my case, i had to save the file as UTF8 with BOM not just as UTF8 utf8 then this error was gone.

UnicodeDecodeError：“ utf8”编解码器无法解码位置0的字节0xa5：无效的起始字节

问题：UnicodeDecodeError：“ utf8”编解码器无法解码位置0的字节0xa5：无效的起始字节

更新

UPDATES

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

交互时在python中重新导入模块

Zenbot 是一个使用Node.js和MongoDB的命令行加密货币交易机器人

在代码中安装python模块

国外牛人整理的Matplotlib超强使用指南与笔记，值得收藏

TensorFlow中的tf.app.flags的目的是什么？

Prometheus + Granafa 40分钟构建MySQL监控平台实战教程

UnicodeDecodeError：“ utf8”编解码器无法解码位置0的字节0xa5：无效的起始字节

问题：UnicodeDecodeError：“ utf8”编解码器无法解码位置0的字节0xa5：无效的起始字节

更新

UPDATES

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

相关文章

排行榜展示

文章展示