“ for line in…”导致UnicodeDecodeError:’utf-8’编解码器无法解码字节

问题:“ for line in…”导致UnicodeDecodeError:’utf-8’编解码器无法解码字节

这是我的代码,

for line in open('u.item'):
#read each line

每当我运行此代码时,都会出现以下错误:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

我试图解决这个问题,并在open()中添加了一个额外的参数,代码看起来像;

for line in open('u.item', encoding='utf-8'):
#read each line

但是,它再次给出相同的错误。那我该怎么办!请帮忙。

Here is my code,

for line in open('u.item'):
#read each line

whenever I run this code it gives the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

I tried to solve this and add an extra parameter in open(), the code looks like;

for line in open('u.item', encoding='utf-8'):
#read each line

But again it gives the same error. what should I do then! Please help.


回答 0

正如Mark Ransom所建议的,我找到了解决该问题的正确编码。编码为“ ISO-8859-1”,因此替换open("u.item", encoding="utf-8")open('u.item', encoding = "ISO-8859-1")可以解决该问题。

As suggested by Mark Ransom, I found the right encoding for that problem. The encoding was "ISO-8859-1", so replacing open("u.item", encoding="utf-8") with open('u.item', encoding = "ISO-8859-1") will solve the problem.


回答 1

同样对我有用,ISO 8859-1将节省很多,哈哈哈,主要是如果使用语音识别API的话

例:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

Also worked for me, ISO 8859-1 is going to save a lot, hahaha, mainly if using Speech Recognition API’s

Example:

file = open('../Resources/' + filename, 'r', encoding="ISO-8859-1");

回答 2

您的文件实际上并不包含utf-8编码的数据,而是包含其他一些编码。弄清楚编码是什么,并在open调用中使用它。

例如,在Windows-1252编码中,0xe9字符为é

Your file doesn’t actually contain utf-8 encoded data, it contains some other encoding. Figure out what that encoding is and use it in the open call.

In Windows-1252 encoding for example the 0xe9 would be the character é.


回答 3

尝试使用熊猫阅读

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

Try this to read using pandas

pd.read_csv('u.item', sep='|', names=m_cols , encoding='latin-1')

回答 4

如果使用Python 2以下将解决方案:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

由于encodingparameter不适用于open(),因此您将收到以下错误:

TypeError:“ encoding”是此函数的无效关键字参数

If you are using Python 2 the following will the solution:

import io
for line in io.open("u.item", encoding="ISO-8859-1"):
    # do something

Because encoding parameter doesn’t work with open(), you will be getting the following error:

TypeError: 'encoding' is an invalid keyword argument for this function

回答 5

您可以使用以下方法解决问题:

for line in open(your_file_path, 'rb'):

‘rb’以二进制模式读取文件。在这里阅读更多。希望这会有所帮助!

You could resolve the problem with:

for line in open(your_file_path, 'rb'):

‘rb’ is reading file in binary mode. Read more here. Hope this will help!


回答 6

这有效:

open('filename', encoding='latin-1')

要么:

open('filename',encoding="IS0-8859-1")

This works:

open('filename', encoding='latin-1')

or:

open('filename',encoding="ISO-8859-1")

回答 7

如果有人在寻找这些,这是在Python 3中转换CSV文件的示例:

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

If someone looking for these, this is an example for converting a CSV file in Python 3:

try:
    inputReader = csv.reader(open(argv[1], encoding='ISO-8859-1'), delimiter=',',quotechar='"')
except IOError:
    pass

回答 8

有时,open(filepath)在其中filepath实际上不是一个文件会得到同样的错误,所以,首先要确保你试图打开的文件存在:

import os
assert os.path.isfile(filepath)

希望这会有所帮助。

Sometimes when open(filepath) in which filepath actually is not a file would get the same error, so firstly make sure the file you’re trying to open exists:

import os
assert os.path.isfile(filepath)

hope this will help.


回答 9

您可以这样尝试:

open('u.item', encoding='utf8', errors='ignore')

you can try this way:

open('u.item', encoding='utf8', errors='ignore')