问题:错误UnicodeDecodeError:’utf-8’编解码器无法解码位置0的字节0xff:无效的起始字节

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

在上述站点上编译“ process.py”时发生错误。

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

追溯(最近一次通话):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

错误原因是什么?Python的版本是3.5.2。

https://github.com/affinelayer/pix2pix-tensorflow/tree/master/tools

An error occurred when compiling “process.py” on the above site.

 python tools/process.py --input_dir data --            operation resize --outp
ut_dir data2/resize
data/0.jpg -> data2/resize/0.png

Traceback (most recent call last):

File "tools/process.py", line 235, in <module>
  main()
File "tools/process.py", line 167, in main
  src = load(src_path)
File "tools/process.py", line 113, in load
  contents = open(path).read()
      File"/home/user/anaconda3/envs/tensorflow_2/lib/python3.5/codecs.py", line 321, in decode
  (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode     byte 0xff in position 0: invalid start byte

What is the cause of the error? Python’s version is 3.5.2.


回答 0

Python尝试将字节数组(bytes假定为utf-8编码的字符串)转换为unicode字符串(str)。当然,此过程是根据utf-8规则进行的解码。尝试此操作时,会遇到utf-8编码的字符串中不允许的字节序列(即位置0处的此0xff)。

由于您没有提供我们可以查看的任何代码,因此我们只能猜测其余的代码。

从堆栈跟踪中,我们可以假定触发操作是从文件(contents = open(path).read())中读取数据。我建议以如下方式重新编码:

with open(path, 'rb') as f:
  contents = f.read()

b在该模式说明open(),指出该文件应作为二进制来处理,所以contents仍将是一个bytes。这样不会发生任何解码尝试。

Python tries to convert a byte-array (a bytes which it assumes to be a utf-8-encoded string) to a unicode string (str). This process of course is a decoding according to utf-8 rules. When it tries this, it encounters a byte sequence which is not allowed in utf-8-encoded strings (namely this 0xff at position 0).

Since you did not provide any code we could look at, we only could guess on the rest.

From the stack trace we can assume that the triggering action was the reading from a file (contents = open(path).read()). I propose to recode this in a fashion like this:

with open(path, 'rb') as f:
  contents = f.read()

That b in the mode specifier in the open() states that the file shall be treated as binary, so contents will remain a bytes. No decoding attempt will happen this way.


回答 1

使用此解决方案,它将去除(忽略)字符并返回不包含字符的字符串。仅当您需要剥离它们而不转换它们时才使用此方法。

with open(path, encoding="utf8", errors='ignore') as f:

使用errors='ignore' 您只会丢失一些字符。但如果您不关心它们,因为它们似乎是多余的字符,这些字符来自与我的套接字服务器连接的客户端的格式和编程不正确。然后,这是一个简单的直接解决方案。 参考

Use this solution it will strip out (ignore) the characters and return the string without them. Only use this if your need is to strip them not convert them.

with open(path, encoding="utf8", errors='ignore') as f:

Using errors='ignore' You’ll just lose some characters. but if your don’t care about them as they seem to be extra characters originating from a the bad formatting and programming of the clients connecting to my socket server. Then its a easy direct solution. reference


回答 2

发生了与此类似的问题,最终使用UTF-16进行解码。我的代码如下。

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

这会将文件内容作为导入,但是它将以UTF格式返回代码。从那里开始,它将被解码并以行分隔。

Had an issue similar to this, Ended up using UTF-16 to decode. my code is below.

with open(path_to_file,'rb') as f:
    contents = f.read()
contents = contents.rstrip("\n").decode("utf-16")
contents = contents.split("\r\n")

this would take the file contents as an import, but it would return the code in UTF format. from there it would be decoded and seperated by lines.


回答 3

使用编码格式ISO-8859-1解决此问题。

Use encoding format ISO-8859-1 to solve the issue.


回答 4

在遇到相同的错误时,我遇到了这个线程,经过一些研究后我可以确认,这是您尝试使用UTF-8解码UTF-16文件时发生的错误。

对于UTF-16,第一个字符(UTF-16中为2个字节)是字节顺序标记(BOM),它用作解码提示,并且不会在解码字符串中显示为字符。这意味着第一个字节将是FE或FF,第二个字节将是另一个。

我找到真正的答案后进行了大量编辑

I’ve come across this thread when suffering the same error, after doing some research I can confirm, this is an error that happens when you try to decode a UTF-16 file with UTF-8.

With UTF-16 the first characther (2 bytes in UTF-16) is a Byte Order Mark (BOM), which is used as a decoding hint and doesn’t appear as a character in the decoded string. This means the first byte will be either FE or FF and the second, the other.

Heavily edited after I found out the real answer


回答 5

仅使用

base64.b64decode(a) 

代替

base64.b64decode(a).decode('utf-8')

use only

base64.b64decode(a) 

instead of

base64.b64decode(a).decode('utf-8')

回答 6

如果您使用的是Mac,请检查是否有隐藏文件.DS_Store。删除文件后,我的程序正常工作。

If you are on a mac check if you for a hidden file, .DS_Store. After removing the file my program worked.


回答 7

检查要读取的文件的路径。我的代码一直在给我错误,直到我将路径名更改为当前工作目录为止。错误是:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

Check the path of the file to be read. My code kept on giving me errors until I changed the path name to present working directory. The error was:

newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xff in position 0: invalid start byte

回答 8

如果您从串行端口接收数据,请确保使用正确的波特率(和其他配置):使用(utf-8)解码,但错误的配置会产生相同的错误

UnicodeDecodeError:’utf-8’编解码器无法解码位置0的字节0xff:无效的起始字节

在Linux上使用以下命令检查您的串行端口配置: stty -F /dev/ttyUSBX -a

if you are receiving data from a serial port, make sure you are using the right baudrate (and the other configs ) : decoding using (utf-8) but the wrong config will generate the same error

UnicodeDecodeError: ‘utf-8’ codec can’t decode byte 0xff in position 0: invalid start byte

to check your serial port config on linux use : stty -F /dev/ttyUSBX -a


回答 9

这仅表示您选择了错误的编码来读取文件。

在Mac上,用于file -I file.txt查找正确的编码。在Linux上,使用file -i file.txt

It simply means that one chose the wrong encoding to read the file.

On Mac, use file -I file.txt to find the correct encoding. On Linux, use file -i file.txt.


回答 10

处理从Linux生成的文件时,我遇到相同的问题。事实证明,这与包含问号的文件有关。

I have the same issue when processing a file generated from Linux. It turns out it was related with files containing question marks..


回答 11

我有一个类似的问题。

解决方法:

import io

with io.open(filename, 'r', encoding='utf-8') as fn:
  lines = fn.readlines()

但是,我还有另一个问题。一些html文件(以我为例)不是utf-8,因此我收到了类似的错误。当我排除这些html文件时,一切工作顺利。

因此,除了修复代码之外,还要检查您正在读取的文件,也许确实存在不兼容性。

I had a similar problem.

Solved it by:

import io

with io.open(filename, 'r', encoding='utf-8') as fn:
  lines = fn.readlines()

However, I had another problem. Some html files (in my case) were not utf-8, so I received a similar error. When I excluded those html files, everything worked smoothly.

So, except from fixing the code, check also the files you are reading from, maybe there is an incompatibility there indeed.


回答 12

如果可能,请在文本编辑器中打开文件,然后尝试将编码更改为UTF-8。否则,请在OS级别以编程方式进行操作。

If possible, open the file in a text editor and try to change the encoding to UTF-8. Otherwise do it programatically at the OS level.


回答 13

我有一个类似的问题。我尝试在tensorflow / models / objective_detection中运行示例并遇到相同的消息。尝试将Python3更改为Python2

I have a similar problem. I try to run an example in tensorflow/models/objective_detection and met the same message. Try to change Python3 to Python2


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。