标签归档:byte

python字符串前的a前缀是什么意思?

问题:python字符串前的a前缀是什么意思?

在python源代码中,我偶然发现在类似如下的字符串之前看到一个小b

b"abcdef"

我知道u表示unicode字符串的r前缀和原始字符串文字的前缀。

b它看起来像一个没有任何前缀的纯字符串,它代表什么?在哪种源代码中有用?

In a python source code I stumbled upon I’ve seen a small b before a string like in:

b"abcdef"

I know about the u prefix signifying a unicode string, and the r prefix for a raw string literal.

What does the b stand for and in which kind of source code is it useful as it seems to be exactly like a plain string without any prefix?


回答 0

这是Python3 bytes 文字。在Python 2.5和更早版本中不存在此前缀(等效于2.x的纯字符串,而3.x的纯字符串等效u于2.x中带前缀的文字)。在Python 2.6+中,它等效于纯字符串,以与3.x兼容

This is Python3 bytes literal. This prefix is absent in Python 2.5 and older (it is equivalent to a plain string of 2.x, while plain string of 3.x is equivalent to a literal with u prefix in 2.x). In Python 2.6+ it is equivalent to a plain string, for compatibility with 3.x.


回答 1

b前缀表示一个bytes字符串常量

如果您看到它在Python 3源代码中使用过,该表达式将创建一个bytes对象,而不是常规Unicode str对象。如果您看到它在Python Shell中回显,或者作为列表,字典或其他容器内容的一部分回显,那么您会看到bytes使用此符号表示的对象。

bytes对象基本上包含一个介于0-255之间的整数序列,但是当表示这些对象时,Python 会将这些字节显示为ASCII码点,以使其更易于读取其内容。外部任何字节可打印的ASCII字符范围被示为转义序列(例如\n\x82等)。相反,您可以同时使用ASCII字符和转义序列来定义字节值。对于ASCII值,使用其数字值(例如b'A'== b'\x41'

因为bytes对象由整数序列组成,所以您可以bytes从其他任何整数序列(其值在0-255范围内)构造一个对象,例如列表:

bytes([72, 101, 108, 108, 111])

和索引给你回的整数(但切片产生一个新bytes值;对于上面的例子中,value[0]给你72,但是value[:1]b'H'作为72是用于大写字母的ASCII码点ħ)。

bytes模拟二进制数据,包括编码文本。如果您的bytes值确实包含文本,则需要先使用正确的编解码器对其进行解码。例如,如果数据编码为UTF-8,则可以使用以下方法获取Unicode str值:

strvalue = bytesvalue.decode('utf-8')

相反,要从str对象中的文本转到bytes需要编码。您需要确定要使用的编码。默认是使用UTF-8,但是您所需要的很大程度上取决于您的用例:

bytesvalue = strvalue.encode('utf-8')

您也可以使用构造函数bytes(strvalue, encoding)执行相同的操作。

解码和编码方法都使用一个额外的参数来指定应如何处理错误

Python 2版本2.6和2.7还支持使用b'..'字符串文字语法创建字符串文字,以简化适用于Python 2和3的代码。

bytes对象是不变的,就像str字符串一样。如果您需要一个可变的字节值,请使用一个bytearray()对象

The b prefix signifies a bytes string literal.

If you see it used in Python 3 source code, the expression creates a bytes object, not a regular Unicode str object. If you see it echoed in your Python shell or as part of a list, dict or other container contents, then you see a bytes object represented using this notation.

bytes objects basically contain a sequence of integers in the range 0-255, but when represented, Python displays these bytes as ASCII codepoints to make it easier to read their contents. Any bytes outside the printable range of ASCII characters are shown as escape sequences (e.g. \n, \x82, etc.). Inversely, you can use both ASCII characters and escape sequences to define byte values; for ASCII values their numeric value is used (e.g. b'A' == b'\x41')

Because a bytes object consist of a sequence of integers, you can construct a bytes object from any other sequence of integers with values in the 0-255 range, like a list:

bytes([72, 101, 108, 108, 111])

and indexing gives you back the integers (but slicing produces a new bytes value; for the above example, value[0] gives you 72, but value[:1] is b'H' as 72 is the ASCII code point for the capital letter H).

bytes model binary data, including encoded text. If your bytes value does contain text, you need to first decode it, using the correct codec. If the data is encoded as UTF-8, for example, you can obtain a Unicode str value with:

strvalue = bytesvalue.decode('utf-8')

Conversely, to go from text in a str object to bytes you need to encode. You need to decide on an encoding to use; the default is to use UTF-8, but what you will need is highly dependent on your use case:

bytesvalue = strvalue.encode('utf-8')

You can also use the constructor, bytes(strvalue, encoding) to do the same.

Both the decoding and encoding methods take an extra argument to specify how errors should be handled.

Python 2, versions 2.6 and 2.7 also support creating string literals using b'..' string literal syntax, to ease code that works on both Python 2 and 3.

bytes objects are immutable, just like str strings are. Use a bytearray() object if you need to have a mutable bytes value.


字符串和字节字符串有什么区别?

问题:字符串和字节字符串有什么区别?

我正在使用一个返回字节字符串的库,我需要将其转换为字符串。

尽管我不确定有什么区别-如果有的话。

I am working with a library which returns a byte string and I need to convert this to a string.

Although I’m not sure what the difference is – if any.


回答 0

假设使用Python 3(在Python 2中,这种区别的定义不太明确)-字符串是字符序列,即unicode码点;这些是一个抽象概念,不能直接存储在磁盘上。毫无疑问,字节字符串是字节序列,可以存储在磁盘上。它们之间的映射是一种编码 -其中有很多(并且无限可能)-并且您需要知道在特定情况下哪种适用才能进行转换,因为不同的编码可能会映射相同的字节到另一个字符串:

>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-16')
'蓏콯캁澽苏'
>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-8')
'τoρνoς'

一旦知道要使用哪个.decode()字符串,就可以使用字节字符串的方法从中获取正确的字符串,如上所述。为了完整起见,.encode()字符串的方法是相反的:

>>> 'τoρνoς'.encode('utf-8')
b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'

Assuming Python 3 (in Python 2, this difference is a little less well-defined) – a string is a sequence of characters, ie unicode codepoints; these are an abstract concept, and can’t be directly stored on disk. A byte string is a sequence of, unsurprisingly, bytes – things that can be stored on disk. The mapping between them is an encoding – there are quite a lot of these (and infinitely many are possible) – and you need to know which applies in the particular case in order to do the conversion, since a different encoding may map the same bytes to a different string:

>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-16')
'蓏콯캁澽苏'
>>> b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'.decode('utf-8')
'τoρνoς'

Once you know which one to use, you can use the .decode() method of the byte string to get the right character string from it as above. For completeness, the .encode() method of a character string goes the opposite way:

>>> 'τoρνoς'.encode('utf-8')
b'\xcf\x84o\xcf\x81\xce\xbdo\xcf\x82'

回答 1

计算机唯一可以存储的是字节。

要将任何内容存储在计算机中,必须先对其进行编码,即将其转换为字节。例如:

  • 如果你想存储的音乐,你必须先进行编码使用它MP3WAV等等。
  • 如果你想存储图片,必须先进行编码使用它PNGJPEG等等。
  • 如果你想存储文本,必须先进行编码使用它ASCIIUTF-8等等。

MP3WAVPNGJPEGASCIIUTF-8是的示例编码。编码是一种格式,以字节为单位表示音频,图像,文本等。

在Python中,字节字符串就是这样:字节序列。这不是人类可读的。在引擎盖下,必须先将所有内容转换为字节字符串,然后才能将其存储在计算机中。

另一方面,通常被称为“字符串”的字符串是字符序列。它是人类可读的。字符串不能直接存储在计算机中,必须先进行编码(转换为字节字符串)。可以通过多种编码将字符串转换为字节字符串,例如ASCIIUTF-8

'I am a string'.encode('ASCII')

上面的Python代码将'I am a string'使用encoding 对字符串进行编码ASCII。上面代码的结果将是一个字节字符串。如果您打印它,Python会将其表示为b'I am a string'。但是请记住,字节字符串不是人类可读的,只是Python从ASCII打印时就对其进行解码。在Python中,字节串由表示b,后跟字节串的ASCII表示。

如果您知道用于编码的字节,则可以将字节字符串解码回字符串。

b'I am a string'.decode('ASCII')

上面的代码将返回原始字符串'I am a string'

编码和解码是相反的操作。在将所有内容写入磁盘之前,必须对其进行编码,并且必须对其进行解码,然后才能被人类读取。

The only thing that a computer can store is bytes.

To store anything in a computer, you must first encode it, i.e. convert it to bytes. For example:

  • If you want to store music, you must first encode it using MP3, WAV, etc.
  • If you want to store a picture, you must first encode it using PNG, JPEG, etc.
  • If you want to store text, you must first encode it using ASCII, UTF-8, etc.

MP3, WAV, PNG, JPEG, ASCII and UTF-8 are examples of encodings. An encoding is a format to represent audio, images, text, etc in bytes.

In Python, a byte string is just that: a sequence of bytes. It isn’t human-readable. Under the hood, everything must be converted to a byte string before it can be stored in a computer.

On the other hand, a character string, often just called a “string”, is a sequence of characters. It is human-readable. A character string can’t be directly stored in a computer, it has to be encoded first (converted into a byte string). There are multiple encodings through which a character string can be converted into a byte string, such as ASCII and UTF-8.

'I am a string'.encode('ASCII')

The above Python code will encode the string 'I am a string' using the encoding ASCII. The result of the above code will be a byte string. If you print it, Python will represent it as b'I am a string'. Remember, however, that byte strings aren’t human-readable, it’s just that Python decodes them from ASCII when you print them. In Python, a byte string is represented by a b, followed by the byte string’s ASCII representation.

A byte string can be decoded back into a character string, if you know the encoding that was used to encode it.

b'I am a string'.decode('ASCII')

The above code will return the original string 'I am a string'.

Encoding and decoding are inverse operations. Everything must be encoded before it can be written to disk, and it must be decoded before it can be read by a human.


回答 2

注意:由于Python 2的生命周期即将结束,因此我将详细说明Python 3的答案。

在Python 3中

bytes由8位无符号值str的序列组成,而由表示人类语言文字字符的Unicode代码点序列组成。

>>> # bytes
>>> b = b'h\x65llo'
>>> type(b)
<class 'bytes'>
>>> list(b)
[104, 101, 108, 108, 111]
>>> print(b)
b'hello'
>>>
>>> # str
>>> s = 'nai\u0308ve'
>>> type(s)
<class 'str'>
>>> list(s)
['n', 'a', 'i', '̈', 'v', 'e']
>>> print(s)
naïve

尽管bytesstr似乎相同的方式工作,他们的情况下,不与对方,即兼容,bytes并且str实例无法与像运营商一起使用>+。此外,请记住,比较bytesstr实例是否相等,即使用==,将始终计算为False即使完全相同。

>>> # concatenation
>>> b'hi' + b'bye' # this is possible
b'hibye'
>>> 'hi' + 'bye' # this is also possible
'hibye'
>>> b'hi' + 'bye' # this will fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytes
>>> 'hi' + b'bye' # this will also fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "bytes") to str
>>>
>>> # comparison
>>> b'red' > b'blue' # this is possible
True
>>> 'red'> 'blue' # this is also possible
True
>>> b'red' > 'blue' # you can't compare bytes with str
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'bytes' and 'str'
>>> 'red' > b'blue' # you can't compare str with bytes
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'str' and 'bytes'
>>> b'blue' == 'red' # equality between str and bytes always evaluates to False
False
>>> b'blue' == 'blue' # equality between str and bytes always evaluates to False
False

处理bytesstr使用使用返回的文件时存在的另一个问题open内置函数。一方面,如果要从文件读取二进制数据或从文件读取二进制数据,请始终使用“ rb”或“ wb”之类的二进制模式打开文件。另一方面,如果要从文件读取Unicode数据或从文件读取Unicode数据,请注意计算机的默认编码,因此如有必要,请传递encoding参数以避免意外情况。

在Python 2中

str由8位值unicode的序列组成,而由Unicode字符序列组成。有一点要记住的是,strunicode如果str仅由7位ASCI字符组成可以与运算符一起使用。

这可能是使用辅助功能之间进行转换有用的strunicode在Python 2之间,以及bytesstr在Python 3。

Note: I will elaborate more my answer for Python 3 since the end of life of Python 2 is very close.

In Python 3

bytes consists of sequences of 8-bit unsigned values, while str consists of sequences of Unicode code points that represent textual characters from human languages.

>>> # bytes
>>> b = b'h\x65llo'
>>> type(b)
<class 'bytes'>
>>> list(b)
[104, 101, 108, 108, 111]
>>> print(b)
b'hello'
>>>
>>> # str
>>> s = 'nai\u0308ve'
>>> type(s)
<class 'str'>
>>> list(s)
['n', 'a', 'i', '̈', 'v', 'e']
>>> print(s)
naïve

Even though bytes and str seem to work the same way, their instances are not compatible with each other, i.e, bytes and str instances can’t be used together with operators like > and +. In addition, keep in mind that comparing bytes and str instances for equality, i.e. using ==, will always evaluate to False even when they contain exactly the same characters.

>>> # concatenation
>>> b'hi' + b'bye' # this is possible
b'hibye'
>>> 'hi' + 'bye' # this is also possible
'hibye'
>>> b'hi' + 'bye' # this will fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat str to bytes
>>> 'hi' + b'bye' # this will also fail
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can only concatenate str (not "bytes") to str
>>>
>>> # comparison
>>> b'red' > b'blue' # this is possible
True
>>> 'red'> 'blue' # this is also possible
True
>>> b'red' > 'blue' # you can't compare bytes with str
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'bytes' and 'str'
>>> 'red' > b'blue' # you can't compare str with bytes
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'str' and 'bytes'
>>> b'blue' == 'red' # equality between str and bytes always evaluates to False
False
>>> b'blue' == 'blue' # equality between str and bytes always evaluates to False
False

Another issue when dealing with bytes and str is present when working with files that are returned using the open built-in function. On one hand, if you want ot read or write binary data to/from a file, always open the file using a binary mode like ‘rb’ or ‘wb’. On the other hand, if you want to read or write Unicode data to/from a file, be aware of the default encoding of your computer, so if necessary pass the encoding parameter to avoid surprises.

In Python 2

str consists of sequences of 8-bit values, while unicode consists of sequences of Unicode characters. One thing to keep in mind is that str and unicode can be used together with operators if str only consists of 7-bit ASCI characters.

It might be useful to use helper functions to convert between str and unicode in Python 2, and between bytes and str in Python 3.


回答 3

什么是Unicode

从根本上讲,计算机只处理数字。他们通过为每个字母分配一个数字来存储字母和其他字符。

……

无论平台是什么,程序是什么,语言是什么,Unicode都会为每个字符提供唯一的数字。

因此,当计算机表示字符串时,它会通过其唯一的Unicode数字找到存储在字符串计算机中的字符,并将这些数字存储在内存中。但是您不能直接将字符串写到磁盘或通过其唯一的Unicode数字在网络上传输字符串,因为这些数字只是简单的十进制数字。您应该将字符串编码为字节字符串,例如UTF-8UTF-8是一种字符编码,能够对所有可能的字符进行编码,并且将字符存储为字节(看起来像这样)。因此,已编码的字符串可以在任何地方使用,因为UTF-8几乎在任何地方都支持。当您打开一个以UTF-8在其他系统上,您的计算机将对其进行解码,并通过其唯一的Unicode数字在其中显示字符。当浏览器接收UTF-8到从网络编码的字符串数据时,它将解码数据为字符串(假设浏览器已UTF-8编码)并显示该字符串。

在python3中,您可以将字符串和字节字符串彼此转换:

>>> print('中文'.encode('utf-8'))
b'\xe4\xb8\xad\xe6\x96\x87'
>>> print(b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8'))
中文 

简而言之,字符串用于显示给人类在计算机上阅读,字节字符串用于存储到磁盘和数据传输。

From What is Unicode:

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one.

……

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language.

So when a computer represents a string, it finds characters stored in the computer of the string through their unique Unicode number and these figures are stored in memory. But you can’t directly write the string to disk or transmit the string on network through their unique Unicode number because these figures are just simple decimal number. You should encode the string to byte string, such as UTF-8. UTF-8 is a character encoding capable of encoding all possible characters and it stores characters as bytes (it looks like this). So the encoded string can be used everywhere because UTF-8 is nearly supported everywhere. When you open a text file encoded in UTF-8 from other systems, your computer will decode it and display characters in it through their unique Unicode number. When a browser receive string data encoded UTF-8 from network, it will decode the data to string (assume the browser in UTF-8 encoding) and display the string.

In python3, you can transform string and byte string to each other:

>>> print('中文'.encode('utf-8'))
b'\xe4\xb8\xad\xe6\x96\x87'
>>> print(b'\xe4\xb8\xad\xe6\x96\x87'.decode('utf-8'))
中文 

In a word, string is for displaying to humans to read on a computer and byte string is for storing to disk and data transmission.


回答 4

Unicode是一种公认​​的格式,用于字符的二进制表示和各种格式(例如,小写/大写,换行,回车)和其他“事物”(例如,表情符号)。无论是在内存中还是在文件中,计算机都能够存储unicode表示(一系列位),而不是存储ascii表示(一系列不同的位)或任何其他表示形式(一系列的位) )。

为了进行通讯通讯双方必须就将使用哪种表示形式达成一致。

因为unicode试图代表所有人与人之间和计算机间通信中使用的可能的字符(和其他“事物”),所以与许多其他表示系统相比,表示许多字符(或事物)所需要的位数更多。试图代表一组更有限的字符/事物。为了“简化”,并可能适应历史用法,unicode表示几乎专门转换为某种其他表示系统(例如ascii),目的是将字符存储在文件中。

这不是的情况下的unicode 不能被用于在文件中存储的字符,或通过发送它们的任何通信信道,只要它不。

术语“字符串”没有精确定义。通常,“字符串”是指一组字符/事物。在计算机中,这些字符可以以多种不同的逐位表示形式中的任何一种形式存储。“字节字符串”是一组字符,它们使用八位(八位称为字节)的表示形式存储。由于如今,计算机使用unicode系统(由可变字节数表示的字符)将字符存储在内存中,并使用字节字符串(由单字节表示的字符)将字符存储到文件中,因此在表示字符之前必须先进行转换内存中的内容将被移动到文件存储中。

Unicode is an agreed-upon format for the binary representation of characters and various kinds of formatting (e.g. lower case/upper case, new line, carriage return), and other “things” (e.g. emojis). A computer is no less capable of storing a unicode representation (a series of bits), whether in memory or in a file, than it is of storing an ascii representation (a different series of bits), or any other representation (series of bits).

For communication to take place, the parties to the communication must agree on what representation will be used.

Because unicode seeks to represent all the possible characters (and other “things”) used in inter-human and inter-computer communication, it requires a greater number of bits for the representation of many characters (or things) than other systems of representation that seek to represent a more limited set of characters/things. To “simplify,” and perhaps to accommodate historical usage, unicode representation is almost exclusively converted to some other system of representation (e.g. ascii) for the purpose of storing characters in files.

It is not the case that unicode cannot be used for storing characters in files, or transmitting them through any communications channel, simply that it is not.

The term “string,” is not precisely defined. “String,” in its common usage, refers to a set of characters/things. In a computer, those characters may be stored in any one of many different bit-by-bit representations. A “byte string” is a set of characters stored using a representation that uses eight bits (eight bits being referred to as a byte). Since, these days, computers use the unicode system (characters represented by a variable number of bytes) to store characters in memory, and byte strings (characters represented by single bytes) to store characters to files, a conversion must be used before characters represented in memory will be moved into storage in files.


回答 5

让我们有一个简单的单字符字符串,'š'并将其编码为字节序列:

>>> 'š'.encode('utf-8')
b'\xc5\xa1'

出于本示例的目的,让我们以二进制形式显示字节序列:

>>> bin(int(b'\xc5\xa1'.hex(), 16))
'0b1100010110100001'

现在,在不知道信息是如何编码的情况下,通常无法将信息解码回去。仅当您知道使用了utf-8文本编码时,您才可以按照用于解码utf-8算法并获取原始字符串:

11000101 10100001
   ^^^^^   ^^^^^^
   00101   100001

您可以将二进制数显示101100001为字符串:

>>> chr(int('101100001', 2))
'š'

Let’s have a simple one-character string 'š' and encode it into a sequence of bytes:

>>> 'š'.encode('utf-8')
b'\xc5\xa1'

For the purpose of this example let’s display the sequence of bytes in its binary form:

>>> bin(int(b'\xc5\xa1'.hex(), 16))
'0b1100010110100001'

Now it is generally not possible to decode the information back without knowing how it was encoded. Only if you know that the utf-8 text encoding was used, you can follow the algorithm for decoding utf-8 and acquire the original string:

11000101 10100001
   ^^^^^   ^^^^^^
   00101   100001

You can display the binary number 101100001 back as a string:

>>> chr(int('101100001', 2))
'š'

回答 6

Python语言包括strbytes作为标准的“内置类型”。换句话说,它们都是类。我认为尝试合理化以这种方式实现Python的理由并不值得。

话虽如此,str而且bytes彼此非常相似。两者共享大多数相同的方法。以下方法是str该类唯一的:

casefold
encode
format
format_map
isdecimal
isidentifier
isnumeric
isprintable

以下方法是bytes该类唯一的:

decode
fromhex
hex

The Python languages includes str and bytes as standard “Built-in Types”. In other words, they are both classes. I don’t think it’s worthwhile trying to rationalize why Python has been implemented this way.

Having said that, str and bytes are very similar to one another. Both share most of the same methods. The following methods are unique to the str class:

casefold
encode
format
format_map
isdecimal
isidentifier
isnumeric
isprintable

The following methods are unique to the bytes class:

decode
fromhex
hex

TypeError:需要类似字节的对象,而在Python3中写入文件时不是’str’

问题:TypeError:需要类似字节的对象,而在Python3中写入文件时不是’str’

我最近已经迁移到Py 3.5。这段代码在Python 2.7中正常工作:

with open(fname, 'rb') as f:
    lines = [x.strip() for x in f.readlines()]

for line in lines:
    tmp = line.strip().lower()
    if 'some-pattern' in tmp: continue
    # ... code

升级到3.5后,我得到了:

TypeError: a bytes-like object is required, not 'str'

最后一行错误(模式搜索代码)。

我试过使用.decode()语句两侧的函数,也尝试过:

if tmp.find('some-pattern') != -1: continue

-无济于事。

我能够很快解决几乎所有的2:3问题,但是这个小小的声明困扰着我。

I’ve very recently migrated to Py 3.5. This code was working properly in Python 2.7:

with open(fname, 'rb') as f:
    lines = [x.strip() for x in f.readlines()]

for line in lines:
    tmp = line.strip().lower()
    if 'some-pattern' in tmp: continue
    # ... code

After upgrading to 3.5, I’m getting the:

TypeError: a bytes-like object is required, not 'str'

error on the last line (the pattern search code).

I’ve tried using the .decode() function on either side of the statement, also tried:

if tmp.find('some-pattern') != -1: continue

– to no avail.

I was able to resolve almost all 2:3 issues quickly, but this little statement is bugging me.


回答 0

您以二进制模式打开文件:

with open(fname, 'rb') as f:

这意味着从文件读取的所有数据都作为bytes对象而不是作为对象返回str。然后,您不能在收容测试中使用字符串:

if 'some-pattern' in tmp: continue

您必须改为使用一个bytes对象进行测试tmp

if b'some-pattern' in tmp: continue

或以文本文件形式打开文件,而不是将'rb'模式替换为'r'

You opened the file in binary mode:

with open(fname, 'rb') as f:

This means that all data read from the file is returned as bytes objects, not str. You cannot then use a string in a containment test:

if 'some-pattern' in tmp: continue

You’d have to use a bytes object to test against tmp instead:

if b'some-pattern' in tmp: continue

or open the file as a textfile instead by replacing the 'rb' mode with 'r'.


回答 1

您可以使用以下方式对字符串进行编码 .encode()

例:

'Hello World'.encode()

You can encode your string by using .encode()

Example:

'Hello World'.encode()

回答 2

就像已经提到的一样,您正在以二进制模式读取文件,然后创建字节列表。在下面的for循环中,您将字符串与字节进行比较,这就是代码失败的地方。

在将字节添加到列表时对字节进行解码应该可以。更改后的代码应如下所示:

with open(fname, 'rb') as f:
    lines = [x.decode('utf8').strip() for x in f.readlines()]

字节类型是在Python 3中引入的,这就是为什么您的代码在Python 2中可以工作的原因。在Python 2中,没有字节的数据类型:

>>> s=bytes('hello')
>>> type(s)
<type 'str'>

Like it has been already mentioned, you are reading the file in binary mode and then creating a list of bytes. In your following for loop you are comparing string to bytes and that is where the code is failing.

Decoding the bytes while adding to the list should work. The changed code should look as follows:

with open(fname, 'rb') as f:
    lines = [x.decode('utf8').strip() for x in f.readlines()]

The bytes type was introduced in Python 3 and that is why your code worked in Python 2. In Python 2 there was no data type for bytes:

>>> s=bytes('hello')
>>> type(s)
<type 'str'>

回答 3

您必须从wb更改为w:

def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'wb')) 
    self.myCsv.writerow(['title', 'link'])

def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'w'))
    self.myCsv.writerow(['title', 'link'])

更改此设置后,错误消失,但是您无法写入文件(以我为例)。毕竟,我没有答案吗?

来源:如何删除^ M

更改为“ rb”会给我带来另一个错误:io.UnsupportedOperation:写入

You have to change from wb to w:

def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'wb')) 
    self.myCsv.writerow(['title', 'link'])

to

def __init__(self):
    self.myCsv = csv.writer(open('Item.csv', 'w'))
    self.myCsv.writerow(['title', 'link'])

After changing this, the error disappears, but you can’t write to the file (in my case). So after all, I don’t have an answer?

Source: How to remove ^M

Changing to ‘rb’ brings me the other error: io.UnsupportedOperation: write


回答 4

对于这个小例子:import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send(**b**'GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')

while True:
    data = mysock.recv(512)
    if ( len(data) < 1 ) :
        break
    print (data);

mysock.close()

在’GET http://www.py4inf.com/code/romeo.txt HTTP / 1.0 \ n \ n’ 之前添加“ b” 解决了我的问题

for this small example: import socket

mysock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
mysock.connect(('www.py4inf.com', 80))
mysock.send(**b**'GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n')

while True:
    data = mysock.recv(512)
    if ( len(data) < 1 ) :
        break
    print (data);

mysock.close()

adding the “b” before ‘GET http://www.py4inf.com/code/romeo.txt HTTP/1.0\n\n’ solved my problem


回答 5

与单引号中给出的硬编码字符串值一起使用encode()函数。

例如:

file.write(answers[i] + '\n'.encode())

要么

line.split(' +++$+++ '.encode())

Use encode() function along with hardcoded String value given in a single quote.

Ex:

file.write(answers[i] + '\n'.encode())

OR

line.split(' +++$+++ '.encode())

回答 6

您以二进制模式打开文件:

以下代码将引发TypeError:需要一个类似字节的对象,而不是’str’。

for line in lines:
    print(type(line))# <class 'bytes'>
    if 'substring' in line:
       print('success')

以下代码将起作用-您必须使用encode()函数:

for line in lines:
    line = line.decode()
    print(type(line))# <class 'str'>
    if 'substring' in line:
       print('success')

You opened the file in binary mode:

The following code will throw a TypeError: a bytes-like object is required, not ‘str’.

for line in lines:
    print(type(line))# <class 'bytes'>
    if 'substring' in line:
       print('success')

The following code will work – you have to use the decode() function:

for line in lines:
    line = line.decode()
    print(type(line))# <class 'str'>
    if 'substring' in line:
       print('success')

回答 7

为什么不尝试以文本形式打开文件?

with open(fname, 'rt') as f:
    lines = [x.strip() for x in f.readlines()]

此外,以下是官方页面上python 3.x的链接:https : //docs.python.org/3/library/io.html 这是开放功能:https : //docs.python.org/3 /library/functions.html#open

如果您确实想将其作为二进制文件处理,则考虑对字符串进行编码。

why not try opening your file as text?

with open(fname, 'rt') as f:
    lines = [x.strip() for x in f.readlines()]

Additionally here is a link for python 3.x on the official page: https://docs.python.org/3/library/io.html And this is the open function: https://docs.python.org/3/library/functions.html#open

If you are really trying to handle it as a binary then consider encoding your string.


回答 8

当我尝试将char(或字符串)转换为时,出现此错误bytes,代码在Python 2.7中是这样的:

# -*- coding: utf-8 -*-
print( bytes('ò') )

这是Python 2.7处理Unicode字符的方式。

这在Python 3.6中不起作用,因为bytes需要一个额外的参数来编码,但这可能有点棘手,因为不同的编码可能会输出不同的结果:

print( bytes('ò', 'iso_8859_1') ) # prints: b'\xf2'
print( bytes('ò', 'utf-8') ) # prints: b'\xc3\xb2'

就我而言,我不得不使用 iso_8859_1在对字节进行编码时来解决问题。

希望这对某人有帮助。

I got this error when I was trying to convert a char (or string) to bytes, the code was something like this with Python 2.7:

# -*- coding: utf-8 -*-
print( bytes('ò') )

This is the way of Python 2.7 when dealing with unicode chars.

This won’t work with Python 3.6, since bytes require an extra argument for encoding, but this can be little tricky, since different encoding may output different result:

print( bytes('ò', 'iso_8859_1') ) # prints: b'\xf2'
print( bytes('ò', 'utf-8') ) # prints: b'\xc3\xb2'

In my case I had to use iso_8859_1 when encoding bytes in order to solve the issue.

Hope this helps someone.