字符串文字前的’b’字符做什么?

问题:字符串文字前的’b’字符做什么?

显然,以下是有效的语法:

my_string = b'The string'

我想知道:

  1. 这是什么b字在前面的字符串是什么意思?
  2. 使用它有什么作用?
  3. 在什么情况下可以使用它?

我在SO上找到了一个相关的问题,但是这个问题是关于PHP的,它指出b用来表示字符串是二进制的,与Unicode相反,Unicode是使PHP <6版本兼容的代码所必需的,当迁移到PHP 6时。我认为这不适用于Python。

我确实在Python站点上找到了有关使用相同语法的字符将字符串指定为Unicode的文档u。不幸的是,它在该文档的任何地方都没有提到b字符。

而且,只是出于好奇,有没有比多符号bu是做其他事情?

Apparently, the following is the valid syntax:

my_string = b'The string'

I would like to know:

  1. What does this b character in front of the string mean?
  2. What are the effects of using it?
  3. What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don’t think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn’t mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?


回答 0

引用Python 2.x文档

在Python 2中,前缀’b’或’B’被忽略;它表示文字应在Python 3中变成字节文字(例如,当代码自动由2to3转换时)。前缀“ u”或“ b”后可以带有前缀“ r”。

Python 3中的文件状态:

字节字面量始终以“ b”或“ B”为前缀;它们产生字节类型的实例而不是str类型。它们只能包含ASCII字符;数值等于或大于128的字节必须用转义符表示。

To quote the Python 2.x documentation:

A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.


回答 1

Python 3.x明确区分了以下类型:

  • str= '...'文字= Unicode字符序列(UTF-16或UTF-32,取决于Python的编译方式)
  • bytes= b'...'文字=八位字节序列(0到255之间的整数)

如果你熟悉Java或C#,想到strStringbytes作为byte[]。如果您熟悉SQL,请认为stras NVARCHARbytesas BINARYBLOB。如果你熟悉Windows注册表,想到strREG_SZbytes作为REG_BINARY。如果您熟悉C(++),请忘记学习的所有知识char和字符串,因为CHARACTER不是BYTE。这个想法早已过时。

您可以使用str,当你想要表达的文字。

print('שלום עולם')

您可以使用bytes,当你想表示相同结构的低级别的二进制数据。

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

您可以编码一个str到一个bytes对象。

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

您可以将a解码bytesstr

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

但是您不能随意混合使用这两种类型。

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

这种b'...'表示法有些令人困惑,因为它允许使用ASCII字符而不是十六进制数字指定字节0x01-0x7F。

>>> b'A' == b'\x41'
True

但是我必须强调,字符不是字节

>>> 'A' == b'A'
False

在Python 2.x中

Python 3.0之前的版本在文本和二进制数据之间缺乏这种区别。相反,有:

  • unicode= u'...'文字= Unicode字符序列= 3.xstr
  • str= '...'文字=混杂字节/字符的序列
    • 通常为文本,以某种未指定的编码进行编码。
    • 而且还用来表示二进制数据,如struct.pack输出。

为了简化从2.x到3.x的过渡,b'...'将原义语法反向移植到Python 2.6,以便区分二进制字符串(应bytes在3.x中)和文本字符串(应str在3中) 。X)。该b前缀在2.x中不执行任何操作,但告诉2to3脚本不要在3.x中将其转换为Unicode字符串。

因此,是的,b'...'Python中的文字具有与PHP中相同的目的。

另外,出于好奇,还有比b和u更多的符号可以执行其他操作吗?

r前缀创建原始字符串(例如,r'\t'是反斜杠+ t,而不是一个选项卡),和三引号'''...'''"""..."""允许多行字符串文字。

Python 3.x makes a clear distinction between the types:

  • str = '...' literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled)
  • bytes = b'...' literals = a sequence of octets (integers between 0 and 255)

If you’re familiar with Java or C#, think of str as String and bytes as byte[]. If you’re familiar with SQL, think of str as NVARCHAR and bytes as BINARY or BLOB. If you’re familiar with the Windows registry, think of str as REG_SZ and bytes as REG_BINARY. If you’re familiar with C(++), then forget everything you’ve learned about char and strings, because A CHARACTER IS NOT A BYTE. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can’t freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

But I must emphasize, a character is not a byte.

>>> 'A' == b'A'
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

  • unicode = u'...' literals = sequence of Unicode characters = 3.x str
  • str = '...' literals = sequences of confounded bytes/characters
    • Usually text, encoded in some unspecified encoding.
    • But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b'...' literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.


回答 2

b表示字节字符串。

字节是实际数据。字符串是一种抽象。

如果您有多个字符的字符串对象并且使用了一个字符,则该字符串将是一个字符串,并且根据编码的不同,大小可能会超过1个字节。

如果使用1个字节和一个字节字符串,则您将获得0-255之间的单个8位值,并且如果由于编码而导致的那些字符大于1个字节,则它可能不表示完整的字符。

TBH我将使用字符串,除非我有一些特定的低级原因要使用字节。

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, you’d get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH I’d use strings unless I had some specific low level reason to use bytes.


回答 3

从服务器端,如果我们发送任何响应,它将以字节类型的形式发送,因此它将在客户端中显示为 b'Response from server'

为了摆脱,b'....'只需使用以下代码:

服务器文件:

stri="Response from server"    
c.send(stri.encode())

客户端文件:

print(s.recv(1024).decode())

然后它将打印 Response from server

From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'

In order get rid of b'....' simply use below code:

Server file:

stri="Response from server"    
c.send(stri.encode())

Client file:

print(s.recv(1024).decode())

then it will print Response from server


回答 4

这是一个示例,其中缺少bTypeError在Python 3.x中引发异常

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

添加b前缀将解决此问题。

Here’s an example where the absence of b would throw a TypeError exception in Python 3.x

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

Adding a b prefix would fix the problem.


回答 5

它将其转换为bytes文字(或str在2.x中),并且对于2.6+有效。

r前缀导致反斜杠需要“不解释”(不被忽略,差异确实物质)。

It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.

The r prefix causes backslashes to be “uninterpreted” (not ignored, and the difference does matter).


回答 6

除了其他人所说的以外,请注意unicode中的单个字符可以由多个字节组成

unicode的工作方式是采用旧的ASCII格式(7位代码,看起来像0xxx xxxx),并添加了多字节序列,其中所有字节均以1(1xxx xxxx)开头,以表示ASCII以外的字符,以便Unicode 向后-与ASCII 兼容

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

In addition to what others have said, note that a single character in unicode can consist of multiple bytes.

The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

回答 7

您可以使用JSON将其转换为字典

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“核心价值”}


烧瓶:

这是烧瓶的一个例子。在终端行上运行此命令:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

在flask / routes.py中

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘核心价值’}

You can use JSON to convert it to dictionary

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“key”:”value”}


FLASK:

This is an example from flask. Run this on terminal line:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

In flask/routes.py

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘key’:’value’}