标签归档:binary

您如何用Python表达二进制文字?

问题:您如何用Python表达二进制文字?

如何使用Python文字将整数表示为二进制数?

我很容易找到十六进制的答案:

>>> 0x12AF
4783
>>> 0x100
256

和八进制:

>>> 01267
695
>>> 0100
64

您如何使用文字在Python中表示二进制?


答案摘要

  • Python 2.5及更早版本:可以使用,int('01010101111',2)但不能使用文字来表示二进制。
  • Python 2.5和更早版本:无法表达二进制文字。
  • Python 2.6 beta:您可以这样做:0b11001110B1100111
  • Python 2.6 beta:还将允许0o270O27(第二个字符是字母O)表示一个八进制。
  • Python 3.0 beta:与2.6相同,但将不再允许使用较旧027的八进制语法。

How do you express an integer as a binary number with Python literals?

I was easily able to find the answer for hex:

>>> 0x12AF
4783
>>> 0x100
256

and octal:

>>> 01267
695
>>> 0100
64

How do you use literals to express binary in Python?


Summary of Answers

  • Python 2.5 and earlier: can express binary using int('01010101111',2) but not with a literal.
  • Python 2.5 and earlier: there is no way to express binary literals.
  • Python 2.6 beta: You can do like so: 0b1100111 or 0B1100111.
  • Python 2.6 beta: will also allow 0o27 or 0O27 (second character is the letter O) to represent an octal.
  • Python 3.0 beta: Same as 2.6, but will no longer allow the older 027 syntax for octals.

回答 0

供参考- 未来的 Python可能性:
从Python 2.6开始,您可以使用前缀0b0B表示二进制文字:

>>> 0b101111
47

您还可以使用新的bin函数来获取数字的二进制表示形式:

>>> bin(173)
'0b10101101'

文档的开发版本:Python 2.6的新增功能

For reference—future Python possibilities:
Starting with Python 2.6 you can express binary literals using the prefix 0b or 0B:

>>> 0b101111
47

You can also use the new bin function to get the binary representation of a number:

>>> bin(173)
'0b10101101'

Development version of the documentation: What’s New in Python 2.6


回答 1

>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

另一种方式。

>>> print int('01010101111',2)
687
>>> print int('11111111',2)
255

Another way.


回答 2

您如何用Python表达二进制文字?

它们不是“二进制”文字,而是“整数文字”。您可以用二进制格式表示整数文字,0后跟a Bb后跟一系列零和一,例如:

>>> 0b0010101010
170
>>> 0B010101
21

从Python 3 文档开始,以下是在Python中提供整数文字的方式:

整数文字由以下词汇定义描述:

integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1"..."9"
digit        ::=  "0"..."9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0"..."7"
hexdigit     ::=  digit | "a"..."f" | "A"..."F"

除了可以存储在可用内存中的整数之外,整数字面量的长度没有限制。

请注意,不允许使用非零十进制数字开头的零。这是为了消除C样式八进制文字的歧义,Python在3.0版之前使用了这些文字。

整数文字的一些示例:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0xdeadbeef
      100_000_000_000                   0b_1110_0101

在版本3.6中进行了更改:现在允许在文本中使用下划线进行分组。

其他表达二进制的方式:

您可以在可操作的字符串对象中包含零和一(尽管在大多数情况下,您可能应该对整数进行按位运算)-只需将零和一的字符串以及您要从中转换的基数传递给int ):

>>> int('010101', 2)
21

您可以选择使用0b0B前缀:

>>> int('0b0010101010', 2)
170

如果将其0作为基数传递,则如果字符串未指定前缀,则它将假定基数为10:

>>> int('10101', 0)
10101
>>> int('0b10101', 0)
21

从int转换回人类可读的二进制文件:

您可以将整数传递给bin以查看二进制文字的字符串表示形式:

>>> bin(21)
'0b10101'

你可以结合binint去来回:

>>> bin(int('010101', 2))
'0b10101'

如果希望最小宽度和前面的零,也可以使用格式规范:

>>> format(int('010101', 2), '{fill}{width}b'.format(width=10, fill=0))
'0000010101'
>>> format(int('010101', 2), '010b')
'0000010101'

How do you express binary literals in Python?

They’re not “binary” literals, but rather, “integer literals”. You can express integer literals with a binary format with a 0 followed by a B or b followed by a series of zeros and ones, for example:

>>> 0b0010101010
170
>>> 0B010101
21

From the Python 3 docs, these are the ways of providing integer literals in Python:

Integer literals are described by the following lexical definitions:

integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1"..."9"
digit        ::=  "0"..."9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0"..."7"
hexdigit     ::=  digit | "a"..."f" | "A"..."F"

There is no limit for the length of integer literals apart from what can be stored in available memory.

Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.

Some examples of integer literals:

7     2147483647                        0o177    0b100110111
3     79228162514264337593543950336     0o377    0xdeadbeef
      100_000_000_000                   0b_1110_0101

Changed in version 3.6: Underscores are now allowed for grouping purposes in literals.

Other ways of expressing binary:

You can have the zeros and ones in a string object which can be manipulated (although you should probably just do bitwise operations on the integer in most cases) – just pass int the string of zeros and ones and the base you are converting from (2):

>>> int('010101', 2)
21

You can optionally have the 0b or 0B prefix:

>>> int('0b0010101010', 2)
170

If you pass it 0 as the base, it will assume base 10 if the string doesn’t specify with a prefix:

>>> int('10101', 0)
10101
>>> int('0b10101', 0)
21

Converting from int back to human readable binary:

You can pass an integer to bin to see the string representation of a binary literal:

>>> bin(21)
'0b10101'

And you can combine bin and int to go back and forth:

>>> bin(int('010101', 2))
'0b10101'

You can use a format specification as well, if you want to have minimum width with preceding zeros:

>>> format(int('010101', 2), '{fill}{width}b'.format(width=10, fill=0))
'0000010101'
>>> format(int('010101', 2), '010b')
'0000010101'

回答 3

开头的0表示底数是8(而不是10),这很容易看到:

>>> int('010101', 0)
4161

如果您不以0开头,则python假定数字以10为底。

>>> int('10101', 0)
10101

0 in the start here specifies that the base is 8 (not 10), which is pretty easy to see:

>>> int('010101', 0)
4161

If you don’t start with a 0, then python assumes the number is base 10.

>>> int('10101', 0)
10101

回答 4

据我所知,直到2.5,Python仅支持十六进制和八进制文字。我确实找到了一些有关在将来的版本中添加二进制文件的讨论,但没有明确的定义。

As far as I can tell Python, up through 2.5, only supports hexadecimal & octal literals. I did find some discussions about adding binary to future versions but nothing definite.


回答 5

我很确定这是由于Python 3.0的变化之一,也许bin()与hex()和oct()一起使用。

编辑:lbrandy的答案在所有情况下都是正确的。

I am pretty sure this is one of the things due to change in Python 3.0 with perhaps bin() to go with hex() and oct().

EDIT: lbrandy’s answer is correct in all cases.


字符串文字前的’b’字符做什么?

问题:字符串文字前的’b’字符做什么?

显然,以下是有效的语法:

my_string = b'The string'

我想知道:

  1. 这是什么b字在前面的字符串是什么意思?
  2. 使用它有什么作用?
  3. 在什么情况下可以使用它?

我在SO上找到了一个相关的问题,但是这个问题是关于PHP的,它指出b用来表示字符串是二进制的,与Unicode相反,Unicode是使PHP <6版本兼容的代码所必需的,当迁移到PHP 6时。我认为这不适用于Python。

我确实在Python站点上找到了有关使用相同语法的字符将字符串指定为Unicode的文档u。不幸的是,它在该文档的任何地方都没有提到b字符。

而且,只是出于好奇,有没有比多符号bu是做其他事情?

Apparently, the following is the valid syntax:

my_string = b'The string'

I would like to know:

  1. What does this b character in front of the string mean?
  2. What are the effects of using it?
  3. What are appropriate situations to use it?

I found a related question right here on SO, but that question is about PHP though, and it states the b is used to indicate the string is binary, as opposed to Unicode, which was needed for code to be compatible from version of PHP < 6, when migrating to PHP 6. I don’t think this applies to Python.

I did find this documentation on the Python site about using a u character in the same syntax to specify a string as Unicode. Unfortunately, it doesn’t mention the b character anywhere in that document.

Also, just out of curiosity, are there more symbols than the b and u that do other things?


回答 0

引用Python 2.x文档

在Python 2中,前缀’b’或’B’被忽略;它表示文字应在Python 3中变成字节文字(例如,当代码自动由2to3转换时)。前缀“ u”或“ b”后可以带有前缀“ r”。

Python 3中的文件状态:

字节字面量始终以“ b”或“ B”为前缀;它们产生字节类型的实例而不是str类型。它们只能包含ASCII字符;数值等于或大于128的字节必须用转义符表示。

To quote the Python 2.x documentation:

A prefix of ‘b’ or ‘B’ is ignored in Python 2; it indicates that the literal should become a bytes literal in Python 3 (e.g. when code is automatically converted with 2to3). A ‘u’ or ‘b’ prefix may be followed by an ‘r’ prefix.

The Python 3 documentation states:

Bytes literals are always prefixed with ‘b’ or ‘B’; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.


回答 1

Python 3.x明确区分了以下类型:

  • str= '...'文字= Unicode字符序列(UTF-16或UTF-32,取决于Python的编译方式)
  • bytes= b'...'文字=八位字节序列(0到255之间的整数)

如果你熟悉Java或C#,想到strStringbytes作为byte[]。如果您熟悉SQL,请认为stras NVARCHARbytesas BINARYBLOB。如果你熟悉Windows注册表,想到strREG_SZbytes作为REG_BINARY。如果您熟悉C(++),请忘记学习的所有知识char和字符串,因为CHARACTER不是BYTE。这个想法早已过时。

您可以使用str,当你想要表达的文字。

print('שלום עולם')

您可以使用bytes,当你想表示相同结构的低级别的二进制数据。

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

您可以编码一个str到一个bytes对象。

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

您可以将a解码bytesstr

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

但是您不能随意混合使用这两种类型。

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

这种b'...'表示法有些令人困惑,因为它允许使用ASCII字符而不是十六进制数字指定字节0x01-0x7F。

>>> b'A' == b'\x41'
True

但是我必须强调,字符不是字节

>>> 'A' == b'A'
False

在Python 2.x中

Python 3.0之前的版本在文本和二进制数据之间缺乏这种区别。相反,有:

  • unicode= u'...'文字= Unicode字符序列= 3.xstr
  • str= '...'文字=混杂字节/字符的序列
    • 通常为文本,以某种未指定的编码进行编码。
    • 而且还用来表示二进制数据,如struct.pack输出。

为了简化从2.x到3.x的过渡,b'...'将原义语法反向移植到Python 2.6,以便区分二进制字符串(应bytes在3.x中)和文本字符串(应str在3中) 。X)。该b前缀在2.x中不执行任何操作,但告诉2to3脚本不要在3.x中将其转换为Unicode字符串。

因此,是的,b'...'Python中的文字具有与PHP中相同的目的。

另外,出于好奇,还有比b和u更多的符号可以执行其他操作吗?

r前缀创建原始字符串(例如,r'\t'是反斜杠+ t,而不是一个选项卡),和三引号'''...'''"""..."""允许多行字符串文字。

Python 3.x makes a clear distinction between the types:

  • str = '...' literals = a sequence of Unicode characters (UTF-16 or UTF-32, depending on how Python was compiled)
  • bytes = b'...' literals = a sequence of octets (integers between 0 and 255)

If you’re familiar with Java or C#, think of str as String and bytes as byte[]. If you’re familiar with SQL, think of str as NVARCHAR and bytes as BINARY or BLOB. If you’re familiar with the Windows registry, think of str as REG_SZ and bytes as REG_BINARY. If you’re familiar with C(++), then forget everything you’ve learned about char and strings, because A CHARACTER IS NOT A BYTE. That idea is long obsolete.

You use str when you want to represent text.

print('שלום עולם')

You use bytes when you want to represent low-level binary data like structs.

NaN = struct.unpack('>d', b'\xff\xf8\x00\x00\x00\x00\x00\x00')[0]

You can encode a str to a bytes object.

>>> '\uFEFF'.encode('UTF-8')
b'\xef\xbb\xbf'

And you can decode a bytes into a str.

>>> b'\xE2\x82\xAC'.decode('UTF-8')
'€'

But you can’t freely mix the two types.

>>> b'\xEF\xBB\xBF' + 'Text with a UTF-8 BOM'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't concat bytes to str

The b'...' notation is somewhat confusing in that it allows the bytes 0x01-0x7F to be specified with ASCII characters instead of hex numbers.

>>> b'A' == b'\x41'
True

But I must emphasize, a character is not a byte.

>>> 'A' == b'A'
False

In Python 2.x

Pre-3.0 versions of Python lacked this kind of distinction between text and binary data. Instead, there was:

  • unicode = u'...' literals = sequence of Unicode characters = 3.x str
  • str = '...' literals = sequences of confounded bytes/characters
    • Usually text, encoded in some unspecified encoding.
    • But also used to represent binary data like struct.pack output.

In order to ease the 2.x-to-3.x transition, the b'...' literal syntax was backported to Python 2.6, in order to allow distinguishing binary strings (which should be bytes in 3.x) from text strings (which should be str in 3.x). The b prefix does nothing in 2.x, but tells the 2to3 script not to convert it to a Unicode string in 3.x.

So yes, b'...' literals in Python have the same purpose that they do in PHP.

Also, just out of curiosity, are there more symbols than the b and u that do other things?

The r prefix creates a raw string (e.g., r'\t' is a backslash + t instead of a tab), and triple quotes '''...''' or """...""" allow multi-line string literals.


回答 2

b表示字节字符串。

字节是实际数据。字符串是一种抽象。

如果您有多个字符的字符串对象并且使用了一个字符,则该字符串将是一个字符串,并且根据编码的不同,大小可能会超过1个字节。

如果使用1个字节和一个字节字符串,则您将获得0-255之间的单个8位值,并且如果由于编码而导致的那些字符大于1个字节,则它可能不表示完整的字符。

TBH我将使用字符串,除非我有一些特定的低级原因要使用字节。

The b denotes a byte string.

Bytes are the actual data. Strings are an abstraction.

If you had multi-character string object and you took a single character, it would be a string, and it might be more than 1 byte in size depending on encoding.

If took 1 byte with a byte string, you’d get a single 8-bit value from 0-255 and it might not represent a complete character if those characters due to encoding were > 1 byte.

TBH I’d use strings unless I had some specific low level reason to use bytes.


回答 3

从服务器端,如果我们发送任何响应,它将以字节类型的形式发送,因此它将在客户端中显示为 b'Response from server'

为了摆脱,b'....'只需使用以下代码:

服务器文件:

stri="Response from server"    
c.send(stri.encode())

客户端文件:

print(s.recv(1024).decode())

然后它将打印 Response from server

From server side, if we send any response, it will be sent in the form of byte type, so it will appear in the client as b'Response from server'

In order get rid of b'....' simply use below code:

Server file:

stri="Response from server"    
c.send(stri.encode())

Client file:

print(s.recv(1024).decode())

then it will print Response from server


回答 4

这是一个示例,其中缺少bTypeError在Python 3.x中引发异常

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

添加b前缀将解决此问题。

Here’s an example where the absence of b would throw a TypeError exception in Python 3.x

>>> f=open("new", "wb")
>>> f.write("Hello Python!")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'str' does not support the buffer interface

Adding a b prefix would fix the problem.


回答 5

它将其转换为bytes文字(或str在2.x中),并且对于2.6+有效。

r前缀导致反斜杠需要“不解释”(不被忽略,差异确实物质)。

It turns it into a bytes literal (or str in 2.x), and is valid for 2.6+.

The r prefix causes backslashes to be “uninterpreted” (not ignored, and the difference does matter).


回答 6

除了其他人所说的以外,请注意unicode中的单个字符可以由多个字节组成

unicode的工作方式是采用旧的ASCII格式(7位代码,看起来像0xxx xxxx),并添加了多字节序列,其中所有字节均以1(1xxx xxxx)开头,以表示ASCII以外的字符,以便Unicode 向后-与ASCII 兼容

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

In addition to what others have said, note that a single character in unicode can consist of multiple bytes.

The way unicode works is that it took the old ASCII format (7-bit code that looks like 0xxx xxxx) and added multi-bytes sequences where all bytes start with 1 (1xxx xxxx) to represent characters beyond ASCII so that Unicode would be backwards-compatible with ASCII.

>>> len('Öl')  # German word for 'oil' with 2 characters
2
>>> 'Öl'.encode('UTF-8')  # convert str to bytes 
b'\xc3\x96l'
>>> len('Öl'.encode('UTF-8'))  # 3 bytes encode 2 characters !
3

回答 7

您可以使用JSON将其转换为字典

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“核心价值”}


烧瓶:

这是烧瓶的一个例子。在终端行上运行此命令:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

在flask / routes.py中

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘核心价值’}

You can use JSON to convert it to dictionary

import json
data = b'{"key":"value"}'
print(json.loads(data))

{“key”:”value”}


FLASK:

This is an example from flask. Run this on terminal line:

import requests
requests.post(url='http://localhost(example)/',json={'key':'value'})

In flask/routes.py

@app.route('/', methods=['POST'])
def api_script_add():
    print(request.data) # --> b'{"hi":"Hello"}'
    print(json.loads(request.data))
return json.loads(request.data)

{‘key’:’value’}