问题:在Python 3中将int转换为字节
我试图在Python 3中构建此byte对象:
b'3\r\n'
所以我尝试了显而易见的(对我来说),发现了一个奇怪的行为:
>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'
显然:
>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
我无法看到有关为什么字节转换以这种方式阅读文档的任何指示。但是,在此Python问题中,我确实发现了一些有关添加format
到字节的惊奇消息(另请参见Python 3字节格式):
http://bugs.python.org/issue3982
现在与byte(int)之类的奇数的相互作用更差,现在返回零
和:
如果bytes(int)返回该int的ASCIIfication,对我来说将更加方便;但老实说,即使是错误也比这种行为要好。(如果我想要这种行为-我从未有过-我宁愿它是一种类方法,就像“ bytes.zeroes(n)”一样被调用。)
有人可以向我解释这种行为的来源吗?
I was trying to build this bytes object in Python 3:
b'3\r\n'
so I tried the obvious (for me), and found a weird behaviour:
>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'
Apparently:
>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'
I’ve been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format
to bytes (see also Python 3 bytes formatting):
http://bugs.python.org/issue3982
This interacts even more poorly with oddities like bytes(int) returning zeroes now
and:
It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior – which I never have – I’d rather it be a classmethod, invoked like “bytes.zeroes(n)”.)
Can someone explain me where this behaviour comes from?
回答 0
那就是它的设计方式-很有道理,因为通常,您将调用bytes
一个可迭代而不是单个整数:
>>> bytes([3])
b'\x03'
该文档说明这一点,以及文档字符串为bytes
:
>>> help(bytes)
...
bytes(int) -> bytes object of size given by the parameter initialized with null bytes
That’s the way it was designed – and it makes sense because usually, you would call bytes
on an iterable instead of a single integer:
>>> bytes([3])
b'\x03'
The docs state this, as well as the docstring for bytes
:
>>> help(bytes)
...
bytes(int) -> bytes object of size given by the parameter initialized with null bytes
回答 1
从python 3.2你可以做
>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'
https://docs.python.org/3/library/stdtypes.html#int.to_bytes
def int_to_bytes(x: int) -> bytes:
return x.to_bytes((x.bit_length() + 7) // 8, 'big')
def int_from_bytes(xbytes: bytes) -> int:
return int.from_bytes(xbytes, 'big')
因此,x == int_from_bytes(int_to_bytes(x))
。请注意,此编码仅适用于无符号(非负)整数。
From python 3.2 you can do
>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'
https://docs.python.org/3/library/stdtypes.html#int.to_bytes
def int_to_bytes(x: int) -> bytes:
return x.to_bytes((x.bit_length() + 7) // 8, 'big')
def int_from_bytes(xbytes: bytes) -> int:
return int.from_bytes(xbytes, 'big')
Accordingly, x == int_from_bytes(int_to_bytes(x))
.
Note that the above encoding works only for unsigned (non-negative) integers.
For signed integers, the bit length is a bit more tricky to calculate:
def int_to_bytes(number: int) -> bytes:
return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)
def int_from_bytes(binary_data: bytes) -> Optional[int]:
return int.from_bytes(binary_data, byteorder='big', signed=True)
回答 2
您可以使用结构包:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
“>”是字节顺序(big-endian),而“ I”是格式字符。因此,如果您要执行其他操作,则可以具体说明:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
这在python 2和python 3上都相同。
注意:逆运算(字节到int)可以使用unpack完成。
You can use the struct’s pack:
In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'
The “>” is the byte-order (big-endian) and the “I” is the format character. So you can be specific if you want to do something else:
In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'
In [13]: struct.pack("B", 1)
Out[13]: '\x01'
This works the same on both python 2 and python 3.
Note: the inverse operation (bytes to int) can be done with unpack.
回答 3
回答 4
该文档说:
bytes(int) -> bytes object of size given by the parameter
initialized with null bytes
序列:
b'3\r\n'
它是字符“ \ r”(13)和“ \ n”(10)的字符“ 3”(十进制51)。
因此,该方式将这样对待它,例如:
>>> bytes([51, 13, 10])
b'3\r\n'
>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'
>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'
在IPython 1.1.0和Python 3.2.3上测试
The documentation says:
bytes(int) -> bytes object of size given by the parameter
initialized with null bytes
The sequence:
b'3\r\n'
It is the character ‘3’ (decimal 51) the character ‘\r’ (13) and ‘\n’ (10).
Therefore, the way would treat it as such, for example:
>>> bytes([51, 13, 10])
b'3\r\n'
>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'
>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'
Tested on IPython 1.1.0 & Python 3.2.3
回答 5
3的ASCII "\x33"
否"\x03"
!
这就是python所做的事情,str(3)
但是对于字节来说,这是完全错误的,因为应该将它们视为二进制数据的数组,而不应将其当作字符串来使用。
实现所需内容最简单的方法是bytes((3,))
,它比bytes([3])
因为初始化列表的开销要大得多,因此更好,因此,在可以使用元组时不要使用列表。您可以使用转换较大的整数int.to_bytes(3, "little")
。
初始化具有给定长度的字节是有意义的,并且是最有用的,因为它们通常用于创建某种类型的缓冲区,您需要为其分配一些给定大小的内存。我在初始化数组或通过向其写入零来扩展某些文件时经常使用它。
The ASCIIfication of 3 is "\x33"
not "\x03"
!
That is what python does for str(3)
but it would be totally wrong for bytes, as they should be considered arrays of binary data and not be abused as strings.
The most easy way to achieve what you want is bytes((3,))
, which is better than bytes([3])
because initializing a list is much more expensive, so never use lists when you can use tuples. You can convert bigger integers by using int.to_bytes(3, "little")
.
Initializing bytes with a given length makes sense and is the most useful, as they are often used to create some type of buffer for which you need some memory of given size allocated. I often use this when initializing arrays or expanding some file by writing zeros to it.
回答 6
int
(包括Python2的 long
)可以bytes
使用以下函数转换为:
import codecs
def int2bytes(i):
hex_value = '{0:x}'.format(i)
# make length of hex_value a multiple of two
hex_value = '0' * (len(hex_value) % 2) + hex_value
return codecs.decode(hex_value, 'hex_codec')
反向转换可以由另一种完成:
import codecs
import six # should be installed via 'pip install six'
long = six.integer_types[-1]
def bytes2int(b):
return long(codecs.encode(b, 'hex_codec'), 16)
这两个函数都可以在Python2和Python3上使用。
int
(including Python2’s long
) can be converted to bytes
using following function:
import codecs
def int2bytes(i):
hex_value = '{0:x}'.format(i)
# make length of hex_value a multiple of two
hex_value = '0' * (len(hex_value) % 2) + hex_value
return codecs.decode(hex_value, 'hex_codec')
The reverse conversion can be done by another one:
import codecs
import six # should be installed via 'pip install six'
long = six.integer_types[-1]
def bytes2int(b):
return long(codecs.encode(b, 'hex_codec'), 16)
Both functions work on both Python2 and Python3.
回答 7
我很好奇范围内单个int的各种方法的性能[0, 255]
,因此我决定进行一些时序测试。
基于下面的定时,和从从尝试许多不同的值和结构中观察到的总的趋势,struct.pack
似乎是最快,其次int.to_bytes
,bytes
和与str.encode
(勿庸置疑)是最慢的。请注意,结果显示出比所显示的更多的变化,int.to_bytes
并且bytes
有时在测试过程中切换速度排名,但struct.pack
显然是最快的。
Windows上CPython 3.7的结果:
Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop
测试模块(名为int_to_byte.py
):
"""Functions for converting a single int to a bytes object with that int's value."""
import random
import shlex
import struct
import timeit
def bytes_(i):
"""From Tim Pietzcker's answer:
https://stackoverflow.com/a/21017834/8117067
"""
return bytes([i])
def to_bytes(i):
"""From brunsgaard's answer:
https://stackoverflow.com/a/30375198/8117067
"""
return i.to_bytes(1, byteorder='big')
def struct_pack(i):
"""From Andy Hayden's answer:
https://stackoverflow.com/a/26920966/8117067
"""
return struct.pack('B', i)
# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067
def chr_encode(i):
"""Another method, from Quuxplusone's answer here:
https://codereview.stackexchange.com/a/210789/140921
Similar to g10guang's answer:
https://stackoverflow.com/a/51558790/8117067
"""
return chr(i).encode('latin1')
converters = [bytes_, to_bytes, struct_pack, chr_encode]
def one_byte_equality_test():
"""Test that results are identical for ints in the range [0, 255]."""
for i in range(256):
results = [c(i) for c in converters]
# Test that all results are equal
start = results[0]
if any(start != b for b in results):
raise ValueError(results)
def timing_tests(value=None):
"""Test each of the functions with a random int."""
if value is None:
# random.randint takes more time than int to byte conversion
# so it can't be a part of the timeit call
value = random.randint(0, 255)
print(f'Testing with {value}:')
for c in converters:
print(f'{c.__name__}: ', end='')
# Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
timeit.main(args=shlex.split(
f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
f"'{c.__name__}(value)'"
))
I was curious about performance of various methods for a single int in the range [0, 255]
, so I decided to do some timing tests.
Based on the timings below, and from the general trend I observed from trying many different values and configurations, struct.pack
seems to be the fastest, followed by int.to_bytes
, bytes
, and with str.encode
(unsurprisingly) being the slowest. Note that the results show some more variation than is represented, and int.to_bytes
and bytes
sometimes switched speed ranking during testing, but struct.pack
is clearly the fastest.
Results in CPython 3.7 on Windows:
Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop
Test module (named int_to_byte.py
):
"""Functions for converting a single int to a bytes object with that int's value."""
import random
import shlex
import struct
import timeit
def bytes_(i):
"""From Tim Pietzcker's answer:
https://stackoverflow.com/a/21017834/8117067
"""
return bytes([i])
def to_bytes(i):
"""From brunsgaard's answer:
https://stackoverflow.com/a/30375198/8117067
"""
return i.to_bytes(1, byteorder='big')
def struct_pack(i):
"""From Andy Hayden's answer:
https://stackoverflow.com/a/26920966/8117067
"""
return struct.pack('B', i)
# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067
def chr_encode(i):
"""Another method, from Quuxplusone's answer here:
https://codereview.stackexchange.com/a/210789/140921
Similar to g10guang's answer:
https://stackoverflow.com/a/51558790/8117067
"""
return chr(i).encode('latin1')
converters = [bytes_, to_bytes, struct_pack, chr_encode]
def one_byte_equality_test():
"""Test that results are identical for ints in the range [0, 255]."""
for i in range(256):
results = [c(i) for c in converters]
# Test that all results are equal
start = results[0]
if any(start != b for b in results):
raise ValueError(results)
def timing_tests(value=None):
"""Test each of the functions with a random int."""
if value is None:
# random.randint takes more time than int to byte conversion
# so it can't be a part of the timeit call
value = random.randint(0, 255)
print(f'Testing with {value}:')
for c in converters:
print(f'{c.__name__}: ', end='')
# Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
timeit.main(args=shlex.split(
f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
f"'{c.__name__}(value)'"
))
回答 8
尽管brunsgaard的先前答案是有效的编码,但它仅适用于无符号整数。这是它的基础,可同时用于有符号和无符号整数。
def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
return i.to_bytes(length, byteorder='big', signed=signed)
def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
return int.from_bytes(b, byteorder='big', signed=signed)
# Test unsigned:
for i in range(1025):
assert i == bytes_to_int(int_to_bytes(i))
# Test signed:
for i in range(-1024, 1025):
assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)
对于编码器,不仅(i + ((i * signed) < 0)).bit_length()
要使用编码器,i.bit_length()
因为后者会导致-128,-32768等的无效编码。
图片来源:CervEd,用于解决效率低下的问题。
Although the prior answer by brunsgaard is an efficient encoding, it works only for unsigned integers. This one builds upon it to work for both signed and unsigned integers.
def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
return i.to_bytes(length, byteorder='big', signed=signed)
def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
return int.from_bytes(b, byteorder='big', signed=signed)
# Test unsigned:
for i in range(1025):
assert i == bytes_to_int(int_to_bytes(i))
# Test signed:
for i in range(-1024, 1025):
assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)
For the encoder, (i + ((i * signed) < 0)).bit_length()
is used instead of just i.bit_length()
because the latter leads to an inefficient encoding of -128, -32768, etc.
Credit: CervEd for fixing a minor inefficiency.
回答 9
该行为来自以下事实:在版本3之前的Python中,bytes
它只是的别名str
。在Python3.x中bytes
是的不可变版本bytearray
-全新类型,不向后兼容。
The behaviour comes from the fact that in Python prior to version 3 bytes
was just an alias for str
. In Python3.x bytes
is an immutable version of bytearray
– completely new type, not backwards compatible.
回答 10
从字节文档:
因此,构造函数参数被解释为针对bytearray()。
然后,从bytearray docs:
可选的source参数可以通过几种不同的方式用于初始化数组:
- 如果它是整数,则数组将具有该大小,并将使用空字节初始化。
请注意,这与2.x(其中x> = 6)行为不同,其中bytes
只是str
:
>>> bytes is str
True
PEP 3112:
2.6 str与3.0的字节类型在各种方面有所不同。最值得注意的是,构造函数完全不同。
From bytes docs:
Accordingly, constructor arguments are interpreted as for bytearray().
Then, from bytearray docs:
The optional source parameter can be used to initialize the array in a few different ways:
- If it is an integer, the array will have that size and will be initialized with null bytes.
Note, that differs from 2.x (where x >= 6) behavior, where bytes
is simply str
:
>>> bytes is str
True
PEP 3112:
The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different.
回答 11
有些答案不能大量使用。
将整数转换为十六进制表示形式,然后将其转换为字节:
def int_to_bytes(number):
hrepr = hex(number).replace('0x', '')
if len(hrepr) % 2 == 1:
hrepr = '0' + hrepr
return bytes.fromhex(hrepr)
结果:
>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
Some answers don’t work with large numbers.
Convert integer to the hex representation, then convert it to bytes:
def int_to_bytes(number):
hrepr = hex(number).replace('0x', '')
if len(hrepr) % 2 == 1:
hrepr = '0' + hrepr
return bytes.fromhex(hrepr)
Result:
>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'
回答 12