在Python 3中将int转换为字节

问题:在Python 3中将int转换为字节

我试图在Python 3中构建此byte对象:

b'3\r\n'

所以我尝试了显而易见的(对我来说),发现了一个奇怪的行为:

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

显然:

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

我无法看到有关为什么字节转换以这种方式阅读文档的任何指示。但是,在此Python问题中,我确实发现了一些有关添加format到字节的惊奇消息(另请参见Python 3字节格式):

http://bugs.python.org/issue3982

现在与byte(int)之类的奇数的相互作用更差,现在返回零

和:

如果bytes(int)返回该int的ASCIIfication,对我来说将更加方便;但老实说,即使是错误也比这种行为要好。(如果我想要这种行为-我从未有过-我宁愿它是一种类方法,就像“ bytes.zeroes(n)”一样被调用。)

有人可以向我解释这种行为的来源吗?

I was trying to build this bytes object in Python 3:

b'3\r\n'

so I tried the obvious (for me), and found a weird behaviour:

>>> bytes(3) + b'\r\n'
b'\x00\x00\x00\r\n'

Apparently:

>>> bytes(10)
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00'

I’ve been unable to see any pointers on why the bytes conversion works this way reading the documentation. However, I did find some surprise messages in this Python issue about adding format to bytes (see also Python 3 bytes formatting):

http://bugs.python.org/issue3982

This interacts even more poorly with oddities like bytes(int) returning zeroes now

and:

It would be much more convenient for me if bytes(int) returned the ASCIIfication of that int; but honestly, even an error would be better than this behavior. (If I wanted this behavior – which I never have – I’d rather it be a classmethod, invoked like “bytes.zeroes(n)”.)

Can someone explain me where this behaviour comes from?


回答 0

那就是它的设计方式-很有道理,因为通常,您将调用bytes一个可迭代而不是单个整数:

>>> bytes([3])
b'\x03'

文档说明这一点,以及文档字符串为bytes

 >>> help(bytes)
 ...
 bytes(int) -> bytes object of size given by the parameter initialized with null bytes

That’s the way it was designed – and it makes sense because usually, you would call bytes on an iterable instead of a single integer:

>>> bytes([3])
b'\x03'

The docs state this, as well as the docstring for bytes:

 >>> help(bytes)
 ...
 bytes(int) -> bytes object of size given by the parameter initialized with null bytes

回答 1

从python 3.2你可以做

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

https://docs.python.org/3/library/stdtypes.html#int.to_bytes

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')

def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

因此,x == int_from_bytes(int_to_bytes(x))。请注意,此编码仅适用于无符号(非负)整数。

From python 3.2 you can do

>>> (1024).to_bytes(2, byteorder='big')
b'\x04\x00'

https://docs.python.org/3/library/stdtypes.html#int.to_bytes

def int_to_bytes(x: int) -> bytes:
    return x.to_bytes((x.bit_length() + 7) // 8, 'big')
    
def int_from_bytes(xbytes: bytes) -> int:
    return int.from_bytes(xbytes, 'big')

Accordingly, x == int_from_bytes(int_to_bytes(x)). Note that the above encoding works only for unsigned (non-negative) integers.

For signed integers, the bit length is a bit more tricky to calculate:

def int_to_bytes(number: int) -> bytes:
    return number.to_bytes(length=(8 + (number + (number < 0)).bit_length()) // 8, byteorder='big', signed=True)

def int_from_bytes(binary_data: bytes) -> Optional[int]:
    return int.from_bytes(binary_data, byteorder='big', signed=True)

回答 2

您可以使用结构包

In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'

“>”是字节顺序(big-endian),而“ I”是格式字符。因此,如果您要执行其他操作,则可以具体说明:

In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'

In [13]: struct.pack("B", 1)
Out[13]: '\x01'

这在python 2和python 3上都相同。

注意:逆运算(字节到int)可以使用unpack完成。

You can use the struct’s pack:

In [11]: struct.pack(">I", 1)
Out[11]: '\x00\x00\x00\x01'

The “>” is the byte-order (big-endian) and the “I” is the format character. So you can be specific if you want to do something else:

In [12]: struct.pack("<H", 1)
Out[12]: '\x01\x00'

In [13]: struct.pack("B", 1)
Out[13]: '\x01'

This works the same on both python 2 and python 3.

Note: the inverse operation (bytes to int) can be done with unpack.


回答 3

Python 3.5+ printf为字节引入了%插值(-style格式)

>>> b'%d\r\n' % 3
b'3\r\n'

请参阅PEP 0461-将%格式添加到字节和字节数组

在早期版本中,您可以使用str.encode('ascii')结果:

>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'

注意:它与产生的东西int.to_bytes不同:

>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != '\x03'
True

Python 3.5+ introduces %-interpolation (printf-style formatting) for bytes:

>>> b'%d\r\n' % 3
b'3\r\n'

See PEP 0461 — Adding % formatting to bytes and bytearray.

On earlier versions, you could use str and .encode('ascii') the result:

>>> s = '%d\r\n' % 3
>>> s.encode('ascii')
b'3\r\n'

Note: It is different from what int.to_bytes produces:

>>> n = 3
>>> n.to_bytes((n.bit_length() + 7) // 8, 'big') or b'\0'
b'\x03'
>>> b'3' == b'\x33' != '\x03'
True

回答 4

该文档说:

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

序列:

b'3\r\n'

它是字符“ \ r”(13)和“ \ n”(10)的字符“ 3”(十进制51)。

因此,该方式将这样对待它,例如:

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

在IPython 1.1.0和Python 3.2.3上测试

The documentation says:

bytes(int) -> bytes object of size given by the parameter
              initialized with null bytes

The sequence:

b'3\r\n'

It is the character ‘3’ (decimal 51) the character ‘\r’ (13) and ‘\n’ (10).

Therefore, the way would treat it as such, for example:

>>> bytes([51, 13, 10])
b'3\r\n'

>>> bytes('3', 'utf8') + b'\r\n'
b'3\r\n'

>>> n = 3
>>> bytes(str(n), 'ascii') + b'\r\n'
b'3\r\n'

Tested on IPython 1.1.0 & Python 3.2.3


回答 5

3的ASCII "\x33""\x03"

这就是python所做的事情,str(3)但是对于字节来说,这是完全错误的,因为应该将它们视为二进制数据的数组,而不应将其当作字符串来使用。

实现所需内容最简单的方法是bytes((3,)),它比bytes([3])因为初始化列表的开销要大得多,因此更好,因此,在可以使用元组时不要使用列表。您可以使用转换较大的整数int.to_bytes(3, "little")

初始化具有给定长度的字节是有意义的,并且是最有用的,因为它们通常用于创建某种类型的缓冲区,您需要为其分配一些给定大小的内存。我在初始化数组或通过向其写入零来扩展某些文件时经常使用它。

The ASCIIfication of 3 is "\x33" not "\x03"!

That is what python does for str(3) but it would be totally wrong for bytes, as they should be considered arrays of binary data and not be abused as strings.

The most easy way to achieve what you want is bytes((3,)), which is better than bytes([3]) because initializing a list is much more expensive, so never use lists when you can use tuples. You can convert bigger integers by using int.to_bytes(3, "little").

Initializing bytes with a given length makes sense and is the most useful, as they are often used to create some type of buffer for which you need some memory of given size allocated. I often use this when initializing arrays or expanding some file by writing zeros to it.


回答 6

int (包括Python2的 long)可以bytes使用以下函数转换为:

import codecs

def int2bytes(i):
    hex_value = '{0:x}'.format(i)
    # make length of hex_value a multiple of two
    hex_value = '0' * (len(hex_value) % 2) + hex_value
    return codecs.decode(hex_value, 'hex_codec')

反向转换可以由另一种完成:

import codecs
import six  # should be installed via 'pip install six'

long = six.integer_types[-1]

def bytes2int(b):
    return long(codecs.encode(b, 'hex_codec'), 16)

这两个函数都可以在Python2和Python3上使用。

int (including Python2’s long) can be converted to bytes using following function:

import codecs

def int2bytes(i):
    hex_value = '{0:x}'.format(i)
    # make length of hex_value a multiple of two
    hex_value = '0' * (len(hex_value) % 2) + hex_value
    return codecs.decode(hex_value, 'hex_codec')

The reverse conversion can be done by another one:

import codecs
import six  # should be installed via 'pip install six'

long = six.integer_types[-1]

def bytes2int(b):
    return long(codecs.encode(b, 'hex_codec'), 16)

Both functions work on both Python2 and Python3.


回答 7

我很好奇范围内单个int的各种方法的性能[0, 255],因此我决定进行一些时序测试。

基于下面的定时,和从从尝试许多不同的值和结构中观察到的总的趋势,struct.pack似乎是最快,其次int.to_bytesbytes和与str.encode(勿庸置疑)是最慢的。请注意,结果显示出比所显示的更多的变化,int.to_bytes并且bytes有时在测试过程中切换速度排名,但struct.pack显然是最快的。

Windows上CPython 3.7的结果:

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

测试模块(名为int_to_byte.py):

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://stackoverflow.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://stackoverflow.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://stackoverflow.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921

    Similar to g10guang's answer:
    https://stackoverflow.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with {value}:')
    for c in converters:
        print(f'{c.__name__}: ', end='')
        # Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
            f"'{c.__name__}(value)'"
        ))

I was curious about performance of various methods for a single int in the range [0, 255], so I decided to do some timing tests.

Based on the timings below, and from the general trend I observed from trying many different values and configurations, struct.pack seems to be the fastest, followed by int.to_bytes, bytes, and with str.encode (unsurprisingly) being the slowest. Note that the results show some more variation than is represented, and int.to_bytes and bytes sometimes switched speed ranking during testing, but struct.pack is clearly the fastest.

Results in CPython 3.7 on Windows:

Testing with 63:
bytes_: 100000 loops, best of 5: 3.3 usec per loop
to_bytes: 100000 loops, best of 5: 2.72 usec per loop
struct_pack: 100000 loops, best of 5: 2.32 usec per loop
chr_encode: 50000 loops, best of 5: 3.66 usec per loop

Test module (named int_to_byte.py):

"""Functions for converting a single int to a bytes object with that int's value."""

import random
import shlex
import struct
import timeit

def bytes_(i):
    """From Tim Pietzcker's answer:
    https://stackoverflow.com/a/21017834/8117067
    """
    return bytes([i])

def to_bytes(i):
    """From brunsgaard's answer:
    https://stackoverflow.com/a/30375198/8117067
    """
    return i.to_bytes(1, byteorder='big')

def struct_pack(i):
    """From Andy Hayden's answer:
    https://stackoverflow.com/a/26920966/8117067
    """
    return struct.pack('B', i)

# Originally, jfs's answer was considered for testing,
# but the result is not identical to the other methods
# https://stackoverflow.com/a/31761722/8117067

def chr_encode(i):
    """Another method, from Quuxplusone's answer here:
    https://codereview.stackexchange.com/a/210789/140921

    Similar to g10guang's answer:
    https://stackoverflow.com/a/51558790/8117067
    """
    return chr(i).encode('latin1')

converters = [bytes_, to_bytes, struct_pack, chr_encode]

def one_byte_equality_test():
    """Test that results are identical for ints in the range [0, 255]."""
    for i in range(256):
        results = [c(i) for c in converters]
        # Test that all results are equal
        start = results[0]
        if any(start != b for b in results):
            raise ValueError(results)

def timing_tests(value=None):
    """Test each of the functions with a random int."""
    if value is None:
        # random.randint takes more time than int to byte conversion
        # so it can't be a part of the timeit call
        value = random.randint(0, 255)
    print(f'Testing with {value}:')
    for c in converters:
        print(f'{c.__name__}: ', end='')
        # Uses technique borrowed from https://stackoverflow.com/q/19062202/8117067
        timeit.main(args=shlex.split(
            f"-s 'from int_to_byte import {c.__name__}; value = {value}' " +
            f"'{c.__name__}(value)'"
        ))

回答 8

尽管brunsgaard的先前答案是有效的编码,但它仅适用于无符号整数。这是它的基础,可同时用于有符号和无符号整数。

def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
    length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
    return i.to_bytes(length, byteorder='big', signed=signed)

def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
    return int.from_bytes(b, byteorder='big', signed=signed)

# Test unsigned:
for i in range(1025):
    assert i == bytes_to_int(int_to_bytes(i))

# Test signed:
for i in range(-1024, 1025):
    assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)

对于编码器,不仅(i + ((i * signed) < 0)).bit_length()要使用编码器,i.bit_length()因为后者会导致-128,-32768等的无效编码。


图片来源:CervEd,用于解决效率低下的问题。

Although the prior answer by brunsgaard is an efficient encoding, it works only for unsigned integers. This one builds upon it to work for both signed and unsigned integers.

def int_to_bytes(i: int, *, signed: bool = False) -> bytes:
    length = ((i + ((i * signed) < 0)).bit_length() + 7 + signed) // 8
    return i.to_bytes(length, byteorder='big', signed=signed)

def bytes_to_int(b: bytes, *, signed: bool = False) -> int:
    return int.from_bytes(b, byteorder='big', signed=signed)

# Test unsigned:
for i in range(1025):
    assert i == bytes_to_int(int_to_bytes(i))

# Test signed:
for i in range(-1024, 1025):
    assert i == bytes_to_int(int_to_bytes(i, signed=True), signed=True)

For the encoder, (i + ((i * signed) < 0)).bit_length() is used instead of just i.bit_length() because the latter leads to an inefficient encoding of -128, -32768, etc.


Credit: CervEd for fixing a minor inefficiency.


回答 9

该行为来自以下事实:在版本3之前的Python中,bytes它只是的别名str。在Python3.x中bytes是的不可变版本bytearray-全新类型,不向后兼容。

The behaviour comes from the fact that in Python prior to version 3 bytes was just an alias for str. In Python3.x bytes is an immutable version of bytearray – completely new type, not backwards compatible.


回答 10

字节文档

因此,构造函数参数被解释为针对bytearray()。

然后,从bytearray docs

可选的source参数可以通过几种不同的方式用于初始化数组:

  • 如果它是整数,则数组将具有该大小,并将使用空字节初始化。

请注意,这与2.x(其中x> = 6)行为不同,其中bytes只是str

>>> bytes is str
True

PEP 3112

2.6 str与3.0的字节类型在各种方面有所不同。最值得注意的是,构造函数完全不同。

From bytes docs:

Accordingly, constructor arguments are interpreted as for bytearray().

Then, from bytearray docs:

The optional source parameter can be used to initialize the array in a few different ways:

  • If it is an integer, the array will have that size and will be initialized with null bytes.

Note, that differs from 2.x (where x >= 6) behavior, where bytes is simply str:

>>> bytes is str
True

PEP 3112:

The 2.6 str differs from 3.0’s bytes type in various ways; most notably, the constructor is completely different.


回答 11

有些答案不能大量使用。

将整数转换为十六进制表示形式,然后将其转换为字节:

def int_to_bytes(number):
    hrepr = hex(number).replace('0x', '')
    if len(hrepr) % 2 == 1:
        hrepr = '0' + hrepr
    return bytes.fromhex(hrepr)

结果:

>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

Some answers don’t work with large numbers.

Convert integer to the hex representation, then convert it to bytes:

def int_to_bytes(number):
    hrepr = hex(number).replace('0x', '')
    if len(hrepr) % 2 == 1:
        hrepr = '0' + hrepr
    return bytes.fromhex(hrepr)

Result:

>>> int_to_bytes(2**256 - 1)
b'\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff\xff'

回答 12

如果问题是如何将整数本身(而不是等效的字符串)转换为字节,我认为可靠的答案是:

>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5

有关这些方法的更多信息,请参见:

  1. https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes
  2. https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes

If the question is how to convert an integer itself (not its string equivalent) into bytes, I think the robust answer is:

>>> i = 5
>>> i.to_bytes(2, 'big')
b'\x00\x05'
>>> int.from_bytes(i.to_bytes(2, 'big'), byteorder='big')
5

More information on these methods here:

  1. https://docs.python.org/3.8/library/stdtypes.html#int.to_bytes
  2. https://docs.python.org/3.8/library/stdtypes.html#int.from_bytes