问题:在python中将字符串转换为二进制
我需要一种方法来获取python中字符串的二进制表示形式。例如
st = "hello world"
toBinary(st)
是否有一些巧妙的方法来做到这一点?
I am in need of a way to get the binary representation of a string in python. e.g.
st = "hello world"
toBinary(st)
Is there a module of some neat way of doing this?
回答 0
像这样吗
>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
Something like this?
>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
回答 1
作为一种更pythonic的方式,您可以先将字符串转换为字节数组,然后在其中使用bin
function map
:
>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
或者您可以加入它:
>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
请注意,在python3中,您需要为bytearray
function 指定编码:
>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
您也可以binascii
在python 2中使用模块:
>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
hexlify
返回二进制数据的十六进制表示形式,然后可以通过将16指定为基数将其转换为int,然后使用转换为int bin
。
As a more pythonic way you can first convert your string to byte array then use bin
function within map
:
>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
Or you can join it:
>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
Note that in python3 you need to specify an encoding for bytearray
function :
>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
You can also use binascii
module in python 2:
>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
hexlify
return the hexadecimal representation of the binary data then you can convert to int by specifying 16 as its base then convert it to binary with bin
.
回答 2
我们只需要对其编码。
'string'.encode('ascii')
We just need to encode it.
'string'.encode('ascii')
回答 3
您可以使用ord()
内置函数访问字符串中字符的代码值。如果然后需要以二进制格式设置此格式,则该string.format()
方法将完成此工作。
a = "test"
print(' '.join(format(ord(x), 'b') for x in a))
(感谢Ashwini Chaudhary发布了该代码段。)
尽管以上代码在Python 3中有效,但是如果您假设使用除UTF-8之外的任何其他编码,则此问题将变得更加复杂。在Python 2中,字符串是字节序列,默认情况下采用ASCII编码。在Python 3中,字符串被假定为Unicode,并且还有一个单独的bytes
类型,其行为更像Python 2字符串。如果您希望采用UTF-8以外的任何其他编码,则需要指定编码。
然后,在Python 3中,您可以执行以下操作:
a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))
对于简单的字母数字字符串,UTF-8和ascii编码之间的区别不会很明显,但是如果您要处理包含不在ascii字符集中的字符的文本,它将变得很重要。
You can access the code values for the characters in your string using the ord()
built-in function. If you then need to format this in binary, the string.format()
method will do the job.
a = "test"
print(' '.join(format(ord(x), 'b') for x in a))
(Thanks to Ashwini Chaudhary for posting that code snippet.)
While the above code works in Python 3, this matter gets more complicated if you’re assuming any encoding other than UTF-8. In Python 2, strings are byte sequences, and ASCII encoding is assumed by default. In Python 3, strings are assumed to be Unicode, and there’s a separate bytes
type that acts more like a Python 2 string. If you wish to assume any encoding other than UTF-8, you’ll need to specify the encoding.
In Python 3, then, you can do something like this:
a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))
The differences between UTF-8 and ascii encoding won’t be obvious for simple alphanumeric strings, but will become important if you’re processing text that includes characters not in the ascii character set.
回答 4
在Python 3.6及更高版本中,您可以使用f-string格式化结果。
str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))
01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
In Python version 3.6 and above you can use f-string to format result.
str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))
01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
The left side of the colon, ord(i), is the actual object whose value
will be formatted and inserted into the output. Using ord() gives you
the base-10 code point for a single str character.
The right hand side of the colon is the format specifier. 08 means
width 8, 0 padded, and the b functions as a sign to output the
resulting number in base 2 (binary).
回答 5
这是对现有答案的更新,该答案已使用bytearray()
并且无法再以这种方式工作:
>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
因为,如上面的链接所述,如果源是字符串,则 还必须提供编码:
>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>
This is an update for the existing answers which used bytearray()
and can not work that way anymore:
>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
Because, as explained in the link above, if the source is a string, you must also give the encoding:
>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>
回答 6
def method_a(sample_string):
binary = ' '.join(format(ord(x), 'b') for x in sample_string)
def method_b(sample_string):
binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))
if __name__ == '__main__':
from timeit import timeit
sample_string = 'Convert this ascii strong to binary.'
print(
timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
)
# 9.564299999998184 2.943955828988692
method_b转换为字节数组的效率更高,因为它进行低级函数调用,而不是手动将每个字符转换为整数,然后将该整数转换为其二进制值。
def method_a(sample_string):
binary = ' '.join(format(ord(x), 'b') for x in sample_string)
def method_b(sample_string):
binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))
if __name__ == '__main__':
from timeit import timeit
sample_string = 'Convert this ascii strong to binary.'
print(
timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
)
# 9.564299999998184 2.943955828988692
method_b is substantially more efficient at converting to a byte array because it makes low level function calls instead of manually transforming every character to an integer, and then converting that integer into its binary value.
回答 7
a = list(input("Enter a string\t: "))
def fun(a):
c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
return c
print(fun(a))
a = list(input("Enter a string\t: "))
def fun(a):
c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
return c
print(fun(a))