# 快速计数正整数中的非零位的方法

## 问题：快速计数正整数中的非零位的方法

``bin(n).count("1")``

PS ：（我将一个大型2D二进制数组表示为一个数字列表并进行按位运算，这将时间从几小时缩短为几分钟。现在，我想摆脱那些多余的分钟。

``12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L``

I need a fast way to count the number of bits in an integer in python. My current solution is

``````bin(n).count("1")
``````

but I am wondering if there is any faster way of doing this?

PS: (i am representing a big 2D binary array as a single list of numbers and doing bitwise operations, and that brings the time down from hours to minutes. and now I would like to get rid of those extra minutes.

Edit: 1. it has to be in python 2.7 or 2.6

and optimizing for small numbers does not matter that much since that would not be a clear bottleneck, but I do have numbers with 10 000 + bits at some places

for example this is a 2000 bit case:

``````12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L
``````

## 回答 0

``counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray``

``````counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')``````

For arbitrary-length integers, `bin(n).count("1")` is the fastest I could find in pure Python.

I tried adapting Óscar’s and Adam’s solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than `bin(n).count("1")` (the 32-bit version took about half again as much time).

On the other hand, gmpy `popcount()` took about 1/20th of the time of `bin(n).count("1")`. So if you can install gmpy, use that.

To answer a question in the comments, for bytes I’d use a lookup table. You can generate it at runtime:

``````counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray
``````

Or just define it literally:

``````counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')
``````

Then it’s `counts[x]` to get the number of 1 bits in `x` where 0 ≤ x ≤ 255.

## 回答 1

``````def CountBits(n):
n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
return n``````

You can adapt the following algorithm:

``````def CountBits(n):
n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
return n
``````

This works for 64-bit positive numbers, but it’s easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket’s value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets – n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.

## 回答 2

``````def numberOfSetBits(i):
i = i - ((i >> 1) & 0x55555555)
i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24``````

Here’s a Python implementation of the population count algorithm, as explained in this post:

``````def numberOfSetBits(i):
i = i - ((i >> 1) & 0x55555555)
i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24
``````

It will work for `0 <= i < 0x100000000`.

## 回答 3

``````#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v        & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])``````

## 编辑

`gmpy` 是包装GMP库的C编码Python扩展模块。

``````>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024``````

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don’t mind using about 64KB of memory).

``````#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v        & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])
``````

On Python 2.x you should replace `range` with `xrange`.

## Edit

If you need better performance (and your numbers are big integers), have a look at the `GMP` library. It contains hand-written assembly implementations for many different architectures.

`gmpy` is A C-coded Python extension module that wraps the GMP library.

``````>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024
``````

## 回答 4

``````def get_bit_count(value):
n = 0
while value:
n += 1
value &= value-1
return n``````

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It’s actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

``````def get_bit_count(value):
n = 0
while value:
n += 1
value &= value-1
return n
``````

## 回答 5

``````def count_ones(a):
s = 0
t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
for c in oct(a)[1:]:
s += t[c]
return s``````

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

``````def count_ones(a):
s = 0
t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
for c in oct(a)[1:]:
s += t[c]
return s
``````

## 回答 6

``````setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v        & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633``````

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of `ceil(n/32.)` 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

``````setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
return (POPCOUNT_TABLE16[ v        & 0xffff] +
POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633
``````

Though I’m surprised no one suggested you write a C module.

## 回答 7

``````#Python prg to count set bits
#Function to count set bits
def bin(n):
count=0
while(n>=1):
if(n%2==0):
n=n//2
else:
count+=1
n=n//2
print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)
``````
``````#Python prg to count set bits
#Function to count set bits
def bin(n):
count=0
while(n>=1):
if(n%2==0):
n=n//2
else:
count+=1
n=n//2
print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)

``````

## 回答 8

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.

The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: `[numberofsetbits(n) for n in range(MAXINT)]`

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti’s answer for a good implementation of this. Of course, you don’t have to keep this all in memory – you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).

# 在python中将字符串转换为二进制

## 问题：在python中将字符串转换为二进制

``````st = "hello world"
toBinary(st)
``````

I am in need of a way to get the binary representation of a string in python. e.g.

``````st = "hello world"
toBinary(st)
``````

Is there a module of some neat way of doing this?

## 回答 0

``````>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'

#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
``````

Something like this?

``````>>> st = "hello world"
>>> ' '.join(format(ord(x), 'b') for x in st)
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'

#using `bytearray`
>>> ' '.join(format(x, 'b') for x in bytearray(st, 'utf-8'))
'1101000 1100101 1101100 1101100 1101111 100000 1110111 1101111 1110010 1101100 1100100'
``````

## 回答 1

``````>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
``````

``````>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
``````

``````>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
``````

``````>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
``````

`hexlify`返回二进制数据的十六进制表示形式，然后可以通过将16指定为基数将其转换为int，然后使用转换为int `bin`

As a more pythonic way you can first convert your string to byte array then use `bin` function within `map` :

``````>>> st = "hello world"
>>> map(bin,bytearray(st))
['0b1101000', '0b1100101', '0b1101100', '0b1101100', '0b1101111', '0b100000', '0b1110111', '0b1101111', '0b1110010', '0b1101100', '0b1100100']
``````

Or you can join it:

``````>>> ' '.join(map(bin,bytearray(st)))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
``````

Note that in python3 you need to specify an encoding for `bytearray` function :

``````>>> ' '.join(map(bin,bytearray(st,'utf8')))
'0b1101000 0b1100101 0b1101100 0b1101100 0b1101111 0b100000 0b1110111 0b1101111 0b1110010 0b1101100 0b1100100'
``````

You can also use `binascii` module in python 2:

``````>>> import binascii
>>> bin(int(binascii.hexlify(st),16))
'0b110100001100101011011000110110001101111001000000111011101101111011100100110110001100100'
``````

`hexlify` return the hexadecimal representation of the binary data then you can convert to int by specifying 16 as its base then convert it to binary with `bin`.

## 回答 2

``'string'.encode('ascii')``

We just need to encode it.

``````'string'.encode('ascii')
``````

## 回答 3

``````a = "test"
print(' '.join(format(ord(x), 'b') for x in a))
``````

（感谢Ashwini Chaudhary发布了该代码段。）

``````a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))
``````

You can access the code values for the characters in your string using the `ord()` built-in function. If you then need to format this in binary, the `string.format()` method will do the job.

``````a = "test"
print(' '.join(format(ord(x), 'b') for x in a))
``````

(Thanks to Ashwini Chaudhary for posting that code snippet.)

While the above code works in Python 3, this matter gets more complicated if you’re assuming any encoding other than UTF-8. In Python 2, strings are byte sequences, and ASCII encoding is assumed by default. In Python 3, strings are assumed to be Unicode, and there’s a separate `bytes` type that acts more like a Python 2 string. If you wish to assume any encoding other than UTF-8, you’ll need to specify the encoding.

In Python 3, then, you can do something like this:

``````a = "test"
a_bytes = bytes(a, "ascii")
print(' '.join(["{0:b}".format(x) for x in a_bytes]))
``````

The differences between UTF-8 and ascii encoding won’t be obvious for simple alphanumeric strings, but will become important if you’re processing text that includes characters not in the ascii character set.

## 回答 4

``````str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))

01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
``````
• 冒号的左侧ord（i）是实际对象，其值将被格式化并插入到输出中。使用ord（）可为您提供单个str字符的以10为底的代码点。

• 冒号的右侧是格式说明符。08表示宽度8，填充0，b表示输出以2为底的数字（二进制​​）的符号。

In Python version 3.6 and above you can use f-string to format result.

``````str = "hello world"
print(" ".join(f"{ord(i):08b}" for i in str))

01101000 01100101 01101100 01101100 01101111 00100000 01110111 01101111 01110010 01101100 01100100
``````
• The left side of the colon, ord(i), is the actual object whose value will be formatted and inserted into the output. Using ord() gives you the base-10 code point for a single str character.

• The right hand side of the colon is the format specifier. 08 means width 8, 0 padded, and the b functions as a sign to output the resulting number in base 2 (binary).

## 回答 5

``````>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
``````

``````>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>
``````

This is an update for the existing answers which used `bytearray()` and can not work that way anymore:

``````>>> st = "hello world"
>>> map(bin, bytearray(st))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: string argument without an encoding
``````

Because, as explained in the link above, if the source is a string, you must also give the encoding:

``````>>> map(bin, bytearray(st, encoding='utf-8'))
<map object at 0x7f14dfb1ff28>
``````

## 回答 6

``````def method_a(sample_string):
binary = ' '.join(format(ord(x), 'b') for x in sample_string)

def method_b(sample_string):
binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))

if __name__ == '__main__':

from timeit import timeit

sample_string = 'Convert this ascii strong to binary.'

print(
timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
)

# 9.564299999998184 2.943955828988692
``````

method_b转换为字节数组的效率更高，因为它进行低级函数调用，而不是手动将每个字符转换为整数，然后将该整数转换为其二进制值。

``````def method_a(sample_string):
binary = ' '.join(format(ord(x), 'b') for x in sample_string)

def method_b(sample_string):
binary = ' '.join(map(bin,bytearray(sample_string,encoding='utf-8')))

if __name__ == '__main__':

from timeit import timeit

sample_string = 'Convert this ascii strong to binary.'

print(
timeit(f'method_a("{sample_string}")',setup='from __main__ import method_a'),
timeit(f'method_b("{sample_string}")',setup='from __main__ import method_b')
)

# 9.564299999998184 2.943955828988692
``````

method_b is substantially more efficient at converting to a byte array because it makes low level function calls instead of manually transforming every character to an integer, and then converting that integer into its binary value.

## 回答 7

``````a = list(input("Enter a string\t: "))
def fun(a):
c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
return c
print(fun(a))``````
``````a = list(input("Enter a string\t: "))
def fun(a):
c =' '.join(['0'*(8-len(bin(ord(i))[2:]))+(bin(ord(i))[2:]) for i in a])
return c
print(fun(a))
``````

# 如何在python中检测文件是否为二进制（非文本）？

## 问题：如何在python中检测文件是否为二进制（非文本）？

How can I tell if a file is binary (non-text) in Python?

I am searching through a large set of files in Python, and keep getting matches in binary files. This makes the output look incredibly messy.

I know I could use `grep -I`, but I am doing more with the data than what grep allows for.

In the past, I would have just searched for characters greater than `0x7f`, but `utf8` and the like, make that impossible on modern systems. Ideally, the solution would be fast.

## 回答 0

``````import mimetypes
...
mime = mimetypes.guess_type(file)``````

You can also use the mimetypes module:

``````import mimetypes
...
mime = mimetypes.guess_type(file)
``````

It’s fairly easy to compile a list of binary mime types. For example Apache distributes with a mime.types file that you could parse into a set of lists, binary and text and then check to see if the mime is in your text or binary list.

## 回答 1

``````>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
>>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))``````

``````>>> is_binary_string(open('/usr/bin/python', 'rb').read(1024))
True
False``````

Yet another method based on file(1) behavior:

``````>>> textchars = bytearray({7,8,9,10,12,13,27} | set(range(0x20, 0x100)) - {0x7f})
>>> is_binary_string = lambda bytes: bool(bytes.translate(None, textchars))
``````

Example:

``````>>> is_binary_string(open('/usr/bin/python', 'rb').read(1024))
True
False
``````

## 回答 2

``````try:
with open(filename, "r") as f:
for l in f:
process_line(l)
except UnicodeDecodeError:
pass # Fond non-text data``````

If you’re using python3 with utf-8 it is straight forward, just open the file in text mode and stop processing if you get an `UnicodeDecodeError`. Python3 will use unicode when handling files in text mode (and bytearray in binary mode) – if your encoding can’t decode arbitrary files it’s quite likely that you will get `UnicodeDecodeError`.

Example:

``````try:
with open(filename, "r") as f:
for l in f:
process_line(l)
except UnicodeDecodeError:
pass # Fond non-text data
``````

## 回答 3

If it helps, many many binary types begin with a magic numbers. Here is a list of file signatures.

## 回答 4

``````def is_binary(filename):
"""Return true if the given filename is binary.
@raise EnvironmentError: if the file does not exist or cannot be accessed.
@attention: found @ http://bytes.com/topic/python/answers/21222-determine-file-type-binary-text on 6/08/2010
@author: Trent Mick <TrentM@ActiveState.com>
@author: Jorge Orpinel <jorge@orpinel.com>"""
fin = open(filename, 'rb')
try:
CHUNKSIZE = 1024
while 1:
if '\0' in chunk: # found null byte
return True
if len(chunk) < CHUNKSIZE:
break # done
# A-wooo! Mira, python no necesita el "except:". Achis... Que listo es.
finally:
fin.close()

return False``````

Try this:

``````def is_binary(filename):
"""Return true if the given filename is binary.
@raise EnvironmentError: if the file does not exist or cannot be accessed.
@attention: found @ http://bytes.com/topic/python/answers/21222-determine-file-type-binary-text on 6/08/2010
@author: Trent Mick <TrentM@ActiveState.com>
@author: Jorge Orpinel <jorge@orpinel.com>"""
fin = open(filename, 'rb')
try:
CHUNKSIZE = 1024
while 1:
if '\0' in chunk: # found null byte
return True
if len(chunk) < CHUNKSIZE:
break # done
# A-wooo! Mira, python no necesita el "except:". Achis... Que listo es.
finally:
fin.close()

return False
``````

## 回答 5

``````import re
import subprocess

def istext(path):
return (re.search(r':.* text',
subprocess.Popen(["file", '-L', path],
is not None)``````

```>>> istext（'/ etc / motd'）

>>> istext（'/ vmlinuz'）

>>> open（'/ tmp / japanese'）。read（）
'\ xe3 \ x81 \ x93 \ xe3 \ x82 \ x8c \ xe3 \ x81 \ xaf \ xe3 \ x80 \ x81 \ xe3 \ x81 \ xbf \ xe3 \ x81 \ x9a \ xe3 \ x81 \ x8c \ xe3 \ x82 \ x81 \ xe5 \ xba \ xa7 \ xe3 \ x81 \ xae \ xe6 \ x99 \ x82 \ xe4 \ xbb \ xa3 \ xe3 \ x81 \ xae \ xe5 \ xb9 \ x95 \ xe9 \ x96 \ x8b \ xe3 \ x81 \ x91 \ xe3 \ x80 \ x82 \ n'
>>> istext（'/ tmp / japanese'）＃适用于UTF-8

```

Here’s a suggestion that uses the Unix file command:

``````import re
import subprocess

def istext(path):
return (re.search(r':.* text',
subprocess.Popen(["file", '-L', path],
is not None)
``````

Example usage:

```>>> istext('/etc/motd')
True
>>> istext('/vmlinuz')
False
'\xe3\x81\x93\xe3\x82\x8c\xe3\x81\xaf\xe3\x80\x81\xe3\x81\xbf\xe3\x81\x9a\xe3\x81\x8c\xe3\x82\x81\xe5\xba\xa7\xe3\x81\xae\xe6\x99\x82\xe4\xbb\xa3\xe3\x81\xae\xe5\xb9\x95\xe9\x96\x8b\xe3\x81\x91\xe3\x80\x82\n'
>>> istext('/tmp/japanese') # works on UTF-8
True
```

It has the downsides of not being portable to Windows (unless you have something like the `file` command there), and having to spawn an external process for each file, which might not be palatable.

## 回答 6

Use binaryornot library (GitHub).

It is very simple and based on the code found in this stackoverflow question.

You can actually write this in 2 lines of code, however this package saves you from having to write and thoroughly test those 2 lines of code with all sorts of weird file types, cross-platform.

## 回答 7

Usually you have to guess.

You can look at the extensions as one clue, if the files have them.

You can also recognise know binary formats, and ignore those.

Otherwise see what proportion of non-printable ASCII bytes you have and take a guess from that.

You can also try decoding from UTF-8 and see if that produces sensible output.

## 回答 8

``````def is_binary(filename):
"""
Return true if the given filename appears to be binary.
File is considered to be binary if it contains a NULL byte.
FIXME: This approach incorrectly reports UTF-16 as binary.
"""
with open(filename, 'rb') as f:
for block in f:
if b'\0' in block:
return True
return False``````

A shorter solution, with a UTF-16 warning:

``````def is_binary(filename):
"""
Return true if the given filename appears to be binary.
File is considered to be binary if it contains a NULL byte.
FIXME: This approach incorrectly reports UTF-16 as binary.
"""
with open(filename, 'rb') as f:
for block in f:
if b'\0' in block:
return True
return False
``````

## 回答 9

``````def is_binary(file_name):
try:
with open(file_name, 'tr') as check_file:  # try open file in text mode
return False
except:  # if fail then file is non-text (binary)
return True``````

We can use python itself to check if a file is binary, because it fails if we try to open binary file in text mode

``````def is_binary(file_name):
try:
with open(file_name, 'tr') as check_file:  # try open file in text mode
return False
except:  # if fail then file is non-text (binary)
return True
``````

## 回答 10

If you’re not on Windows, you can use Python Magic to determine the filetype. Then you can check if it is a text/ mime type.

## 回答 11

``````import codecs

#: BOMs to indicate that a file is a text file even if it contains zero bytes.
_TEXT_BOMS = (
codecs.BOM_UTF16_BE,
codecs.BOM_UTF16_LE,
codecs.BOM_UTF32_BE,
codecs.BOM_UTF32_LE,
codecs.BOM_UTF8,
)

def is_binary_file(source_path):
with open(source_path, 'rb') as source_file:
return not any(initial_bytes.startswith(bom) for bom in _TEXT_BOMS) \
and b'\0' in initial_bytes``````

Here’s a function that first checks if the file starts with a BOM and if not looks for a zero byte within the initial 8192 bytes:

``````import codecs

#: BOMs to indicate that a file is a text file even if it contains zero bytes.
_TEXT_BOMS = (
codecs.BOM_UTF16_BE,
codecs.BOM_UTF16_LE,
codecs.BOM_UTF32_BE,
codecs.BOM_UTF32_LE,
codecs.BOM_UTF8,
)

def is_binary_file(source_path):
with open(source_path, 'rb') as source_file:
return not any(initial_bytes.startswith(bom) for bom in _TEXT_BOMS) \
and b'\0' in initial_bytes
``````

Technically the check for the UTF-8 BOM is unnecessary because it should not contain zero bytes for all practical purpose. But as it is a very common encoding it’s quicker to check for the BOM in the beginning instead of scanning all the 8192 bytes for 0.

## 回答 12

mimetypes模块不同，它不使用文件扩展名，而是检查文件的内容。

``````>>> import magic
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
'PDF document, version 1.2'``````

Try using the currently maintained python-magic which is not the same module in @Kami Kisiel’s answer. This does support all platforms including Windows however you will need the `libmagic` binary files. This is explained in the README.

Unlike the mimetypes module, it doesn’t use the file’s extension and instead inspects the contents of the file.

``````>>> import magic
>>> magic.from_file("testdata/test.pdf", mime=True)
'application/pdf'
>>> magic.from_file("testdata/test.pdf")
'PDF document, version 1.2'
'PDF document, version 1.2'
``````

## 回答 13

``````def test_file_isbinary(filename):
cmd = shlex.split("file -b -e soft '{}'".format(filename))
if subprocess.check_output(cmd)[:4] in {'ASCI', 'UTF-'}:
return False
return True``````

I came here looking for exactly the same thing–a comprehensive solution provided by the standard library to detect binary or text. After reviewing the options people suggested, the nix file command looks to be the best choice (I’m only developing for linux boxen). Some others posted solutions using file but they are unnecessarily complicated in my opinion, so here’s what I came up with:

``````def test_file_isbinary(filename):
cmd = shlex.split("file -b -e soft '{}'".format(filename))
if subprocess.check_output(cmd)[:4] in {'ASCI', 'UTF-'}:
return False
return True
``````

It should go without saying, but your code that calls this function should make sure you can read a file before testing it, otherwise this will be mistakenly detect the file as binary.

## 回答 14

``````from mimetypes import guess_type
from mimetypes import add_type

def __init__(self):

def __listDir(self,path):
try:
return listdir(path)
except IOError:
print ("The directory {0} could not be accessed".format(path))

def getTextFiles(self, path):
asciiFiles = []
for files in self.__listDir(path):
if guess_type(files)[0].split("/")[0] == "text":
asciiFiles.append(files)
try:
return asciiFiles
except NameError:
print ("No text files in directory: {0}".format(path))
finally:
del asciiFiles``````

I guess that the best solution is to use the guess_type function. It holds a list with several mimetypes and you can also include your own types. Here come the script that I did to solve my problem:

``````from mimetypes import guess_type
from mimetypes import add_type

def __init__(self):

def __listDir(self,path):
try:
return listdir(path)
except IOError:
print ("The directory {0} could not be accessed".format(path))

def getTextFiles(self, path):
asciiFiles = []
for files in self.__listDir(path):
if guess_type(files)[0].split("/")[0] == "text":
asciiFiles.append(files)
try:
return asciiFiles
except NameError:
print ("No text files in directory: {0}".format(path))
finally:
del asciiFiles
``````

It is inside of a Class, as you can see based on the ustructure of the code. But you can pretty much change the things you want to implement it inside your application. It`s quite simple to use. The method getTextFiles returns a list object with all the text files that resides on the directory you pass in path variable.

## 在* NIX上：

### 如果您有权使用`file`shell命令，shlex可以帮助使子流程模块更可用：

``````from os.path import realpath
from subprocess import check_output
from shlex import split

filepath = realpath('rel/or/abs/path/to/file')
assert 'ascii' in check_output(split('file {}'.format(filepth).lower()))``````

### 或者，您也可以将其置于for循环中，以使用以下命令获取当前目录中所有文件的输出：

``````import os
for afile in [x for x in os.listdir('.') if os.path.isfile(x)]:
assert 'ascii' in check_output(split('file {}'.format(afile).lower()))``````

### 或所有子目录：

``````for curdir, filelist in zip(os.walk('.')[0], os.walk('.')[2]):
for afile in filelist:
assert 'ascii' in check_output(split('file {}'.format(afile).lower()))``````

## on *NIX:

### If you have access to the `file` shell-command, shlex can help make the subprocess module more usable:

``````from os.path import realpath
from subprocess import check_output
from shlex import split

filepath = realpath('rel/or/abs/path/to/file')
assert 'ascii' in check_output(split('file {}'.format(filepth).lower()))
``````

### Or, you could also stick that in a for-loop to get output for all files in the current dir using:

``````import os
for afile in [x for x in os.listdir('.') if os.path.isfile(x)]:
assert 'ascii' in check_output(split('file {}'.format(afile).lower()))
``````

### or for all subdirs:

``````for curdir, filelist in zip(os.walk('.')[0], os.walk('.')[2]):
for afile in filelist:
assert 'ascii' in check_output(split('file {}'.format(afile).lower()))
``````

## 回答 16

``````import sys
PY3 = sys.version_info[0] == 3

# A function that takes an integer in the 8-bit range and returns
# a single-character byte object in py3 / a single-character string
# in py2.
#
int2byte = (lambda x: bytes((x,))) if PY3 else chr

_text_characters = (
b''.join(int2byte(i) for i in range(32, 127)) +
b'\n\r\t\f\b')

def istextfile(fileobj, blocksize=512):
""" Uses heuristics to guess whether the given file is text or binary,
by reading a single block of bytes from the file.
If more than 30% of the chars in the block are non-text, or there
are NUL ('\x00') bytes in the block, assume this is a binary file.
"""
if b'\x00' in block:
# Files with null bytes are binary
return False
elif not block:
# An empty file is considered a valid text file
return True

# Use translate's 'deletechars' argument to efficiently remove all
# occurrences of _text_characters from the block
nontext = block.translate(None, _text_characters)
return float(len(nontext)) / len(block) <= 0.30``````

Most of the programs consider the file to be binary (which is any file that is not “line-oriented”) if it contains a NULL character.

Here is perl’s version of `pp_fttext()` (`pp_sys.c`) implemented in Python:

``````import sys
PY3 = sys.version_info[0] == 3

# A function that takes an integer in the 8-bit range and returns
# a single-character byte object in py3 / a single-character string
# in py2.
#
int2byte = (lambda x: bytes((x,))) if PY3 else chr

_text_characters = (
b''.join(int2byte(i) for i in range(32, 127)) +
b'\n\r\t\f\b')

def istextfile(fileobj, blocksize=512):
""" Uses heuristics to guess whether the given file is text or binary,
by reading a single block of bytes from the file.
If more than 30% of the chars in the block are non-text, or there
are NUL ('\x00') bytes in the block, assume this is a binary file.
"""
if b'\x00' in block:
# Files with null bytes are binary
return False
elif not block:
# An empty file is considered a valid text file
return True

# Use translate's 'deletechars' argument to efficiently remove all
# occurrences of _text_characters from the block
nontext = block.translate(None, _text_characters)
return float(len(nontext)) / len(block) <= 0.30
``````

Note also that this code was written to run on both Python 2 and Python 3 without changes.

## 回答 17

``isBinary = os.system("file -b" + name + " | grep text > /dev/null")``

are you in unix? if so, then try:

``````isBinary = os.system("file -b" + name + " | grep text > /dev/null")
``````

The shell return values are inverted (0 is ok, so if it finds “text” then it will return a 0, and in Python that is a False expression).

## 回答 18

``b'\x00' in open("foo.bar", 'rb').read()``

``````#!/usr/bin/env python3
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
args = parser.parse_args()
with open(args.file[0], 'rb') as f:
if b'\x00' in f.read():
print('The file is binary!')
else:
print('The file is not binary!')``````

``````\$ ./is_binary.py /etc/hosts
The file is not binary!
\$ ./is_binary.py `which which`
The file is binary!``````

Simpler way is to check if the file consist NULL character (`\x00`) by using `in` operator, for instance:

``````b'\x00' in open("foo.bar", 'rb').read()
``````

See below the complete example:

``````#!/usr/bin/env python3
import argparse
if __name__ == '__main__':
parser = argparse.ArgumentParser()
args = parser.parse_args()
with open(args.file[0], 'rb') as f:
if b'\x00' in f.read():
print('The file is binary!')
else:
print('The file is not binary!')
``````

Sample usage:

``````\$ ./is_binary.py /etc/hosts
The file is not binary!
\$ ./is_binary.py `which which`
The file is binary!
``````

## 回答 19

``````from binaryornot.check import is_binary
is_binary('filename')``````

``````from binaryornot.check import is_binary
is_binary('filename')
``````

Documentation

# 将十六进制转换为二进制

## 问题：将十六进制转换为二进制

I have ABC123EFFF.

I want to have 001010101111000001001000111110111111111111 (i.e. binary repr. with, say, 42 digits and leading zeroes).

How?

## 回答 0

``````my_hexdata = "1a"

scale = 16 ## equals to hexadecimal

num_of_bits = 8

bin(int(my_hexdata, scale))[2:].zfill(num_of_bits)``````

For solving the left-side trailing zero problem:

``````my_hexdata = "1a"

scale = 16 ## equals to hexadecimal

num_of_bits = 8

bin(int(my_hexdata, scale))[2:].zfill(num_of_bits)
``````

It will give 00011010 instead of the trimmed version.

## 回答 1

``````import binascii

binary_string = binascii.unhexlify(hex_string)``````

``````import binascii

binary_string = binascii.unhexlify(hex_string)
``````

binascii.unhexlify

Return the binary data represented by the hexadecimal string specified as the parameter.

## 回答 2

``bin(int("abc123efff", 16))[2:]``
``````bin(int("abc123efff", 16))[2:]
``````

## 简短答案：

Python 3.6中的新f字符串允许您使用非常简洁的语法来执行此操作：

``````>>> f'{0xABC123EFFF:0>42b}'
'001010101111000001001000111110111111111111'``````

``````>>> number, pad, rjust, size, kind = 0xABC123EFFF, '0', '>', 42, 'b'
'001010101111000001001000111110111111111111'``````

## 将十六进制转换为二进制，42位数字和前导零？

``````>>> integer = int('ABC123EFFF', 16)
>>> integer
737679765503``````

``````>>> integer = 0xABC123EFFF
>>> integer
737679765503``````

## 使用内置功能， `format`

``````>>> format(integer, '0>42b')
'001010101111000001001000111110111111111111'``````

``[[fill]align][sign][#][0][width][,][.precision][type]``

``````>>> spec = '{fill}{align}{width}{type}'.format(fill='0', align='>', width=42, type='b')
>>> spec
'0>42b'``````

``````>>> bin_representation = format(integer, spec)
>>> bin_representation
'001010101111000001001000111110111111111111'
>>> print(bin_representation)
001010101111000001001000111110111111111111``````

## 字符串格式化（模板化） `str.format`

``````>>> 'here is the binary form: {0:{spec}}'.format(integer, spec=spec)
'here is the binary form: 001010101111000001001000111110111111111111'``````

``````>>> 'here is the binary form: {0:0>42b}'.format(integer)
'here is the binary form: 001010101111000001001000111110111111111111'``````

## 使用新的f字符串进行字符串格式化

``````>>> integer = 0xABC123EFFF
>>> length = 42
>>> f'{integer:0>{length}b}'
'001010101111000001001000111110111111111111'``````

``````def bin_format(integer, length):
return f'{integer:0>{length}b}'``````

``````>>> bin_format(0xABC123EFFF, 42)
'001010101111000001001000111110111111111111'    ``````

## 在旁边

``````>>> help(int.to_bytes)
to_bytes(...)
int.to_bytes(length, byteorder, *, signed=False) -> bytes
...``````

``````>>> integer.to_bytes(6, 'big')
b'\x00\xab\xc1#\xef\xff'``````

## Convert hex to binary

I have ABC123EFFF.

I want to have 001010101111000001001000111110111111111111 (i.e. binary repr. with, say, 42 digits and leading zeroes).

The new f-strings in Python 3.6 allow you to do this using very terse syntax:

``````>>> f'{0xABC123EFFF:0>42b}'
'001010101111000001001000111110111111111111'
``````

or to break that up with the semantics:

``````>>> number, pad, rjust, size, kind = 0xABC123EFFF, '0', '>', 42, 'b'
'001010101111000001001000111110111111111111'
``````

What you are actually saying is that you have a value in a hexadecimal representation, and you want to represent an equivalent value in binary.

The value of equivalence is an integer. But you may begin with a string, and to view in binary, you must end with a string.

## Convert hex to binary, 42 digits and leading zeros?

We have several direct ways to accomplish this goal, without hacks using slices.

First, before we can do any binary manipulation at all, convert to int (I presume this is in a string format, not as a literal):

``````>>> integer = int('ABC123EFFF', 16)
>>> integer
737679765503
``````

alternatively we could use an integer literal as expressed in hexadecimal form:

``````>>> integer = 0xABC123EFFF
>>> integer
737679765503
``````

Now we need to express our integer in a binary representation.

## Use the builtin function, `format`

Then pass to `format`:

``````>>> format(integer, '0>42b')
'001010101111000001001000111110111111111111'
``````

This uses the formatting specification’s mini-language.

To break that down, here’s the grammar form of it:

``````[[fill]align][sign][#][0][width][,][.precision][type]
``````

To make that into a specification for our needs, we just exclude the things we don’t need:

``````>>> spec = '{fill}{align}{width}{type}'.format(fill='0', align='>', width=42, type='b')
>>> spec
'0>42b'
``````

and just pass that to format

``````>>> bin_representation = format(integer, spec)
>>> bin_representation
'001010101111000001001000111110111111111111'
>>> print(bin_representation)
001010101111000001001000111110111111111111
``````

## String Formatting (Templating) with `str.format`

We can use that in a string using `str.format` method:

``````>>> 'here is the binary form: {0:{spec}}'.format(integer, spec=spec)
'here is the binary form: 001010101111000001001000111110111111111111'
``````

Or just put the spec directly in the original string:

``````>>> 'here is the binary form: {0:0>42b}'.format(integer)
'here is the binary form: 001010101111000001001000111110111111111111'
``````

## String Formatting with the new f-strings

Let’s demonstrate the new f-strings. They use the same mini-language formatting rules:

``````>>> integer = 0xABC123EFFF
>>> length = 42
>>> f'{integer:0>{length}b}'
'001010101111000001001000111110111111111111'
``````

Now let’s put this functionality into a function to encourage reusability:

``````def bin_format(integer, length):
return f'{integer:0>{length}b}'
``````

And now:

``````>>> bin_format(0xABC123EFFF, 42)
'001010101111000001001000111110111111111111'
``````

## Aside

If you actually just wanted to encode the data as a string of bytes in memory or on disk, you can use the `int.to_bytes` method, which is only available in Python 3:

``````>>> help(int.to_bytes)
to_bytes(...)
int.to_bytes(length, byteorder, *, signed=False) -> bytes
...
``````

And since 42 bits divided by 8 bits per byte equals 6 bytes:

``````>>> integer.to_bytes(6, 'big')
b'\x00\xab\xc1#\xef\xff'
``````

## 回答 4

``>>> bin( 0xABC123EFFF )``

‘0b1010101111000000000010001000111110111111111111’

``````>>> bin( 0xABC123EFFF )
``````

‘0b1010101111000001001000111110111111111111’

## 回答 5

``"{0:020b}".format(int('ABC123EFFF', 16))``
``````"{0:020b}".format(int('ABC123EFFF', 16))
``````

## 回答 6

``(n & (1 << i)) and 1``

``````
import binascii

def byte_to_binary(n):
return ''.join(str((n & (1 << i)) and 1) for i in reversed(range(8)))

def hex_to_binary(h):
return ''.join(byte_to_binary(ord(b)) for b in binascii.unhexlify(h))

print hex_to_binary('abc123efff')

>>> 1010101111000001001000111110111111111111``````

``(n & (1 << i)) and 1``

``1 if n & (1 << i) or 0``

（我不确定是哪个TBH的可读性）

Here’s a fairly raw way to do it using bit fiddling to generate the binary strings.

The key bit to understand is:

``(n & (1 << i)) and 1``

Which will generate either a 0 or 1 if the i’th bit of n is set.

``````
import binascii

def byte_to_binary(n):
return ''.join(str((n & (1 << i)) and 1) for i in reversed(range(8)))

def hex_to_binary(h):
return ''.join(byte_to_binary(ord(b)) for b in binascii.unhexlify(h))

print hex_to_binary('abc123efff')

>>> 1010101111000001001000111110111111111111
``````

Edit: using the “new” ternary operator this:

``(n & (1 << i)) and 1``

Would become:

``1 if n & (1 << i) or 0``

(Which TBH I’m not sure how readable that is)

## 回答 7

``````
def hextobin(self, hexval):
'''
Takes a string representation of hex data with
arbitrary length and converts to string representation
of binary.  Includes padding 0s
'''
thelen = len(hexval)*4
binval = bin(int(hexval, 16))[2:]
while ((len(binval)) < thelen):
binval = '0' + binval
return binval
``````

This is a slight touch up to Glen Maynard’s solution, which I think is the right way to do it. It just adds the padding element.

``````
def hextobin(self, hexval):
'''
Takes a string representation of hex data with
arbitrary length and converts to string representation
of binary.  Includes padding 0s
'''
thelen = len(hexval)*4
binval = bin(int(hexval, 16))[2:]
while ((len(binval)) &lt thelen):
binval = '0' + binval
return binval

``````

Pulled it out of a class. Just take out `self, ` if you’re working in a stand-alone script.

## 回答 8

int（）

``int(string, base)``

``format(integer, # of bits)``

``````# w/o 0b prefix
>> format(int("ABC123EFFF", 16), "040b")
1010101111000001001000111110111111111111

# with 0b prefix
>> format(int("ABC123EFFF", 16), "#042b")
0b1010101111000001001000111110111111111111

# w/o 0b prefix + 64bit
>> format(int("ABC123EFFF", 16), "064b")
0000000000000000000000001010101111000001001000111110111111111111``````

Use Built-in format() function and int() function It’s simple and easy to understand. It’s little bit simplified version of Aaron answer

int()

``````int(string, base)
``````

format()

``````format(integer, # of bits)
``````

Example

``````# w/o 0b prefix
>> format(int("ABC123EFFF", 16), "040b")
1010101111000001001000111110111111111111

# with 0b prefix
>> format(int("ABC123EFFF", 16), "#042b")
0b1010101111000001001000111110111111111111

# w/o 0b prefix + 64bit
>> format(int("ABC123EFFF", 16), "064b")
0000000000000000000000001010101111000001001000111110111111111111
``````

## 回答 9

``````1 - 0001
2 - 0010
...
a - 1010
b - 1011
...
f - 1111``````

Replace each hex digit with the corresponding 4 binary digits:

``````1 - 0001
2 - 0010
...
a - 1010
b - 1011
...
f - 1111
``````

## 回答 10

``````#decimal to binary
def d2b(n):
bStr = ''
if n < 0: raise ValueError, "must be a positive integer"
if n == 0: return '0'
while n > 0:
bStr = str(n % 2) + bStr
n = n >> 1
return bStr

#hex to binary
def h2b(hex):
return d2b(int(hex,16))``````

hex –> decimal then decimal –> binary

``````#decimal to binary
def d2b(n):
bStr = ''
if n < 0: raise ValueError, "must be a positive integer"
if n == 0: return '0'
while n > 0:
bStr = str(n % 2) + bStr
n = n >> 1
return bStr

#hex to binary
def h2b(hex):
return d2b(int(hex,16))
``````

## 回答 11

``````import math

def hextobinary(hex_string):
s = int(hex_string, 16)
num_digits = int(math.ceil(math.log(s) / math.log(2)))
digit_lst = ['0'] * num_digits
idx = num_digits
while s > 0:
idx -= 1
if s % 2 == 1: digit_lst[idx] = '1'
s = s / 2
return ''.join(digit_lst)

print hextobinary('abc123efff')``````

Another way:

``````import math

def hextobinary(hex_string):
s = int(hex_string, 16)
num_digits = int(math.ceil(math.log(s) / math.log(2)))
digit_lst = ['0'] * num_digits
idx = num_digits
while s > 0:
idx -= 1
if s % 2 == 1: digit_lst[idx] = '1'
s = s / 2
return ''.join(digit_lst)

print hextobinary('abc123efff')
``````

## 回答 12

``````def hextobin(h):
return bin(int(h, 16))[2:].zfill(len(h) * 4)``````

I added the calculation for the number of bits to fill to Onedinkenedi’s solution. Here is the resulting function:

``````def hextobin(h):
return bin(int(h, 16))[2:].zfill(len(h) * 4)
``````

Where 16 is the base you’re converting from (hexadecimal), and 4 is how many bits you need to represent each digit, or log base 2 of the scale.

## 回答 13

`````` def conversion():
e1=("a","b","c","d","e","f")
e2=(10,11,12,13,14,15)
e3=1
e4=len(e)
e5=()
while e3<=e4:
e5=e5+(e[e3-1],)
e3=e3+1
print e5
e6=1
e8=()
while e6<=e4:
e7=e5[e6-1]
if e7=="A":
e7=10
if e7=="B":
e7=11
if e7=="C":
e7=12
if e7=="D":
e7=13
if e7=="E":
e7=14
if e7=="F":
e7=15
else:
e7=int(e7)
e8=e8+(e7,)
e6=e6+1
print e8

e9=1
e10=len(e8)
e11=()
while e9<=e10:
e12=e8[e9-1]
a1=e12
a2=()
a3=1
while a3<=1:
a4=a1%2
a2=a2+(a4,)
a1=a1/2
if a1<2:
if a1==1:
a2=a2+(1,)
if a1==0:
a2=a2+(0,)
a3=a3+1
a5=len(a2)
a6=1
a7=""
a56=a5
while a6<=a5:
a7=a7+str(a2[a56-1])
a6=a6+1
a56=a56-1
if a5<=3:
if a5==1:
a8="000"
a7=a8+a7
if a5==2:
a8="00"
a7=a8+a7
if a5==3:
a8="0"
a7=a8+a7
else:
a7=a7
print a7,
e9=e9+1``````
`````` def conversion():
e1=("a","b","c","d","e","f")
e2=(10,11,12,13,14,15)
e3=1
e4=len(e)
e5=()
while e3<=e4:
e5=e5+(e[e3-1],)
e3=e3+1
print e5
e6=1
e8=()
while e6<=e4:
e7=e5[e6-1]
if e7=="A":
e7=10
if e7=="B":
e7=11
if e7=="C":
e7=12
if e7=="D":
e7=13
if e7=="E":
e7=14
if e7=="F":
e7=15
else:
e7=int(e7)
e8=e8+(e7,)
e6=e6+1
print e8

e9=1
e10=len(e8)
e11=()
while e9<=e10:
e12=e8[e9-1]
a1=e12
a2=()
a3=1
while a3<=1:
a4=a1%2
a2=a2+(a4,)
a1=a1/2
if a1<2:
if a1==1:
a2=a2+(1,)
if a1==0:
a2=a2+(0,)
a3=a3+1
a5=len(a2)
a6=1
a7=""
a56=a5
while a6<=a5:
a7=a7+str(a2[a56-1])
a6=a6+1
a56=a56-1
if a5<=3:
if a5==1:
a8="000"
a7=a8+a7
if a5==2:
a8="00"
a7=a8+a7
if a5==3:
a8="0"
a7=a8+a7
else:
a7=a7
print a7,
e9=e9+1
``````

## 回答 14

``````input = 'ABC123EFFF'
for index, value in enumerate(input):
print(value)
print(bin(int(value,16)+16)[3:])

string = ''.join([bin(int(x,16)+16)[3:] for y,x in enumerate(input)])
print(string)``````

i have a short snipped hope that helps 🙂

``````input = 'ABC123EFFF'
for index, value in enumerate(input):
print(value)
print(bin(int(value,16)+16)[3:])

string = ''.join([bin(int(x,16)+16)[3:] for y,x in enumerate(input)])
print(string)
``````

first i use your input and enumerate it to get each symbol. then i convert it to binary and trim from 3th position to the end. The trick to get the 0 is to add the max value of the input -> in this case always 16 🙂

the short form ist the join method. Enjoy.

## 回答 15

``````# Python Program - Convert Hexadecimal to Binary
hexdec = input("Enter Hexadecimal string: ")
print(hexdec," in Binary = ", end="")    # end is by default "\n" which prints a new line
for _hex in hexdec:
dec = int(_hex, 16)    # 16 means base-16 wich is hexadecimal
print(bin(dec)[2:].rjust(4,"0"), end="")    # the [2:] skips 0b, and the ``````
``````# Python Program - Convert Hexadecimal to Binary
hexdec = input("Enter Hexadecimal string: ")
print(hexdec," in Binary = ", end="")    # end is by default "\n" which prints a new line
for _hex in hexdec:
dec = int(_hex, 16)    # 16 means base-16 wich is hexadecimal
print(bin(dec)[2:].rjust(4,"0"), end="")    # the [2:] skips 0b, and the
``````

## 回答 16

``````def hex_to_binary( hex_code ):
bin_code = bin( hex_code )[2:]
return '0'*padding + bin_code``````

``````>>> hex_to_binary( 0xABC123EFFF )
'1010101111000001001000111110111111111111'``````

``````>>> hex_to_binary( 0x7123 )
'0111000100100011'``````

The binary version of ABC123EFFF is actually 1010101111000001001000111110111111111111

For almost all applications you want the binary version to have a length that is a multiple of 4 with leading padding of 0s.

To get this in Python:

``````def hex_to_binary( hex_code ):
bin_code = bin( hex_code )[2:]
return '0'*padding + bin_code
``````

Example 1:

``````>>> hex_to_binary( 0xABC123EFFF )
'1010101111000001001000111110111111111111'
``````

Example 2:

``````>>> hex_to_binary( 0x7123 )
'0111000100100011'
``````

Note that this also works in Micropython 🙂

## 回答 17

1. 使用pip安装
``pip install coden``
1. 兑换
``````a_hexadecimal_number = "f1ff"

• 二进制
• INT十进制
• _to_-函数的转换关键字

Just use the module coden (note: I am the author of the module)

You can convert haxedecimal to binary there.

1. Install using pip
``````pip install coden
``````
1. Convert
``````a_hexadecimal_number = "f1ff"
``````

The converting Keywords are:

• hex for hexadeimal
• bin for binary
• int for decimal
• _to_ – the converting keyword for the function

So you can also format: e. hexadecimal_output = bin_to_hex(a_binary_number)

## 回答 18

HEX_TO_BINARY_CONVERSION_TABLE = {‘0’：’0000’，

``````                              '1': '0001',

'2': '0010',

'3': '0011',

'4': '0100',

'5': '0101',

'6': '0110',

'7': '0111',

'8': '1000',

'9': '1001',

'a': '1010',

'b': '1011',

'c': '1100',

'd': '1101',

'e': '1110',

'f': '1111'}

def hex_to_binary(hex_string):
binary_string = ""
for character in hex_string:
binary_string += HEX_TO_BINARY_CONVERSION_TABLE[character]
return binary_string``````

HEX_TO_BINARY_CONVERSION_TABLE = { ‘0’: ‘0000’,

``````                              '1': '0001',

'2': '0010',

'3': '0011',

'4': '0100',

'5': '0101',

'6': '0110',

'7': '0111',

'8': '1000',

'9': '1001',

'a': '1010',

'b': '1011',

'c': '1100',

'd': '1101',

'e': '1110',

'f': '1111'}

def hex_to_binary(hex_string):
binary_string = ""
for character in hex_string:
binary_string += HEX_TO_BINARY_CONVERSION_TABLE[character]
return binary_string
``````

## 回答 19

``````a = raw_input('hex number\n')
length = len(a)
ab = bin(int(a, 16))[2:]
while len(ab)<(length * 4):
ab = '0' + ab
print ab``````
``````a = raw_input('hex number\n')
length = len(a)
ab = bin(int(a, 16))[2:]
while len(ab)<(length * 4):
ab = '0' + ab
print ab
``````

## 回答 20

``````import binascii
hexa_input = input('Enter hex String to convert to Binary: ')
Integer_output=int(hexa_input,16)
print(Binary_output)
"""zfill(x) i.e. x no of 0 s to be padded left - Integers will overwrite 0 s
starting from right side but remaining 0 s will display till quantity x
[y:] where y is no of output chars which need to destroy starting from left"""``````
``````import binascii
hexa_input = input('Enter hex String to convert to Binary: ')
Integer_output=int(hexa_input,16)
print(Binary_output)
"""zfill(x) i.e. x no of 0 s to be padded left - Integers will overwrite 0 s
starting from right side but remaining 0 s will display till quantity x
[y:] where y is no of output chars which need to destroy starting from left"""
``````

## 回答 21

``````no=raw_input("Enter your number in hexa decimal :")
def convert(a):
if a=="0":
c="0000"
elif a=="1":
c="0001"
elif a=="2":
c="0010"
elif a=="3":
c="0011"
elif a=="4":
c="0100"
elif a=="5":
c="0101"
elif a=="6":
c="0110"
elif a=="7":
c="0111"
elif a=="8":
c="1000"
elif a=="9":
c="1001"
elif a=="A":
c="1010"
elif a=="B":
c="1011"
elif a=="C":
c="1100"
elif a=="D":
c="1101"
elif a=="E":
c="1110"
elif a=="F":
c="1111"
else:
c="invalid"
return c

a=len(no)
b=0
l=""
while b<a:
l=l+convert(no[b])
b+=1
print l``````
``````no=raw_input("Enter your number in hexa decimal :")
def convert(a):
if a=="0":
c="0000"
elif a=="1":
c="0001"
elif a=="2":
c="0010"
elif a=="3":
c="0011"
elif a=="4":
c="0100"
elif a=="5":
c="0101"
elif a=="6":
c="0110"
elif a=="7":
c="0111"
elif a=="8":
c="1000"
elif a=="9":
c="1001"
elif a=="A":
c="1010"
elif a=="B":
c="1011"
elif a=="C":
c="1100"
elif a=="D":
c="1101"
elif a=="E":
c="1110"
elif a=="F":
c="1111"
else:
c="invalid"
return c

a=len(no)
b=0
l=""
while b<a:
l=l+convert(no[b])
b+=1
print l
``````

# 用python读取二进制文件

## 问题：用python读取二进制文件

``````int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups

``````Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N. ``````

I find particularly difficult reading binary file with Python. Can you give me a hand? I need to read this file, which in Fortran 90 is easily read by

``````int*4 n_particles, n_groups
real*4 group_id(n_particles)
read (*) n_particles, n_groups
``````

In detail, the file format is:

``````Bytes 1-4 -- The integer 8.
Bytes 5-8 -- The number of particles, N.
Bytes 9-12 -- The number of groups.
Bytes 13-16 -- The integer 8.
Bytes 17-20 -- The integer 4*N.
Next many bytes -- The group ID numbers for all the particles.
Last 4 bytes -- The integer 4*N.
``````

How can I read this with Python? I tried everything but it never worked. Is there any chance I might use a f90 program in python, reading this binary file and then save the data that I need to use?

## 回答 0

``````with open(fileName, mode='rb') as file: # b is important -> binary

``struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])``

Read the binary file content like this:

``````with open(fileName, mode='rb') as file: # b is important -> binary
``````

then “unpack” binary data using struct.unpack:

The start bytes: `struct.unpack("iiiii", fileContent[:20])`

The body: ignore the heading bytes and the trailing byte (= 24); The remaining part forms the body, to know the number of bytes in the body do an integer division by 4; The obtained quotient is multiplied by the string `'i'` to create the correct format for the unpack method:

``````struct.unpack("i" * ((len(fileContent) -24) // 4), fileContent[20:-4])
``````

The end byte: `struct.unpack("i", fileContent[-4:])`

## 回答 1

``````import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)``````

In general, I would recommend that you look into using Python’s struct module for this. It’s standard with Python, and it should be easy to translate your question’s specification into a formatting string suitable for `struct.unpack()`.

Do note that if there’s “invisible” padding between/around the fields, you will need to figure that out and include it in the `unpack()` call, or you will read the wrong bits.

Reading the contents of the file in order to have something to unpack is pretty trivial:

``````import struct

data = open("from_fortran.bin", "rb").read()

(eight, N) = struct.unpack("@II", data)
``````

This unpacks the first two fields, assuming they start at the very beginning of the file (no padding or extraneous data), and also assuming native byte-order (the `@` symbol). The `I`s in the formatting string mean “unsigned integer, 32 bits”.

## 回答 2

You could use `numpy.fromfile`, which can read data from both text and binary files. You would first construct a data type, which represents your file format, using `numpy.dtype`, and then read this type from file using `numpy.fromfile`.

## 回答 3

``````from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+``````

``i = int.from_bytes(data[:4], byteorder='little', signed=False)``

``````import struct
ints = struct.unpack('iiii', data[:16])``````

To read a binary file to a `bytes` object:

``````from pathlib import Path
data = Path('/path/to/file').read_bytes()  # Python 3.5+
``````

To create an `int` from bytes 0-3 of the data:

``````i = int.from_bytes(data[:4], byteorder='little', signed=False)
``````

To unpack multiple `int`s from the data:

``````import struct
ints = struct.unpack('iiii', data[:16])
``````

## 回答 4

``````import binaryfile

def particle_file(f):
f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
f.skip(4)  # Bytes 1-4
num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
f.int('num_groups', 4)  # Bytes 9-12
f.skip(8)  # Bytes 13-20
for i in range(num_particles):
f.struct('group_ids', '>f')  # 4 bytes x num_particles
f.skip(4)

with open('myfile.bin', 'rb') as fh:
result = binaryfile.read(fh, particle_file)
print(result)``````

``````{
'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
'__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
'num_particles': 5,
'num_groups': 3
}``````

I too found Python lacking when it comes to reading and writing binary files, so I wrote a small module (for Python 3.6+).

With binaryfile you’d do something like this (I’m guessing, since I don’t know Fortran):

``````import binaryfile

def particle_file(f):
f.array('group_ids')  # Declare group_ids to be an array (so we can use it in a loop)
f.skip(4)  # Bytes 1-4
num_particles = f.count('num_particles', 'group_ids', 4)  # Bytes 5-8
f.int('num_groups', 4)  # Bytes 9-12
f.skip(8)  # Bytes 13-20
for i in range(num_particles):
f.struct('group_ids', '>f')  # 4 bytes x num_particles
f.skip(4)

with open('myfile.bin', 'rb') as fh:
result = binaryfile.read(fh, particle_file)
print(result)
``````

Which produces an output like this:

``````{
'group_ids': [(1.0,), (0.0,), (2.0,), (0.0,), (1.0,)],
'__skipped': [b'\x00\x00\x00\x08', b'\x00\x00\x00\x08\x00\x00\x00\x14', b'\x00\x00\x00\x14'],
'num_particles': 5,
'num_groups': 3
}
``````

I used skip() to skip the additional data Fortran adds, but you may want to add a utility to handle Fortran records properly instead. If you do, a pull request would be welcome.

## 回答 5

``````import pickle
f=open("filename.dat","rb")
try:
while True:
print x
except EOFError:
pass
f.close()``````
``````import pickle
f=open("filename.dat","rb")
try:
while True:
print x
except EOFError:
pass
f.close()
``````

# 按位操作和用法

## 问题：按位操作和用法

``````x = 1        # 0001
x << 2       # Shift left 2 bits: 0100
# Result: 4

x | 2        # Bitwise OR: 0011
# Result: 3

x & 1        # Bitwise AND: 0001
# Result: 1
``````

Consider this code:

``````x = 1        # 0001
x << 2       # Shift left 2 bits: 0100
# Result: 4

x | 2        # Bitwise OR: 0011
# Result: 3

x & 1        # Bitwise AND: 0001
# Result: 1
``````

I can understand the arithmetic operators in Python (and other languages), but I never understood ‘bitwise’ operators quite well. In the above example (from a Python book), I understand the left-shift but not the other two.

Also, what are bitwise operators actually used for? I’d appreciate some examples.

## 回答 0

• `AND`仅当其两个输入均为1时，才为1；否则为0。
• `OR`如果其输入之一或全部为1，则为1，否则为0。
• `XOR`仅当其输入之一恰好为1时为1，否则为0。
• `NOT` 仅当其输入为0时为1，否则为0。

``````AND | 0 1     OR | 0 1     XOR | 0 1    NOT | 0 1
----+-----    ---+----     ----+----    ----+----
0  | 0 0      0 | 0 1       0 | 0 1        | 1 0
1  | 0 1      1 | 1 1       1 | 1 0``````

``````    201: 1100 1001
AND  15: 0000 1111
------------------
IS   9  0000 1001``````

``````1001 0101 >> 2 gives 0010 0101
1111 1111 << 4 gives 1111 0000``````

``bits8 = (bits8 << 4) & 255``

``packed_val = ((val1 & 15) << 4) | (val2 & 15)``
• `& 15`操作将确保两个值仅具有低4位。
• `<< 4`是一个4位左移位移动`val1`进入前的8位值的4位。
• `|`简单地结合了这两者结合起来。

``````                val1            val2
====            ====
& 15 (and)   xxxx-0111       xxxx-0100  & 15
<< 4 (left)  0111-0000           |
|               |
+-------+-------+
|
| (or)                0111-0100``````

Bitwise operators are operators that work on multi-bit values, but conceptually one bit at a time.

• `AND` is 1 only if both of its inputs are 1, otherwise it’s 0.
• `OR` is 1 if one or both of its inputs are 1, otherwise it’s 0.
• `XOR` is 1 only if exactly one of its inputs are 1, otherwise it’s 0.
• `NOT` is 1 only if its input is 0, otherwise it’s 0.

These can often be best shown as truth tables. Input possibilities are on the top and left, the resultant bit is one of the four (two in the case of NOT since it only has one input) values shown at the intersection of the inputs.

``````AND | 0 1     OR | 0 1     XOR | 0 1    NOT | 0 1
----+-----    ---+----     ----+----    ----+----
0  | 0 0      0 | 0 1       0 | 0 1        | 1 0
1  | 0 1      1 | 1 1       1 | 1 0
``````

One example is if you only want the lower 4 bits of an integer, you AND it with 15 (binary 1111) so:

``````    201: 1100 1001
AND  15: 0000 1111
------------------
IS   9  0000 1001
``````

The zero bits in 15 in that case effectively act as a filter, forcing the bits in the result to be zero as well.

In addition, `>>` and `<<` are often included as bitwise operators, and they “shift” a value respectively right and left by a certain number of bits, throwing away bits that roll of the end you’re shifting towards, and feeding in zero bits at the other end.

So, for example:

``````1001 0101 >> 2 gives 0010 0101
1111 1111 << 4 gives 1111 0000
``````

Note that the left shift in Python is unusual in that it’s not using a fixed width where bits are discarded – while many languages use a fixed width based on the data type, Python simply expands the width to cater for extra bits. In order to get the discarding behaviour in Python, you can follow a left shift with a bitwise `and` such as in an 8-bit value shifting left four bits:

``````bits8 = (bits8 << 4) & 255
``````

With that in mind, another example of bitwise operators is if you have two 4-bit values that you want to pack into an 8-bit one, you can use all three of your operators (`left-shift`, `and` and `or`):

``````packed_val = ((val1 & 15) << 4) | (val2 & 15)
``````
• The `& 15` operation will make sure that both values only have the lower 4 bits.
• The `<< 4` is a 4-bit shift left to move `val1` into the top 4 bits of an 8-bit value.
• The `|` simply combines these two together.

If `val1` is 7 and `val2` is 4:

``````                val1            val2
====            ====
& 15 (and)   xxxx-0111       xxxx-0100  & 15
<< 4 (left)  0111-0000           |
|               |
+-------+-------+
|
| (or)                0111-0100
``````

## 回答 1

`|` 用于将某个位设置为1

`&` 用于测试或清除特定位

• 设置一个位（其中n是位数，0是最低有效位）：

`unsigned char a |= (1 << n);`

• 清除一点：

`unsigned char b &= ~(1 << n);`

• 切换一下：

`unsigned char c ^= (1 << n);`

• 测试一下：

`unsigned char e = d & (1 << n);`

`x | 2`用于将第1位的设置`x`为1

`x & 1`用于测试第0位`x`是1还是0

One typical usage:

`|` is used to set a certain bit to 1

`&` is used to test or clear a certain bit

• Set a bit (where n is the bit number, and 0 is the least significant bit):

`unsigned char a |= (1 << n);`

• Clear a bit:

`unsigned char b &= ~(1 << n);`

• Toggle a bit:

`unsigned char c ^= (1 << n);`

• Test a bit:

`unsigned char e = d & (1 << n);`

Take the case of your list for example:

`x | 2` is used to set bit 1 of `x` to 1

`x & 1` is used to test if bit 0 of `x` is 1 or 0

## 回答 2

``````def hexToRgb(value):
# Convert string to hexadecimal number (base 16)
num = (int(value.lstrip("#"), 16))

# Shift 16 bits to the right, and then binary AND to obtain 8 bits representing red
r = ((num >> 16) & 0xFF)

# Shift 8 bits to the right, and then binary AND to obtain 8 bits representing green
g = ((num >> 8) & 0xFF)

# Simply binary AND to obtain 8 bits representing blue
b = (num & 0xFF)
return (r, g, b)``````

what are bitwise operators actually used for? I’d appreciate some examples.

One of the most common uses of bitwise operations is for parsing hexadecimal colours.

For example, here’s a Python function that accepts a String like `#FF09BE` and returns a tuple of its Red, Green and Blue values.

``````def hexToRgb(value):
# Convert string to hexadecimal number (base 16)
num = (int(value.lstrip("#"), 16))

# Shift 16 bits to the right, and then binary AND to obtain 8 bits representing red
r = ((num >> 16) & 0xFF)

# Shift 8 bits to the right, and then binary AND to obtain 8 bits representing green
g = ((num >> 8) & 0xFF)

# Simply binary AND to obtain 8 bits representing blue
b = (num & 0xFF)
return (r, g, b)
``````

I know that there are more efficient ways to acheive this, but I believe that this is a really concise example illustrating both shifts and bitwise boolean operations.

## 回答 3

• 计算机网络;

• 电信应用（蜂窝电话，卫星通信等）。

``read = ((read ^ 0x80) >> 4) & 0x0f; ``

I think that the second part of the question:

Also, what are bitwise operators actually used for? I’d appreciate some examples.

Has been only partially addressed. These are my two cents on that matter.

Bitwise operations in programming languages play a fundamental role when dealing with a lot of applications. Almost all low-level computing must be done using this kind of operations.

In all applications that need to send data between two nodes, such as:

• computer networks;

• telecommunication applications (cellular phones, satellite communications, etc).

In the lower level layer of communication, the data is usually sent in what is called frames. Frames are just strings of bytes that are sent through a physical channel. This frames usually contain the actual data plus some other fields (coded in bytes) that are part of what is called the header. The header usually contains bytes that encode some information related to the status of the communication (e.g, with flags (bits)), frame counters, correction and error detection codes, etc. To get the transmitted data in a frame, and to build the frames to send data, you will need for sure bitwise operations.

In general, when dealing with that kind of applications, an API is available so you don’t have to deal with all those details. For example, all modern programming languages provide libraries for socket connections, so you don’t actually need to build the TCP/IP communication frames. But think about the good people that programmed those APIs for you, they had to deal with frame construction for sure; using all kinds of bitwise operations to go back and forth from the low-level to the higher-level communication.

As a concrete example, imagine some one gives you a file that contains raw data that was captured directly by telecommunication hardware. In this case, in order to find the frames, you will need to read the raw bytes in the file and try to find some kind of synchronization words, by scanning the data bit by bit. After identifying the synchronization words, you will need to get the actual frames, and SHIFT them if necessary (and that is just the start of the story) to get the actual data that is being transmitted.

Another very different low level family of application is when you need to control hardware using some (kind of ancient) ports, such as parallel and serial ports. This ports are controlled by setting some bytes, and each bit of that bytes has a specific meaning, in terms of instructions, for that port (see for instance http://en.wikipedia.org/wiki/Parallel_port). If you want to build software that does something with that hardware you will need bitwise operations to translate the instructions you want to execute to the bytes that the port understand.

For example, if you have some physical buttons connected to the parallel port to control some other device, this is a line of code that you can find in the soft application:

``````read = ((read ^ 0x80) >> 4) & 0x0f;
``````

Hope this contributes.

## 回答 4

``````x | 2

0001 //x
0010 //2

0011 //result = 3``````

``````x & 1

0001 //x
0001 //1

0001 //result = 1``````

I hope this clarifies those two:

``````x | 2

0001 //x
0010 //2

0011 //result = 3
``````

``````x & 1

0001 //x
0001 //1

0001 //result = 1
``````

## 回答 5

Think of 0 as false and 1 as true. Then bitwise and(&) and or(|) work just like regular and and or except they do all of the bits in the value at once. Typically you will see them used for flags if you have 30 options that can be set (say as draw styles on a window) you don’t want to have to pass in 30 separate boolean values to set or unset each one so you use | to combine options into a single value and then you use & to check if each option is set. This style of flag passing is heavily used by OpenGL. Since each bit is a separate flag you get flag values on powers of two(aka numbers that have only one bit set) 1(2^0) 2(2^1) 4(2^2) 8(2^3) the power of two tells you which bit is set if the flag is on.

Also note 2 = 10 so x|2 is 110(6) not 111(7) If none of the bits overlap(which is true in this case) | acts like addition.

## 回答 6

I didn’t see it mentioned above but you will also see some people use left and right shift for arithmetic operations. A left shift by x is equivalent to multiplying by 2^x (as long as it doesn’t overflow) and a right shift is equivalent to dividing by 2^x.

Recently I’ve seen people using x << 1 and x >> 1 for doubling and halving, although I’m not sure if they are just trying to be clever or if there really is a distinct advantage over the normal operators.

## 回答 7

• 联合运算符`|`将两个集合组合在一起，形成一个新集合，其中两个集合都包含项。
• 交集运算符`&`仅在两个项中都获得项。
• 差异运算符`-`在第一组中获得项目，但在第二组中则没有。
• 对称差运算符`^`获取任一集合中的项目，但不能同时获取两者。

``````first = {1, 2, 3, 4, 5, 6}
second = {4, 5, 6, 7, 8, 9}

print(first | second)

print(first & second)

print(first - second)

print(second - first)

print(first ^ second)``````

``````{1, 2, 3, 4, 5, 6, 7, 8, 9}

{4, 5, 6}

{1, 2, 3}

{8, 9, 7}

{1, 2, 3, 7, 8, 9}``````

Sets

Sets can be combined using mathematical operations.

• The union operator `|` combines two sets to form a new one containing items in either.
• The intersection operator `&` gets items only in both.
• The difference operator `-` gets items in the first set but not in the second.
• The symmetric difference operator `^` gets items in either set, but not both.

Try It Yourself:

``````first = {1, 2, 3, 4, 5, 6}
second = {4, 5, 6, 7, 8, 9}

print(first | second)

print(first & second)

print(first - second)

print(second - first)

print(first ^ second)
``````

Result:

``````{1, 2, 3, 4, 5, 6, 7, 8, 9}

{4, 5, 6}

{1, 2, 3}

{8, 9, 7}

{1, 2, 3, 7, 8, 9}
``````

## 回答 8

``````10 | 12

1010 #decimal 10
1100 #decimal 12

1110 #result = 14``````

``````10 & 12

1010 #decimal 10
1100 #decimal 12

1000 #result = 8``````

``````x = raw_input('Enter a number:')
print 'x is %s.' % ('even', 'odd')[x&1]``````

This example will show you the operations for all four 2 bit values:

``````10 | 12

1010 #decimal 10
1100 #decimal 12

1110 #result = 14
``````

``````10 & 12

1010 #decimal 10
1100 #decimal 12

1000 #result = 8
``````

Here is one example of usage:

``````x = raw_input('Enter a number:')
print 'x is %s.' % ('even', 'odd')[x&1]
``````

## 回答 9

``````import os
import stat

#Get the actual mode of a file
mode = os.stat('file.txt').st_mode

#File should be a regular file, readable and writable by its owner
#Each permission value has a single 'on' bit.  Use bitwise or to combine
#them.
desired_mode = stat.S_IFREG|stat.S_IRUSR|stat.S_IWUSR

#check for exact match:
mode == desired_mode
#check for at least one bit matching:
bool(mode & desired_mode)
#check for at least one bit 'on' in one, and not in the other:
bool(mode ^ desired_mode)
#check that all bits from desired_mode are set in mode, but I don't care about
# other bits.
not bool((mode^desired_mode)&desired_mode)``````

Another common use-case is manipulating/testing file permissions. See the Python stat module: http://docs.python.org/library/stat.html.

For example, to compare a file’s permissions to a desired permission set, you could do something like:

``````import os
import stat

#Get the actual mode of a file
mode = os.stat('file.txt').st_mode

#File should be a regular file, readable and writable by its owner
#Each permission value has a single 'on' bit.  Use bitwise or to combine
#them.
desired_mode = stat.S_IFREG|stat.S_IRUSR|stat.S_IWUSR

#check for exact match:
mode == desired_mode
#check for at least one bit matching:
bool(mode & desired_mode)
#check for at least one bit 'on' in one, and not in the other:
bool(mode ^ desired_mode)
#check that all bits from desired_mode are set in mode, but I don't care about
# other bits.
not bool((mode^desired_mode)&desired_mode)
``````

I cast the results as booleans, because I only care about the truth or falsehood, but it would be a worthwhile exercise to print out the bin() values for each one.

## 回答 10

Bit representations of integers are often used in scientific computing to represent arrays of true-false information because a bitwise operation is much faster than iterating through an array of booleans. (Higher level languages may use the idea of a bit array.)

A nice and fairly simple example of this is the general solution to the game of Nim. Take a look at the Python code on the Wikipedia page. It makes heavy use of bitwise exclusive or, `^`.

## 回答 11

``````import numpy as np
a=np.array([1.2, 2.3, 3.4])
np.where((a>2) and (a<3))
#Result: Value Error
np.where((a>2) & (a<3))
#Result: (array([1]),)``````

There may be a better way to find where an array element is between two values, but as this example shows, the & works here, whereas and does not.

``````import numpy as np
a=np.array([1.2, 2.3, 3.4])
np.where((a>2) and (a<3))
#Result: Value Error
np.where((a>2) & (a<3))
#Result: (array([1]),)
``````

## 回答 12

``````111 #decimal 7
-
100 #decimal 4
--------------
011 #decimal 3``````

``````001 #decimal 1
-
100 #decimal 4
--------------
001 #decimal 1``````

i didnt see it mentioned, This example will show you the (-) decimal operation for 2 bit values: A-B (only if A contains B)

this operation is needed when we hold an verb in our program that represent bits. sometimes we need to add bits (like above) and sometimes we need to remove bits (if the verb contains then)

``````111 #decimal 7
-
100 #decimal 4
--------------
011 #decimal 3
``````

with python: 7 & ~4 = 3 (remove from 7 the bits that represent 4)

``````001 #decimal 1
-
100 #decimal 4
--------------
001 #decimal 1
``````

with python: 1 & ~4 = 1 (remove from 1 the bits that represent 4 – in this case 1 is not ‘contains’ 4)..

## 回答 13

``````>>> import bitstring
>>> bitstring.BitArray(bytes='ABCDEFGHIJKLMNOPQ') << 4
BitArray('0x142434445464748494a4b4c4d4e4f50510')
>>> bitstring.BitArray(hex='0x4142434445464748494a4b4c4d4e4f5051') << 4
BitArray('0x142434445464748494a4b4c4d4e4f50510')``````

Whilst manipulating bits of an integer is useful, often for network protocols, which may be specified down to the bit, one can require manipulation of longer byte sequences (which aren’t easily converted into one integer). In this case it is useful to employ the bitstring library which allows for bitwise operations on data – e.g. one can import the string ‘ABCDEFGHIJKLMNOPQ’ as a string or as hex and bit shift it (or perform other bitwise operations):

``````>>> import bitstring
>>> bitstring.BitArray(bytes='ABCDEFGHIJKLMNOPQ') << 4
BitArray('0x142434445464748494a4b4c4d4e4f50510')
>>> bitstring.BitArray(hex='0x4142434445464748494a4b4c4d4e4f5051') << 4
BitArray('0x142434445464748494a4b4c4d4e4f50510')
``````

## 回答 14

the following bitwise operators: &, |, ^, and ~ return values (based on their input) in the same way logic gates affect signals. You could use them to emulate circuits.

## 回答 15

``````In Binary
a=1010 --> this is 0xA or decimal 10
then
c = 1111 ^ a = 0101 --> this is 0xF or decimal 15
-----------------
In Python
a=10
b=15
c = a ^ b --> 0101
print(bin(c)) # gives '0b101'``````

To flip bits (i.e. 1’s complement/invert) you can do the following:

Since value ExORed with all 1s results into inversion, for a given bit width you can use ExOR to invert them.

``````In Binary
a=1010 --> this is 0xA or decimal 10
then
c = 1111 ^ a = 0101 --> this is 0xF or decimal 15
-----------------
In Python
a=10
b=15
c = a ^ b --> 0101
print(bin(c)) # gives '0b101'
``````

# 转换为二进制并在Python中保持前导零

## 问题：转换为二进制并在Python中保持前导零

``````bin(1) -> 0b1

# What I would like:
bin(1) -> 0b00000001``````

I’m trying to convert an integer to binary using the bin() function in Python. However, it always removes the leading zeros, which I actually need, such that the result is always 8-bit:

Example:

``````bin(1) -> 0b1

# What I would like:
bin(1) -> 0b00000001
``````

Is there a way of doing this?

## 回答 0

``````>>> format(14, '#010b')
'0b00001110'``````

`format()`函数仅遵循格式规范迷你语言来格式化输入。在`#`使格式包括`0b`前缀，而`010`大小格式的输出，以适应在10个字符宽，与`0`填充; 2个字符`0b`前缀，其他8个二进制数字。

``````>>> value = 14
>>> f'The produced output, in binary, is: {value:#010b}'
'The produced output, in binary, is: 0b00001110'
>>> 'The produced output, in binary, is: {:#010b}'.format(value)
'The produced output, in binary, is: 0b00001110'``````

``````>>> import timeit
>>> timeit.timeit("f_(v, '#010b')", "v = 14; f_ = format")  # use a local for performance
0.40298633499332936
>>> timeit.timeit("f'{v:#010b}'", "v = 14")
0.2850222919951193``````

``````>>> format(14, '08b')
'00001110'``````

Use the `format()` function:

``````>>> format(14, '#010b')
'0b00001110'
``````

The `format()` function simply formats the input following the Format Specification mini language. The `#` makes the format include the `0b` prefix, and the `010` size formats the output to fit in 10 characters width, with `0` padding; 2 characters for the `0b` prefix, the other 8 for the binary digits.

This is the most compact and direct option.

If you are putting the result in a larger string, use an formatted string literal (3.6+) or use `str.format()` and put the second argument for the `format()` function after the colon of the placeholder `{:..}`:

``````>>> value = 14
>>> f'The produced output, in binary, is: {value:#010b}'
'The produced output, in binary, is: 0b00001110'
>>> 'The produced output, in binary, is: {:#010b}'.format(value)
'The produced output, in binary, is: 0b00001110'
``````

As it happens, even for just formatting a single value (so without putting the result in a larger string), using a formatted string literal is faster than using `format()`:

``````>>> import timeit
>>> timeit.timeit("f_(v, '#010b')", "v = 14; f_ = format")  # use a local for performance
0.40298633499332936
>>> timeit.timeit("f'{v:#010b}'", "v = 14")
0.2850222919951193
``````

But I’d use that only if performance in a tight loop matters, as `format(...)` communicates the intent better.

If you did not want the `0b` prefix, simply drop the `#` and adjust the length of the field:

``````>>> format(14, '08b')
'00001110'
``````

## 回答 1

``````>>> '{:08b}'.format(1)
'00000001'``````

``````>>> '{0:08b}'.format(1)
'00000001'      ``````
``````>>> '{:08b}'.format(1)
'00000001'
``````

Note for Python 2.6 or older, you cannot omit the positional argument identifier before `:`, so use

``````>>> '{0:08b}'.format(1)
'00000001'
``````

## 回答 2

``bin(1)[2:].zfill(8)``

``'00000001'``

I am using

``````bin(1)[2:].zfill(8)
``````

will print

``````'00000001'
``````

## 回答 3

``````def binary(num, pre='0b', length=8, spacer=0):
return '{0}{{:{1}>{2}}}'.format(pre, spacer, length).format(bin(num)[2:])``````

``print binary(1)``

``'0b00000001'``

``````def binary(num, length=8):
return format(num, '#0{}b'.format(length + 2))``````

You can use the string formatting mini language:

``````def binary(num, pre='0b', length=8, spacer=0):
return '{0}{{:{1}>{2}}}'.format(pre, spacer, length).format(bin(num)[2:])
``````

Demo:

``````print binary(1)
``````

Output:

``````'0b00000001'
``````

EDIT: based on @Martijn Pieters idea

``````def binary(num, length=8):
return format(num, '#0{}b'.format(length + 2))
``````

## 回答 4

``````>>> var = 23
>>> f"{var:#010b}"
'0b00010111'``````

• `var` 要格式化的变量
• `:` 之后的所有内容都是格式说明符
• `#`使用其他形式（添加`0b`前缀）
• `0` 用零填充
• `10` 填充到总长度不超过10（包括2个字符 `0b`
• `b` 使用二进制表示形式

When using Python `>= 3.6`, the cleanest way is to use f-strings with string formatting:

``````>>> var = 23
>>> f"{var:#010b}"
'0b00010111'
``````

Explanation:

• `var` the variable to format
• `:` everything after this is the format specifier
• `#` use the alternative form (adds the `0b` prefix)
• `0` pad with zeros
• `10` pad to a total length off 10 (this includes the 2 chars for `0b`)
• `b` use binary representation for the number

## 回答 5

``binary = ''.join(['{0:08b}'.format(ord(x)) for x in input])``

Python 3

Sometimes you just want a simple one liner:

``````binary = ''.join(['{0:08b}'.format(ord(x)) for x in input])
``````

Python 3

## 回答 6

``("{:0%db}"%length).format(num)``

You can use something like this

``````("{:0%db}"%length).format(num)
``````

## 回答 7

``'0b'+ '1'.rjust(8,'0)``

you can use rjust string method of python syntax: string.rjust(length, fillchar) fillchar is optional

and for your Question you acn write like this

``````'0b'+ '1'.rjust(8,'0)
``````

so it wil be ‘0b00000001’

## 回答 8

``````print str(1).zfill(2)
print str(10).zfill(2)
print str(100).zfill(2)``````

``````01
10
100``````

You can use zfill:

``````print str(1).zfill(2)
print str(10).zfill(2)
print str(100).zfill(2)
``````

prints:

``````01
10
100
``````

I like this solution, as it helps not only when outputting the number, but when you need to assign it to a variable… e.g. – x = str(datetime.date.today().month).zfill(2) will return x as ’02’ for the month of feb.

# 在python中将整数转换为二进制

## 问题：在python中将整数转换为二进制

``````>>> bin(6)
'0b110'
``````

``````>>> bin(6)[2:]
'110'
``````

In order to convert an integer to a binary, I have used this code :

``````>>> bin(6)
'0b110'
``````

and when to erase the ‘0b’, I use this :

``````>>> bin(6)[2:]
'110'
``````

What can I do if I want to show `6` as `00000110` instead of `110`?

## 回答 0

``````>>> '{0:08b}'.format(6)
'00000110'
``````

• `{}` 将变量放入字符串
• `0` 将变量放在参数位置0
• `:`为该变量添加格式设置选项（否则它将代表小数`6`
• `08` 将数字格式化为左侧零填充的八位数字
• `b` 将数字转换为其二进制表示形式

``````>>> f'{6:08b}'
'00000110'
``````
``````>>> '{0:08b}'.format(6)
'00000110'
``````

Just to explain the parts of the formatting string:

• `{}` places a variable into a string
• `0` takes the variable at argument position 0
• `:` adds formatting options for this variable (otherwise it would represent decimal `6`)
• `08` formats the number to eight digits zero-padded on the left
• `b` converts the number to its binary representation

If you’re using a version of Python 3.6 or above, you can also use f-strings:

``````>>> f'{6:08b}'
'00000110'
``````

## 回答 1

``````>>> bin(6)[2:].zfill(8)
'00000110'
``````

``````>>> f'{6:08b}'
'00000110'
``````

Just another idea:

``````>>> bin(6)[2:].zfill(8)
'00000110'
``````

Shorter way via string interpolation (Python 3.6+):

``````>>> f'{6:08b}'
'00000110'
``````

## 回答 2

``````>>> bin8 = lambda x : ''.join(reversed( [str((x >> i) & 1) for i in range(8)] ) )
>>> bin8(6)
'00000110'
>>> bin8(-3)
'11111101'
``````

A bit twiddling method…

``````>>> bin8 = lambda x : ''.join(reversed( [str((x >> i) & 1) for i in range(8)] ) )
>>> bin8(6)
'00000110'
>>> bin8(-3)
'11111101'
``````

## 回答 3

``````format(6, "08b")
``````

``````format(<the_integer>, "<0><width_of_string><format_specifier>")
``````

Just use the format function

``````format(6, "08b")
``````

The general form is

``````format(<the_integer>, "<0><width_of_string><format_specifier>")
``````

## 回答 4

eumiro的答案更好，但是我只是发布此内容以供参考：

``````>>> "%08d" % int(bin(6)[2:])
00000110
``````

eumiro’s answer is better, however I’m just posting this for variety:

``````>>> "%08d" % int(bin(6)[2:])
00000110
``````

## 回答 5

..或如果不确定该数字始终为8位数字，则可以将其作为参数传递：

``````>>> '%0*d' % (8, int(bin(6)[2:]))
'00000110'
``````

.. or if you’re not sure it should always be 8 digits, you can pass it as a parameter:

``````>>> '%0*d' % (8, int(bin(6)[2:]))
'00000110'
``````

## 回答 6

``````def intoBinary(number):
binarynumber=""
if (number!=0):
while (number>=1):
if (number %2==0):
binarynumber=binarynumber+"0"
number=number/2
else:
binarynumber=binarynumber+"1"
number=(number-1)/2

else:
binarynumber="0"

return "".join(reversed(binarynumber))
``````

Going Old School always works

``````def intoBinary(number):
binarynumber=""
if (number!=0):
while (number>=1):
if (number %2==0):
binarynumber=binarynumber+"0"
number=number/2
else:
binarynumber=binarynumber+"1"
number=(number-1)/2

else:
binarynumber="0"

return "".join(reversed(binarynumber))
``````

## 回答 7

### `numpy.binary_repr(num, width=None)` 有一个魔术宽度论点

``````>>> np.binary_repr(3, width=4)
'0011'
``````

``````>>> np.binary_repr(-3, width=5)
'11101'
``````

### `numpy.binary_repr(num, width=None)` has a magic width argument

Relevant examples from the documentation linked above:

``````>>> np.binary_repr(3, width=4)
'0011'
``````

The two’s complement is returned when the input number is negative and width is specified:

``````>>> np.binary_repr(-3, width=5)
'11101'
``````

## 回答 8

``````('0' * 7 + bin(6)[2:])[-8:]
``````

``````right_side = bin(6)[2:]
'0' * ( 8 - len( right_side )) + right_side
``````
``````('0' * 7 + bin(6)[2:])[-8:]
``````

or

``````right_side = bin(6)[2:]
'0' * ( 8 - len( right_side )) + right_side
``````

## 回答 9

``````import numpy as np
np.binary_repr(6, width=8)
``````

Assuming you want to parse the number of digits used to represent from a variable which is not always constant, a good way will be to use numpy.binary.

could be useful when you apply binary to power sets

``````import numpy as np
np.binary_repr(6, width=8)
``````

## 回答 10

``````my_num = 6
print(f'{my_num:b}')
``````

even an easier way

``````my_num = 6
print(f'{my_num:b}')
``````

## 回答 11

``````format(a, 'b')
``````

``````int('110', 2)
``````

The best way is to specify the format.

``````format(a, 'b')
``````

returns the binary value of a in string format.

To convert a binary string back to integer, use int() function.

``````int('110', 2)
``````

returns integer value of binary string.

## 回答 12

``````def int_to_bin(num, fill):
bin_result = ''

def int_to_binary(number):
nonlocal bin_result
if number > 1:
int_to_binary(number // 2)
bin_result = bin_result + str(number % 2)

int_to_binary(num)
return bin_result.zfill(fill)
``````
``````def int_to_bin(num, fill):
bin_result = ''

def int_to_binary(number):
nonlocal bin_result
if number > 1:
int_to_binary(number // 2)
bin_result = bin_result + str(number % 2)

int_to_binary(num)
return bin_result.zfill(fill)
``````

## 回答 13

``````"{0:b}".format(n)
``````

You can use just:

``````"{0:b}".format(n)
``````

In my opinion this is the easiest way!

# 如何在Python3中将“二进制字符串”转换为普通字符串？

## 问题：如何在Python3中将“二进制字符串”转换为普通字符串？

``````>>> b'a string'
b'a string'``````

``````>>> print(b'a string')
b'a string'
>>> print(str(b'a string'))
b'a string'``````

For example, I have a string like this(return value of `subprocess.check_output`):

``````>>> b'a string'
b'a string'
``````

Whatever I did to it, it is always printed with the annoying `b'` before the string:

``````>>> print(b'a string')
b'a string'
>>> print(str(b'a string'))
b'a string'
``````

Does anyone have any ideas about how to use it as a normal string or convert it into a normal string?

## 回答 0

``````>>> b'a string'.decode('ascii')
'a string'``````

``````>>> 'a string'.encode('ascii')
b'a string'``````

Decode it.

``````>>> b'a string'.decode('ascii')
'a string'
``````

To get bytes from string, encode it.

``````>>> 'a string'.encode('ascii')
b'a string'
``````

## 回答 1

``````>>> b'a string'.decode('utf-8')
'a string'``````

If the answer from falsetru didn’t work you could also try:

``````>>> b'a string'.decode('utf-8')
'a string'
``````

## 回答 2

Please, see oficial `encode()` and `decode()` documentation from `codecs` library. `utf-8` is the default encoding for the functions, but there are severals standard encodings in Python 3, like `latin_1` or `utf_32`.

# 读取二进制文件并遍历每个字节

## 问题：读取二进制文件并遍历每个字节

In Python, how do I read in a binary file and loop over each byte of that file?

## 回答 0

Python 2.4及更早版本

``````f = open("myfile", "rb")
try:
while byte != "":
# Do stuff with byte.
finally:
f.close()``````

Python 2.5-2.7

``````with open("myfile", "rb") as f:
while byte != "":
# Do stuff with byte.

``from __future__ import with_statement``

Python 3

``````with open("myfile", "rb") as f:
while byte != b"":
# Do stuff with byte.

``````with open("myfile", "rb") as f:
while byte:
# Do stuff with byte.

python 3.8

``````with open("myfile", "rb") as f:
while (byte := f.read(1)):
# Do stuff with byte.``````

Python 2.4 and Earlier

``````f = open("myfile", "rb")
try:
while byte != "":
# Do stuff with byte.
finally:
f.close()
``````

Python 2.5-2.7

``````with open("myfile", "rb") as f:
while byte != "":
# Do stuff with byte.
``````

Note that the with statement is not available in versions of Python below 2.5. To use it in v 2.5 you’ll need to import it:

``````from __future__ import with_statement
``````

In 2.6 this is not needed.

Python 3

In Python 3, it’s a bit different. We will no longer get raw characters from the stream in byte mode but byte objects, thus we need to alter the condition:

``````with open("myfile", "rb") as f:
while byte != b"":
# Do stuff with byte.
``````

Or as benhoyt says, skip the not equal and take advantage of the fact that `b""` evaluates to false. This makes the code compatible between 2.6 and 3.x without any changes. It would also save you from changing the condition if you go from byte mode to text or the reverse.

``````with open("myfile", "rb") as f:
while byte:
# Do stuff with byte.
``````

python 3.8

From now on thanks to := operator the above code can be written in a shorter way.

``````with open("myfile", "rb") as f:
while (byte := f.read(1)):
# Do stuff with byte.
``````

## 回答 1

``````def bytes_from_file(filename, chunksize=8192):
with open(filename, "rb") as f:
while True:
if chunk:
for b in chunk:
yield b
else:
break

# example:
for b in bytes_from_file('filename'):
do_stuff_with(b)``````

This generator yields bytes from a file, reading the file in chunks:

``````def bytes_from_file(filename, chunksize=8192):
with open(filename, "rb") as f:
while True:
if chunk:
for b in chunk:
yield b
else:
break

# example:
for b in bytes_from_file('filename'):
do_stuff_with(b)
``````

See the Python documentation for information on iterators and generators.

## 回答 2

``````with open("filename", "rb") as f:
for b in bytes_read:
process_byte(b)``````

``````with open("filename", "rb") as f:
for b in bytes_read:
process_byte(b)

`with`语句在Python 2.5及更高版本中可用。

If the file is not too big that holding it in memory is a problem:

``````with open("filename", "rb") as f:
for b in bytes_read:
process_byte(b)
``````

where process_byte represents some operation you want to perform on the passed-in byte.

If you want to process a chunk at a time:

``````with open("filename", "rb") as f:
for b in bytes_read:
process_byte(b)
``````

The `with` statement is available in Python 2.5 and greater.

## 回答 3

``````with open(filename, 'rb') as file:
for byte in iter(lambda: file.read(1), b''):
# Do stuff with byte``````

`with`-statement自动关闭文件-包括下面的代码引发异常的情况。

``````#!/usr/bin/env python3
"""Discard all input. `cat > /dev/null` analog."""
import sys
from functools import partial
from collections import deque

chunksize = int(sys.argv[1]) if len(sys.argv) > 1 else (1 << 15)
deque(iter(partial(sys.stdin.detach().read, chunksize), b''), maxlen=0)``````

``\$ dd if=/dev/zero bs=1M count=1000 | python3 blackhole.py``

`mmap`允许您同时将文件视为`bytearray`和文件对象。如果您需要访问两个接口，它可以作为将整个文件加载到内存中的替代方法。特别是，您可以仅使用普通`for`-loop 一次遍历一个内存映射文件一个字节：

``````from mmap import ACCESS_READ, mmap

with open(filename, 'rb', 0) as f, mmap(f.fileno(), 0, access=ACCESS_READ) as s:
for byte in s: # length is equal to the current file size
# Do stuff with byte``````

`mmap`支持切片符号。例如，从文件中从position开始`mm[i:i+len]`返回`len`字节`i`。Python 3.2之前不支持上下文管理器协议。`mm.close()`在这种情况下，您需要显式调用。使用遍历每个字节`mmap`比消耗更多的内存`file.read(1)`，但`mmap`速度要快一个数量级。

To read a file — one byte at a time (ignoring the buffering) — you could use the two-argument `iter(callable, sentinel)` built-in function:

``````with open(filename, 'rb') as file:
for byte in iter(lambda: file.read(1), b''):
# Do stuff with byte
``````

It calls `file.read(1)` until it returns nothing `b''` (empty bytestring). The memory doesn’t grow unlimited for large files. You could pass `buffering=0` to `open()`, to disable the buffering — it guarantees that only one byte is read per iteration (slow).

`with`-statement closes the file automatically — including the case when the code underneath raises an exception.

Despite the presence of internal buffering by default, it is still inefficient to process one byte at a time. For example, here’s the `blackhole.py` utility that eats everything it is given:

``````#!/usr/bin/env python3
"""Discard all input. `cat > /dev/null` analog."""
import sys
from functools import partial
from collections import deque

chunksize = int(sys.argv[1]) if len(sys.argv) > 1 else (1 << 15)
deque(iter(partial(sys.stdin.detach().read, chunksize), b''), maxlen=0)
``````

Example:

``````\$ dd if=/dev/zero bs=1M count=1000 | python3 blackhole.py
``````

It processes ~1.5 GB/s when `chunksize == 32768` on my machine and only ~7.5 MB/s when `chunksize == 1`. That is, it is 200 times slower to read one byte at a time. Take it into account if you can rewrite your processing to use more than one byte at a time and if you need performance.

`mmap` allows you to treat a file as a `bytearray` and a file object simultaneously. It can serve as an alternative to loading the whole file in memory if you need access both interfaces. In particular, you can iterate one byte at a time over a memory-mapped file just using a plain `for`-loop:

``````from mmap import ACCESS_READ, mmap

with open(filename, 'rb', 0) as f, mmap(f.fileno(), 0, access=ACCESS_READ) as s:
for byte in s: # length is equal to the current file size
# Do stuff with byte
``````

`mmap` supports the slice notation. For example, `mm[i:i+len]` returns `len` bytes from the file starting at position `i`. The context manager protocol is not supported before Python 3.2; you need to call `mm.close()` explicitly in this case. Iterating over each byte using `mmap` consumes more memory than `file.read(1)`, but `mmap` is an order of magnitude faster.

## 用Python读取二进制文件并遍历每个字节

Python 3.5中的新功能是 `pathlib`模块，它具有一种便捷的方法，专门用于将文件读取为字节，从而允许我们遍历字节。我认为这是一个不错的回答（如果又快又脏）：

``````import pathlib

for byte in pathlib.Path(path).read_bytes():
print(byte)``````

``````with open(path, 'b') as file:
for byte in file.read():
print(byte)``````

``````with open(path, 'b') as file:
callable = lambda: file.read(1024)
sentinel = bytes() # or b''
for chunk in iter(callable, sentinel):
for byte in chunk:
print(byte)``````

（其他几个答案都提到了这一点，但很少有人提供合理的读取大小。）

## 大文件或缓冲/交互式读取的最佳实践

``````from pathlib import Path
from functools import partial
from io import DEFAULT_BUFFER_SIZE

def file_byte_iterator(path):
"""given a path, return an iterator over the file
that lazily loads the file
"""
path = Path(path)
with path.open('rb') as file:
file_iterator = iter(reader, bytes())
for chunk in file_iterator:
yield from chunk``````

### 最佳实践演示：

``````import random
import pathlib
path = 'pseudorandom_bytes'
pathobj = pathlib.Path(path)

pathobj.write_bytes(
bytes(random.randint(0, 255) for _ in range(2**20)))``````

``````>>> l = list(file_byte_iterator(path))
>>> len(l)
1048576``````

``````>>> l[-100:]
[208, 5, 156, 186, 58, 107, 24, 12, 75, 15, 1, 252, 216, 183, 235, 6, 136, 50, 222, 218, 7, 65, 234, 129, 240, 195, 165, 215, 245, 201, 222, 95, 87, 71, 232, 235, 36, 224, 190, 185, 12, 40, 131, 54, 79, 93, 210, 6, 154, 184, 82, 222, 80, 141, 117, 110, 254, 82, 29, 166, 91, 42, 232, 72, 231, 235, 33, 180, 238, 29, 61, 250, 38, 86, 120, 38, 49, 141, 17, 190, 191, 107, 95, 223, 222, 162, 116, 153, 232, 85, 100, 97, 41, 61, 219, 233, 237, 55, 246, 181]
>>> l[:100]
[28, 172, 79, 126, 36, 99, 103, 191, 146, 225, 24, 48, 113, 187, 48, 185, 31, 142, 216, 187, 27, 146, 215, 61, 111, 218, 171, 4, 160, 250, 110, 51, 128, 106, 3, 10, 116, 123, 128, 31, 73, 152, 58, 49, 184, 223, 17, 176, 166, 195, 6, 35, 206, 206, 39, 231, 89, 249, 21, 112, 168, 4, 88, 169, 215, 132, 255, 168, 129, 127, 60, 252, 244, 160, 80, 155, 246, 147, 234, 227, 157, 137, 101, 84, 115, 103, 77, 44, 84, 134, 140, 77, 224, 176, 242, 254, 171, 115, 193, 29]``````

### 不要逐行迭代二进制文件

``````    with open(path, 'rb') as file:
for chunk in file: # text newline iteration - not for bytes
yield from chunk``````

## Reading binary file in Python and looping over each byte

New in Python 3.5 is the `pathlib` module, which has a convenience method specifically to read in a file as bytes, allowing us to iterate over the bytes. I consider this a decent (if quick and dirty) answer:

``````import pathlib

for byte in pathlib.Path(path).read_bytes():
print(byte)
``````

Interesting that this is the only answer to mention `pathlib`.

In Python 2, you probably would do this (as Vinay Sajip also suggests):

``````with open(path, 'b') as file:
for byte in file.read():
print(byte)
``````

In the case that the file may be too large to iterate over in-memory, you would chunk it, idiomatically, using the `iter` function with the `callable, sentinel` signature – the Python 2 version:

``````with open(path, 'b') as file:
callable = lambda: file.read(1024)
sentinel = bytes() # or b''
for chunk in iter(callable, sentinel):
for byte in chunk:
print(byte)
``````

(Several other answers mention this, but few offer a sensible read size.)

## Best practice for large files or buffered/interactive reading

Let’s create a function to do this, including idiomatic uses of the standard library for Python 3.5+:

``````from pathlib import Path
from functools import partial
from io import DEFAULT_BUFFER_SIZE

def file_byte_iterator(path):
"""given a path, return an iterator over the file
that lazily loads the file
"""
path = Path(path)
with path.open('rb') as file:
file_iterator = iter(reader, bytes())
for chunk in file_iterator:
yield from chunk
``````

Note that we use `file.read1`. `file.read` blocks until it gets all the bytes requested of it or `EOF`. `file.read1` allows us to avoid blocking, and it can return more quickly because of this. No other answers mention this as well.

### Demonstration of best practice usage:

Let’s make a file with a megabyte (actually mebibyte) of pseudorandom data:

``````import random
import pathlib
path = 'pseudorandom_bytes'
pathobj = pathlib.Path(path)

pathobj.write_bytes(
bytes(random.randint(0, 255) for _ in range(2**20)))
``````

Now let’s iterate over it and materialize it in memory:

``````>>> l = list(file_byte_iterator(path))
>>> len(l)
1048576
``````

We can inspect any part of the data, for example, the last 100 and first 100 bytes:

``````>>> l[-100:]
[208, 5, 156, 186, 58, 107, 24, 12, 75, 15, 1, 252, 216, 183, 235, 6, 136, 50, 222, 218, 7, 65, 234, 129, 240, 195, 165, 215, 245, 201, 222, 95, 87, 71, 232, 235, 36, 224, 190, 185, 12, 40, 131, 54, 79, 93, 210, 6, 154, 184, 82, 222, 80, 141, 117, 110, 254, 82, 29, 166, 91, 42, 232, 72, 231, 235, 33, 180, 238, 29, 61, 250, 38, 86, 120, 38, 49, 141, 17, 190, 191, 107, 95, 223, 222, 162, 116, 153, 232, 85, 100, 97, 41, 61, 219, 233, 237, 55, 246, 181]
>>> l[:100]
[28, 172, 79, 126, 36, 99, 103, 191, 146, 225, 24, 48, 113, 187, 48, 185, 31, 142, 216, 187, 27, 146, 215, 61, 111, 218, 171, 4, 160, 250, 110, 51, 128, 106, 3, 10, 116, 123, 128, 31, 73, 152, 58, 49, 184, 223, 17, 176, 166, 195, 6, 35, 206, 206, 39, 231, 89, 249, 21, 112, 168, 4, 88, 169, 215, 132, 255, 168, 129, 127, 60, 252, 244, 160, 80, 155, 246, 147, 234, 227, 157, 137, 101, 84, 115, 103, 77, 44, 84, 134, 140, 77, 224, 176, 242, 254, 171, 115, 193, 29]
``````

### Don’t iterate by lines for binary files

Don’t do the following – this pulls a chunk of arbitrary size until it gets to a newline character – too slow when the chunks are too small, and possibly too large as well:

``````    with open(path, 'rb') as file:
for chunk in file: # text newline iteration - not for bytes
yield from chunk
``````

The above is only good for what are semantically human readable text files (like plain text, code, markup, markdown etc… essentially anything ascii, utf, latin, etc… encoded) that you should open without the `'b'` flag.

## 回答 5

``````with open("myfile", "rb") as f:
while True:
if not byte:
break
do_stuff_with(ord(byte))``````

• 内部python缓冲区-无需读取块
• 干式原理-不要重复读取行
• with语句可确保关闭文件干净
• 如果没有更多的字节（不是字节为零），则“ byte”的计算结果为false

``````from functools import partial

with open(filename, 'rb') as file:
for byte in iter(partial(file.read, 1), b''):
# Do stuff with byte``````

``````def bytes_from_file(filename):
with open(filename, "rb") as f:
while True:
if not byte:
break
yield(ord(byte))

# example:
for b in bytes_from_file('filename'):
do_stuff_with(b)``````

To sum up all the brilliant points of chrispy, Skurmedel, Ben Hoyt and Peter Hansen, this would be the optimal solution for processing a binary file one byte at a time:

``````with open("myfile", "rb") as f:
while True:
if not byte:
break
do_stuff_with(ord(byte))
``````

For python versions 2.6 and above, because:

• python buffers internally – no need to read chunks
• DRY principle – do not repeat the read line
• with statement ensures a clean file close
• ‘byte’ evaluates to false when there are no more bytes (not when a byte is zero)

Or use J. F. Sebastians solution for improved speed

``````from functools import partial

with open(filename, 'rb') as file:
for byte in iter(partial(file.read, 1), b''):
# Do stuff with byte
``````

Or if you want it as a generator function like demonstrated by codeape:

``````def bytes_from_file(filename):
with open(filename, "rb") as f:
while True:
if not byte:
break
yield(ord(byte))

# example:
for b in bytes_from_file('filename'):
do_stuff_with(b)
``````

## 回答 6

Python 3，一次读取所有文件：

``````with open("filename", "rb") as binary_file:
# Read the whole file at once
print(data)``````

Python 3, read all of the file at once:

``````with open("filename", "rb") as binary_file:
# Read the whole file at once
print(data)
``````

You can iterate whatever you want using `data` variable.

## 回答 7

``````import numpy as np

file = "binary_file.bin"
data = np.fromfile(file, 'u1')``````

After trying all the above and using the answer from @Aaron Hall, I was getting memory errors for a ~90 Mb file on a computer running Window 10, 8 Gb RAM and Python 3.5 32-bit. I was recommended by a colleague to use `numpy` instead and it works wonders.

By far, the fastest to read an entire binary file (that I have tested) is:

``````import numpy as np

file = "binary_file.bin"
data = np.fromfile(file, 'u1')
``````

Reference

Multitudes faster than any other methods so far. Hope it helps someone!

## 回答 8

``````>>> struct.unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)``````

If you have a lot of binary data to read, you might want to consider the struct module. It is documented as converting “between C and Python types”, but of course, bytes are bytes, and whether those were created as C types does not matter. For example, if your binary data contains two 2-byte integers and one 4-byte integer, you can read them as follows (example taken from `struct` documentation):

``````>>> struct.unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
``````

You might find this more convenient, faster, or both, than explicitly looping over the content of a file.

## 回答 9

``````Fastest to slowest execution speeds with 32-bit Python 2.7.16
numpy version 1.16.5
Test file size: 1,024 KiB
100 executions, best of 3 repetitions

1                  Tcll (array.array) :   3.8943 secs, rel speed   1.00x,   0.00% slower (262.95 KiB/sec)
2  Vinay Sajip (read all into memory) :   4.1164 secs, rel speed   1.06x,   5.71% slower (248.76 KiB/sec)
3            codeape + iter + partial :   4.1616 secs, rel speed   1.07x,   6.87% slower (246.06 KiB/sec)
4                             codeape :   4.1889 secs, rel speed   1.08x,   7.57% slower (244.46 KiB/sec)
5               Vinay Sajip (chunked) :   4.1977 secs, rel speed   1.08x,   7.79% slower (243.94 KiB/sec)
6           Aaron Hall (Py 2 version) :   4.2417 secs, rel speed   1.09x,   8.92% slower (241.41 KiB/sec)
7                     gerrit (struct) :   4.2561 secs, rel speed   1.09x,   9.29% slower (240.59 KiB/sec)
8                     Rick M. (numpy) :   8.1398 secs, rel speed   2.09x, 109.02% slower (125.80 KiB/sec)
9                           Skurmedel :  31.3264 secs, rel speed   8.04x, 704.42% slower ( 32.69 KiB/sec)

Benchmark runtime (min:sec) - 03:26``````

``````Fastest to slowest execution speeds with 32-bit Python 3.8.0
numpy version 1.17.4
Test file size: 1,024 KiB
100 executions, best of 3 repetitions

1  Vinay Sajip + "yield from" + "walrus operator" :   3.5235 secs, rel speed   1.00x,   0.00% slower (290.62 KiB/sec)
2                       Aaron Hall + "yield from" :   3.5284 secs, rel speed   1.00x,   0.14% slower (290.22 KiB/sec)
3         codeape + iter + partial + "yield from" :   3.5303 secs, rel speed   1.00x,   0.19% slower (290.06 KiB/sec)
4                      Vinay Sajip + "yield from" :   3.5312 secs, rel speed   1.00x,   0.22% slower (289.99 KiB/sec)
5      codeape + "yield from" + "walrus operator" :   3.5370 secs, rel speed   1.00x,   0.38% slower (289.51 KiB/sec)
6                          codeape + "yield from" :   3.5390 secs, rel speed   1.00x,   0.44% slower (289.35 KiB/sec)
7                                      jfs (mmap) :   4.0612 secs, rel speed   1.15x,  15.26% slower (252.14 KiB/sec)
8              Vinay Sajip (read all into memory) :   4.5948 secs, rel speed   1.30x,  30.40% slower (222.86 KiB/sec)
9                        codeape + iter + partial :   4.5994 secs, rel speed   1.31x,  30.54% slower (222.64 KiB/sec)
10                                        codeape :   4.5995 secs, rel speed   1.31x,  30.54% slower (222.63 KiB/sec)
11                          Vinay Sajip (chunked) :   4.6110 secs, rel speed   1.31x,  30.87% slower (222.08 KiB/sec)
12                      Aaron Hall (Py 2 version) :   4.6292 secs, rel speed   1.31x,  31.38% slower (221.20 KiB/sec)
13                             Tcll (array.array) :   4.8627 secs, rel speed   1.38x,  38.01% slower (210.58 KiB/sec)
14                                gerrit (struct) :   5.0816 secs, rel speed   1.44x,  44.22% slower (201.51 KiB/sec)
15                 Rick M. (numpy) + "yield from" :  11.8084 secs, rel speed   3.35x, 235.13% slower ( 86.72 KiB/sec)
16                                      Skurmedel :  11.8806 secs, rel speed   3.37x, 237.18% slower ( 86.19 KiB/sec)
17                                Rick M. (numpy) :  13.3860 secs, rel speed   3.80x, 279.91% slower ( 76.50 KiB/sec)

Benchmark runtime (min:sec) - 04:47``````

``````from __future__ import print_function
import array
import atexit
from collections import deque, namedtuple
import io
from mmap import ACCESS_READ, mmap
import numpy as np
from operator import attrgetter
import os
import random
import struct
import sys
import tempfile
from textwrap import dedent
import time
import timeit
import traceback

try:
xrange
except NameError:  # Python 3
xrange = range

class KiB(int):
""" KibiBytes - multiples of the byte units for quantities of information. """
def __new__(self, value=0):
return 1024*value

BIG_TEST_FILE = 1  # MiBs or 0 for a small file.
SML_TEST_FILE = KiB(64)
EXECUTIONS = 100  # Number of times each "algorithm" is executed per timing run.
TIMINGS = 3  # Number of timing runs.
CHUNK_SIZE = KiB(8)
if BIG_TEST_FILE:
FILE_SIZE = KiB(1024) * BIG_TEST_FILE
else:
FILE_SIZE = SML_TEST_FILE  # For quicker testing.

# Common setup for all algorithms -- prefixed to each algorithm's setup.
COMMON_SETUP = dedent("""
# Make accessible in algorithms.
from __main__ import array, deque, get_buffer_size, mmap, np, struct
from __main__ import ACCESS_READ, CHUNK_SIZE, FILE_SIZE, TEMP_FILENAME
from functools import partial
try:
xrange
except NameError:  # Python 3
xrange = range
""")

def get_buffer_size(path):
""" Determine optimal buffer size for reading files. """
st = os.stat(path)
try:
bufsize = st.st_blksize # Available on some Unix systems (like Linux)
except AttributeError:
bufsize = io.DEFAULT_BUFFER_SIZE
return bufsize

# Utility primarily for use when embedding additional algorithms into benchmark.
# Verify generator reads correct number of bytes (assumes values are correct).
bytes_read = sum(1 for _ in file_byte_iterator(TEMP_FILENAME))
assert bytes_read == FILE_SIZE, \
'Wrong number of bytes generated: got {:,} instead of {:,}'.format(
"""

TIMING = namedtuple('TIMING', 'label, exec_time')

class Algorithm(namedtuple('CodeFragments', 'setup, test')):

# Default timeit "stmt" code fragment.
_TEST = """
#for b in file_byte_iterator(TEMP_FILENAME):  # Loop over every byte.
#    pass  # Do stuff with byte...
deque(file_byte_iterator(TEMP_FILENAME), maxlen=0)  # Data sink.
"""

# Must overload __new__ because (named)tuples are immutable.
def __new__(cls, setup, test=None):
""" Dedent (unindent) code fragment string arguments.
Args:
`setup` -- Code fragment that defines things used by `test` code.
In this case it should define a generator function named
`file_byte_iterator()` that will be passed that name of a test file
of binary data. This code is not timed.
`test` -- Code fragment that uses things defined in `setup` code.
Defaults to _TEST. This is the code that's timed.
"""
test =  cls._TEST if test is None else test  # Use default unless one is provided.

# Uncomment to replace all performance tests with one that verifies the correct
# number of bytes values are being generated by the file_byte_iterator function.

return tuple.__new__(cls, (dedent(setup), dedent(test)))

algorithms = {

'Aaron Hall (Py 2 version)': Algorithm("""
def file_byte_iterator(path):
with open(path, "rb") as file:
callable = partial(file.read, 1024)
sentinel = bytes() # or b''
for chunk in iter(callable, sentinel):
for byte in chunk:
yield byte
"""),

"codeape": Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while True:
if chunk:
for b in chunk:
yield b
else:
break
"""),

"codeape + iter + partial": Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
for chunk in iter(partial(f.read, chunksize), b''):
for b in chunk:
yield b
"""),

"gerrit (struct)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
fmt = '{}B'.format(FILE_SIZE)  # Reads entire file at once.
for b in struct.unpack(fmt, f.read()):
yield b
"""),

'Rick M. (numpy)': Algorithm("""
def file_byte_iterator(filename):
for byte in np.fromfile(filename, 'u1'):
yield byte
"""),

"Skurmedel": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
while byte:
yield byte
"""),

"Tcll (array.array)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
arr = array.array('B')
arr.fromfile(f, FILE_SIZE)  # Reads entire file at once.
for b in arr:
yield b
"""),

"Vinay Sajip (read all into memory)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
for b in bytes_read:
yield b
"""),

"Vinay Sajip (chunked)": Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk:
for b in chunk:
yield b
"""),

}  # End algorithms

#
# Versions of algorithms that will only work in certain releases (or better) of Python.
#
if sys.version_info >= (3, 3):
algorithms.update({

'codeape + iter + partial + "yield from"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
for chunk in iter(partial(f.read, chunksize), b''):
yield from chunk
"""),

'codeape + "yield from"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while True:
if chunk:
yield from chunk
else:
break
"""),

"jfs (mmap)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f, \
mmap(f.fileno(), 0, access=ACCESS_READ) as s:
yield from s
"""),

'Rick M. (numpy) + "yield from"': Algorithm("""
def file_byte_iterator(filename):
#    data = np.fromfile(filename, 'u1')
yield from np.fromfile(filename, 'u1')
"""),

'Vinay Sajip + "yield from"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk:
yield from chunk  # Added in Py 3.3
"""),

})  # End Python 3.3 update.

if sys.version_info >= (3, 5):
algorithms.update({

'Aaron Hall + "yield from"': Algorithm("""
from pathlib import Path

def file_byte_iterator(path):
''' Given a path, return an iterator over the file
that lazily loads the file.
'''
path = Path(path)
bufsize = get_buffer_size(path)

with path.open('rb') as file:
for chunk in iter(reader, bytes()):
yield from chunk
"""),

})  # End Python 3.5 update.

if sys.version_info >= (3, 8, 0):
algorithms.update({

'Vinay Sajip + "yield from" + "walrus operator"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk := f.read(chunksize):
yield from chunk  # Added in Py 3.3
"""),

'codeape + "yield from" + "walrus operator"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk := f.read(chunksize):
yield from chunk
"""),

})  # End Python 3.8.0 update.update.

#### Main ####

def main():
global TEMP_FILENAME

def cleanup():
""" Clean up after testing is completed. """
try:
os.remove(TEMP_FILENAME)  # Delete the temporary file.
except Exception:
pass

atexit.register(cleanup)

# Create a named temporary binary file of pseudo-random bytes for testing.
fd, TEMP_FILENAME = tempfile.mkstemp('.bin')
with os.fdopen(fd, 'wb') as file:
os.write(fd, bytearray(random.randrange(256) for _ in range(FILE_SIZE)))

# Execute and time each algorithm, gather results.
start_time = time.time()  # To determine how long testing itself takes.

timings = []
for label in algorithms:
try:
timing = TIMING(label,
min(timeit.repeat(algorithms[label].test,
setup=COMMON_SETUP + algorithms[label].setup,
repeat=TIMINGS, number=EXECUTIONS)))
except Exception as exc:
print('{} occurred timing the algorithm: "{}"\n  {}'.format(
type(exc).__name__, label, exc))
traceback.print_exc(file=sys.stdout)  # Redirect to stdout.
sys.exit(1)
timings.append(timing)

# Report results.
print('Fastest to slowest execution speeds with {}-bit Python {}.{}.{}'.format(
64 if sys.maxsize > 2**32 else 32, *sys.version_info[:3]))
print('  numpy version {}'.format(np.version.full_version))
print('  Test file size: {:,} KiB'.format(FILE_SIZE // KiB(1)))
print('  {:,d} executions, best of {:d} repetitions'.format(EXECUTIONS, TIMINGS))
print()

longest = max(len(timing.label) for timing in timings)  # Len of longest identifier.
ranked = sorted(timings, key=attrgetter('exec_time')) # Sort so fastest is first.
fastest = ranked[0].exec_time
for rank, timing in enumerate(ranked, 1):
print('{:<2d} {:>{width}} : {:8.4f} secs, rel speed {:6.2f}x, {:6.2f}% slower '
'({:6.2f} KiB/sec)'.format(
rank,
timing.label, timing.exec_time, round(timing.exec_time/fastest, 2),
round((timing.exec_time/fastest - 1) * 100, 2),
(FILE_SIZE/timing.exec_time) / KiB(1),  # per sec.
width=longest))
print()
mins, secs = divmod(time.time()-start_time, 60)
print('Benchmark runtime (min:sec) - {:02d}:{:02d}'.format(int(mins),
int(round(secs))))

main()``````

This post itself is not a direct answer to the question. What it is instead is a data-driven extensible benchmark that can be used to compare many of the answers (and variations of utilizing new features added in later, more modern, versions of Python) that have been posted to this question — and should therefore be helpful in determining which has the best performance.

In a few cases I’ve modified the code in the referenced answer to make it compatible with the benchmark framework.

First, here are the results for what currently are the latest versions of Python 2 & 3:

``````Fastest to slowest execution speeds with 32-bit Python 2.7.16
numpy version 1.16.5
Test file size: 1,024 KiB
100 executions, best of 3 repetitions

1                  Tcll (array.array) :   3.8943 secs, rel speed   1.00x,   0.00% slower (262.95 KiB/sec)
2  Vinay Sajip (read all into memory) :   4.1164 secs, rel speed   1.06x,   5.71% slower (248.76 KiB/sec)
3            codeape + iter + partial :   4.1616 secs, rel speed   1.07x,   6.87% slower (246.06 KiB/sec)
4                             codeape :   4.1889 secs, rel speed   1.08x,   7.57% slower (244.46 KiB/sec)
5               Vinay Sajip (chunked) :   4.1977 secs, rel speed   1.08x,   7.79% slower (243.94 KiB/sec)
6           Aaron Hall (Py 2 version) :   4.2417 secs, rel speed   1.09x,   8.92% slower (241.41 KiB/sec)
7                     gerrit (struct) :   4.2561 secs, rel speed   1.09x,   9.29% slower (240.59 KiB/sec)
8                     Rick M. (numpy) :   8.1398 secs, rel speed   2.09x, 109.02% slower (125.80 KiB/sec)
9                           Skurmedel :  31.3264 secs, rel speed   8.04x, 704.42% slower ( 32.69 KiB/sec)

Benchmark runtime (min:sec) - 03:26
``````

``````Fastest to slowest execution speeds with 32-bit Python 3.8.0
numpy version 1.17.4
Test file size: 1,024 KiB
100 executions, best of 3 repetitions

1  Vinay Sajip + "yield from" + "walrus operator" :   3.5235 secs, rel speed   1.00x,   0.00% slower (290.62 KiB/sec)
2                       Aaron Hall + "yield from" :   3.5284 secs, rel speed   1.00x,   0.14% slower (290.22 KiB/sec)
3         codeape + iter + partial + "yield from" :   3.5303 secs, rel speed   1.00x,   0.19% slower (290.06 KiB/sec)
4                      Vinay Sajip + "yield from" :   3.5312 secs, rel speed   1.00x,   0.22% slower (289.99 KiB/sec)
5      codeape + "yield from" + "walrus operator" :   3.5370 secs, rel speed   1.00x,   0.38% slower (289.51 KiB/sec)
6                          codeape + "yield from" :   3.5390 secs, rel speed   1.00x,   0.44% slower (289.35 KiB/sec)
7                                      jfs (mmap) :   4.0612 secs, rel speed   1.15x,  15.26% slower (252.14 KiB/sec)
8              Vinay Sajip (read all into memory) :   4.5948 secs, rel speed   1.30x,  30.40% slower (222.86 KiB/sec)
9                        codeape + iter + partial :   4.5994 secs, rel speed   1.31x,  30.54% slower (222.64 KiB/sec)
10                                        codeape :   4.5995 secs, rel speed   1.31x,  30.54% slower (222.63 KiB/sec)
11                          Vinay Sajip (chunked) :   4.6110 secs, rel speed   1.31x,  30.87% slower (222.08 KiB/sec)
12                      Aaron Hall (Py 2 version) :   4.6292 secs, rel speed   1.31x,  31.38% slower (221.20 KiB/sec)
13                             Tcll (array.array) :   4.8627 secs, rel speed   1.38x,  38.01% slower (210.58 KiB/sec)
14                                gerrit (struct) :   5.0816 secs, rel speed   1.44x,  44.22% slower (201.51 KiB/sec)
15                 Rick M. (numpy) + "yield from" :  11.8084 secs, rel speed   3.35x, 235.13% slower ( 86.72 KiB/sec)
16                                      Skurmedel :  11.8806 secs, rel speed   3.37x, 237.18% slower ( 86.19 KiB/sec)
17                                Rick M. (numpy) :  13.3860 secs, rel speed   3.80x, 279.91% slower ( 76.50 KiB/sec)

Benchmark runtime (min:sec) - 04:47
``````

I also ran it with a much larger 10 MiB test file (which took nearly an hour to run) and got performance results which were comparable to those shown above.

Here’s the code used to do the benchmarking:

``````from __future__ import print_function
import array
import atexit
from collections import deque, namedtuple
import io
from mmap import ACCESS_READ, mmap
import numpy as np
from operator import attrgetter
import os
import random
import struct
import sys
import tempfile
from textwrap import dedent
import time
import timeit
import traceback

try:
xrange
except NameError:  # Python 3
xrange = range

class KiB(int):
""" KibiBytes - multiples of the byte units for quantities of information. """
def __new__(self, value=0):
return 1024*value

BIG_TEST_FILE = 1  # MiBs or 0 for a small file.
SML_TEST_FILE = KiB(64)
EXECUTIONS = 100  # Number of times each "algorithm" is executed per timing run.
TIMINGS = 3  # Number of timing runs.
CHUNK_SIZE = KiB(8)
if BIG_TEST_FILE:
FILE_SIZE = KiB(1024) * BIG_TEST_FILE
else:
FILE_SIZE = SML_TEST_FILE  # For quicker testing.

# Common setup for all algorithms -- prefixed to each algorithm's setup.
COMMON_SETUP = dedent("""
# Make accessible in algorithms.
from __main__ import array, deque, get_buffer_size, mmap, np, struct
from __main__ import ACCESS_READ, CHUNK_SIZE, FILE_SIZE, TEMP_FILENAME
from functools import partial
try:
xrange
except NameError:  # Python 3
xrange = range
""")

def get_buffer_size(path):
""" Determine optimal buffer size for reading files. """
st = os.stat(path)
try:
bufsize = st.st_blksize # Available on some Unix systems (like Linux)
except AttributeError:
bufsize = io.DEFAULT_BUFFER_SIZE
return bufsize

# Utility primarily for use when embedding additional algorithms into benchmark.
# Verify generator reads correct number of bytes (assumes values are correct).
bytes_read = sum(1 for _ in file_byte_iterator(TEMP_FILENAME))
assert bytes_read == FILE_SIZE, \
'Wrong number of bytes generated: got {:,} instead of {:,}'.format(
"""

TIMING = namedtuple('TIMING', 'label, exec_time')

class Algorithm(namedtuple('CodeFragments', 'setup, test')):

# Default timeit "stmt" code fragment.
_TEST = """
#for b in file_byte_iterator(TEMP_FILENAME):  # Loop over every byte.
#    pass  # Do stuff with byte...
deque(file_byte_iterator(TEMP_FILENAME), maxlen=0)  # Data sink.
"""

# Must overload __new__ because (named)tuples are immutable.
def __new__(cls, setup, test=None):
""" Dedent (unindent) code fragment string arguments.
Args:
`setup` -- Code fragment that defines things used by `test` code.
In this case it should define a generator function named
`file_byte_iterator()` that will be passed that name of a test file
of binary data. This code is not timed.
`test` -- Code fragment that uses things defined in `setup` code.
Defaults to _TEST. This is the code that's timed.
"""
test =  cls._TEST if test is None else test  # Use default unless one is provided.

# Uncomment to replace all performance tests with one that verifies the correct
# number of bytes values are being generated by the file_byte_iterator function.

return tuple.__new__(cls, (dedent(setup), dedent(test)))

algorithms = {

'Aaron Hall (Py 2 version)': Algorithm("""
def file_byte_iterator(path):
with open(path, "rb") as file:
callable = partial(file.read, 1024)
sentinel = bytes() # or b''
for chunk in iter(callable, sentinel):
for byte in chunk:
yield byte
"""),

"codeape": Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while True:
if chunk:
for b in chunk:
yield b
else:
break
"""),

"codeape + iter + partial": Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
for chunk in iter(partial(f.read, chunksize), b''):
for b in chunk:
yield b
"""),

"gerrit (struct)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
fmt = '{}B'.format(FILE_SIZE)  # Reads entire file at once.
for b in struct.unpack(fmt, f.read()):
yield b
"""),

'Rick M. (numpy)': Algorithm("""
def file_byte_iterator(filename):
for byte in np.fromfile(filename, 'u1'):
yield byte
"""),

"Skurmedel": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
while byte:
yield byte
"""),

"Tcll (array.array)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
arr = array.array('B')
arr.fromfile(f, FILE_SIZE)  # Reads entire file at once.
for b in arr:
yield b
"""),

"Vinay Sajip (read all into memory)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f:
for b in bytes_read:
yield b
"""),

"Vinay Sajip (chunked)": Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk:
for b in chunk:
yield b
"""),

}  # End algorithms

#
# Versions of algorithms that will only work in certain releases (or better) of Python.
#
if sys.version_info >= (3, 3):
algorithms.update({

'codeape + iter + partial + "yield from"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
for chunk in iter(partial(f.read, chunksize), b''):
yield from chunk
"""),

'codeape + "yield from"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while True:
if chunk:
yield from chunk
else:
break
"""),

"jfs (mmap)": Algorithm("""
def file_byte_iterator(filename):
with open(filename, "rb") as f, \
mmap(f.fileno(), 0, access=ACCESS_READ) as s:
yield from s
"""),

'Rick M. (numpy) + "yield from"': Algorithm("""
def file_byte_iterator(filename):
#    data = np.fromfile(filename, 'u1')
yield from np.fromfile(filename, 'u1')
"""),

'Vinay Sajip + "yield from"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk:
yield from chunk  # Added in Py 3.3
"""),

})  # End Python 3.3 update.

if sys.version_info >= (3, 5):
algorithms.update({

'Aaron Hall + "yield from"': Algorithm("""
from pathlib import Path

def file_byte_iterator(path):
''' Given a path, return an iterator over the file
that lazily loads the file.
'''
path = Path(path)
bufsize = get_buffer_size(path)

with path.open('rb') as file:
for chunk in iter(reader, bytes()):
yield from chunk
"""),

})  # End Python 3.5 update.

if sys.version_info >= (3, 8, 0):
algorithms.update({

'Vinay Sajip + "yield from" + "walrus operator"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk := f.read(chunksize):
yield from chunk  # Added in Py 3.3
"""),

'codeape + "yield from" + "walrus operator"': Algorithm("""
def file_byte_iterator(filename, chunksize=CHUNK_SIZE):
with open(filename, "rb") as f:
while chunk := f.read(chunksize):
yield from chunk
"""),

})  # End Python 3.8.0 update.update.

#### Main ####

def main():
global TEMP_FILENAME

def cleanup():
""" Clean up after testing is completed. """
try:
os.remove(TEMP_FILENAME)  # Delete the temporary file.
except Exception:
pass

atexit.register(cleanup)

# Create a named temporary binary file of pseudo-random bytes for testing.
fd, TEMP_FILENAME = tempfile.mkstemp('.bin')
with os.fdopen(fd, 'wb') as file:
os.write(fd, bytearray(random.randrange(256) for _ in range(FILE_SIZE)))

# Execute and time each algorithm, gather results.
start_time = time.time()  # To determine how long testing itself takes.

timings = []
for label in algorithms:
try:
timing = TIMING(label,
min(timeit.repeat(algorithms[label].test,
setup=COMMON_SETUP + algorithms[label].setup,
repeat=TIMINGS, number=EXECUTIONS)))
except Exception as exc:
print('{} occurred timing the algorithm: "{}"\n  {}'.format(
type(exc).__name__, label, exc))
traceback.print_exc(file=sys.stdout)  # Redirect to stdout.
sys.exit(1)
timings.append(timing)

# Report results.
print('Fastest to slowest execution speeds with {}-bit Python {}.{}.{}'.format(
64 if sys.maxsize > 2**32 else 32, *sys.version_info[:3]))
print('  numpy version {}'.format(np.version.full_version))
print('  Test file size: {:,} KiB'.format(FILE_SIZE // KiB(1)))
print('  {:,d} executions, best of {:d} repetitions'.format(EXECUTIONS, TIMINGS))
print()

longest = max(len(timing.label) for timing in timings)  # Len of longest identifier.
ranked = sorted(timings, key=attrgetter('exec_time')) # Sort so fastest is first.
fastest = ranked[0].exec_time
for rank, timing in enumerate(ranked, 1):
print('{:<2d} {:>{width}} : {:8.4f} secs, rel speed {:6.2f}x, {:6.2f}% slower '
'({:6.2f} KiB/sec)'.format(
rank,
timing.label, timing.exec_time, round(timing.exec_time/fastest, 2),
round((timing.exec_time/fastest - 1) * 100, 2),
(FILE_SIZE/timing.exec_time) / KiB(1),  # per sec.
width=longest))
print()
mins, secs = divmod(time.time()-start_time, 60)
print('Benchmark runtime (min:sec) - {:02d}:{:02d}'.format(int(mins),
int(round(secs))))

main()
``````

## 回答 10

``````from array import array

with open( path, 'rb' ) as file:
data = array( 'B', file.read() ) # buffer the file

# evaluate it's data
for byte in data:
v = byte # int value
c = chr(byte)``````

if you are looking for something speedy, here’s a method I’ve been using that’s worked for years:

``````from array import array

with open( path, 'rb' ) as file:
data = array( 'B', file.read() ) # buffer the file

# evaluate it's data
for byte in data:
v = byte # int value
c = chr(byte)
``````

if you want to iterate chars instead of ints, you can simply use `data = file.read()`, which should be a bytes() object in py3.