快速计数正整数中的非零位的方法

Question 1

I need a fast way to count the number of bits in an integer in python. My current solution is

bin(n).count("1")

but I am wondering if there is any faster way of doing this?

PS: (i am representing a big 2D binary array as a single list of numbers and doing bitwise operations, and that brings the time down from hours to minutes. and now I would like to get rid of those extra minutes.

Edit: 1. it has to be in python 2.7 or 2.6

and optimizing for small numbers does not matter that much since that would not be a clear bottleneck, but I do have numbers with 10 000 + bits at some places

for example this is a 2000 bit case:

12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L

Question 2

For arbitrary-length integers, bin(n).count("1") is the fastest I could find in pure Python.

I tried adapting Óscar’s and Adam’s solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than bin(n).count("1") (the 32-bit version took about half again as much time).

On the other hand, gmpy popcount() took about 1/20th of the time of bin(n).count("1"). So if you can install gmpy, use that.

To answer a question in the comments, for bytes I’d use a lookup table. You can generate it at runtime:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

Or just define it literally:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

Then it’s counts[x] to get the number of 1 bits in x where 0 ≤ x ≤ 255.

Question 3

You can adapt the following algorithm:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

This works for 64-bit positive numbers, but it’s easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket’s value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets – n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.

Question 4

Here’s a Python implementation of the population count algorithm, as explained in this post:

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

It will work for 0 <= i < 0x100000000.

Question 5

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don’t mind using about 64KB of memory).

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

On Python 2.x you should replace range with xrange.

Edit

If you need better performance (and your numbers are big integers), have a look at the GMP library. It contains hand-written assembly implementations for many different architectures.

gmpy is A C-coded Python extension module that wraps the GMP library.

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

Question 6

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It’s actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n

Question 7

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation

Question 8

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of ceil(n/32.) 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

Though I’m surprised no one suggested you write a C module.

Question 9

#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)

Question 10

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.

The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: [numberofsetbits(n) for n in range(MAXINT)]

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti’s answer for a good implementation of this. Of course, you don’t have to keep this all in memory – you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).

快速计数正整数中的非零位的方法

问题：快速计数正整数中的非零位的方法

回答 0

回答 1

回答 2

回答 3

编辑

Edit

回答 4

回答 5

回答 6

回答 7

回答 8

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

7行代码 Python热力图可视化分析缺失数据处理

Python 流程图 — 一键转化代码为流程图

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

反转numpy数组的最有效方法

为什么会看到“ TypeError：字符串索引必须为整数”？

子过程中“ shell = True”的实际含义

检查Python列表项是否在另一个字符串中包含一个字符串

如何使python解释器正确处理字符串操作中的非ASCII字符？

如何修复imdb.load_data（）函数的“ allow_pickle = False时无法加载对象数组”？

快速计数正整数中的非零位的方法

问题：快速计数正整数中的非零位的方法

回答 0

回答 1

回答 2

回答 3

编辑

Edit

回答 4

回答 5

回答 6

回答 7

回答 8

相关文章

排行榜展示

文章展示