标签归档:counting

快速计数正整数中的非零位的方法

问题:快速计数正整数中的非零位的方法

我需要一种快速的方法来计算python中整数的位数。我当前的解决方案是

bin(n).count("1")

但我想知道是否有更快的方法?

PS :(我将一个大型2D二进制数组表示为一个数字列表并进行按位运算,这将时间从几小时缩短为几分钟。现在,我想摆脱那些多余的分钟。

编辑:1.它必须在python 2.7或2.6中

对小数字进行优化并不重要,因为这并不是一个明显的瓶颈,但是我确实在某些地方有1万+位的数字

例如,这是一个2000位的情况:

12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L

I need a fast way to count the number of bits in an integer in python. My current solution is

bin(n).count("1")

but I am wondering if there is any faster way of doing this?

PS: (i am representing a big 2D binary array as a single list of numbers and doing bitwise operations, and that brings the time down from hours to minutes. and now I would like to get rid of those extra minutes.

Edit: 1. it has to be in python 2.7 or 2.6

and optimizing for small numbers does not matter that much since that would not be a clear bottleneck, but I do have numbers with 10 000 + bits at some places

for example this is a 2000 bit case:

12448057941136394342297748548545082997815840357634948550739612798732309975923280685245876950055614362283769710705811182976142803324242407017104841062064840113262840137625582646683068904149296501029754654149991842951570880471230098259905004533869130509989042199261339990315125973721454059973605358766253998615919997174542922163484086066438120268185904663422979603026066685824578356173882166747093246377302371176167843247359636030248569148734824287739046916641832890744168385253915508446422276378715722482359321205673933317512861336054835392844676749610712462818600179225635467147870208L

回答 0

对于任意长度的整数,这bin(n).count("1")是我在纯Python中可以找到的最快的速度。

我尝试修改Óscar和Adam的解决方案以分别处理64位和32位块中的整数。两者都比至少慢了十倍bin(n).count("1")(32位版本花了大约一半的时间)。

另一方面,gmpy popcount()大约花费了时间的1/20 bin(n).count("1")。因此,如果可以安装gmpy,请使用它。

为了回答注释中的问题,对于字节,我将使用查找表。您可以在运行时生成它:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

或者只是按字面意思定义:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

然后counts[x]得到0≤x≤255处的1位数目x

For arbitrary-length integers, bin(n).count("1") is the fastest I could find in pure Python.

I tried adapting Óscar’s and Adam’s solutions to process the integer in 64-bit and 32-bit chunks, respectively. Both were at least ten times slower than bin(n).count("1") (the 32-bit version took about half again as much time).

On the other hand, gmpy popcount() took about 1/20th of the time of bin(n).count("1"). So if you can install gmpy, use that.

To answer a question in the comments, for bytes I’d use a lookup table. You can generate it at runtime:

counts = bytes(bin(x).count("1") for x in range(256))  # py2: use bytearray

Or just define it literally:

counts = (b'\x00\x01\x01\x02\x01\x02\x02\x03\x01\x02\x02\x03\x02\x03\x03\x04'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x01\x02\x02\x03\x02\x03\x03\x04\x02\x03\x03\x04\x03\x04\x04\x05'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x02\x03\x03\x04\x03\x04\x04\x05\x03\x04\x04\x05\x04\x05\x05\x06'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x03\x04\x04\x05\x04\x05\x05\x06\x04\x05\x05\x06\x05\x06\x06\x07'
          b'\x04\x05\x05\x06\x05\x06\x06\x07\x05\x06\x06\x07\x06\x07\x07\x08')

Then it’s counts[x] to get the number of 1 bits in x where 0 ≤ x ≤ 255.


回答 1

您可以调整以下算法:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

这适用于64位正数,但是它很容易扩展,并且运算数量随参数的对数增长(即与参数的位大小成线性关系)。

为了了解其工作原理,可以想象将整个64位字符串分成64个1位存储桶。每个存储区的值等于存储区中设置的位数(如果未设置,则为0,如果设置为1,则为1)。第一次转换会产生类似状态,但有32个存储桶,每个存储桶2位长。这可以通过适当地移动存储桶并添加其值来实现(一次加法处理所有存储桶,因为在存储桶之间不会发生进位-n位数字始终足够长以对数字n进行编码)。进一步的转换导致状态的桶数呈指数级下降,而大小呈指数增长,直到我们得到一个64位长的桶。这给出了原始参数中设置的位数。

You can adapt the following algorithm:

def CountBits(n):
  n = (n & 0x5555555555555555) + ((n & 0xAAAAAAAAAAAAAAAA) >> 1)
  n = (n & 0x3333333333333333) + ((n & 0xCCCCCCCCCCCCCCCC) >> 2)
  n = (n & 0x0F0F0F0F0F0F0F0F) + ((n & 0xF0F0F0F0F0F0F0F0) >> 4)
  n = (n & 0x00FF00FF00FF00FF) + ((n & 0xFF00FF00FF00FF00) >> 8)
  n = (n & 0x0000FFFF0000FFFF) + ((n & 0xFFFF0000FFFF0000) >> 16)
  n = (n & 0x00000000FFFFFFFF) + ((n & 0xFFFFFFFF00000000) >> 32) # This last & isn't strictly necessary.
  return n

This works for 64-bit positive numbers, but it’s easily extendable and the number of operations growth with the logarithm of the argument (i.e. linearly with the bit-size of the argument).

In order to understand how this works imagine that you divide the entire 64-bit string into 64 1-bit buckets. Each bucket’s value is equal to the number of bits set in the bucket (0 if no bits are set and 1 if one bit is set). The first transformation results in an analogous state, but with 32 buckets each 2-bit long. This is achieved by appropriately shifting the buckets and adding their values (one addition takes care of all buckets since no carry can occur across buckets – n-bit number is always long enough to encode number n). Further transformations lead to states with exponentially decreasing number of buckets of exponentially growing size until we arrive at one 64-bit long bucket. This gives the number of bits set in the original argument.


回答 2

这里有一个Python实现人口数的算法,如在此解释

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

它适用于0 <= i < 0x100000000

Here’s a Python implementation of the population count algorithm, as explained in this post:

def numberOfSetBits(i):
    i = i - ((i >> 1) & 0x55555555)
    i = (i & 0x33333333) + ((i >> 2) & 0x33333333)
    return (((i + (i >> 4) & 0xF0F0F0F) * 0x1010101) & 0xffffffff) >> 24

It will work for 0 <= i < 0x100000000.


回答 3

根据这篇文章,这似乎是汉明重量最快的实现之一(如果您不介意使用大约64KB的内存)。

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

在Python 2.x上,您应该替换rangexrange

编辑

如果您需要更好的性能(并且数字是大整数),请查看GMP库。它包含用于许多不同体系结构的手写程序集实现。

gmpy 是包装GMP库的C编码Python扩展模块。

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

According to this post, this seems to be one the fastest implementation of the Hamming weight (if you don’t mind using about 64KB of memory).

#http://graphics.stanford.edu/~seander/bithacks.html#CountBitsSetTable
POPCOUNT_TABLE16 = [0] * 2**16
for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

On Python 2.x you should replace range with xrange.

Edit

If you need better performance (and your numbers are big integers), have a look at the GMP library. It contains hand-written assembly implementations for many different architectures.

gmpy is A C-coded Python extension module that wraps the GMP library.

>>> import gmpy
>>> gmpy.popcount(2**1024-1)
1024

回答 4

我真的很喜欢这种方法。它简单而快速,但由于python具有无限整数,因此位长不受限制。

实际上,它比看上去要狡猾得多,因为它避免了浪费时间扫描零。例如,与1111中一样,将需要花费相同的时间来计算1000000000000000000000010100000001中的设置位。

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n

I really like this method. Its simple and pretty fast but also not limited in the bit length since python has infinite integers.

It’s actually more cunning than it looks, because it avoids wasting time scanning the zeros. For example it will take the same time to count the set bits in 1000000000000000000000010100000001 as in 1111.

def get_bit_count(value):
   n = 0
   while value:
      n += 1
      value &= value-1
   return n

回答 5

您可以使用该算法获取整数的二进制字符串[1],而不是将字符串连接起来,而是计算一个数字:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation

You can use the algorithm to get the binary string [1] of an integer but instead of concatenating the string, counting the number of ones:

def count_ones(a):
    s = 0
    t = {'0':0, '1':1, '2':1, '3':2, '4':1, '5':2, '6':2, '7':3}
    for c in oct(a)[1:]:
        s += t[c]
    return s

[1] https://wiki.python.org/moin/BitManipulation


回答 6

你说Numpy太慢了。您是否使用它来存储单个位?为什么不扩展使用int作为位数组,而是使用Numpy来存储这些位的想法呢?

将n位存储为ceil(n/32.)32位int 数组。然后,您可以使用int的方式(很好,非常相似)使用numpy数组,包括使用它们为另一个数组建立索引。

该算法基本上是并行计算每个单元中设置的位数,并且它们求和每个单元的位数。

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

尽管我很惊讶,但没有人建议您编写C模块。

You said Numpy was too slow. Were you using it to store individual bits? Why not extend the idea of using ints as bit arrays but use Numpy to store those?

Store n bits as an array of ceil(n/32.) 32-bit ints. You can then work with the numpy array the same (well, similar enough) way you use ints, including using them to index another array.

The algorithm is basically to compute, in parallel, the number of bits set in each cell, and them sum up the bitcount of each cell.

setup = """
import numpy as np
#Using Paolo Moretti's answer http://stackoverflow.com/a/9829855/2963903
POPCOUNT_TABLE16 = np.zeros(2**16, dtype=int) #has to be an array

for index in range(len(POPCOUNT_TABLE16)):
    POPCOUNT_TABLE16[index] = (index & 1) + POPCOUNT_TABLE16[index >> 1]

def popcount32_table16(v):
    return (POPCOUNT_TABLE16[ v        & 0xffff] +
            POPCOUNT_TABLE16[(v >> 16) & 0xffff])

def count1s(v):
    return popcount32_table16(v).sum()

v1 = np.arange(1000)*1234567                       #numpy array
v2 = sum(int(x)<<(32*i) for i, x in enumerate(v1)) #single int
"""
from timeit import timeit

timeit("count1s(v1)", setup=setup)        #49.55184188873349
timeit("bin(v2).count('1')", setup=setup) #225.1857464598633

Though I’m surprised no one suggested you write a C module.


回答 7

#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)
#Python prg to count set bits
#Function to count set bits
def bin(n):
    count=0
    while(n>=1):
        if(n%2==0):
            n=n//2
        else:
            count+=1
            n=n//2
    print("Count of set bits:",count)
#Fetch the input from user
num=int(input("Enter number: "))
#Output
bin(num)


回答 8

事实证明,您的起始表示形式是一个整数列表,该整数列表为1或0。只需在该表示形式中对其进行计数即可。


整数中的位数在python中是恒定的。

但是,如果要计算设置的位数,最快的方法是创建符合以下伪代码的列表: [numberofsetbits(n) for n in range(MAXINT)]

生成列表后,这将为您提供恒定时间的查找。有关此的良好实现,请参见@PaoloMoretti的答案。当然,您不必将所有内容都保留在内存中-您可以使用某种持久键值存储,甚至MySql。(另一种选择是实现您自己的基于磁盘的简单存储)。

It turns out your starting representation is a list of lists of ints which are either 1 or 0. Simply count them in that representation.


The number of bits in an integer is constant in python.

However, if you want to count the number of set bits, the fastest way is to create a list conforming to the following pseudocode: [numberofsetbits(n) for n in range(MAXINT)]

This will provide you a constant time lookup after you have generated the list. See @PaoloMoretti’s answer for a good implementation of this. Of course, you don’t have to keep this all in memory – you could use some sort of persistent key-value store, or even MySql. (Another option would be to implement your own simple disk-based storage).


计算Python列表中的True布尔数

问题:计算Python列表中的True布尔数

我有一个布尔值列表:

[True, True, False, False, False, True]

并且我正在寻找一种方法来计算True列表中的数目(因此,在上面的示例中,我希望返回值是3。)我已经找到了一些示例,这些示例寻找特定元素的出现次数。因为我正在使用布尔运算,所以有效的方法吗?我在想类似all或的东西any

I have a list of Booleans:

[True, True, False, False, False, True]

and I am looking for a way to count the number of True in the list (so in the example above, I want the return to be 3.) I have found examples of looking for the number of occurrences of specific elements, but is there a more efficient way to do it since I’m working with Booleans? I’m thinking of something analogous to all or any.


回答 0

True等于1

>>> sum([True, True, False, False, False, True])
3

True is equal to 1.

>>> sum([True, True, False, False, False, True])
3

回答 1

list有一个count方法:

>>> [True,True,False].count(True)
2

这实际上比效率更高sum,并且对意图更明确,因此没有理由使用sum

In [1]: import random

In [2]: x = [random.choice([True, False]) for i in range(100)]

In [3]: %timeit x.count(True)
970 ns ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit sum(x)
1.72 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

list has a count method:

>>> [True,True,False].count(True)
2

This is actually more efficient than sum, as well as being more explicit about the intent, so there’s no reason to use sum:

In [1]: import random

In [2]: x = [random.choice([True, False]) for i in range(100)]

In [3]: %timeit x.count(True)
970 ns ± 41.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [4]: %timeit sum(x)
1.72 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

回答 2

如果只关心常量True,那么简单sum就可以了。但是,请记住,在Python中,其他值True也会计算。一个更可靠的解决方案是使用bool内置的:

>>> l = [1, 2, True, False]
>>> sum(bool(x) for x in l)
3

更新:这是另一个类似的健壮解决方案,具有更透明的优点:

>>> sum(1 for x in l if x)
3

PS Python琐事:True 可能不为1 可能为真。警告:请勿在工作中尝试!

>>> True = 2
>>> if True: print('true')
... 
true
>>> l = [True, True, False, True]
>>> sum(l)
6
>>> sum(bool(x) for x in l)
3
>>> sum(1 for x in l if x)
3

更多邪恶:

True = False

If you are only concerned with the constant True, a simple sum is fine. However, keep in mind that in Python other values evaluate as True as well. A more robust solution would be to use the bool builtin:

>>> l = [1, 2, True, False]
>>> sum(bool(x) for x in l)
3

UPDATE: Here’s another similarly robust solution that has the advantage of being more transparent:

>>> sum(1 for x in l if x)
3

P.S. Python trivia: True could be true without being 1. Warning: do not try this at work!

>>> True = 2
>>> if True: print('true')
... 
true
>>> l = [True, True, False, True]
>>> sum(l)
6
>>> sum(bool(x) for x in l)
3
>>> sum(1 for x in l if x)
3

Much more evil:

True = False

回答 3

您可以使用sum()

>>> sum([True, True, False, False, False, True])
3

You can use sum():

>>> sum([True, True, False, False, False, True])
3

回答 4

仅出于完整性考虑(sum通常更可取),我想提一下我们也可以filter用来获取真实值。在通常情况下,filter接受一个函数作为第一个参数,但是如果传递它None,它将为所有“真实的”值过滤。这个功能有些令人惊讶,但是有充分的文档记录,并且可以在Python 2和3中使用。

版本之间的区别在于,在Python 2中filter返回一个列表,因此我们可以使用len

>>> bool_list = [True, True, False, False, False, True]
>>> filter(None, bool_list)
[True, True, True]
>>> len(filter(None, bool_list))
3

但是在Python 3中,filter返回了一个迭代器,因此我们不能使用len,并且如果我们想避免使用sum(出于任何原因),我们需要求助于将迭代器转换为列表(这使它变得不那么美观了):

>>> bool_list = [True, True, False, False, False, True]
>>> filter(None, bool_list)
<builtins.filter at 0x7f64feba5710>
>>> list(filter(None, bool_list))
[True, True, True]
>>> len(list(filter(None, bool_list)))
3

Just for completeness’ sake (sum is usually preferable), I wanted to mention that we can also use filter to get the truthy values. In the usual case, filter accepts a function as the first argument, but if you pass it None, it will filter for all “truthy” values. This feature is somewhat surprising, but is well documented and works in both Python 2 and 3.

The difference between the versions, is that in Python 2 filter returns a list, so we can use len:

>>> bool_list = [True, True, False, False, False, True]
>>> filter(None, bool_list)
[True, True, True]
>>> len(filter(None, bool_list))
3

But in Python 3, filter returns an iterator, so we can’t use len, and if we want to avoid using sum (for any reason) we need to resort to converting the iterator to a list (which makes this much less pretty):

>>> bool_list = [True, True, False, False, False, True]
>>> filter(None, bool_list)
<builtins.filter at 0x7f64feba5710>
>>> list(filter(None, bool_list))
[True, True, True]
>>> len(list(filter(None, bool_list)))
3

回答 5

阅读有关该问题的所有答案和评论后,我想做一个小实验。

我产生50000个随机布尔值,并呼吁sumcount他们。

这是我的结果:

>>> a = [bool(random.getrandbits(1)) for x in range(50000)]
>>> len(a)
50000
>>> a.count(False)
24884
>>> a.count(True)
25116
>>> def count_it(a):
...   curr = time.time()
...   counting = a.count(True)
...   print("Count it = " + str(time.time() - curr))
...   return counting
... 
>>> def sum_it(a):
...   curr = time.time()
...   counting = sum(a)
...   print("Sum it = " + str(time.time() - curr))
...   return counting
... 
>>> count_it(a)
Count it = 0.00121307373046875
25015
>>> sum_it(a)
Sum it = 0.004102230072021484
25015

可以肯定的是,我又重复了几次:

>>> count_it(a)
Count it = 0.0013530254364013672
25015
>>> count_it(a)
Count it = 0.0014507770538330078
25015
>>> count_it(a)
Count it = 0.0013344287872314453
25015
>>> sum_it(a)
Sum it = 0.003480195999145508
25015
>>> sum_it(a)
Sum it = 0.0035257339477539062
25015
>>> sum_it(a)
Sum it = 0.003350496292114258
25015
>>> sum_it(a)
Sum it = 0.003744363784790039
25015

如您所见,count速度比快3倍sum。因此,我建议count像在一样使用count_it

Python版本:3.6.7
CPU核心:4
RAM大小:16 GB
操作系统:Ubuntu 18.04.1 LTS

After reading all the answers and comments on this question, I thought to do a small experiment.

I generated 50,000 random booleans and called sum and count on them.

Here are my results:

>>> a = [bool(random.getrandbits(1)) for x in range(50000)]
>>> len(a)
50000
>>> a.count(False)
24884
>>> a.count(True)
25116
>>> def count_it(a):
...   curr = time.time()
...   counting = a.count(True)
...   print("Count it = " + str(time.time() - curr))
...   return counting
... 
>>> def sum_it(a):
...   curr = time.time()
...   counting = sum(a)
...   print("Sum it = " + str(time.time() - curr))
...   return counting
... 
>>> count_it(a)
Count it = 0.00121307373046875
25015
>>> sum_it(a)
Sum it = 0.004102230072021484
25015

Just to be sure, I repeated it several more times:

>>> count_it(a)
Count it = 0.0013530254364013672
25015
>>> count_it(a)
Count it = 0.0014507770538330078
25015
>>> count_it(a)
Count it = 0.0013344287872314453
25015
>>> sum_it(a)
Sum it = 0.003480195999145508
25015
>>> sum_it(a)
Sum it = 0.0035257339477539062
25015
>>> sum_it(a)
Sum it = 0.003350496292114258
25015
>>> sum_it(a)
Sum it = 0.003744363784790039
25015

And as you can see, count is 3 times faster than sum. So I would suggest to use count as I did in count_it.

Python version: 3.6.7
CPU cores: 4
RAM size: 16 GB
OS: Ubuntu 18.04.1 LTS


回答 6

首先进行比较安全bool。这很容易做到:

>>> sum(map(bool,[True, True, False, False, False, True]))
3

然后,您将在适当的存储区中捕获Python认为是或否的所有内容:

>>> allTrue=[True, not False, True+1,'0', ' ', 1, [0], {0:0}, set([0])]
>>> list(map(bool,allTrue))
[True, True, True, True, True, True, True, True, True]

如果您愿意,可以使用一种理解:

>>> allFalse=['',[],{},False,0,set(),(), not True, True-1]
>>> [bool(i) for i in allFalse]
[False, False, False, False, False, False, False, False, False]

It is safer to run through bool first. This is easily done:

>>> sum(map(bool,[True, True, False, False, False, True]))
3

Then you will catch everything that Python considers True or False into the appropriate bucket:

>>> allTrue=[True, not False, True+1,'0', ' ', 1, [0], {0:0}, set([0])]
>>> list(map(bool,allTrue))
[True, True, True, True, True, True, True, True, True]

If you prefer, you can use a comprehension:

>>> allFalse=['',[],{},False,0,set(),(), not True, True-1]
>>> [bool(i) for i in allFalse]
[False, False, False, False, False, False, False, False, False]

回答 7

我更喜欢len([b for b in boollist if b is True])(或等效的generator-expression),因为它很容易解释。比伊格纳西奥·巴斯克斯(Ignacio Vazquez-Abrams)提出的答案更“神奇”。

或者,您可以执行此操作,该操作仍假定bool可转换为int,但不对True的值进行任何假设: ntrue = sum(boollist) / int(True)

I prefer len([b for b in boollist if b is True]) (or the generator-expression equivalent), as it’s quite self-explanatory. Less ‘magical’ than the answer proposed by Ignacio Vazquez-Abrams.

Alternatively, you can do this, which still assumes that bool is convertable to int, but makes no assumptions about the value of True: ntrue = sum(boollist) / int(True)