标签归档:count

计算CSV Python中有多少行?

问题:计算CSV Python中有多少行?

我正在使用python(Django Framework)读取CSV文件。如您所见,我仅从该CSV中提取了2行。我一直在尝试将CS​​V的总行数存储在变量中。

如何获得总行数?

file = object.myfilePath
fileObject = csv.reader(file)
for i in range(2):
    data.append(fileObject.next()) 

我努力了:

len(fileObject)
fileObject.length

I’m using python (Django Framework) to read a CSV file. I pull just 2 lines out of this CSV as you can see. What I have been trying to do is store in a variable the total number of rows the CSV also.

How can I get the total number of rows?

file = object.myfilePath
fileObject = csv.reader(file)
for i in range(2):
    data.append(fileObject.next()) 

I have tried:

len(fileObject)
fileObject.length

回答 0

您需要计算行数:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

使用sum()与生成器表达使一个有效的计数器,从而避免在存储器中存储整个文件。

如果您已经开始阅读两行,那么您需要将这两行加到总计中;已读取的行不计在内。

You need to count the number of rows:

row_count = sum(1 for row in fileObject)  # fileObject is your csv.reader

Using sum() with a generator expression makes for an efficient counter, avoiding storing the whole file in memory.

If you already read 2 rows to start with, then you need to add those 2 rows to your total; rows that have already been read are not being counted.


回答 1

2018-10-29编辑

谢谢你的意见。

我测试了几种代码来获取csv文件中的行数(以速度为单位)。最好的方法如下。

with open(filename) as f:
    sum(1 for line in f)

这是经过测试的代码。

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime(filename, funcname, func):
    print(f"# {funcname}")
    t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
    print('Elapsed time : ', t)
    print('n = ', func(filename))
    print('\n')

def sum1forline(filename):
    with open(filename) as f:
        return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)

def lenopenreadlines(filename):
    with open(filename) as f:
        return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)

def lenpd(filename):
    return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)

def csvreaderfor(filename):
    cnt = 0
    with open(filename) as f:
        cr = csv.reader(f)
        for row in cr:
            cnt += 1
    return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)

def openenum(filename):
    cnt = 0
    with open(filename) as f:
        for i, line in enumerate(f,1):
            cnt += 1
    return cnt
talktime(filename, 'openenum', openenum)

结果如下。

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

总之,sum(1 for line in f)是最快的。但是与可能没有太大区别len(f.readlines())

sample_submission.csv 是30.2MB,具有3100万个字符。

2018-10-29 EDIT

Thank you for the comments.

I tested several kinds of code to get the number of lines in a csv file in terms of speed. The best method is below.

with open(filename) as f:
    sum(1 for line in f)

Here is the code tested.

import timeit
import csv
import pandas as pd

filename = './sample_submission.csv'

def talktime(filename, funcname, func):
    print(f"# {funcname}")
    t = timeit.timeit(f'{funcname}("{filename}")', setup=f'from __main__ import {funcname}', number = 100) / 100
    print('Elapsed time : ', t)
    print('n = ', func(filename))
    print('\n')

def sum1forline(filename):
    with open(filename) as f:
        return sum(1 for line in f)
talktime(filename, 'sum1forline', sum1forline)

def lenopenreadlines(filename):
    with open(filename) as f:
        return len(f.readlines())
talktime(filename, 'lenopenreadlines', lenopenreadlines)

def lenpd(filename):
    return len(pd.read_csv(filename)) + 1
talktime(filename, 'lenpd', lenpd)

def csvreaderfor(filename):
    cnt = 0
    with open(filename) as f:
        cr = csv.reader(f)
        for row in cr:
            cnt += 1
    return cnt
talktime(filename, 'csvreaderfor', csvreaderfor)

def openenum(filename):
    cnt = 0
    with open(filename) as f:
        for i, line in enumerate(f,1):
            cnt += 1
    return cnt
talktime(filename, 'openenum', openenum)

The result was below.

# sum1forline
Elapsed time :  0.6327946722068599
n =  2528244


# lenopenreadlines
Elapsed time :  0.655304473598555
n =  2528244


# lenpd
Elapsed time :  0.7561274056295324
n =  2528244


# csvreaderfor
Elapsed time :  1.5571560935772661
n =  2528244


# openenum
Elapsed time :  0.773000013928679
n =  2528244

In conclusion, sum(1 for line in f) is fastest. But there might not be significant difference from len(f.readlines()).

sample_submission.csv is 30.2MB and has 31 million characters.


回答 2

为此,您需要像我的示例一样有一些代码:

file = open("Task1.csv")
numline = len(file.readlines())
print (numline)

希望对大家有帮助。

To do it you need to have a bit of code like my example here:

file = open("Task1.csv")
numline = len(file.readlines())
print (numline)

I hope this helps everyone.


回答 3

上面的一些建议计算了csv文件中的LINES数量。但是某些CSV文件将包含带引号的字符串,这些字符串本身包含换行符。MS CSV文件通常用\ r \ n分隔记录,但在带引号的字符串中单独使用\ n。

对于这样的文件,计算文件中的文本行(由换行符分隔)将导致太大的结果。因此,为了获得准确的计数,您需要使用csv.reader来读取记录。

Several of the above suggestions count the number of LINES in the csv file. But some CSV files will contain quoted strings which themselves contain newline characters. MS CSV files usually delimit records with \r\n, but use \n alone within quoted strings.

For a file like this, counting lines of text (as delimited by newline) in the file will give too large a result. So for an accurate count you need to use csv.reader to read the records.


回答 4

首先,您必须使用open打开文件

input_file = open("nameOfFile.csv","r+")

然后使用csv.reader打开csv

reader_file = csv.reader(input_file)

最后,您可以使用“ len”指令获取行数

value = len(list(reader_file))

总代码是这样的:

input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))

请记住,如果要重复使用csv文件,则必须创建一个input_file.fseek(0),因为当您使用reader_file的列表时,它将读取所有文件,并且文件中的指针会更改其位置

First you have to open the file with open

input_file = open("nameOfFile.csv","r+")

Then use the csv.reader for open the csv

reader_file = csv.reader(input_file)

At the last, you can take the number of row with the instruction ‘len’

value = len(list(reader_file))

The total code is this:

input_file = open("nameOfFile.csv","r+")
reader_file = csv.reader(input_file)
value = len(list(reader_file))

Remember that if you want to reuse the csv file, you have to make a input_file.fseek(0), because when you use a list for the reader_file, it reads all file, and the pointer in the file change its position


回答 5

row_count = sum(1 for line in open(filename)) 为我工作。

注意:sum(1 for line in csv.reader(filename))似乎要计算第一行的长度

row_count = sum(1 for line in open(filename)) worked for me.

Note : sum(1 for line in csv.reader(filename)) seems to calculate the length of first line


回答 6

numline = len(file_read.readlines())
numline = len(file_read.readlines())

回答 7

当实例化一个csv.reader对象并遍历整个文件时,可以访问提供行数的名为line_num的实例变量:

import csv
with open('csv_path_file') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        pass
    print(csv_reader.line_num)

when you instantiate a csv.reader object and you iter the whole file then you can access an instance variable called line_num providing the row count:

import csv
with open('csv_path_file') as f:
    csv_reader = csv.reader(f)
    for row in csv_reader:
        pass
    print(csv_reader.line_num)

回答 8

import csv
count = 0
with open('filename.csv', 'rb') as count_file:
    csv_reader = csv.reader(count_file)
    for row in csv_reader:
        count += 1

print count
import csv
count = 0
with open('filename.csv', 'rb') as count_file:
    csv_reader = csv.reader(count_file)
    for row in csv_reader:
        count += 1

print count

回答 9

使用“列表”适合更可行的对象。

然后,您可以计数,跳过,变异,直到您的内心渴望:

list(fileObject) #list values

len(list(fileObject)) # get length of file lines

list(fileObject)[10:] # skip first 10 lines

Use “list” to fit a more workably object.

You can then count, skip, mutate till your heart’s desire:

list(fileObject) #list values

len(list(fileObject)) # get length of file lines

list(fileObject)[10:] # skip first 10 lines

回答 10

您还可以使用经典的for循环:

import pandas as pd
df = pd.read_csv('your_file.csv')

count = 0
for i in df['a_column']:
    count = count + 1

print(count)

You can also use a classic for loop:

import pandas as pd
df = pd.read_csv('your_file.csv')

count = 0
for i in df['a_column']:
    count = count + 1

print(count)

回答 11

可能想在命令行中尝试以下简单的操作:

sed -n '$=' filename 要么 wc -l filename

might want to try something as simple as below in the command line:

sed -n '$=' filename

or

wc -l filename

回答 12

这适用于csv和所有基于Unix的操作系统中包含字符串的文件:

import os

numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])

如果csv文件包含一个字段行,则可以从numOfLines上面扣除一个:

numOfLines = numOfLines - 1

This works for csv and all files containing strings in Unix-based OSes:

import os

numOfLines = int(os.popen('wc -l < file.csv').read()[:-1])

In case the csv file contains a fields row you can deduct one from numOfLines above:

numOfLines = numOfLines - 1

回答 13

我认为我们可以改善最佳答案,我正在使用:

len = sum(1 for _ in reader)

此外,我们不应忘记pythonic代码在项目中并非总是具有最佳性能。例如:如果我们可以在同一数据集中同时执行更多操作,最好在同一个气泡中全部进行操作,而不是制作两个或多个pythonic气泡。

I think we can improve the best answer a little bit, I’m using:

len = sum(1 for _ in reader)

Moreover, we shouldnt forget pythonic code not always have the best performance in the project. In example: If we can do more operations at the same time in the same data set Its better to do all in the same bucle instead make two or more pythonic bucles.


回答 14

尝试

data = pd.read_csv("data.csv")
data.shape

在输出中,您可以看到类似(aa,bb)的图形,其中aa是行数

try

data = pd.read_csv("data.csv")
data.shape

and in the output you can see something like (aa,bb) where aa is the # of rows


回答 15

import pandas as pd
data = pd.read_csv('data.csv') 
totalInstances=len(data)
import pandas as pd
data = pd.read_csv('data.csv') 
totalInstances=len(data)

sqlalchemy中的分组和计数功能

问题:sqlalchemy中的分组和计数功能

我想在sqlalchemy中使用“分组并计数”命令。我怎样才能做到这一点?

I want a “group by and count” command in sqlalchemy. How can I do this?


回答 0

关于计数文档说,对于group_by查询,最好使用func.count()

from sqlalchemy import func
session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()

The documentation on counting says that for group_by queries it is better to use func.count():

from sqlalchemy import func
session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()

回答 1

如果您使用Table.query属性:

from sqlalchemy import func
Table.query.with_entities(Table.column, func.count(Table.column)).group_by(Table.column).all()

如果您使用session.query()方法(如miniwark的回答所述):

from sqlalchemy import func
session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()

If you are using Table.query property:

from sqlalchemy import func
Table.query.with_entities(Table.column, func.count(Table.column)).group_by(Table.column).all()

If you are using session.query() method (as stated in miniwark’s answer):

from sqlalchemy import func
session.query(Table.column, func.count(Table.column)).group_by(Table.column).all()

回答 2

您还可以依靠多个组及其交集:

self.session.query(func.count(Table.column1),Table.column1, Table.column2).group_by(Table.column1, Table.column2).all()

上面的查询将返回两列中所有可能的值组合的计数。

You can also count on multiple groups and their intersection:

self.session.query(func.count(Table.column1),Table.column1, Table.column2).group_by(Table.column1, Table.column2).all()

The query above will return counts for all possible combinations of values from both columns.


如何使用Python计算目录中的文件数

问题:如何使用Python计算目录中的文件数

我需要计算使用Python的目录中的文件数。

我猜最简单的方法是len(glob.glob('*')),但这也将目录本身视为文件。

有什么方法可以只计算目录中的文件吗?

I need to count the number of files in a directory using Python.

I guess the easiest way is len(glob.glob('*')), but that also counts the directory itself as a file.

Is there any way to count only the files in a directory?


回答 0

os.listdir()比使用会更有效率glob.glob。要测试文件名是否是普通文件(而不是目录或其他实体),请使用os.path.isfile()

import os, os.path

# simple version for working with CWD
print len([name for name in os.listdir('.') if os.path.isfile(name)])

# path joining version for other paths
DIR = '/tmp'
print len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

os.listdir() will be slightly more efficient than using glob.glob. To test if a filename is an ordinary file (and not a directory or other entity), use os.path.isfile():

import os, os.path

# simple version for working with CWD
print len([name for name in os.listdir('.') if os.path.isfile(name)])

# path joining version for other paths
DIR = '/tmp'
print len([name for name in os.listdir(DIR) if os.path.isfile(os.path.join(DIR, name))])

回答 1

import os

path, dirs, files = next(os.walk("/usr/lib"))
file_count = len(files)
import os

path, dirs, files = next(os.walk("/usr/lib"))
file_count = len(files)

回答 2

对于所有类型的文件,都包含以下子目录:

import os

list = os.listdir(dir) # dir is your directory path
number_files = len(list)
print number_files

仅文件(避免使用子目录):

import os

onlyfiles = next(os.walk(dir))[2] #dir is your directory path as string
print len(onlyfiles)

For all kind of files, subdirectories included:

import os

list = os.listdir(dir) # dir is your directory path
number_files = len(list)
print number_files

Only files (avoiding subdirectories):

import os

onlyfiles = next(os.walk(dir))[2] #dir is your directory path as string
print len(onlyfiles)

回答 3

这是fnmatch派上用场的地方:

import fnmatch

print len(fnmatch.filter(os.listdir(dirpath), '*.txt'))

更多详细信息:http : //docs.python.org/2/library/fnmatch.html

This is where fnmatch comes very handy:

import fnmatch

print len(fnmatch.filter(os.listdir(dirpath), '*.txt'))

More details: http://docs.python.org/2/library/fnmatch.html


回答 4

如果要计算目录中的所有文件(包括子目录中的文件),则最Python化的方法是:

import os

file_count = sum(len(files) for _, _, files in os.walk(r'C:\Dropbox'))
print(file_count)

我们使用的总和要比显式添加文件计数(正在等待的时间)更快

If you want to count all files in the directory – including files in subdirectories, the most pythonic way is:

import os

file_count = sum(len(files) for _, _, files in os.walk(r'C:\Dropbox'))
print(file_count)

We use sum that is faster than explicitly adding the file counts (timings pending)


回答 5

import os
print len(os.listdir(os.getcwd()))
import os
print len(os.listdir(os.getcwd()))

回答 6

def directory(path,extension):
  list_dir = []
  list_dir = os.listdir(path)
  count = 0
  for file in list_dir:
    if file.endswith(extension): # eg: '.txt'
      count += 1
  return count
def directory(path,extension):
  list_dir = []
  list_dir = os.listdir(path)
  count = 0
  for file in list_dir:
    if file.endswith(extension): # eg: '.txt'
      count += 1
  return count

回答 7

我很惊讶没有人提到os.scandir

def count_files(dir):
    return len([1 for x in list(os.scandir(dir)) if x.is_file()])

I am surprised that nobody mentioned os.scandir:

def count_files(dir):
    return len([1 for x in list(os.scandir(dir)) if x.is_file()])

回答 8

这可以使用os.listdir并适用于任何目录:

import os
directory = 'mydirpath'

number_of_files = len([item for item in os.listdir(directory) if os.path.isfile(os.path.join(directory, item))])

可以使用生成器简化此过程,并通过以下方法使速度更快一点:

import os
isfile = os.path.isfile
join = os.path.join

directory = 'mydirpath'
number_of_files = sum(1 for item in os.listdir(directory) if isfile(join(directory, item)))

This uses os.listdir and works for any directory:

import os
directory = 'mydirpath'

number_of_files = len([item for item in os.listdir(directory) if os.path.isfile(os.path.join(directory, item))])

this can be simplified with a generator and made a little bit faster with:

import os
isfile = os.path.isfile
join = os.path.join

directory = 'mydirpath'
number_of_files = sum(1 for item in os.listdir(directory) if isfile(join(directory, item)))

回答 9

def count_em(valid_path):
   x = 0
   for root, dirs, files in os.walk(valid_path):
       for f in files:
            x = x+1
print "There are", x, "files in this directory."
return x

取自这篇文章

def count_em(valid_path):
   x = 0
   for root, dirs, files in os.walk(valid_path):
       for f in files:
            x = x+1
print "There are", x, "files in this directory."
return x

Taked from this post


回答 10

import os

def count_files(in_directory):
    joiner= (in_directory + os.path.sep).__add__
    return sum(
        os.path.isfile(filename)
        for filename
        in map(joiner, os.listdir(in_directory))
    )

>>> count_files("/usr/lib")
1797
>>> len(os.listdir("/usr/lib"))
2049
import os

def count_files(in_directory):
    joiner= (in_directory + os.path.sep).__add__
    return sum(
        os.path.isfile(filename)
        for filename
        in map(joiner, os.listdir(in_directory))
    )

>>> count_files("/usr/lib")
1797
>>> len(os.listdir("/usr/lib"))
2049

回答 11

卢克的代码重新格式化。

import os

print len(os.walk('/usr/lib').next()[2])

Luke’s code reformat.

import os

print len(os.walk('/usr/lib').next()[2])

回答 12

这是我发现有用的简单单行命令:

print int(os.popen("ls | wc -l").read())

Here is a simple one-line command that I found useful:

print int(os.popen("ls | wc -l").read())

回答 13

虽然我同意@DanielStutzbach提供的答案:os.listdir()将比使用效率更高glob.glob

但是,如果要计算文件夹中特定文件的数量,则需要额外的精度len(glob.glob())。例如,如果您要计算要使用的文件夹中的所有pdf,请执行以下操作:

pdfCounter = len(glob.glob1(myPath,"*.pdf"))

While I agree with the answer provided by @DanielStutzbach: os.listdir() will be slightly more efficient than using glob.glob.

However, an extra precision, if you do want to count the number of specific files in folder, you want to use len(glob.glob()). For instance if you were to count all the pdfs in a folder you want to use:

pdfCounter = len(glob.glob1(myPath,"*.pdf"))

回答 14

很简单:

print(len([iq for iq in os.scandir('PATH')]))

它只计算目录中的文件数,我使用列表推导技术遍历特定目录,以返回所有文件。“ len(返回列表)”返回文件数。

It is simple:

print(len([iq for iq in os.scandir('PATH')]))

it simply counts number of files in directory , i have used list comprehension technique to iterate through specific directory returning all files in return . “len(returned list)” returns number of files.


回答 15

import os

total_con=os.listdir('<directory path>')

files=[]

for f_n in total_con:
   if os.path.isfile(f_n):
     files.append(f_n)


print len(files)
import os

total_con=os.listdir('<directory path>')

files=[]

for f_n in total_con:
   if os.path.isfile(f_n):
     files.append(f_n)


print len(files)

回答 16

如果您将使用操作系统的标准外壳,则可以更快地获得结果,而不是使用纯pythonic方式。

Windows示例:

import os
import subprocess

def get_num_files(path):
    cmd = 'DIR \"%s\" /A-D /B /S | FIND /C /V ""' % path
    return int(subprocess.check_output(cmd, shell=True))

If you’ll be using the standard shell of the operating system, you can get the result much faster rather than using pure pythonic way.

Example for Windows:

import os
import subprocess

def get_num_files(path):
    cmd = 'DIR \"%s\" /A-D /B /S | FIND /C /V ""' % path
    return int(subprocess.check_output(cmd, shell=True))

回答 17

我找到了另一个可能是正确的答案。

for root, dirs, files in os.walk(input_path):    
for name in files:
    if os.path.splitext(name)[1] == '.TXT' or os.path.splitext(name)[1] == '.txt':
        datafiles.append(os.path.join(root,name)) 


print len(files) 

I found another answer which may be correct as accepted answer.

for root, dirs, files in os.walk(input_path):    
for name in files:
    if os.path.splitext(name)[1] == '.TXT' or os.path.splitext(name)[1] == '.txt':
        datafiles.append(os.path.join(root,name)) 


print len(files) 

回答 18

我用过glob.iglob类似的目录结构

data
└───train
   └───subfolder1
   |      file111.png
   |      file112.png
   |      ...
   |
   └───subfolder2
          file121.png
          file122.png
          ...
└───test
       file221.png
       file222.png

以下两个选项均返回4(如预期,即不计算子文件夹本身

  • len(list(glob.iglob("data/train/*/*.png", recursive=True)))
  • sum(1 for i in glob.iglob("data/train/*/*.png"))

I used glob.iglob for a directory structure similar to

data
└───train
│   └───subfolder1
│   |   │   file111.png
│   |   │   file112.png
│   |   │   ...
│   |
│   └───subfolder2
│       │   file121.png
│       │   file122.png
│       │   ...
└───test
    │   file221.png
    │   file222.png

Both of the following options return 4 (as expected, i.e. does not count the subfolders themselves)

  • len(list(glob.iglob("data/train/*/*.png", recursive=True)))
  • sum(1 for i in glob.iglob("data/train/*/*.png"))

回答 19

我这样做了,这返回了文件夹(Attack_Data)中文件的数量。

import os
def fcount(path):
    #Counts the number of files in a directory
    count = 0
    for f in os.listdir(path):
        if os.path.isfile(os.path.join(path, f)):
            count += 1

    return count
path = r"C:\Users\EE EKORO\Desktop\Attack_Data" #Read files in folder
print (fcount(path))

i did this and this returned the number of files in the folder(Attack_Data)…this works fine.

import os
def fcount(path):
    #Counts the number of files in a directory
    count = 0
    for f in os.listdir(path):
        if os.path.isfile(os.path.join(path, f)):
            count += 1

    return count
path = r"C:\Users\EE EKORO\Desktop\Attack_Data" #Read files in folder
print (fcount(path))

在python中计算字典中关键字的数量

问题:在python中计算字典中关键字的数量

我在词典中有一个单词列表,其值=关键字的重复,但是我只想要一个不同单词的列表,因此我想计算关键字的数量。有没有一种方法可以计算关键字的数量,或者还有另一种方法我应该寻找不同的单词?

I have a list of words in a dictionary with the value = the repetition of the keyword but I only want a list of distinct words so I wanted to count the number of keywords. Is there a way to count the number of keywords or is there another way I should look for distinct words?


回答 0

len(yourdict.keys())

要不就

len(yourdict)

如果您想计算文件中的唯一单词,则可以使用set并执行以下操作

len(set(open(yourdictfile).read().split()))
len(yourdict.keys())

or just

len(yourdict)

If you like to count unique words in the file, you could just use set and do like

len(set(open(yourdictfile).read().split()))

回答 1

可以使用该len()功能找到不同单词的数量(即词典中的条目数)。

> a = {'foo':42, 'bar':69}
> len(a)
2

要获取所有不同的单词(即键),请使用.keys()方法。

> list(a.keys())
['foo', 'bar']

The number of distinct words (i.e. count of entries in the dictionary) can be found using the len() function.

> a = {'foo':42, 'bar':69}
> len(a)
2

To get all the distinct words (i.e. the keys), use the .keys() method.

> list(a.keys())
['foo', 'bar']

回答 2

len()直接在您的字典上进行调用是有效的,并且比构建迭代器,d.keys()和对其进行调用要快len(),但是与您的程序正在执行的其他操作相比,两者的速度都可以忽略不计。

d = {x: x**2 for x in range(1000)}

len(d)
# 1000

len(d.keys())
# 1000

%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Calling len() directly on your dictionary works, and is faster than building an iterator, d.keys(), and calling len() on it, but the speed of either will negligible in comparison to whatever else your program is doing.

d = {x: x**2 for x in range(1000)}

len(d)
# 1000

len(d.keys())
# 1000

%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

回答 3

如果问题是关于计算关键字数量,则建议使用类似

def countoccurrences(store, value):
    try:
        store[value] = store[value] + 1
    except KeyError as e:
        store[value] = 1
    return

在main函数中有一些循环数据并将值传递给countoccurrences函数的东西

if __name__ == "__main__":
    store = {}
    list = ('a', 'a', 'b', 'c', 'c')
    for data in list:
        countoccurrences(store, data)
    for k, v in store.iteritems():
        print "Key " + k + " has occurred "  + str(v) + " times"

代码输出

Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times

If the question is about counting the number of keywords then would recommend something like

def countoccurrences(store, value):
    try:
        store[value] = store[value] + 1
    except KeyError as e:
        store[value] = 1
    return

in the main function have something that loops through the data and pass the values to countoccurrences function

if __name__ == "__main__":
    store = {}
    list = ('a', 'a', 'b', 'c', 'c')
    for data in list:
        countoccurrences(store, data)
    for k, v in store.iteritems():
        print "Key " + k + " has occurred "  + str(v) + " times"

The code outputs

Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times

回答 4

在发布的答案UnderWaterKremlin上进行了一些修改,以使其成为python3证明。下面是一个令人惊讶的结果作为答案。

系统规格:

  • python = 3.7.4,
  • 康达= 4.8.0
  • 3.6Ghz,8核心,16gb。
import timeit

d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000

print (len(d.keys()))
# 1000

print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000))        # 1

print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2

结果:

1)= 37.0100378

2)= 37.002148899999995

因此,len(d.keys())目前看来比使用来得快len()

Some modifications were made on posted answer UnderWaterKremlin to make it python3 proof. A surprising result below as answer.

System specs:

  • python =3.7.4,
  • conda = 4.8.0
  • 3.6Ghz, 8 core, 16gb.
import timeit

d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000

print (len(d.keys()))
# 1000

print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000))        # 1

print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2

Result:

1) = 37.0100378

2) = 37.002148899999995

So it seems that len(d.keys()) is currently faster than just using len().


Python中整数的长度

问题:Python中整数的长度

在Python中,如何找到整数位数?

In Python, how do you find the number of digits in an integer?


回答 0

如果您希望整数的长度与整数的位数一样,则可以始终将其转换为string,str(133)并找到其长度,如len(str(123))

If you want the length of an integer as in the number of digits in the integer, you can always convert it to string like str(133) and find its length like len(str(123)).


回答 1

不转换为字符串

import math
digits = int(math.log10(n))+1

同时处理零和负数

import math
if n > 0:
    digits = int(math.log10(n))+1
elif n == 0:
    digits = 1
else:
    digits = int(math.log10(-n))+2 # +1 if you don't count the '-' 

您可能希望将其放入函数中:)

这是一些基准。在len(str())已经落后的甚至是相当小的数字

timeit math.log10(2**8)
1000000 loops, best of 3: 746 ns per loop
timeit len(str(2**8))
1000000 loops, best of 3: 1.1 µs per loop

timeit math.log10(2**100)
1000000 loops, best of 3: 775 ns per loop
 timeit len(str(2**100))
100000 loops, best of 3: 3.2 µs per loop

timeit math.log10(2**10000)
1000000 loops, best of 3: 844 ns per loop
timeit len(str(2**10000))
100 loops, best of 3: 10.3 ms per loop

Without conversion to string

import math
digits = int(math.log10(n))+1

To also handle zero and negative numbers

import math
if n > 0:
    digits = int(math.log10(n))+1
elif n == 0:
    digits = 1
else:
    digits = int(math.log10(-n))+2 # +1 if you don't count the '-' 

You’d probably want to put that in a function :)

Here are some benchmarks. The len(str()) is already behind for even quite small numbers

timeit math.log10(2**8)
1000000 loops, best of 3: 746 ns per loop
timeit len(str(2**8))
1000000 loops, best of 3: 1.1 µs per loop

timeit math.log10(2**100)
1000000 loops, best of 3: 775 ns per loop
 timeit len(str(2**100))
100000 loops, best of 3: 3.2 µs per loop

timeit math.log10(2**10000)
1000000 loops, best of 3: 844 ns per loop
timeit len(str(2**10000))
100 loops, best of 3: 10.3 ms per loop

回答 2

所有math.log10解决方案都会给您带来问题。

math.log10速度很快,但是当您的数字大于999999999999997时会出现问题。这是因为浮点数太多.9s,导致结果四舍五入。

解决方案是对大于该阈值的数字使用while计数器方法。

为了使速度更快,请创建10 ^ 16、10 ^ 17,依此类推,并作为变量存储在列表中。这样,它就像一个表查找。

def getIntegerPlaces(theNumber):
    if theNumber <= 999999999999997:
        return int(math.log10(theNumber)) + 1
    else:
        counter = 15
        while theNumber >= 10**counter:
            counter += 1
        return counter

All math.log10 solutions will give you problems.

math.log10 is fast but gives problem when your number is greater than 999999999999997. This is because the float have too many .9s, causing the result to round up.

The solution is to use a while counter method for numbers above that threshold.

To make this even faster, create 10^16, 10^17 so on so forth and store as variables in a list. That way, it is like a table lookup.

def getIntegerPlaces(theNumber):
    if theNumber <= 999999999999997:
        return int(math.log10(theNumber)) + 1
    else:
        counter = 15
        while theNumber >= 10**counter:
            counter += 1
        return counter

回答 3

Python 2.* int占用4或8个字节(32或64位),具体取决于您的Python构建。 sys.maxint2**31-1对于32位整数,2**63-1对于64位整数)将告诉您获得两种可能性中的哪一种。

在Python 3中,ints(类似于longPython 2中的)可以采用任意大小,直到可用内存量为止。sys.getsizeof为您提供任何给定值一个很好的迹象,但它确实也算一些固定开销:

>>> import sys
>>> sys.getsizeof(0)
12
>>> sys.getsizeof(2**99)
28

如果,正如其他答案所暗示的那样,如果您正在考虑某个整数值的字符串表示形式,则只需采用该len表示形式的形式即可,无论是以10还是10为底!

Python 2.* ints take either 4 or 8 bytes (32 or 64 bits), depending on your Python build. sys.maxint (2**31-1 for 32-bit ints, 2**63-1 for 64-bit ints) will tell you which of the two possibilities obtains.

In Python 3, ints (like longs in Python 2) can take arbitrary sizes up to the amount of available memory; sys.getsizeof gives you a good indication for any given value, although it does also count some fixed overhead:

>>> import sys
>>> sys.getsizeof(0)
12
>>> sys.getsizeof(2**99)
28

If, as other answers suggests, you’re thinking about some string representation of the integer value, then just take the len of that representation, be it in base 10 or otherwise!


回答 4

自问这个问题以来已经有好几年了,但是我已经为几种计算整数长度的方法编制了基准。

def libc_size(i): 
    return libc.snprintf(buf, 100, c_char_p(b'%i'), i) # equivalent to `return snprintf(buf, 100, "%i", i);`

def str_size(i):
    return len(str(i)) # Length of `i` as a string

def math_size(i):
    return 1 + math.floor(math.log10(i)) # 1 + floor of log10 of i

def exp_size(i):
    return int("{:.5e}".format(i).split("e")[1]) + 1 # e.g. `1e10` -> `10` + 1 -> 11

def mod_size(i):
    return len("%i" % i) # Uses string modulo instead of str(i)

def fmt_size(i):
    return len("{0}".format(i)) # Same as above but str.format

(libc函数需要一些设置,我没有包括在内)

size_exp感谢Brian Preslopsky,size_str感谢GeekTantra,以及size_math John La Rooy

结果如下:

Time for libc size:      1.2204 μs
Time for string size:    309.41 ns
Time for math size:      329.54 ns
Time for exp size:       1.4902 μs
Time for mod size:       249.36 ns
Time for fmt size:       336.63 ns
In order of speed (fastest first):
+ mod_size (1.000000x)
+ str_size (1.240835x)
+ math_size (1.321577x)
+ fmt_size (1.350007x)
+ libc_size (4.894290x)
+ exp_size (5.976219x)

(免责声明:该函数在输入1到1,000,000上运行)

下面是结果sys.maxsize - 100000sys.maxsize

Time for libc size:      1.4686 μs
Time for string size:    395.76 ns
Time for math size:      485.94 ns
Time for exp size:       1.6826 μs
Time for mod size:       364.25 ns
Time for fmt size:       453.06 ns
In order of speed (fastest first):
+ mod_size (1.000000x)
+ str_size (1.086498x)
+ fmt_size (1.243817x)
+ math_size (1.334066x)
+ libc_size (4.031780x)
+ exp_size (4.619188x)

如您所见,mod_sizelen("%i" % i))是最快的,比使用速度稍快,str(i)并且比其他速度快得多。

It’s been several years since this question was asked, but I have compiled a benchmark of several methods to calculate the length of an integer.

def libc_size(i): 
    return libc.snprintf(buf, 100, c_char_p(b'%i'), i) # equivalent to `return snprintf(buf, 100, "%i", i);`

def str_size(i):
    return len(str(i)) # Length of `i` as a string

def math_size(i):
    return 1 + math.floor(math.log10(i)) # 1 + floor of log10 of i

def exp_size(i):
    return int("{:.5e}".format(i).split("e")[1]) + 1 # e.g. `1e10` -> `10` + 1 -> 11

def mod_size(i):
    return len("%i" % i) # Uses string modulo instead of str(i)

def fmt_size(i):
    return len("{0}".format(i)) # Same as above but str.format

(the libc function requires some setup, which I haven’t included)

size_exp is thanks to Brian Preslopsky, size_str is thanks to GeekTantra, and size_math is thanks to John La Rooy

Here are the results:

Time for libc size:      1.2204 μs
Time for string size:    309.41 ns
Time for math size:      329.54 ns
Time for exp size:       1.4902 μs
Time for mod size:       249.36 ns
Time for fmt size:       336.63 ns
In order of speed (fastest first):
+ mod_size (1.000000x)
+ str_size (1.240835x)
+ math_size (1.321577x)
+ fmt_size (1.350007x)
+ libc_size (4.894290x)
+ exp_size (5.976219x)

(Disclaimer: the function is run on inputs 1 to 1,000,000)

Here are the results for sys.maxsize - 100000 to sys.maxsize:

Time for libc size:      1.4686 μs
Time for string size:    395.76 ns
Time for math size:      485.94 ns
Time for exp size:       1.6826 μs
Time for mod size:       364.25 ns
Time for fmt size:       453.06 ns
In order of speed (fastest first):
+ mod_size (1.000000x)
+ str_size (1.086498x)
+ fmt_size (1.243817x)
+ math_size (1.334066x)
+ libc_size (4.031780x)
+ exp_size (4.619188x)

As you can see, mod_size (len("%i" % i)) is the fastest, slightly faster than using str(i) and significantly faster than others.


回答 5

设数字为,n则数字的位数n为:

math.floor(math.log10(n))+1

请注意,这将为+ ve整数<10e15给出正确的答案。除此之外,返回类型的精度限制math.log10和答案可能会相差1 len(str(n))。这需要的O(log(n))时间与对10的幂进行迭代相同。

感谢@SetiVolkylany将我的注意力带到了这个限制上。看起来正确的解决方案在实现细节上有一些警告,这令人惊讶。

Let the number be n then the number of digits in n is given by:

math.floor(math.log10(n))+1

Note that this will give correct answers for +ve integers < 10e15. Beyond that the precision limits of the return type of math.log10 kicks in and the answer may be off by 1. I would simply use len(str(n)) beyond that; this requires O(log(n)) time which is same as iterating over powers of 10.

Thanks to @SetiVolkylany for bringing my attenstion to this limitation. Its amazing how seemingly correct solutions have caveats in implementation details.


回答 6

好吧,如果不转换为字符串,我将执行以下操作:

def lenDigits(x): 
    """
    Assumes int(x)
    """

    x = abs(x)

    if x < 10:
        return 1

    return 1 + lenDigits(x / 10)

极简递归FTW

Well, without converting to string I would do something like:

def lenDigits(x): 
    """
    Assumes int(x)
    """

    x = abs(x)

    if x < 10:
        return 1

    return 1 + lenDigits(x / 10)

Minimalist recursion FTW


回答 7

计算不将整数转换为字符串的位数:

x=123
x=abs(x)
i = 0
while x >= 10**i:
    i +=1
# i is the number of digits

Count the number of digits w/o convert integer to a string:

x=123
x=abs(x)
i = 0
while x >= 10**i:
    i +=1
# i is the number of digits

回答 8

如亲爱的用户@Calvintwr所提到的,该函数math.log10在范围[-999999999999997,999999999999997]之外的数字中存在问题,在此我们会出现浮点错误。我在JavaScript(Google V8和NodeJS)和C(GNU GCC编译器)上遇到了这个问题,因此'purely mathematically'这里没有解决方案。


基于这个要点答案,亲爱的用户@Calvintwr

import math


def get_count_digits(number: int):
    """Return number of digits in a number."""

    if number == 0:
        return 1

    number = abs(number)

    if number <= 999999999999997:
        return math.floor(math.log10(number)) + 1

    count = 0
    while number:
        count += 1
        number //= 10
    return count

我测试了长度不超过20(含)的数字,还好。这必须足够,因为在64位系统上,最大长度为整数(19 len(str(sys.maxsize)) == 19)。

assert get_count_digits(-99999999999999999999) == 20
assert get_count_digits(-10000000000000000000) == 20
assert get_count_digits(-9999999999999999999) == 19
assert get_count_digits(-1000000000000000000) == 19
assert get_count_digits(-999999999999999999) == 18
assert get_count_digits(-100000000000000000) == 18
assert get_count_digits(-99999999999999999) == 17
assert get_count_digits(-10000000000000000) == 17
assert get_count_digits(-9999999999999999) == 16
assert get_count_digits(-1000000000000000) == 16
assert get_count_digits(-999999999999999) == 15
assert get_count_digits(-100000000000000) == 15
assert get_count_digits(-99999999999999) == 14
assert get_count_digits(-10000000000000) == 14
assert get_count_digits(-9999999999999) == 13
assert get_count_digits(-1000000000000) == 13
assert get_count_digits(-999999999999) == 12
assert get_count_digits(-100000000000) == 12
assert get_count_digits(-99999999999) == 11
assert get_count_digits(-10000000000) == 11
assert get_count_digits(-9999999999) == 10
assert get_count_digits(-1000000000) == 10
assert get_count_digits(-999999999) == 9
assert get_count_digits(-100000000) == 9
assert get_count_digits(-99999999) == 8
assert get_count_digits(-10000000) == 8
assert get_count_digits(-9999999) == 7
assert get_count_digits(-1000000) == 7
assert get_count_digits(-999999) == 6
assert get_count_digits(-100000) == 6
assert get_count_digits(-99999) == 5
assert get_count_digits(-10000) == 5
assert get_count_digits(-9999) == 4
assert get_count_digits(-1000) == 4
assert get_count_digits(-999) == 3
assert get_count_digits(-100) == 3
assert get_count_digits(-99) == 2
assert get_count_digits(-10) == 2
assert get_count_digits(-9) == 1
assert get_count_digits(-1) == 1
assert get_count_digits(0) == 1
assert get_count_digits(1) == 1
assert get_count_digits(9) == 1
assert get_count_digits(10) == 2
assert get_count_digits(99) == 2
assert get_count_digits(100) == 3
assert get_count_digits(999) == 3
assert get_count_digits(1000) == 4
assert get_count_digits(9999) == 4
assert get_count_digits(10000) == 5
assert get_count_digits(99999) == 5
assert get_count_digits(100000) == 6
assert get_count_digits(999999) == 6
assert get_count_digits(1000000) == 7
assert get_count_digits(9999999) == 7
assert get_count_digits(10000000) == 8
assert get_count_digits(99999999) == 8
assert get_count_digits(100000000) == 9
assert get_count_digits(999999999) == 9
assert get_count_digits(1000000000) == 10
assert get_count_digits(9999999999) == 10
assert get_count_digits(10000000000) == 11
assert get_count_digits(99999999999) == 11
assert get_count_digits(100000000000) == 12
assert get_count_digits(999999999999) == 12
assert get_count_digits(1000000000000) == 13
assert get_count_digits(9999999999999) == 13
assert get_count_digits(10000000000000) == 14
assert get_count_digits(99999999999999) == 14
assert get_count_digits(100000000000000) == 15
assert get_count_digits(999999999999999) == 15
assert get_count_digits(1000000000000000) == 16
assert get_count_digits(9999999999999999) == 16
assert get_count_digits(10000000000000000) == 17
assert get_count_digits(99999999999999999) == 17
assert get_count_digits(100000000000000000) == 18
assert get_count_digits(999999999999999999) == 18
assert get_count_digits(1000000000000000000) == 19
assert get_count_digits(9999999999999999999) == 19
assert get_count_digits(10000000000000000000) == 20
assert get_count_digits(99999999999999999999) == 20

使用Python 3.5测试的所有代码示例

As mentioned the dear user @Calvintwr, the function math.log10 has problem in a number outside of a range [-999999999999997, 999999999999997], where we get floating point errors. I had this problem with the JavaScript (the Google V8 and the NodeJS) and the C (the GNU GCC compiler), so a 'purely mathematically' solution is impossible here.


Based on this gist and the answer the dear user @Calvintwr

import math


def get_count_digits(number: int):
    """Return number of digits in a number."""

    if number == 0:
        return 1

    number = abs(number)

    if number <= 999999999999997:
        return math.floor(math.log10(number)) + 1

    count = 0
    while number:
        count += 1
        number //= 10
    return count

I tested it on numbers with length up to 20 (inclusive) and all right. It must be enough, because the length max integer number on a 64-bit system is 19 (len(str(sys.maxsize)) == 19).

assert get_count_digits(-99999999999999999999) == 20
assert get_count_digits(-10000000000000000000) == 20
assert get_count_digits(-9999999999999999999) == 19
assert get_count_digits(-1000000000000000000) == 19
assert get_count_digits(-999999999999999999) == 18
assert get_count_digits(-100000000000000000) == 18
assert get_count_digits(-99999999999999999) == 17
assert get_count_digits(-10000000000000000) == 17
assert get_count_digits(-9999999999999999) == 16
assert get_count_digits(-1000000000000000) == 16
assert get_count_digits(-999999999999999) == 15
assert get_count_digits(-100000000000000) == 15
assert get_count_digits(-99999999999999) == 14
assert get_count_digits(-10000000000000) == 14
assert get_count_digits(-9999999999999) == 13
assert get_count_digits(-1000000000000) == 13
assert get_count_digits(-999999999999) == 12
assert get_count_digits(-100000000000) == 12
assert get_count_digits(-99999999999) == 11
assert get_count_digits(-10000000000) == 11
assert get_count_digits(-9999999999) == 10
assert get_count_digits(-1000000000) == 10
assert get_count_digits(-999999999) == 9
assert get_count_digits(-100000000) == 9
assert get_count_digits(-99999999) == 8
assert get_count_digits(-10000000) == 8
assert get_count_digits(-9999999) == 7
assert get_count_digits(-1000000) == 7
assert get_count_digits(-999999) == 6
assert get_count_digits(-100000) == 6
assert get_count_digits(-99999) == 5
assert get_count_digits(-10000) == 5
assert get_count_digits(-9999) == 4
assert get_count_digits(-1000) == 4
assert get_count_digits(-999) == 3
assert get_count_digits(-100) == 3
assert get_count_digits(-99) == 2
assert get_count_digits(-10) == 2
assert get_count_digits(-9) == 1
assert get_count_digits(-1) == 1
assert get_count_digits(0) == 1
assert get_count_digits(1) == 1
assert get_count_digits(9) == 1
assert get_count_digits(10) == 2
assert get_count_digits(99) == 2
assert get_count_digits(100) == 3
assert get_count_digits(999) == 3
assert get_count_digits(1000) == 4
assert get_count_digits(9999) == 4
assert get_count_digits(10000) == 5
assert get_count_digits(99999) == 5
assert get_count_digits(100000) == 6
assert get_count_digits(999999) == 6
assert get_count_digits(1000000) == 7
assert get_count_digits(9999999) == 7
assert get_count_digits(10000000) == 8
assert get_count_digits(99999999) == 8
assert get_count_digits(100000000) == 9
assert get_count_digits(999999999) == 9
assert get_count_digits(1000000000) == 10
assert get_count_digits(9999999999) == 10
assert get_count_digits(10000000000) == 11
assert get_count_digits(99999999999) == 11
assert get_count_digits(100000000000) == 12
assert get_count_digits(999999999999) == 12
assert get_count_digits(1000000000000) == 13
assert get_count_digits(9999999999999) == 13
assert get_count_digits(10000000000000) == 14
assert get_count_digits(99999999999999) == 14
assert get_count_digits(100000000000000) == 15
assert get_count_digits(999999999999999) == 15
assert get_count_digits(1000000000000000) == 16
assert get_count_digits(9999999999999999) == 16
assert get_count_digits(10000000000000000) == 17
assert get_count_digits(99999999999999999) == 17
assert get_count_digits(100000000000000000) == 18
assert get_count_digits(999999999999999999) == 18
assert get_count_digits(1000000000000000000) == 19
assert get_count_digits(9999999999999999999) == 19
assert get_count_digits(10000000000000000000) == 20
assert get_count_digits(99999999999999999999) == 20

All example of codes tested with the Python 3.5


回答 9

对于后代来说,毫无疑问,这是迄今为止最慢的解决方案:

def num_digits(num, number_of_calls=1):
    "Returns the number of digits of an integer num."
    if num == 0 or num == -1:
        return 1 if number_of_calls == 1 else 0
    else:
        return 1 + num_digits(num/10, number_of_calls+1)

For posterity, no doubt by far the slowest solution to this problem:

def num_digits(num, number_of_calls=1):
    "Returns the number of digits of an integer num."
    if num == 0 or num == -1:
        return 1 if number_of_calls == 1 else 0
    else:
        return 1 + num_digits(num/10, number_of_calls+1)

回答 10

from math import log10
digits = lambda n: ((n==0) and 1) or int(log10(abs(n)))+1
from math import log10
digits = lambda n: ((n==0) and 1) or int(log10(abs(n)))+1

回答 11

假设您要求存储的最大数字整数,则该值取决于实现。我建议您在使用python时不要以这种方式思考。无论如何,可以在python’integer’中存储很大的值。记住,Python使用鸭子类型!

编辑: 我在澄清询问者要求数字位数之前给出了答案。为此,我同意接受的答案所建议的方法。没有什么可添加的!

Assuming you are asking for the largest number you can store in an integer, the value is implementation dependent. I suggest that you don’t think in that way when using python. In any case, quite a large value can be stored in a python ‘integer’. Remember, Python uses duck typing!

Edit: I gave my answer before the clarification that the asker wanted the number of digits. For that, I agree with the method suggested by the accepted answer. Nothing more to add!


回答 12

def length(i):
  return len(str(i))
def length(i):
  return len(str(i))

回答 13

可以使用以下方法快速完成整数操作:

len(str(abs(1234567890)))

它获取绝对值“ 1234567890”的字符串的长度

abs返回没有任何负数的数字(仅数字的大小),str将其转换/转换为字符串,并len返回该字符串的字符串长度。

如果您希望它适用于浮点数,则可以使用以下两种方法之一:

# Ignore all after decimal place
len(str(abs(0.1234567890)).split(".")[0])

# Ignore just the decimal place
len(str(abs(0.1234567890)))-1

备查。

It can be done for integers quickly by using:

len(str(abs(1234567890)))

Which gets the length of the string of the absolute value of “1234567890”

abs returns the number WITHOUT any negatives (only the magnitude of the number), str casts/converts it to a string and len returns the string length of that string.

If you want it to work for floats, you can use either of the following:

# Ignore all after decimal place
len(str(abs(0.1234567890)).split(".")[0])

# Ignore just the decimal place
len(str(abs(0.1234567890)))-1

For future reference.


回答 14

以科学记数法表示格式并得出指数:

int("{:.5e}".format(1000000).split("e")[1]) + 1

我不知道速度,但这很简单。

请注意,小数点后的有效位数(如果将科学计数的小数部分四舍五入到另一个数字,则“ .5e”中的“ 5”可能是一个问题。我将其设置为任意大,但可以反映您知道的最大数字的长度。

Format in scientific notation and pluck off the exponent:

int("{:.5e}".format(1000000).split("e")[1]) + 1

I don’t know about speed, but it’s simple.

Please note the number of significant digits after the decimal (the “5” in the “.5e” can be an issue if it rounds up the decimal part of the scientific notation to another digit. I set it arbitrarily large, but could reflect the length of the largest number you know about.


回答 15

def count_digit(number):
  if number >= 10:
    count = 2
  else:
    count = 1
  while number//10 > 9:
    count += 1
    number = number//10
  return count
def count_digit(number):
  if number >= 10:
    count = 2
  else:
    count = 1
  while number//10 > 9:
    count += 1
    number = number//10
  return count

回答 16

如果必须要求用户输入内容,然后必须计算有多少个数字,则可以按照以下步骤操作:

count_number = input('Please enter a number\t')

print(len(count_number))

注意:切勿将int用作用户输入。

If you have to ask an user to give input and then you have to count how many numbers are there then you can follow this:

count_number = input('Please enter a number\t')

print(len(count_number))

Note: Never take an int as user input.


回答 17

def digits(n)
    count = 0
    if n == 0:
        return 1
    while (n >= 10**count):
        count += 1
        n += n%10
    return count
print(digits(25))   # Should print 2
print(digits(144))  # Should print 3
print(digits(1000)) # Should print 4
print(digits(0))    # Should print 1
def digits(n)
    count = 0
    if n == 0:
        return 1
    
    if n < 0:
        n *= -1

    while (n >= 10**count):
        count += 1
        n += n%10

    return count

print(digits(25))   # Should print 2
print(digits(144))  # Should print 3
print(digits(1000)) # Should print 4
print(digits(0))    # Should print 1

回答 18

我的代码与此相同;我使用了log10方法:

from math import *

def digit_count(数字):

if number>1 and round(log10(number))>=log10(number) and number%10!=0 :
    return round(log10(number))
elif  number>1 and round(log10(number))<log10(number) and number%10!=0:
    return round(log10(number))+1
elif number%10==0 and number!=0:
    return int(log10(number)+1)
elif number==1 or number==0:
    return 1

我必须指定1和0的情况,因为log10(1)= 0和log10(0)= ND,因此不满足提到的条件。但是,此代码仅适用于整数。

My code for the same is as follows;i have used the log10 method:

from math import *

def digit_count(number):

if number>1 and round(log10(number))>=log10(number) and number%10!=0 :
    return round(log10(number))
elif  number>1 and round(log10(number))<log10(number) and number%10!=0:
    return round(log10(number))+1
elif number%10==0 and number!=0:
    return int(log10(number)+1)
elif number==1 or number==0:
    return 1

I had to specify in case of 1 and 0 because log10(1)=0 and log10(0)=ND and hence the condition mentioned isn’t satisfied. However, this code works only for whole numbers.


回答 19

这是一个笨重但快速的版本:

def nbdigit ( x ):
    if x >= 10000000000000000 : # 17 -
        return len( str( x ))
    if x < 100000000 : # 1 - 8
        if x < 10000 : # 1 - 4
            if x < 100             : return (x >= 10)+1 
            else                   : return (x >= 1000)+3
        else: # 5 - 8                                                 
            if x < 1000000         : return (x >= 100000)+5 
            else                   : return (x >= 10000000)+7
    else: # 9 - 16 
        if x < 1000000000000 : # 9 - 12
            if x < 10000000000     : return (x >= 1000000000)+9 
            else                   : return (x >= 100000000000)+11
        else: # 13 - 16
            if x < 100000000000000 : return (x >= 10000000000000)+13 
            else                   : return (x >= 1000000000000000)+15

对于不太大的数字,只有5个比较。在我的计算机上,它比该math.log10版本快30%,比该版本快5%len( str())。好吧…如果您不疯狂使用它,那么没有吸引力。

这是我用来测试/测量功能的一组数字:

n = [ int( (i+1)**( 17/7. )) for i in xrange( 1000000 )] + [0,10**16-1,10**16,10**16+1]

注意:它不管理负数,但适应很容易…

Here is a bulky but fast version :

def nbdigit ( x ):
    if x >= 10000000000000000 : # 17 -
        return len( str( x ))
    if x < 100000000 : # 1 - 8
        if x < 10000 : # 1 - 4
            if x < 100             : return (x >= 10)+1 
            else                   : return (x >= 1000)+3
        else: # 5 - 8                                                 
            if x < 1000000         : return (x >= 100000)+5 
            else                   : return (x >= 10000000)+7
    else: # 9 - 16 
        if x < 1000000000000 : # 9 - 12
            if x < 10000000000     : return (x >= 1000000000)+9 
            else                   : return (x >= 100000000000)+11
        else: # 13 - 16
            if x < 100000000000000 : return (x >= 10000000000000)+13 
            else                   : return (x >= 1000000000000000)+15

Only 5 comparisons for not too big numbers. On my computer it is about 30% faster than the math.log10 version and 5% faster than the len( str()) one. Ok… no so attractive if you don’t use it furiously.

And here is the set of numbers I used to test/measure my function:

n = [ int( (i+1)**( 17/7. )) for i in xrange( 1000000 )] + [0,10**16-1,10**16,10**16+1]

NB: it does not manage negative numbers, but the adaptation is easy…


回答 20

>>> a=12345
>>> a.__str__().__len__()
5
>>> a=12345
>>> a.__str__().__len__()
5

等效的熊猫数(不同)

问题:等效的熊猫数(不同)

我使用pandas作为数据库替代品,因为我有多个数据库(oracle,mssql等),并且无法对SQL等效命令进行一系列命令。

我在DataFrame中加载了一个带有一些列的表:

YEARMONTH, CLIENTCODE, SIZE, .... etc etc

在SQL中,每年计算不同客户端的数量将是:

SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;

结果将是

201301    5000
201302    13245

如何在熊猫中做到这一点?

I am using pandas as a db substitute as I have multiple databases (oracle, mssql, etc) and I am unable to make a sequence of commands to a SQL equivalent.

I have a table loaded in a DataFrame with some columns:

YEARMONTH, CLIENTCODE, SIZE, .... etc etc

In SQL, to count the amount of different clients per year would be:

SELECT count(distinct CLIENTCODE) FROM table GROUP BY YEARMONTH;

And the result would be

201301    5000
201302    13245

How can I do that in pandas?


回答 0

我相信这就是您想要的:

table.groupby('YEARMONTH').CLIENTCODE.nunique()

例:

In [2]: table
Out[2]: 
   CLIENTCODE  YEARMONTH
0           1     201301
1           1     201301
2           2     201301
3           1     201302
4           2     201302
5           2     201302
6           3     201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]: 
YEARMONTH
201301       2
201302       3

I believe this is what you want:

table.groupby('YEARMONTH').CLIENTCODE.nunique()

Example:

In [2]: table
Out[2]: 
   CLIENTCODE  YEARMONTH
0           1     201301
1           1     201301
2           2     201301
3           1     201302
4           2     201302
5           2     201302
6           3     201302

In [3]: table.groupby('YEARMONTH').CLIENTCODE.nunique()
Out[3]: 
YEARMONTH
201301       2
201302       3

回答 1

这是另一种方法,非常简单,可以说您的数据框名称为daat,列名称为YEARMONTH

daat.YEARMONTH.value_counts()

Here is another method, much simple, lets say your dataframe name is daat and column name is YEARMONTH

daat.YEARMONTH.value_counts()

回答 2

有趣的是,通常len(unique())速度比快几倍(3x-15x)nunique()

Interestingly enough, very often len(unique()) is a few times (3x-15x) faster than nunique().


回答 3

使用crosstab,这将返回比groupby nunique

pd.crosstab(df.YEARMONTH,df.CLIENTCODE)
Out[196]: 
CLIENTCODE  1  2  3
YEARMONTH          
201301      2  1  0
201302      1  2  1

稍作修改后,产生结果

pd.crosstab(df.YEARMONTH,df.CLIENTCODE).ne(0).sum(1)
Out[197]: 
YEARMONTH
201301    2
201302    3
dtype: int64

Using crosstab, this will return more information than groupby nunique

pd.crosstab(df.YEARMONTH,df.CLIENTCODE)
Out[196]: 
CLIENTCODE  1  2  3
YEARMONTH          
201301      2  1  0
201302      1  2  1

After a little bit modify ,yield the result

pd.crosstab(df.YEARMONTH,df.CLIENTCODE).ne(0).sum(1)
Out[197]: 
YEARMONTH
201301    2
201302    3
dtype: int64

回答 4

我也在使用,nunique但是如果您必须使用诸如'min', 'max', 'count' or 'mean'etc之类的聚合函数,它将非常有帮助。

df.groupby('YEARMONTH')['CLIENTCODE'].transform('nunique') #count(distinct)
df.groupby('YEARMONTH')['CLIENTCODE'].transform('min')     #min
df.groupby('YEARMONTH')['CLIENTCODE'].transform('max')     #max
df.groupby('YEARMONTH')['CLIENTCODE'].transform('mean')    #average
df.groupby('YEARMONTH')['CLIENTCODE'].transform('count')   #count

I am also using nunique but it will be very helpful if you have to use an aggregate function like 'min', 'max', 'count' or 'mean' etc.

df.groupby('YEARMONTH')['CLIENTCODE'].transform('nunique') #count(distinct)
df.groupby('YEARMONTH')['CLIENTCODE'].transform('min')     #min
df.groupby('YEARMONTH')['CLIENTCODE'].transform('max')     #max
df.groupby('YEARMONTH')['CLIENTCODE'].transform('mean')    #average
df.groupby('YEARMONTH')['CLIENTCODE'].transform('count')   #count

回答 5

使用新的Pandas版本,很容易获得数据框

unique_count = pd.groupby(['YEARMONTH'], as_index=False).agg(uniq_CLIENTCODE =('CLIENTCODE',pd.Series.count))

With new pandas version, it is easy to get as dataframe

unique_count = pd.groupby(['YEARMONTH'], as_index=False).agg(uniq_CLIENTCODE =('CLIENTCODE',pd.Series.count))

回答 6

这是一种在多个列上具有不同计数的方法。让我们来一些数据:

data = {'CLIENT_CODE':[1,1,2,1,2,2,3],
        'YEAR_MONTH':[201301,201301,201301,201302,201302,201302,201302],
        'PRODUCT_CODE': [100,150,220,400,50,80,100]
       }
table = pd.DataFrame(data)
table

CLIENT_CODE YEAR_MONTH  PRODUCT_CODE
0   1       201301      100
1   1       201301      150
2   2       201301      220
3   1       201302      400
4   2       201302      50
5   2       201302      80
6   3       201302      100

现在,列出感兴趣的列,并使用经过稍微修改的语法的groupby:

columns = ['YEAR_MONTH', 'PRODUCT_CODE']
table[columns].groupby(table['CLIENT_CODE']).nunique()

我们获得:

YEAR_MONTH  PRODUCT_CODE CLIENT_CODE        
1           2            3
2           2            3
3           1            1

Here an approach to have count distinct over multiple columns. Let’s have some data:

data = {'CLIENT_CODE':[1,1,2,1,2,2,3],
        'YEAR_MONTH':[201301,201301,201301,201302,201302,201302,201302],
        'PRODUCT_CODE': [100,150,220,400,50,80,100]
       }
table = pd.DataFrame(data)
table

CLIENT_CODE YEAR_MONTH  PRODUCT_CODE
0   1       201301      100
1   1       201301      150
2   2       201301      220
3   1       201302      400
4   2       201302      50
5   2       201302      80
6   3       201302      100

Now, list the columns of interest and use groupby in a slightly modified syntax:

columns = ['YEAR_MONTH', 'PRODUCT_CODE']
table[columns].groupby(table['CLIENT_CODE']).nunique()

We obtain:

YEAR_MONTH  PRODUCT_CODE CLIENT_CODE        
1           2            3
2           2            3
3           1            1

回答 7

列的不同以及其他列上的聚合

要获取任何列(CLIENTCODE在您的情况下)的不同数量的值,可以使用nunique。我们可以将输入作为字典传递给agg函数,以及其他列的聚合:

grp_df = df.groupby('YEARMONTH').agg({'CLIENTCODE': ['nunique'],
                                      'other_col_1': ['sum', 'count']})

# to flatten the multi-level columns
grp_df.columns = ["_".join(col).strip() for col in grp_df.columns.values]

# if you wish to reset the index
grp_df.reset_index(inplace=True)

Distinct of column along with aggregations on other columns

To get the distinct number of values for any column (CLIENTCODE in your case), we can use nunique. We can pass the input as a dictionary in agg function, along with aggregations on other columns:

grp_df = df.groupby('YEARMONTH').agg({'CLIENTCODE': ['nunique'],
                                      'other_col_1': ['sum', 'count']})

# to flatten the multi-level columns
grp_df.columns = ["_".join(col).strip() for col in grp_df.columns.values]

# if you wish to reset the index
grp_df.reset_index(inplace=True)

如何计算Python中ndarray中某些项目的出现?

问题:如何计算Python中ndarray中某些项目的出现?

在Python中,我有一个ndarray y 打印为array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

我试图计算这个数组中有多少个0和多少个1

但是当我输入y.count(0)or时y.count(1),它说

numpy.ndarray 对象没有属性 count

我该怎么办?

In Python, I have an ndarray y that is printed as array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

I’m trying to count how many 0s and how many 1s are there in this array.

But when I type y.count(0) or y.count(1), it says

numpy.ndarray object has no attribute count

What should I do?


回答 0

>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> unique, counts = numpy.unique(a, return_counts=True)
>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}

非numpy方式

使用collections.Counter;

>> import collections, numpy

>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> collections.Counter(a)
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})
>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> unique, counts = numpy.unique(a, return_counts=True)
>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}

Non-numpy way:

Use collections.Counter;

>> import collections, numpy

>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> collections.Counter(a)
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})

回答 1

那使用numpy.count_nonzero什么呢

>>> import numpy as np
>>> y = np.array([1, 2, 2, 2, 2, 0, 2, 3, 3, 3, 0, 0, 2, 2, 0])

>>> np.count_nonzero(y == 1)
1
>>> np.count_nonzero(y == 2)
7
>>> np.count_nonzero(y == 3)
3

What about using numpy.count_nonzero, something like

>>> import numpy as np
>>> y = np.array([1, 2, 2, 2, 2, 0, 2, 3, 3, 3, 0, 0, 2, 2, 0])

>>> np.count_nonzero(y == 1)
1
>>> np.count_nonzero(y == 2)
7
>>> np.count_nonzero(y == 3)
3

回答 2

就个人而言,我会去: (y == 0).sum()(y == 1).sum()

例如

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

Personally, I’d go for: (y == 0).sum() and (y == 1).sum()

E.g.

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

回答 3

对于您的情况,您还可以查看numpy.bincount

In [56]: a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

In [57]: np.bincount(a)
Out[57]: array([8, 4])  #count of zeros is at index 0 : 8
                        #count of ones is at index 1 : 4

For your case you could also look into numpy.bincount

In [56]: a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

In [57]: np.bincount(a)
Out[57]: array([8, 4])  #count of zeros is at index 0 : 8
                        #count of ones is at index 1 : 4

回答 4

将数组转换y为列表l,然后执行l.count(1)l.count(0)

>>> y = numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>> l = list(y)
>>> l.count(1)
4
>>> l.count(0)
8 

Convert your array y to list l and then do l.count(1) and l.count(0)

>>> y = numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>> l = list(y)
>>> l.count(1)
4
>>> l.count(0)
8 

回答 5

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

如果你知道,他们只是01

np.sum(y)

给你的数量。 np.sum(1-y)给出零。

为了稍微概括起见,如果要计数0而不是零(但可能是2或3):

np.count_nonzero(y)

给出非零的数量。

但是,如果您需要更复杂的东西,我认为numpy不会提供一个不错的count选择。在这种情况下,请转到集合:

import collections
collections.Counter(y)
> Counter({0: 8, 1: 4})

这就像一个字典

collections.Counter(y)[0]
> 8
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

If you know that they are just 0 and 1:

np.sum(y)

gives you the number of ones. np.sum(1-y) gives the zeroes.

For slight generality, if you want to count 0 and not zero (but possibly 2 or 3):

np.count_nonzero(y)

gives the number of nonzero.

But if you need something more complicated, I don’t think numpy will provide a nice count option. In that case, go to collections:

import collections
collections.Counter(y)
> Counter({0: 8, 1: 4})

This behaves like a dict

collections.Counter(y)[0]
> 8

回答 6

如果您确切知道要查找的号码,则可以使用以下代码;

lst = np.array([1,1,2,3,3,6,6,6,3,2,1])
(lst == 2).sum()

返回数组中发生2的次数。

If you know exactly which number you’re looking for, you can use the following;

lst = np.array([1,1,2,3,3,6,6,6,3,2,1])
(lst == 2).sum()

returns how many times 2 is occurred in your array.


回答 7

老实说,我发现将其转换为pandas系列或DataFrame最简单:

import pandas as pd
import numpy as np

df = pd.DataFrame({'data':np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])})
print df['data'].value_counts()

或罗伯特·穆伊(Robert Muil)提出的这一好话:

pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()

Honestly I find it easiest to convert to a pandas Series or DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'data':np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])})
print df['data'].value_counts()

Or this nice one-liner suggested by Robert Muil:

pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()

回答 8

没有人建议使用numpy.bincount(input, minlength)minlength = np.size(input),但它似乎是一个很好的解决方案,并且绝对是最快的

In [1]: choices = np.random.randint(0, 100, 10000)

In [2]: %timeit [ np.sum(choices == k) for k in range(min(choices), max(choices)+1) ]
100 loops, best of 3: 2.67 ms per loop

In [3]: %timeit np.unique(choices, return_counts=True)
1000 loops, best of 3: 388 µs per loop

In [4]: %timeit np.bincount(choices, minlength=np.size(choices))
100000 loops, best of 3: 16.3 µs per loop

numpy.unique(x, return_counts=True)和之间的疯狂加速numpy.bincount(x, minlength=np.max(x))

No one suggested to use numpy.bincount(input, minlength) with minlength = np.size(input), but it seems to be a good solution, and definitely the fastest:

In [1]: choices = np.random.randint(0, 100, 10000)

In [2]: %timeit [ np.sum(choices == k) for k in range(min(choices), max(choices)+1) ]
100 loops, best of 3: 2.67 ms per loop

In [3]: %timeit np.unique(choices, return_counts=True)
1000 loops, best of 3: 388 µs per loop

In [4]: %timeit np.bincount(choices, minlength=np.size(choices))
100000 loops, best of 3: 16.3 µs per loop

That’s a crazy speedup between numpy.unique(x, return_counts=True) and numpy.bincount(x, minlength=np.max(x)) !


回答 9

怎么样len(y[y==0])len(y[y==1])

What about len(y[y==0]) and len(y[y==1]) ?


回答 10

y.tolist().count(val)

使用val 0或1

由于python列表具有本机函数count,因此在使用该函数之前将其转换为list是一个简单的解决方案。

y.tolist().count(val)

with val 0 or 1

Since a python list has a native function count, converting to list before using that function is a simple solution.


回答 11

另一个简单的解决方案可能是使用numpy.count_nonzero()

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y_nonzero_num = np.count_nonzero(y==1)
y_zero_num = np.count_nonzero(y==0)
y_nonzero_num
4
y_zero_num
8

不要让名称误导您,如果您像示例中那样将其与布尔值一起使用,就可以解决问题。

Yet another simple solution might be to use numpy.count_nonzero():

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y_nonzero_num = np.count_nonzero(y==1)
y_zero_num = np.count_nonzero(y==0)
y_nonzero_num
4
y_zero_num
8

Don’t let the name mislead you, if you use it with the boolean just like in the example, it will do the trick.


回答 12

要计算出现次数,可以使用np.unique(array, return_counts=True)

In [75]: boo = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

# use bool value `True` or equivalently `1`
In [77]: uniq, cnts = np.unique(boo, return_counts=1)
In [81]: uniq
Out[81]: array([0, 1])   #unique elements in input array are: 0, 1

In [82]: cnts
Out[82]: array([8, 4])   # 0 occurs 8 times, 1 occurs 4 times

To count the number of occurrences, you can use np.unique(array, return_counts=True):

In [75]: boo = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

# use bool value `True` or equivalently `1`
In [77]: uniq, cnts = np.unique(boo, return_counts=1)
In [81]: uniq
Out[81]: array([0, 1])   #unique elements in input array are: 0, 1

In [82]: cnts
Out[82]: array([8, 4])   # 0 occurs 8 times, 1 occurs 4 times

回答 13

我会使用np.where:

how_many_0 = len(np.where(a==0.)[0])
how_many_1 = len(np.where(a==1.)[0])

I’d use np.where:

how_many_0 = len(np.where(a==0.)[0])
how_many_1 = len(np.where(a==1.)[0])

回答 14

利用系列提供的方法:

>>> import pandas as pd
>>> y = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
>>> pd.Series(y).value_counts()
0    8
1    4
dtype: int64

take advantage of the methods offered by a Series:

>>> import pandas as pd
>>> y = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
>>> pd.Series(y).value_counts()
0    8
1    4
dtype: int64

回答 15

一个简单的一般答案是:

numpy.sum(MyArray==x)   # sum of a binary list of the occurence of x (=0 or 1) in MyArray

这将导致完整的代码作为示例

import numpy
MyArray=numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])  # array we want to search in
x=0   # the value I want to count (can be iterator, in a list, etc.)
numpy.sum(MyArray==0)   # sum of a binary list of the occurence of x in MyArray

现在,如果MyArray具有多个维度,并且您要计算行中值分布的出现次数(此后为pattern)

MyArray=numpy.array([[6, 1],[4, 5],[0, 7],[5, 1],[2, 5],[1, 2],[3, 2],[0, 2],[2, 5],[5, 1],[3, 0]])
x=numpy.array([5,1])   # the value I want to count (can be iterator, in a list, etc.)
temp = numpy.ascontiguousarray(MyArray).view(numpy.dtype((numpy.void, MyArray.dtype.itemsize * MyArray.shape[1])))  # convert the 2d-array into an array of analyzable patterns
xt=numpy.ascontiguousarray(x).view(numpy.dtype((numpy.void, x.dtype.itemsize * x.shape[0])))  # convert what you search into one analyzable pattern
numpy.sum(temp==xt)  # count of the searched pattern in the list of patterns

A general and simple answer would be:

numpy.sum(MyArray==x)   # sum of a binary list of the occurence of x (=0 or 1) in MyArray

which would result into this full code as exemple

import numpy
MyArray=numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])  # array we want to search in
x=0   # the value I want to count (can be iterator, in a list, etc.)
numpy.sum(MyArray==0)   # sum of a binary list of the occurence of x in MyArray

Now if MyArray is in multiple dimensions and you want to count the occurence of a distribution of values in line (= pattern hereafter)

MyArray=numpy.array([[6, 1],[4, 5],[0, 7],[5, 1],[2, 5],[1, 2],[3, 2],[0, 2],[2, 5],[5, 1],[3, 0]])
x=numpy.array([5,1])   # the value I want to count (can be iterator, in a list, etc.)
temp = numpy.ascontiguousarray(MyArray).view(numpy.dtype((numpy.void, MyArray.dtype.itemsize * MyArray.shape[1])))  # convert the 2d-array into an array of analyzable patterns
xt=numpy.ascontiguousarray(x).view(numpy.dtype((numpy.void, x.dtype.itemsize * x.shape[0])))  # convert what you search into one analyzable pattern
numpy.sum(temp==xt)  # count of the searched pattern in the list of patterns

回答 16

您可以使用字典理解来创建整齐的单线。可以在这里找到有关字典理解的更多信息

>>>counts = {int(value): list(y).count(value) for value in set(y)}
>>>print(counts)
{0: 8, 1: 4}

这将创建一个字典,将ndarray中的值作为键,并将值的计数分别作为键的值。

每当您要计算此格式数组中某个值的出现次数时,此方法都将起作用。

You can use dictionary comprehension to create a neat one-liner. More about dictionary comprehension can be found here

>>>counts = {int(value): list(y).count(value) for value in set(y)}
>>>print(counts)
{0: 8, 1: 4}

This will create a dictionary with the values in your ndarray as keys, and the counts of the values as the values for the keys respectively.

This will work whenever you want to count occurences of a value in arrays of this format.


回答 17

尝试这个:

a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
list(a).count(1)

Try this:

a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
list(a).count(1)

回答 18

这可以通过以下方法轻松完成

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y.tolist().count(1)

This can be done easily in the following method

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y.tolist().count(1)

回答 19

由于您的ndarray仅包含0和1,因此您可以使用sum()获得1的出现,并使用len()-sum()获得0的出现。

num_of_ones = sum(array)
num_of_zeros = len(array)-sum(array)

Since your ndarray contains only 0 and 1, you can use sum() to get the occurrence of 1s and len()-sum() to get the occurrence of 0s.

num_of_ones = sum(array)
num_of_zeros = len(array)-sum(array)

回答 20

您有一个只有1和0的特殊数组。所以一个诀窍是使用

np.mean(x)

这将为您提供数组中1s的百分比。或者,使用

np.sum(x)
np.sum(1-x)

将为您提供数组中1和0的绝对数。

You have a special array with only 1 and 0 here. So a trick is to use

np.mean(x)

which gives you the percentage of 1s in your array. Alternatively, use

np.sum(x)
np.sum(1-x)

will give you the absolute number of 1 and 0 in your array.


回答 21

dict(zip(*numpy.unique(y, return_counts=True)))

刚刚在此处复制了Seppo Enarvi的评论,这应该是一个正确的答案

dict(zip(*numpy.unique(y, return_counts=True)))

Just copied Seppo Enarvi’s comment here which deserves to be a proper answer


回答 22

它涉及更多的步骤,但是对2d数组和更复杂的过滤器也适用的更灵活的解决方案是创建一个布尔掩码,然后在掩码上使用.sum()。

>>>>y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>>>mask = y == 0
>>>>mask.sum()
8

It involves one more step, but a more flexible solution which would also work for 2d arrays and more complicated filters is to create a boolean mask and then use .sum() on the mask.

>>>>y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>>>mask = y == 0
>>>>mask.sum()
8

回答 23

如果您不想使用numpy或collections模块,则可以使用字典:

d = dict()
a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
for item in a:
    try:
        d[item]+=1
    except KeyError:
        d[item]=1

结果:

>>>d
{0: 8, 1: 4}

当然,您也可以使用if / else语句。我认为Counter函数的功能几乎相同,但这更加透明。

If you don’t want to use numpy or a collections module you can use a dictionary:

d = dict()
a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
for item in a:
    try:
        d[item]+=1
    except KeyError:
        d[item]=1

result:

>>>d
{0: 8, 1: 4}

Of course you can also use an if/else statement. I think the Counter function does almost the same thing but this is more transparant.


回答 24

对于通用条目:

x = np.array([11, 2, 3, 5, 3, 2, 16, 10, 10, 3, 11, 4, 5, 16, 3, 11, 4])
n = {i:len([j for j in np.where(x==i)[0]]) for i in set(x)}
ix = {i:[j for j in np.where(x==i)[0]] for i in set(x)}

将输出一个计数:

{2: 2, 3: 4, 4: 2, 5: 2, 10: 2, 11: 3, 16: 2}

和索引:

{2: [1, 5],
3: [2, 4, 9, 14],
4: [11, 16],
5: [3, 12],
10: [7, 8],
11: [0, 10, 15],
16: [6, 13]}

For generic entries:

x = np.array([11, 2, 3, 5, 3, 2, 16, 10, 10, 3, 11, 4, 5, 16, 3, 11, 4])
n = {i:len([j for j in np.where(x==i)[0]]) for i in set(x)}
ix = {i:[j for j in np.where(x==i)[0]] for i in set(x)}

Will output a count:

{2: 2, 3: 4, 4: 2, 5: 2, 10: 2, 11: 3, 16: 2}

And indices:

{2: [1, 5],
3: [2, 4, 9, 14],
4: [11, 16],
5: [3, 12],
10: [7, 8],
11: [0, 10, 15],
16: [6, 13]}

回答 25

这里有一些东西,通过它您可以计算出特定数字的出现次数:根据您的代码

count_of_zero = list(y [y == 0])。count(0)

打印(count_of_zero)

//根据匹配项,将有布尔值,根据True值,将返回数字0

here I have something, through which you can count the number of occurrence of a particular number: according to your code

count_of_zero=list(y[y==0]).count(0)

print(count_of_zero)

// according to the match there will be boolean values and according to True value the number 0 will be return


回答 26

如果您对最快的执行感兴趣,那么您会事先知道要查找的值,并且您的数组是一维的,否则您对展平数组上的结果感兴趣(在这种情况下,函数的输入应是np.flatten(arr)不是只arr),然后Numba是你的朋友:

import numba as nb


@nb.jit
def count_nb(arr, value):
    result = 0
    for x in arr:
        if x == value:
            result += 1
    return result

或者,对于超大型阵列,并行化可能会有所帮助:

@nb.jit(parallel=True)
def count_nbp(arr, value):
    result = 0
    for i in nb.prange(arr.size):
        if arr[i] == value:
            result += 1
    return result

对这些基准进行基准测试np.count_nonzero()(也存在创建可以避免的临时数组的问题)和np.unique()基于-的解决方案

import numpy as np


def count_np(arr, value):
    return np.count_nonzero(arr == value)
import numpy as np


def count_np2(arr, value):
    uniques, counts = np.unique(a, return_counts=True)
    counter = dict(zip(uniques, counts))
    return counter[value] if value in counter else 0 

用于使用以下命令生成的输入:

def gen_input(n, a=0, b=100):
    return np.random.randint(a, b, n)

获得以下图(图的第二行是对更快方法的放大):

bm_full bm_zoom

表明基于Numba的解决方案比NumPy的解决方案明显更快,并且对于非常大的输入,并行方法比朴素的方法要快。


完整的代码在这里

If you are interested in the fastest execution, you know in advance which value(s) to look for, and your array is 1D, or you are otherwise interested in the result on the flattened array (in which case the input of the function should be np.flatten(arr) rather than just arr), then Numba is your friend:

import numba as nb


@nb.jit
def count_nb(arr, value):
    result = 0
    for x in arr:
        if x == value:
            result += 1
    return result

or, for very large arrays where parallelization may be beneficial:

@nb.jit(parallel=True)
def count_nbp(arr, value):
    result = 0
    for i in nb.prange(arr.size):
        if arr[i] == value:
            result += 1
    return result

Benchmarking these against np.count_nonzero() (which also has a problem of creating a temporary array which may be avoided) and np.unique()-based solution

import numpy as np


def count_np(arr, value):
    return np.count_nonzero(arr == value)
import numpy as np


def count_np2(arr, value):
    uniques, counts = np.unique(a, return_counts=True)
    counter = dict(zip(uniques, counts))
    return counter[value] if value in counter else 0 

for input generated with:

def gen_input(n, a=0, b=100):
    return np.random.randint(a, b, n)

the following plots are obtained (the second row of plots is a zoom on the faster approach):

bm_full bm_zoom

Showing that Numba-based solution are noticeably faster than the NumPy counterparts, and, for very large inputs, the parallel approach is faster than the naive one.


Full code available here.


回答 27

如果使用生成器处理非常大的数组,则可以选择。令人高兴的是,这种方法对数组和列表都适用,并且您不需要任何其他程序包。此外,您没有使用太多的内存。

my_array = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
sum(1 for val in my_array if val==0)
Out: 8

if you are dealing with very large arrays using generators could be an option. The nice thing here it that this approach works fine for both arrays and lists and you dont need any additional package. Additionally, you are not using that much memory.

my_array = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
sum(1 for val in my_array if val==0)
Out: 8

回答 28

Numpy为此提供了一个模块。只是一个小技巧。将您的输入数组作为垃圾箱。

numpy.histogram(y, bins=y)

输出是2个数组。一个带有值本身,另一个带有相应的频率。

Numpy has a module for this. Just a small hack. Put your input array as bins.

numpy.histogram(y, bins=y)

The output are 2 arrays. One with the values itself, other with the corresponding frequencies.


回答 29

using numpy.count

$ a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]

$ np.count(a, 1)
using numpy.count

$ a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]

$ np.count(a, 1)

计算字符串中字符的出现次数

问题:计算字符串中字符的出现次数

计算字符串中字符出现次数的最简单方法是什么?

例如计算'a'出现在其中的次数'Mary had a little lamb'

What’s the simplest way to count the number of occurrences of a character in a string?

e.g. count the number of times 'a' appears in 'Mary had a little lamb'


回答 0

str.count(sub [,start [,end]])

返回sub范围中的子字符串不重叠的次数[start, end]。可选参数startend并按片表示法解释。

>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.

>>> sentence = 'Mary had a little lamb'
>>> sentence.count('a')
4

回答 1

您可以使用count()

>>> 'Mary had a little lamb'.count('a')
4

You can use count() :

>>> 'Mary had a little lamb'.count('a')
4

回答 2

正如其他答案所说,使用字符串方法count()可能是最简单的方法,但是如果您经常这样做,请查看collections.Counter

from collections import Counter
my_str = "Mary had a little lamb"
counter = Counter(my_str)
print counter['a']

As other answers said, using the string method count() is probably the simplest, but if you’re doing this frequently, check out collections.Counter:

from collections import Counter
my_str = "Mary had a little lamb"
counter = Counter(my_str)
print counter['a']

回答 3

正则表达式可能吗?

import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))

Regular expressions maybe?

import re
my_string = "Mary had a little lamb"
len(re.findall("a", my_string))

回答 4

myString.count('a');

更多信息在这里

myString.count('a');

more info here


回答 5

Python-3.x:

"aabc".count("a")

str.count(sub [,start [,end]])

返回子字符串sub在[start,end]范围内不重叠的次数。可选参数start和end解释为切片表示法。

Python-3.x:

"aabc".count("a")

str.count(sub[, start[, end]])

Return the number of non-overlapping occurrences of substring sub in the range [start, end]. Optional arguments start and end are interpreted as in slice notation.


回答 6

str.count(a)是计算字符串中单个字符的最佳解决方案。但是,如果您需要计算更多的字符,则必须读取整个字符串与要计算的字符一样多的次数。

这项工作的更好方法是:

from collections import defaultdict

text = 'Mary had a little lamb'
chars = defaultdict(int)

for char in text:
    chars[char] += 1

因此,您将拥有一个dict,它返回字符串中每个字母(0如果不存在)的出现次数。

>>>chars['a']
4
>>>chars['x']
0

对于不区分大小写的计数器,您可以通过子类化来覆盖mutator和accessor方法defaultdict(基类的方法是只读的):

class CICounter(defaultdict):
    def __getitem__(self, k):
        return super().__getitem__(k.lower())

    def __setitem__(self, k, v):
        super().__setitem__(k.lower(), v)


chars = CICounter(int)

for char in text:
    chars[char] += 1

>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0

str.count(a) is the best solution to count a single character in a string. But if you need to count more characters you would have to read the whole string as many times as characters you want to count.

A better approach for this job would be:

from collections import defaultdict

text = 'Mary had a little lamb'
chars = defaultdict(int)

for char in text:
    chars[char] += 1

So you’ll have a dict that returns the number of occurrences of every letter in the string and 0 if it isn’t present.

>>>chars['a']
4
>>>chars['x']
0

For a case insensitive counter you could override the mutator and accessor methods by subclassing defaultdict (base class’ ones are read-only):

class CICounter(defaultdict):
    def __getitem__(self, k):
        return super().__getitem__(k.lower())

    def __setitem__(self, k, v):
        super().__setitem__(k.lower(), v)


chars = CICounter(int)

for char in text:
    chars[char] += 1

>>>chars['a']
4
>>>chars['M']
2
>>>chars['x']
0

回答 7

这个简单而直接的功能可能会有所帮助:

def check_freq(x):
    freq = {}
    for c in x:
       freq[c] = str.count(c)
    return freq

check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}

This easy and straight forward function might help:

def check_freq(x):
    freq = {}
    for c in x:
       freq[c] = str.count(c)
    return freq

check_freq("abbabcbdbabdbdbabababcbcbab")
{'a': 7, 'b': 14, 'c': 3, 'd': 3}

回答 8

如果要区分大小写(当然还有正则表达式的全部功能),则正则表达式非常有用。

my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m")   # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))

请注意,正则表达式版本的运行时间大约是其十倍,这仅在my_string非常长或代码处于深循环内时才可能是一个问题。

Regular expressions are very useful if you want case-insensitivity (and of course all the power of regex).

my_string = "Mary had a little lamb"
# simplest solution, using count, is case-sensitive
my_string.count("m")   # yields 1
import re
# case-sensitive with regex
len(re.findall("m", my_string))
# three ways to get case insensitivity - all yield 2
len(re.findall("(?i)m", my_string))
len(re.findall("m|M", my_string))
len(re.findall(re.compile("m",re.IGNORECASE), my_string))

Be aware that the regex version takes on the order of ten times as long to run, which will likely be an issue only if my_string is tremendously long, or the code is inside a deep loop.


回答 9

a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
    print key, a.count(key)
a = 'have a nice day'
symbol = 'abcdefghijklmnopqrstuvwxyz'
for key in symbol:
    print key, a.count(key)

回答 10

str = "count a character occurance"

List = list(str)
print (List)
Uniq = set(List)
print (Uniq)

for key in Uniq:
    print (key, str.count(key))
str = "count a character occurance"

List = list(str)
print (List)
Uniq = set(List)
print (Uniq)

for key in Uniq:
    print (key, str.count(key))

回答 11

另一种方式来获得所有的字符数不使用Counter()count和正则表达式

counts_dict = {}
for c in list(sentence):
  if c not in counts_dict:
    counts_dict[c] = 0
  counts_dict[c] += 1

for key, value in counts_dict.items():
    print(key, value)

An alternative way to get all the character counts without using Counter(), count and regex

counts_dict = {}
for c in list(sentence):
  if c not in counts_dict:
    counts_dict[c] = 0
  counts_dict[c] += 1

for key, value in counts_dict.items():
    print(key, value)

回答 12

count绝对是计算字符串中字符出现次数的最简洁,最有效的方法,但是我尝试使用解决方案lambda,例如:

sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

这将导致:

4

同样,这样做还有一个好处,如果该句子是包含与上述相同字符的子字符串列表,则由于使用,这也会给出正确的结果in。看一看 :

sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

这也导致:

4

当然,这仅在检查单个字符的出现(例如'a'在这种特殊情况下)时才起作用。

count is definitely the most concise and efficient way of counting the occurrence of a character in a string but I tried to come up with a solution using lambda, something like this :

sentence = 'Mary had a little lamb'
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

This will result in :

4

Also, there is one more advantage to this is if the sentence is a list of sub-strings containing same characters as above, then also this gives the correct result because of the use of in. Have a look :

sentence = ['M', 'ar', 'y', 'had', 'a', 'little', 'l', 'am', 'b']
sum(map(lambda x : 1 if 'a' in x else 0, sentence))

This also results in :

4

But Of-course this will work only when checking occurrence of single character such as 'a' in this particular case.


回答 13

“不使用count来查找想要的字符串中的字符”方法。

import re

def count(s, ch):

   pass

def main():

   s = raw_input ("Enter strings what you like, for example, 'welcome': ")  

   ch = raw_input ("Enter you want count characters, but best result to find one character: " )

   print ( len (re.findall ( ch, s ) ) )

main()

“Without using count to find you want character in string” method.

import re

def count(s, ch):

   pass

def main():

   s = raw_input ("Enter strings what you like, for example, 'welcome': ")  

   ch = raw_input ("Enter you want count characters, but best result to find one character: " )

   print ( len (re.findall ( ch, s ) ) )

main()

回答 14

我是熊猫图书馆的粉丝,尤其是value_counts()方法。您可以使用它来计算字符串中每个字符的出现:

>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
     8
a    5
e    4
t    4
o    3
n    3
s    3
d    3
l    3
u    2
i    2
r    2
v    2
`    2
h    2
p    1
b    1
I    1
m    1
(    1
y    1
_    1
)    1
c    1
dtype: int64

I am a fan of the pandas library, in particular the value_counts() method. You could use it to count the occurrence of each character in your string:

>>> import pandas as pd
>>> phrase = "I love the pandas library and its `value_counts()` method"
>>> pd.Series(list(phrase)).value_counts()
     8
a    5
e    4
t    4
o    3
n    3
s    3
d    3
l    3
u    2
i    2
r    2
v    2
`    2
h    2
p    1
b    1
I    1
m    1
(    1
y    1
_    1
)    1
c    1
dtype: int64

回答 15

spam = 'have a nice day'
var = 'd'


def count(spam, var):
    found = 0
    for key in spam:
        if key == var:
            found += 1
    return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))
spam = 'have a nice day'
var = 'd'


def count(spam, var):
    found = 0
    for key in spam:
        if key == var:
            found += 1
    return found
count(spam, var)
print 'count %s is: %s ' %(var, count(spam, var))

回答 16

Python 3

有两种方法可以实现此目的:

1)内置函数count()

sentence = 'Mary had a little lamb'
print(sentence.count('a'))`

2)不使用功能

sentence = 'Mary had a little lamb'    
count = 0

for i in sentence:
    if i == "a":
        count = count + 1

print(count)

Python 3

Ther are two ways to achieve this:

1) With built-in function count()

sentence = 'Mary had a little lamb'
print(sentence.count('a'))`

2) Without using a function

sentence = 'Mary had a little lamb'    
count = 0

for i in sentence:
    if i == "a":
        count = count + 1

print(count)

回答 17

仅此恕我直言-您可以添加上限或下限方法

def count_letter_in_str(string,letter):
    return string.count(letter)

No more than this IMHO – you can add the upper or lower methods

def count_letter_in_str(string,letter):
    return string.count(letter)

如何根据对象的属性对对象列表进行排序?

问题:如何根据对象的属性对对象列表进行排序?

我有一个Python对象列表,我想按对象本身的属性对其进行排序。该列表如下所示:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

每个对象都有一个计数:

>>> ut[1].count
1L

我需要按递减计数对列表进行排序。

我已经看到了几种方法,但是我正在寻找Python的最佳实践。

I’ve got a list of Python objects that I’d like to sort by an attribute of the objects themselves. The list looks like:

>>> ut
[<Tag: 128>, <Tag: 2008>, <Tag: <>, <Tag: actionscript>, <Tag: addresses>,
 <Tag: aes>, <Tag: ajax> ...]

Each object has a count:

>>> ut[1].count
1L

I need to sort the list by number of counts descending.

I’ve seen several methods for this, but I’m looking for best practice in Python.


回答 0

# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

有关按键排序的更多信息。

# To sort the list in place...
ut.sort(key=lambda x: x.count, reverse=True)

# To return a new list, use the sorted() built-in function...
newlist = sorted(ut, key=lambda x: x.count, reverse=True)

More on sorting by keys.


回答 1

可以使用最快的方法,尤其是在您的列表中有很多记录的情况下operator.attrgetter("count")。但是,它可以在预操作者版本的Python上运行,因此具有后备机制会很好。然后,您可能需要执行以下操作:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place

A way that can be fastest, especially if your list has a lot of records, is to use operator.attrgetter("count"). However, this might run on an pre-operator version of Python, so it would be nice to have a fallback mechanism. You might want to do the following, then:

try: import operator
except ImportError: keyfun= lambda x: x.count # use a lambda if no operator module
else: keyfun= operator.attrgetter("count") # use operator since it's faster than lambda

ut.sort(key=keyfun, reverse=True) # sort in-place

回答 2

读者应注意,key =方法:

ut.sort(key=lambda x: x.count, reverse=True)

比向对象添加丰富的比较运算符快许多倍。我很惊讶地阅读了这篇文章(“ Python in a Nutshell”的第485页)。您可以通过在这个小程序上运行测试来确认这一点:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

我的非常少的测试表明,第一种方法的运行速度要慢10倍以上,但书中说,一般而言,它仅慢5倍左右。他们说的原因是由于python(timsort)中使用了高度优化的排序算法。

仍然,.sort(lambda)比普通的旧.sort()快是很奇怪的。我希望他们能解决这个问题。

Readers should notice that the key= method:

ut.sort(key=lambda x: x.count, reverse=True)

is many times faster than adding rich comparison operators to the objects. I was surprised to read this (page 485 of “Python in a Nutshell”). You can confirm this by running tests on this little program:

#!/usr/bin/env python
import random

class C:
    def __init__(self,count):
        self.count = count

    def __cmp__(self,other):
        return cmp(self.count,other.count)

longList = [C(random.random()) for i in xrange(1000000)] #about 6.1 secs
longList2 = longList[:]

longList.sort() #about 52 - 6.1 = 46 secs
longList2.sort(key = lambda c: c.count) #about 9 - 6.1 = 3 secs

My, very minimal, tests show the first sort is more than 10 times slower, but the book says it is only about 5 times slower in general. The reason they say is due to the highly optimizes sort algorithm used in python (timsort).

Still, its very odd that .sort(lambda) is faster than plain old .sort(). I hope they fix that.


回答 3

面向对象的方法

最好将对象排序逻辑(如果适用)设置为类的属性,而不是在每个实例中都要求进行排序。

这样可以确保一致性,并且不需要样板代码。

至少,您应该指定__eq____lt__操作此功能。然后使用sorted(list_of_objects)

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]

Object-oriented approach

It’s good practice to make object sorting logic, if applicable, a property of the class rather than incorporated in each instance the ordering is required.

This ensures consistency and removes the need for boilerplate code.

At a minimum, you should specify __eq__ and __lt__ operations for this to work. Then just use sorted(list_of_objects).

class Card(object):

    def __init__(self, rank, suit):
        self.rank = rank
        self.suit = suit

    def __eq__(self, other):
        return self.rank == other.rank and self.suit == other.suit

    def __lt__(self, other):
        return self.rank < other.rank

hand = [Card(10, 'H'), Card(2, 'h'), Card(12, 'h'), Card(13, 'h'), Card(14, 'h')]
hand_order = [c.rank for c in hand]  # [10, 2, 12, 13, 14]

hand_sorted = sorted(hand)
hand_sorted_order = [c.rank for c in hand_sorted]  # [2, 10, 12, 13, 14]

回答 4

from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)
from operator import attrgetter
ut.sort(key = attrgetter('count'), reverse = True)

回答 5

它看起来很像Django ORM模型实例的列表。

为什么不对这样的查询进行排序:

ut = Tag.objects.order_by('-count')

It looks much like a list of Django ORM model instances.

Why not sort them on query like this:

ut = Tag.objects.order_by('-count')

回答 6

将丰富的比较运算符添加到对象类,然后使用列表的sort()方法。
参见python中的丰富比较


更新:尽管此方法可行,但我认为Triptych的解决方案更简单,因此更适合您的情况。

Add rich comparison operators to the object class, then use sort() method of the list.
See rich comparison in python.


Update: Although this method would work, I think solution from Triptych is better suited to your case because way simpler.


回答 7

如果要排序的属性property,则可以避免导入,operator.attrgetter而可以使用属性的fget方法。

例如,对于Circle具有属性的类,radius我们可以circles按如下所示对半径列表进行排序:

result = sorted(circles, key=Circle.radius.fget)

这不是最知名的功能,但通常使我免于导入的麻烦。

If the attribute you want to sort by is a property, then you can avoid importing operator.attrgetter and use the property’s fget method instead.

For example, for a class Circle with a property radius we could sort a list of circles by radii as follows:

result = sorted(circles, key=Circle.radius.fget)

This is not the most well-known feature but often saves me a line with the import.


如何计算列表项的出现?

问题:如何计算列表项的出现?

给定一个项目,我如何计算它在Python列表中的出现次数?

Given an item, how can I count its occurrences in a list in Python?


回答 0

如果只需要一项的计数,请使用以下count方法:

>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3

如果您要计算多个项目,请不要使用它。count循环调用需要为每个count调用单独遍历列表,这可能会对性能造成灾难性影响。如果您要计算所有项目,甚至只是多个项目,请使用Counter,如其他答案中所述。

If you only want one item’s count, use the count method:

>>> [1, 2, 3, 4, 1, 4, 1].count(1)
3

Don’t use this if you want to count multiple items. Calling count in a loop requires a separate pass over the list for every count call, which can be catastrophic for performance. If you want to count all items, or even just multiple items, use Counter, as explained in the other answers.


回答 1

使用Counter如果你正在使用Python 2.7或3.x和你想出现的每个元素的数量:

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})

Use Counter if you are using Python 2.7 or 3.x and you want the number of occurrences for each element:

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> Counter(z)
Counter({'blue': 3, 'red': 2, 'yellow': 1})

回答 2

计算列表中一项的出现

仅计算一个列表项的出现次数即可 count()

>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2

计算列表中所有项目的出现次数也称为“对列表进行计数”或创建计数计数器。

用count()计算所有项目

要计算l一个项目的出现次数,只需使用列表理解和count()方法

[[x,l.count(x)] for x in set(l)]

(或类似的字典dict((x,l.count(x)) for x in set(l))

例:

>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}

用Counter()计算所有项目

或者,库中有更快的Countercollections

Counter(l)

例:

>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})

计数器快多少?

我检查Counter了清单的计算速度。我尝试了两种方法的几个值,n并且看起来Counter快了大约2的常数。

这是我使用的脚本:

from __future__ import print_function
import timeit

t1=timeit.Timer('Counter(l)', \
                'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
                'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count():   ", t2.repeat(repeat=3,number=10000)

并输出:

Counter():  [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count():    [7.779430688009597, 7.962715800967999, 8.420845870045014]

Counting the occurrences of one item in a list

For counting the occurrences of just one list item you can use count()

>>> l = ["a","b","b"]
>>> l.count("a")
1
>>> l.count("b")
2

Counting the occurrences of all items in a list is also known as “tallying” a list, or creating a tally counter.

Counting all items with count()

To count the occurrences of items in l one can simply use a list comprehension and the count() method

[[x,l.count(x)] for x in set(l)]

(or similarly with a dictionary dict((x,l.count(x)) for x in set(l)))

Example:

>>> l = ["a","b","b"]
>>> [[x,l.count(x)] for x in set(l)]
[['a', 1], ['b', 2]]
>>> dict((x,l.count(x)) for x in set(l))
{'a': 1, 'b': 2}

Counting all items with Counter()

Alternatively, there’s the faster Counter class from the collections library

Counter(l)

Example:

>>> l = ["a","b","b"]
>>> from collections import Counter
>>> Counter(l)
Counter({'b': 2, 'a': 1})

How much faster is Counter?

I checked how much faster Counter is for tallying lists. I tried both methods out with a few values of n and it appears that Counter is faster by a constant factor of approximately 2.

Here is the script I used:

from __future__ import print_function
import timeit

t1=timeit.Timer('Counter(l)', \
                'import random;import string;from collections import Counter;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

t2=timeit.Timer('[[x,l.count(x)] for x in set(l)]',
                'import random;import string;n=1000;l=[random.choice(string.ascii_letters) for x in range(n)]'
                )

print("Counter(): ", t1.repeat(repeat=3,number=10000))
print("count():   ", t2.repeat(repeat=3,number=10000)

And the output:

Counter():  [0.46062711701961234, 0.4022796869976446, 0.3974247490405105]
count():    [7.779430688009597, 7.962715800967999, 8.420845870045014]

回答 3

获取字典中每个项目出现次数的另一种方法是:

dict((i, a.count(i)) for i in a)

Another way to get the number of occurrences of each item, in a dictionary:

dict((i, a.count(i)) for i in a)

回答 4

list.count(x)返回x出现在列表中的次数

请参阅:http : //docs.python.org/tutorial/datastructures.html#more-on-lists

list.count(x) returns the number of times x appears in a list

see: http://docs.python.org/tutorial/datastructures.html#more-on-lists


回答 5

给定一个项目,我如何计算它在Python列表中的出现次数?

这是一个示例列表:

>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']

list.count

list.count方法

>>> l.count('b')
4

这适用于任何列表。元组也有这种方法:

>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6

collections.Counter

然后是collections.Counter。您可以将任何可迭代的对象转储到Counter中,而不仅仅是列表,并且Counter将保留元素计数的数据结构。

用法:

>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4

计数器基于Python字典,它们的键是元素,因此键必须是可哈希的。它们基本上就像允许多余元素进入的集合。

的进一步使用 collections.Counter

您可以从计数器中添加或减去可迭代项:

>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4

您还可以使用计数器进行多组操作:

>>> c2 = Counter(list('aabbxyz'))
>>> c - c2                   # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2                   # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2                   # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2                   # set intersection
Counter({'a': 2, 'b': 2})

为什么不熊猫呢?

另一个答案表明:

为什么不使用熊猫?

熊猫是一个公共库,但不在标准库中。根据需要添加它并非易事。

在列表对象本身以及标准库中都有针对此用例的内置解决方案。

如果您的项目不再需要熊猫,那么仅将其作为此功能的要求是愚蠢的。

Given an item, how can I count its occurrences in a list in Python?

Here’s an example list:

>>> l = list('aaaaabbbbcccdde')
>>> l
['a', 'a', 'a', 'a', 'a', 'b', 'b', 'b', 'b', 'c', 'c', 'c', 'd', 'd', 'e']

list.count

There’s the list.count method

>>> l.count('b')
4

This works fine for any list. Tuples have this method as well:

>>> t = tuple('aabbbffffff')
>>> t
('a', 'a', 'b', 'b', 'b', 'f', 'f', 'f', 'f', 'f', 'f')
>>> t.count('f')
6

collections.Counter

And then there’s collections.Counter. You can dump any iterable into a Counter, not just a list, and the Counter will retain a data structure of the counts of the elements.

Usage:

>>> from collections import Counter
>>> c = Counter(l)
>>> c['b']
4

Counters are based on Python dictionaries, their keys are the elements, so the keys need to be hashable. They are basically like sets that allow redundant elements into them.

Further usage of collections.Counter

You can add or subtract with iterables from your counter:

>>> c.update(list('bbb'))
>>> c['b']
7
>>> c.subtract(list('bbb'))
>>> c['b']
4

And you can do multi-set operations with the counter as well:

>>> c2 = Counter(list('aabbxyz'))
>>> c - c2                   # set difference
Counter({'a': 3, 'c': 3, 'b': 2, 'd': 2, 'e': 1})
>>> c + c2                   # addition of all elements
Counter({'a': 7, 'b': 6, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c | c2                   # set union
Counter({'a': 5, 'b': 4, 'c': 3, 'd': 2, 'e': 1, 'y': 1, 'x': 1, 'z': 1})
>>> c & c2                   # set intersection
Counter({'a': 2, 'b': 2})

Why not pandas?

Another answer suggests:

Why not use pandas?

Pandas is a common library, but it’s not in the standard library. Adding it as a requirement is non-trivial.

There are builtin solutions for this use-case in the list object itself as well as in the standard library.

If your project does not already require pandas, it would be foolish to make it a requirement just for this functionality.


回答 6

我已经将所有建议的解决方案(以及一些新的解决方案)与perfplot(我的一个小项目)进行了比较。

计数一个项目

对于足够大的阵列,事实证明

numpy.sum(numpy.array(a) == 1) 

比其他解决方案快一点。

在此处输入图片说明

计算所有项目

由于之前建立的

numpy.bincount(a)

是你想要的。

在此处输入图片说明


复制代码的代码:

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

2。

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

I’ve compared all suggested solutions (and a few new ones) with perfplot (a small project of mine).

Counting one item

For large enough arrays, it turns out that

numpy.sum(numpy.array(a) == 1) 

is slightly faster than the other solutions.

enter image description here

Counting all items

As established before,

numpy.bincount(a)

is what you want.

enter image description here


Code to reproduce the plots:

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

2.

from collections import Counter
from collections import defaultdict
import numpy
import operator
import pandas
import perfplot


def counter(a):
    return Counter(a)


def count(a):
    return dict((i, a.count(i)) for i in set(a))


def bincount(a):
    return numpy.bincount(a)


def pandas_value_counts(a):
    return pandas.Series(a).value_counts()


def occur_dict(a):
    d = {}
    for i in a:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
    return d


def count_unsorted_list_items(items):
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


def operator_countof(a):
    return dict((i, operator.countOf(a, i)) for i in set(a))


perfplot.show(
    setup=lambda n: list(numpy.random.randint(0, 100, n)),
    n_range=[2**k for k in range(20)],
    kernels=[
        counter, count, bincount, pandas_value_counts, occur_dict,
        count_unsorted_list_items, operator_countof
        ],
    equality_check=None,
    logx=True,
    logy=True,
    )

回答 7

如果您想一次计算所有值,则可以使用numpy数组快速完成bincount,如下所示

import numpy as np
a = np.array([1, 2, 3, 4, 1, 4, 1])
np.bincount(a)

这使

>>> array([0, 3, 1, 1, 2])

If you want to count all values at once you can do it very fast using numpy arrays and bincount as follows

import numpy as np
a = np.array([1, 2, 3, 4, 1, 4, 1])
np.bincount(a)

which gives

>>> array([0, 3, 1, 1, 2])

回答 8

如果可以使用pandas,则value_counts可以在那里进行救援。

>>> import pandas as pd
>>> a = [1, 2, 3, 4, 1, 4, 1]
>>> pd.Series(a).value_counts()
1    3
4    2
3    1
2    1
dtype: int64

它还会根据频率自动对结果进行排序。

如果您希望结果在列表列表中,请执行以下操作

>>> pd.Series(a).value_counts().reset_index().values.tolist()
[[1, 3], [4, 2], [3, 1], [2, 1]]

If you can use pandas, then value_counts is there for rescue.

>>> import pandas as pd
>>> a = [1, 2, 3, 4, 1, 4, 1]
>>> pd.Series(a).value_counts()
1    3
4    2
3    1
2    1
dtype: int64

It automatically sorts the result based on frequency as well.

If you want the result to be in a list of list, do as below

>>> pd.Series(a).value_counts().reset_index().values.tolist()
[[1, 3], [4, 2], [3, 1], [2, 1]]

回答 9

为什么不使用熊猫呢?

import pandas as pd

l = ['a', 'b', 'c', 'd', 'a', 'd', 'a']

# converting the list to a Series and counting the values
my_count = pd.Series(l).value_counts()
my_count

输出:

a    3
d    2
b    1
c    1
dtype: int64

如果要查找特定元素的数量,请说a,请尝试:

my_count['a']

输出:

3

Why not using Pandas?

import pandas as pd

l = ['a', 'b', 'c', 'd', 'a', 'd', 'a']

# converting the list to a Series and counting the values
my_count = pd.Series(l).value_counts()
my_count

Output:

a    3
d    2
b    1
c    1
dtype: int64

If you are looking for a count of a particular element, say a, try:

my_count['a']

Output:

3

回答 10

我今天遇到了这个问题,在考虑检查SO之前推出了自己的解决方案。这个:

dict((i,a.count(i)) for i in a)

对于大型列表,真的非常慢。我的解决方案

def occurDict(items):
    d = {}
    for i in items:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
return d

实际上比Counter解决方案要快一点,至少对于Python 2.7而言。

I had this problem today and rolled my own solution before I thought to check SO. This:

dict((i,a.count(i)) for i in a)

is really, really slow for large lists. My solution

def occurDict(items):
    d = {}
    for i in items:
        if i in d:
            d[i] = d[i]+1
        else:
            d[i] = 1
return d

is actually a bit faster than the Counter solution, at least for Python 2.7.


回答 11

# Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
from collections import defaultdict
def count_unsorted_list_items(items):
    """
    :param items: iterable of hashable items to count
    :type items: iterable

    :returns: dict of counts like Py2.7 Counter
    :rtype: dict
    """
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


# Python >= 2.2 (generators)
def count_sorted_list_items(items):
    """
    :param items: sorted iterable of items to count
    :type items: sorted iterable

    :returns: generator of (item, count) tuples
    :rtype: generator
    """
    if not items:
        return
    elif len(items) == 1:
        yield (items[0], 1)
        return
    prev_item = items[0]
    count = 1
    for item in items[1:]:
        if prev_item == item:
            count += 1
        else:
            yield (prev_item, count)
            count = 1
            prev_item = item
    yield (item, count)
    return


import unittest
class TestListCounters(unittest.TestCase):
    def test_count_unsorted_list_items(self):
        D = (
            ([], []),
            ([2], [(2,1)]),
            ([2,2], [(2,2)]),
            ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
            )
        for inp, exp_outp in D:
            counts = count_unsorted_list_items(inp) 
            print inp, exp_outp, counts
            self.assertEqual(counts, dict( exp_outp ))

        inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
        self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )


    def test_count_sorted_list_items(self):
        D = (
            ([], []),
            ([2], [(2,1)]),
            ([2,2], [(2,2)]),
            ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
            )
        for inp, exp_outp in D:
            counts = list( count_sorted_list_items(inp) )
            print inp, exp_outp, counts
            self.assertEqual(counts, exp_outp)

        inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
        self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
        # ... [(2,2), (4,1), (2,1)]
# Python >= 2.6 (defaultdict) && < 2.7 (Counter, OrderedDict)
from collections import defaultdict
def count_unsorted_list_items(items):
    """
    :param items: iterable of hashable items to count
    :type items: iterable

    :returns: dict of counts like Py2.7 Counter
    :rtype: dict
    """
    counts = defaultdict(int)
    for item in items:
        counts[item] += 1
    return dict(counts)


# Python >= 2.2 (generators)
def count_sorted_list_items(items):
    """
    :param items: sorted iterable of items to count
    :type items: sorted iterable

    :returns: generator of (item, count) tuples
    :rtype: generator
    """
    if not items:
        return
    elif len(items) == 1:
        yield (items[0], 1)
        return
    prev_item = items[0]
    count = 1
    for item in items[1:]:
        if prev_item == item:
            count += 1
        else:
            yield (prev_item, count)
            count = 1
            prev_item = item
    yield (item, count)
    return


import unittest
class TestListCounters(unittest.TestCase):
    def test_count_unsorted_list_items(self):
        D = (
            ([], []),
            ([2], [(2,1)]),
            ([2,2], [(2,2)]),
            ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
            )
        for inp, exp_outp in D:
            counts = count_unsorted_list_items(inp) 
            print inp, exp_outp, counts
            self.assertEqual(counts, dict( exp_outp ))

        inp, exp_outp = UNSORTED_WIN = ([2,2,4,2], [(2,3), (4,1)])
        self.assertEqual(dict( exp_outp ), count_unsorted_list_items(inp) )


    def test_count_sorted_list_items(self):
        D = (
            ([], []),
            ([2], [(2,1)]),
            ([2,2], [(2,2)]),
            ([2,2,2,2,3,3,5,5], [(2,4), (3,2), (5,2)]),
            )
        for inp, exp_outp in D:
            counts = list( count_sorted_list_items(inp) )
            print inp, exp_outp, counts
            self.assertEqual(counts, exp_outp)

        inp, exp_outp = UNSORTED_FAIL = ([2,2,4,2], [(2,3), (4,1)])
        self.assertEqual(exp_outp, list( count_sorted_list_items(inp) ))
        # ... [(2,2), (4,1), (2,1)]

回答 12

以下是三种解决方案:

最快的是使用for循环并将其存储在Dict中。

import time
from collections import Counter


def countElement(a):
    g = {}
    for i in a:
        if i in g: 
            g[i] +=1
        else: 
            g[i] =1
    return g


z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4]


#Solution 1 - Faster
st = time.monotonic()
for i in range(1000000):
    b = countElement(z)
et = time.monotonic()
print(b)
print('Simple for loop and storing it in dict - Duration: {}'.format(et - st))

#Solution 2 - Fast
st = time.monotonic()
for i in range(1000000):
    a = Counter(z)
et = time.monotonic()
print (a)
print('Using collections.Counter - Duration: {}'.format(et - st))

#Solution 3 - Slow
st = time.monotonic()
for i in range(1000000):
    g = dict([(i, z.count(i)) for i in set(z)])
et = time.monotonic()
print(g)
print('Using list comprehension - Duration: {}'.format(et - st))

结果

#Solution 1 - Faster
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3}
Simple for loop and storing it in dict - Duration: 12.032000000000153
#Solution 2 - Fast
Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1})
Using collections.Counter - Duration: 15.889999999999418
#Solution 3 - Slow
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1}
Using list comprehension - Duration: 33.0

Below are the three solutions:

Fastest is using a for loop and storing it in a Dict.

import time
from collections import Counter


def countElement(a):
    g = {}
    for i in a:
        if i in g: 
            g[i] +=1
        else: 
            g[i] =1
    return g


z = [1,1,1,1,2,2,2,2,3,3,4,5,5,234,23,3,12,3,123,12,31,23,13,2,4,23,42,42,34,234,23,42,34,23,423,42,34,23,423,4,234,23,42,34,23,4,23,423,4,23,4]


#Solution 1 - Faster
st = time.monotonic()
for i in range(1000000):
    b = countElement(z)
et = time.monotonic()
print(b)
print('Simple for loop and storing it in dict - Duration: {}'.format(et - st))

#Solution 2 - Fast
st = time.monotonic()
for i in range(1000000):
    a = Counter(z)
et = time.monotonic()
print (a)
print('Using collections.Counter - Duration: {}'.format(et - st))

#Solution 3 - Slow
st = time.monotonic()
for i in range(1000000):
    g = dict([(i, z.count(i)) for i in set(z)])
et = time.monotonic()
print(g)
print('Using list comprehension - Duration: {}'.format(et - st))

Result

#Solution 1 - Faster
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 234: 3, 23: 10, 12: 2, 123: 1, 31: 1, 13: 1, 42: 5, 34: 4, 423: 3}
Simple for loop and storing it in dict - Duration: 12.032000000000153
#Solution 2 - Fast
Counter({23: 10, 4: 6, 2: 5, 42: 5, 1: 4, 3: 4, 34: 4, 234: 3, 423: 3, 5: 2, 12: 2, 123: 1, 31: 1, 13: 1})
Using collections.Counter - Duration: 15.889999999999418
#Solution 3 - Slow
{1: 4, 2: 5, 3: 4, 4: 6, 5: 2, 34: 4, 423: 3, 234: 3, 42: 5, 12: 2, 13: 1, 23: 10, 123: 1, 31: 1}
Using list comprehension - Duration: 33.0

回答 13

所有元素的计数 itertools.groupby()

获取列表中所有元素的计数的另一种可能性是使用itertools.groupby()

具有“重复”计数

from itertools import groupby

L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c']  # Input list

counts = [(i, len(list(c))) for i,c in groupby(L)]      # Create value-count pairs as list of tuples 
print(counts)

退货

[('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]

请注意,它是如何将前三个组合在一起a作为第一组的,而其他组合a则位于列表的下方。发生这种情况是因为未对输入列表L进行排序。如果组实际上应该分开,那么有时这可能是一个好处。

具有独特的计数

如果需要唯一的组计数,只需对输入列表进行排序:

counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
print(counts)

退货

[('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]

注意:为了创建唯一计数,与groupby解决方案相比,许多其他答案都提供了更轻松,更易读的代码。但是这里显示它与重复计数示例相似。

Count of all elements with itertools.groupby()

Antoher possiblity for getting the count of all elements in the list could be by means of itertools.groupby().

With “duplicate” counts

from itertools import groupby

L = ['a', 'a', 'a', 't', 'q', 'a', 'd', 'a', 'd', 'c']  # Input list

counts = [(i, len(list(c))) for i,c in groupby(L)]      # Create value-count pairs as list of tuples 
print(counts)

Returns

[('a', 3), ('t', 1), ('q', 1), ('a', 1), ('d', 1), ('a', 1), ('d', 1), ('c', 1)]

Notice how it combined the first three a‘s as the first group, while other groups of a are present further down the list. This happens because the input list L was not sorted. This can be a benefit sometimes if the groups should in fact be separate.

With unique counts

If unique group counts are desired, just sort the input list:

counts = [(i, len(list(c))) for i,c in groupby(sorted(L))]
print(counts)

Returns

[('a', 5), ('c', 1), ('d', 2), ('q', 1), ('t', 1)]

Note: For creating unique counts, many of the other answers provide easier and more readable code compared to the groupby solution. But it is shown here to draw a parallel to the duplicate count example.


回答 14

建议使用numpy的bincount,但是它仅适用于具有非负整数的一维数组。此外,结果数组可能会造成混淆(它包含原始列表的最小值到最大值的整数的出现,并将丢失的整数设置为0)。

使用numpy更好的方法是使用属性设置为True 的唯一函数return_counts。它返回一个元组,该元组具有唯一值的数组和每个唯一值的出现的数组。

# a = [1, 1, 0, 2, 1, 0, 3, 3]
a_uniq, counts = np.unique(a, return_counts=True)  # array([0, 1, 2, 3]), array([2, 3, 1, 2]

然后我们可以将它们配对为

dict(zip(a_uniq, counts))  # {0: 2, 1: 3, 2: 1, 3: 2}

它还可以与其他数据类型和“ 2d列表”一起使用,例如

>>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
>>> dict(zip(*np.unique(a, return_counts=True)))
{'a': 3, 'b': 3, 'c': 2}

It was suggested to use numpy’s bincount, however it works only for 1d arrays with non-negative integers. Also, the resulting array might be confusing (it contains the occurrences of the integers from min to max of the original list, and sets to 0 the missing integers).

A better way to do it with numpy is to use the unique function with the attribute return_counts set to True. It returns a tuple with an array of the unique values and an array of the occurrences of each unique value.

# a = [1, 1, 0, 2, 1, 0, 3, 3]
a_uniq, counts = np.unique(a, return_counts=True)  # array([0, 1, 2, 3]), array([2, 3, 1, 2]

and then we can pair them as

dict(zip(a_uniq, counts))  # {0: 2, 1: 3, 2: 1, 3: 2}

It also works with other data types and “2d lists”, e.g.

>>> a = [['a', 'b', 'b', 'b'], ['a', 'c', 'c', 'a']]
>>> dict(zip(*np.unique(a, return_counts=True)))
{'a': 3, 'b': 3, 'c': 2}

回答 15

计算具有共同类型的各种元素的数量:

li = ['A0','c5','A8','A2','A5','c2','A3','A9']

print sum(1 for el in li if el[0]=='A' and el[1] in '01234')

3 ,而不是6

To count the number of diverse elements having a common type:

li = ['A0','c5','A8','A2','A5','c2','A3','A9']

print sum(1 for el in li if el[0]=='A' and el[1] in '01234')

gives

3 , not 6


回答 16

虽然这是一个非常古老的问题,但是由于我没有找到一支,所以我做了一支。

# original numbers in list
l = [1, 2, 2, 3, 3, 3, 4]

# empty dictionary to hold pair of number and its count
d = {}

# loop through all elements and store count
[ d.update( {i:d.get(i, 0)+1} ) for i in l ]

print(d)

Although it is very old question, but as i didn’t find a one liner, i made one.

# original numbers in list
l = [1, 2, 2, 3, 3, 3, 4]

# empty dictionary to hold pair of number and its count
d = {}

# loop through all elements and store count
[ d.update( {i:d.get(i, 0)+1} ) for i in l ]

print(d)

回答 17

您也可以使用countOf内置模块的方法operator

>>> import operator
>>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1)
3

You can also use countOf method of a built-in module operator.

>>> import operator
>>> operator.countOf([1, 2, 3, 4, 1, 4, 1], 1)
3

回答 18

可能不是最有效的,需要额外的通行证才能删除重复项。

功能实现:

arr = np.array(['a','a','b','b','b','c'])
print(set(map(lambda x  : (x , list(arr).count(x)) , arr)))

返回:

{('c', 1), ('b', 3), ('a', 2)}

或返回为dict

print(dict(map(lambda x  : (x , list(arr).count(x)) , arr)))

返回:

{'b': 3, 'c': 1, 'a': 2}

May not be the most efficient, requires an extra pass to remove duplicates.

Functional implementation :

arr = np.array(['a','a','b','b','b','c'])
print(set(map(lambda x  : (x , list(arr).count(x)) , arr)))

returns :

{('c', 1), ('b', 3), ('a', 2)}

or return as dict :

print(dict(map(lambda x  : (x , list(arr).count(x)) , arr)))

returns :

{'b': 3, 'c': 1, 'a': 2}

回答 19

sum([1 for elem in <yourlist> if elem==<your_value>])

这将返回your_value的出现次数

sum([1 for elem in <yourlist> if elem==<your_value>])

This will return the amount of occurences of your_value


回答 20

我将使用filter()Lukasz的示例:

>>> lst = [1, 2, 3, 4, 1, 4, 1]
>>> len(filter(lambda x: x==1, lst))
3

I would use filter(), take Lukasz’s example:

>>> lst = [1, 2, 3, 4, 1, 4, 1]
>>> len(filter(lambda x: x==1, lst))
3

回答 21

如果您希望特定元素出现多次:

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> single_occurrences = Counter(z)
>>> print(single_occurrences.get("blue"))
3
>>> print(single_occurrences.values())
dict_values([3, 2, 1])

if you want a number of occurrences for the particular element:

>>> from collections import Counter
>>> z = ['blue', 'red', 'blue', 'yellow', 'blue', 'red']
>>> single_occurrences = Counter(z)
>>> print(single_occurrences.get("blue"))
3
>>> print(single_occurrences.values())
dict_values([3, 2, 1])

回答 22

def countfrequncyinarray(arr1):
    r=len(arr1)
    return {i:arr1.count(i) for i in range(1,r+1)}
arr1=[4,4,4,4]
a=countfrequncyinarray(arr1)
print(a)
def countfrequncyinarray(arr1):
    r=len(arr1)
    return {i:arr1.count(i) for i in range(1,r+1)}
arr1=[4,4,4,4]
a=countfrequncyinarray(arr1)
print(a)

回答 23

l2=[1,"feto",["feto",1,["feto"]],['feto',[1,2,3,['feto']]]]
count=0
 def Test(l):   
        global count 
        if len(l)==0:
             return count
        count=l.count("feto")
        for i in l:
             if type(i) is list:
                count+=Test(i)
        return count   
    print(Test(l2))

这将递归计数或搜索列表中的项目,即使它在列表列表中

l2=[1,"feto",["feto",1,["feto"]],['feto',[1,2,3,['feto']]]]
count=0
 def Test(l):   
        global count 
        if len(l)==0:
             return count
        count=l.count("feto")
        for i in l:
             if type(i) is list:
                count+=Test(i)
        return count   
    print(Test(l2))

this will recursive count or search for the item in the list even if it in list of lists