问题:在python中计算字典中关键字的数量

我在词典中有一个单词列表,其值=关键字的重复,但是我只想要一个不同单词的列表,因此我想计算关键字的数量。有没有一种方法可以计算关键字的数量,或者还有另一种方法我应该寻找不同的单词?

I have a list of words in a dictionary with the value = the repetition of the keyword but I only want a list of distinct words so I wanted to count the number of keywords. Is there a way to count the number of keywords or is there another way I should look for distinct words?


回答 0

len(yourdict.keys())

要不就

len(yourdict)

如果您想计算文件中的唯一单词,则可以使用set并执行以下操作

len(set(open(yourdictfile).read().split()))
len(yourdict.keys())

or just

len(yourdict)

If you like to count unique words in the file, you could just use set and do like

len(set(open(yourdictfile).read().split()))

回答 1

可以使用该len()功能找到不同单词的数量(即词典中的条目数)。

> a = {'foo':42, 'bar':69}
> len(a)
2

要获取所有不同的单词(即键),请使用.keys()方法。

> list(a.keys())
['foo', 'bar']

The number of distinct words (i.e. count of entries in the dictionary) can be found using the len() function.

> a = {'foo':42, 'bar':69}
> len(a)
2

To get all the distinct words (i.e. the keys), use the .keys() method.

> list(a.keys())
['foo', 'bar']

回答 2

len()直接在您的字典上进行调用是有效的,并且比构建迭代器,d.keys()和对其进行调用要快len(),但是与您的程序正在执行的其他操作相比,两者的速度都可以忽略不计。

d = {x: x**2 for x in range(1000)}

len(d)
# 1000

len(d.keys())
# 1000

%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

Calling len() directly on your dictionary works, and is faster than building an iterator, d.keys(), and calling len() on it, but the speed of either will negligible in comparison to whatever else your program is doing.

d = {x: x**2 for x in range(1000)}

len(d)
# 1000

len(d.keys())
# 1000

%timeit len(d)
# 41.9 ns ± 0.244 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

%timeit len(d.keys())
# 83.3 ns ± 0.41 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

回答 3

如果问题是关于计算关键字数量,则建议使用类似

def countoccurrences(store, value):
    try:
        store[value] = store[value] + 1
    except KeyError as e:
        store[value] = 1
    return

在main函数中有一些循环数据并将值传递给countoccurrences函数的东西

if __name__ == "__main__":
    store = {}
    list = ('a', 'a', 'b', 'c', 'c')
    for data in list:
        countoccurrences(store, data)
    for k, v in store.iteritems():
        print "Key " + k + " has occurred "  + str(v) + " times"

代码输出

Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times

If the question is about counting the number of keywords then would recommend something like

def countoccurrences(store, value):
    try:
        store[value] = store[value] + 1
    except KeyError as e:
        store[value] = 1
    return

in the main function have something that loops through the data and pass the values to countoccurrences function

if __name__ == "__main__":
    store = {}
    list = ('a', 'a', 'b', 'c', 'c')
    for data in list:
        countoccurrences(store, data)
    for k, v in store.iteritems():
        print "Key " + k + " has occurred "  + str(v) + " times"

The code outputs

Key a has occurred 2 times
Key c has occurred 2 times
Key b has occurred 1 times

回答 4

在发布的答案UnderWaterKremlin上进行了一些修改,以使其成为python3证明。下面是一个令人惊讶的结果作为答案。

系统规格:

  • python = 3.7.4,
  • 康达= 4.8.0
  • 3.6Ghz,8核心,16gb。
import timeit

d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000

print (len(d.keys()))
# 1000

print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000))        # 1

print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2

结果:

1)= 37.0100378

2)= 37.002148899999995

因此,len(d.keys())目前看来比使用来得快len()

Some modifications were made on posted answer UnderWaterKremlin to make it python3 proof. A surprising result below as answer.

System specs:

  • python =3.7.4,
  • conda = 4.8.0
  • 3.6Ghz, 8 core, 16gb.
import timeit

d = {x: x**2 for x in range(1000)}
#print (d)
print (len(d))
# 1000

print (len(d.keys()))
# 1000

print (timeit.timeit('len({x: x**2 for x in range(1000)})', number=100000))        # 1

print (timeit.timeit('len({x: x**2 for x in range(1000)}.keys())', number=100000)) # 2

Result:

1) = 37.0100378

2) = 37.002148899999995

So it seems that len(d.keys()) is currently faster than just using len().


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。