Python:字典列表,如果存在,则增加一个字典值,如果不增加新字典

问题:Python:字典列表,如果存在,则增加一个字典值,如果不增加新字典

我想做类似的事情。

list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.cn/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.cn/']

urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]

for url in list_of_urls:
    if url in [f['url'] for f in urls]:
         urls[??]['nbr'] += 1
    else:
         urls.append({'url': url, 'nbr': 1})

我能怎么做 ?我不知道该选择元组来编辑它还是找出元组索引?

有什么帮助吗?

I would like do something like that.

list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.cn/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.cn/']

urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]

for url in list_of_urls:
    if url in [f['url'] for f in urls]:
         urls[??]['nbr'] += 1
    else:
         urls.append({'url': url, 'nbr': 1})

How can I do ? I don’t know if I should take the tuple to edit it or figure out the tuple indices?

Any help ?


回答 0

那是组织事情的一种非常奇怪的方式。如果存储在字典中,这很容易:

# This example should work in any version of Python.
# urls_d will contain URL keys, with counts as values, like: {'http://www.google.fr/' : 1 }
urls_d = {}
for url in list_of_urls:
    if not url in urls_d:
        urls_d[url] = 1
    else:
        urls_d[url] += 1

这段更新计数字典的代码是Python中常见的“模式”。常见的是defaultdict,创建了一个特殊的数据结构,以使其变得更加容易:

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

如果您defaultdict使用键访问,而该键尚未在中defaultdict,则该键会自动添加一个默认值。将defaultdict采取调用您传递,并调用它来获得默认值。在这种情况下,我们在课堂上通过了int;当Python调用时,int()它返回零值。因此,第一次引用URL时,其计数将初始化为零,然后将一个添加到计数中。

但是充满计数的字典也是一种常见的模式,因此Python提供了一个现成的类:containers.Counter 您只需Counter通过调用该类并传递任何可迭代的类来创建实例;它会建立一个字典,其中的键是可迭代的值,而值是键在可迭代中出现的次数的计数。上面的示例变为:

from collections import Counter  # available in Python 2.7 and newer

urls_d = Counter(list_of_urls)

如果您确实需要按照显示的方式进行操作,则最简单,最快的方法是使用这三个示例中的任何一个,然后构建所需的示例。

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

urls = [{"url": key, "nbr": value} for key, value in urls_d.items()]

如果您使用的是Python 2.7或更高版本,则可以单行执行:

from collections import Counter

urls = [{"url": key, "nbr": value} for key, value in Counter(list_of_urls).items()]

That is a very strange way to organize things. If you stored in a dictionary, this is easy:

# This example should work in any version of Python.
# urls_d will contain URL keys, with counts as values, like: {'http://www.google.fr/' : 1 }
urls_d = {}
for url in list_of_urls:
    if not url in urls_d:
        urls_d[url] = 1
    else:
        urls_d[url] += 1

This code for updating a dictionary of counts is a common “pattern” in Python. It is so common that there is a special data structure, defaultdict, created just to make this even easier:

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

If you access the defaultdict using a key, and the key is not already in the defaultdict, the key is automatically added with a default value. The defaultdict takes the callable you passed in, and calls it to get the default value. In this case, we passed in class int; when Python calls int() it returns a zero value. So, the first time you reference a URL, its count is initialized to zero, and then you add one to the count.

But a dictionary full of counts is also a common pattern, so Python provides a ready-to-use class: containers.Counter You just create a Counter instance by calling the class, passing in any iterable; it builds a dictionary where the keys are values from the iterable, and the values are counts of how many times the key appeared in the iterable. The above example then becomes:

from collections import Counter  # available in Python 2.7 and newer

urls_d = Counter(list_of_urls)

If you really need to do it the way you showed, the easiest and fastest way would be to use any one of these three examples, and then build the one you need.

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

urls = [{"url": key, "nbr": value} for key, value in urls_d.items()]

If you are using Python 2.7 or newer you can do it in a one-liner:

from collections import Counter

urls = [{"url": key, "nbr": value} for key, value in Counter(list_of_urls).items()]

回答 1

使用默认值可以,但是:

urls[url] = urls.get(url, 0) + 1

使用.get,可以获取默认返回值(如果不存在)。默认情况下为None,但如果我发送给您,则为0。

Using the default works, but so does:

urls[url] = urls.get(url, 0) + 1

using .get, you can get a default return if it doesn’t exist. By default it’s None, but in the case I sent you, it would be 0.


回答 2

使用defaultdict

from collections import defaultdict

urls = defaultdict(int)

for url in list_of_urls:
    urls[url] += 1

Use defaultdict:

from collections import defaultdict

urls = defaultdict(int)

for url in list_of_urls:
    urls[url] += 1

回答 3

这对我来说总是正常的:

for url in list_of_urls:
    urls.setdefault(url, 0)
    urls[url] += 1

This always works fine for me:

for url in list_of_urls:
    urls.setdefault(url, 0)
    urls[url] += 1

回答 4

完全按照您的方式来做?您可以使用for … else结构

for url in list_of_urls:
    for url_dict in urls:
        if url_dict['url'] == url:
            url_dict['nbr'] += 1
            break
    else:
        urls.append(dict(url=url, nbr=1))

但这是很不雅观的。您是否真的必须将访问的URL存储为LIST?例如,如果将其排序为dict,并以url字符串索引,则它会更干净:

urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}

for url in list_of_urls:
    if url in urls:
        urls[url]['nbr'] += 1
    else:
        urls[url] = dict(url=url, nbr=1)

在第二个示例中需要注意的几件事:

  • 了解测试单个测试时如何使用dict urls消除整个urls列表的需求url。这种方法将更快。
  • 使用dict( )大括号代替您的代码
  • 使用list_of_urlsurlsurl作为变量名使代码挺难解析。这是更好地找到一些更清晰的,如urls_to_visiturls_already_visitedcurrent_url。我知道,时间更长。但这更清楚。

当然,我假设这dict(url='http://www.google.fr', nbr=1)是您自己的数据结构的简化,因为否则,urls可能只是:

urls = {'http://www.google.fr':1}

for url in list_of_urls:
    if url in urls:
        urls[url] += 1
    else:
        urls[url] = 1

使用defaultdict姿势可以很优雅:

urls = collections.defaultdict(int)
for url in list_of_urls:
    urls[url] += 1

To do it exactly your way? You could use the for…else structure

for url in list_of_urls:
    for url_dict in urls:
        if url_dict['url'] == url:
            url_dict['nbr'] += 1
            break
    else:
        urls.append(dict(url=url, nbr=1))

But it is quite inelegant. Do you really have to store the visited urls as a LIST? If you sort it as a dict, indexed by url string, for example, it would be way cleaner:

urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}

for url in list_of_urls:
    if url in urls:
        urls[url]['nbr'] += 1
    else:
        urls[url] = dict(url=url, nbr=1)

A few things to note in that second example:

  • see how using a dict for urls removes the need for going through the whole urls list when testing for one single url. This approach will be faster.
  • Using dict( ) instead of braces makes your code shorter
  • using list_of_urls, urls and url as variable names make the code quite hard to parse. It’s better to find something clearer, such as urls_to_visit, urls_already_visited and current_url. I know, it’s longer. But it’s clearer.

And of course I’m assuming that dict(url='http://www.google.fr', nbr=1) is a simplification of your own data structure, because otherwise, urls could simply be:

urls = {'http://www.google.fr':1}

for url in list_of_urls:
    if url in urls:
        urls[url] += 1
    else:
        urls[url] = 1

Which can get very elegant with the defaultdict stance:

urls = collections.defaultdict(int)
for url in list_of_urls:
    urls[url] += 1

回答 5

除了第一次以外,每次看到一个单词时,if语句的测试都会失败。如果您要计算大量的单词,许多单词可能会多次出现。在一个值的初始化仅发生一次且该值的增加将发生多次的情况下,使用try语句会更便宜:

urls_d = {}
for url in list_of_urls:
    try:
        urls_d[url] += 1
    except KeyError:
        urls_d[url] = 1

您可以阅读有关此内容的更多信息:https : //wiki.python.org/moin/PythonSpeed/PerformanceTips

Except for the first time, each time a word is seen the if statement’s test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:

urls_d = {}
for url in list_of_urls:
    try:
        urls_d[url] += 1
    except KeyError:
        urls_d[url] = 1

you can read more about this: https://wiki.python.org/moin/PythonSpeed/PerformanceTips