Python：字典列表，如果存在，则增加一个字典值，如果不增加新字典

Question 1

I would like do something like that.

list_of_urls = ['http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.cn/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.fr/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.fr/', 'http://www.google.com/', 
                'http://www.google.cn/']

urls = [{'url': 'http://www.google.fr/', 'nbr': 1}]

for url in list_of_urls:
    if url in [f['url'] for f in urls]:
         urls[??]['nbr'] += 1
    else:
         urls.append({'url': url, 'nbr': 1})

How can I do ? I don’t know if I should take the tuple to edit it or figure out the tuple indices?

Any help ?

Question 2

That is a very strange way to organize things. If you stored in a dictionary, this is easy:

# This example should work in any version of Python.
# urls_d will contain URL keys, with counts as values, like: {'http://www.google.fr/' : 1 }
urls_d = {}
for url in list_of_urls:
    if not url in urls_d:
        urls_d[url] = 1
    else:
        urls_d[url] += 1

This code for updating a dictionary of counts is a common “pattern” in Python. It is so common that there is a special data structure, defaultdict, created just to make this even easier:

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

If you access the defaultdict using a key, and the key is not already in the defaultdict, the key is automatically added with a default value. The defaultdict takes the callable you passed in, and calls it to get the default value. In this case, we passed in class int; when Python calls int() it returns a zero value. So, the first time you reference a URL, its count is initialized to zero, and then you add one to the count.

But a dictionary full of counts is also a common pattern, so Python provides a ready-to-use class: containers.Counter You just create a Counter instance by calling the class, passing in any iterable; it builds a dictionary where the keys are values from the iterable, and the values are counts of how many times the key appeared in the iterable. The above example then becomes:

from collections import Counter  # available in Python 2.7 and newer

urls_d = Counter(list_of_urls)

If you really need to do it the way you showed, the easiest and fastest way would be to use any one of these three examples, and then build the one you need.

from collections import defaultdict  # available in Python 2.5 and newer

urls_d = defaultdict(int)
for url in list_of_urls:
    urls_d[url] += 1

urls = [{"url": key, "nbr": value} for key, value in urls_d.items()]

If you are using Python 2.7 or newer you can do it in a one-liner:

from collections import Counter

urls = [{"url": key, "nbr": value} for key, value in Counter(list_of_urls).items()]

Question 3

Using the default works, but so does:

urls[url] = urls.get(url, 0) + 1

using .get, you can get a default return if it doesn’t exist. By default it’s None, but in the case I sent you, it would be 0.

Question 4

Use defaultdict:

from collections import defaultdict

urls = defaultdict(int)

for url in list_of_urls:
    urls[url] += 1

Question 5

This always works fine for me:

for url in list_of_urls:
    urls.setdefault(url, 0)
    urls[url] += 1

Question 6

To do it exactly your way? You could use the for…else structure

for url in list_of_urls:
    for url_dict in urls:
        if url_dict['url'] == url:
            url_dict['nbr'] += 1
            break
    else:
        urls.append(dict(url=url, nbr=1))

But it is quite inelegant. Do you really have to store the visited urls as a LIST? If you sort it as a dict, indexed by url string, for example, it would be way cleaner:

urls = {'http://www.google.fr/': dict(url='http://www.google.fr/', nbr=1)}

for url in list_of_urls:
    if url in urls:
        urls[url]['nbr'] += 1
    else:
        urls[url] = dict(url=url, nbr=1)

A few things to note in that second example:

see how using a dict for urls removes the need for going through the whole urls list when testing for one single url. This approach will be faster.
Using dict( ) instead of braces makes your code shorter
using list_of_urls, urls and url as variable names make the code quite hard to parse. It’s better to find something clearer, such as urls_to_visit, urls_already_visited and current_url. I know, it’s longer. But it’s clearer.

And of course I’m assuming that dict(url='http://www.google.fr', nbr=1) is a simplification of your own data structure, because otherwise, urls could simply be:

urls = {'http://www.google.fr':1}

for url in list_of_urls:
    if url in urls:
        urls[url] += 1
    else:
        urls[url] = 1

Which can get very elegant with the defaultdict stance:

urls = collections.defaultdict(int)
for url in list_of_urls:
    urls[url] += 1

Question 7

Except for the first time, each time a word is seen the if statement’s test fails. If you are counting a large number of words, many will probably occur multiple times. In a situation where the initialization of a value is only going to occur once and the augmentation of that value will occur many times it is cheaper to use a try statement:

urls_d = {}
for url in list_of_urls:
    try:
        urls_d[url] += 1
    except KeyError:
        urls_d[url] = 1

you can read more about this: https://wiki.python.org/moin/PythonSpeed/PerformanceTips

Python：字典列表，如果存在，则增加一个字典值，如果不增加新字典

排行榜展示

Python 情人节超强技能导出微信聊天记录生成词云

你不得不知道的python超级文献批量搜索下载工具

Python 流程图 — 一键转化代码为流程图

7行代码 Python热力图可视化分析缺失数据处理

Python 优化—算出每条语句执行时间

你的10W块放哪里能赚最多钱？

文章展示

正在获取“在库libxml2中找不到函数xmlCheckVersion。是否已安装libxml2？” 通过pip安装lxml时

从字符串中删除所有特殊字符，标点和空格

从Python中的字符串中删除特定字符

如何改善我的爪子检测？

如何按字典值对字典列表进行排序？

从子类调用父类的方法？

Python：字典列表，如果存在，则增加一个字典值，如果不增加新字典

问题：Python：字典列表，如果存在，则增加一个字典值，如果不增加新字典

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

相关文章

排行榜展示

文章展示