标签归档:collections

如何在Python中创建一组集?

问题:如何在Python中创建一组集?

我正在尝试在Python中设置一组。我不知道该怎么做。

从空集开始xx

xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)

但我明白了

TypeError: unhashable type: 'list'

要么

TypeError: unhashable type: 'set'

Python中可能有一组集合吗?

我正在处理大量集合,但我希望不必处理重复的集合(集合A1,集合A2,….的集合B,如果Ai = Aj,则“将取消”两个集合)

I’m trying to make a set of sets in Python. I can’t figure out how to do it.

Starting with the empty set xx:

xx = set([])
# Now we have some other set, for example
elements = set([2,3,4])
xx.add(elements)

but I get

TypeError: unhashable type: 'list'

or

TypeError: unhashable type: 'set'

Is it possible to have a set of sets in Python?

I am dealing with a large collection of sets and I want to be able to not have to deal duplicate sets (a set B of sets A1, A2, …., An would “cancel” two sets if Ai = Aj)


回答 0

Python的抱怨是因为内部set对象是可变的,因此不可散列。解决方案是frozenset用于内部集,以表明您无意修改它们。

Python’s complaining because the inner set objects are mutable and thus not hashable. The solution is to use frozenset for the inner sets, to indicate that you have no intention of modifying them.


回答 1

人们已经提到您可以使用Frozenset()做到这一点,所以我将添加一个代码来实现此目的:

例如,您要从以下列表列表中创建一组集合:

t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]

您可以通过以下方式创建集合:

t1 = set(frozenset(i) for i in t)

People already mentioned that you can do this with a frozenset(), so I will just add a code how to achieve this:

For example you want to create a set of sets from the following list of lists:

t = [[], [1, 2], [5], [1, 2, 5], [1, 2, 3, 4], [1, 2, 3, 6]]

you can create your set in the following way:

t1 = set(frozenset(i) for i in t)

回答 2

frozenset在内部使用。


回答 3

所以我有完全相同的问题。我想制作一个可以作为一组集合使用的数据结构。问题在于集合必须包含不可变的对象。因此,您可以做的只是将其作为一组元组。对我来说很好!

A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

So I had the exact same problem. I wanted to make a data structure that works as a set of sets. The problem is that the sets must contain immutable objects. So, what you can do is simply make it as a set of tuples. That worked fine for me!

A = set()
A.add( (2,3,4) )##adds the element
A.add( (2,3,4) )##does not add the same element
A.add( (2,3,5) )##adds the element, because it is different!

回答 4

截至2020年,Python官方文档建议使用frozenset表示集合集。

As of 2020, the official Python documentation advise using frozenset to represent sets of sets.


如何按值对Counter排序?-Python

问题:如何按值对Counter排序?-Python

除了执行反向列表理解的列表理解之外,还有一种Python方式可以按值对Counter进行排序吗?如果是这样,它比这更快:

>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x)
['a', 'b', 'c']
>>> sorted(x.items())
[('a', 5), ('b', 3), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()])]
[('b', 3), ('a', 5), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()], reverse=True)]
[('c', 7), ('a', 5), ('b', 3)

Other than doing list comprehensions of reversed list comprehension, is there a pythonic way to sort Counter by value? If so, it is faster than this:

>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x)
['a', 'b', 'c']
>>> sorted(x.items())
[('a', 5), ('b', 3), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()])]
[('b', 3), ('a', 5), ('c', 7)]
>>> [(l,k) for k,l in sorted([(j,i) for i,j in x.items()], reverse=True)]
[('c', 7), ('a', 5), ('b', 3)

回答 0

使用Counter.most_common()方法,它将为您排序项目:

>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]

它将以最有效的方式进行;如果您要求前N个而不是所有值,heapq则使用a代替直接排序:

>>> x.most_common(1)
[('c', 7)]

在计数器外部,可以始终根据key功能调整排序;.sort()sorted()都接受赎回,让您指定要排序的输入序列的值; sorted(x, key=x.get, reverse=True)将为您提供与相同的排序x.most_common(),但仅返回键,例如:

>>> sorted(x, key=x.get, reverse=True)
['c', 'a', 'b']

或者您可以仅对给定的值(key, value)对进行排序:

>>> sorted(x.items(), key=lambda pair: pair[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]

有关更多信息,请参见Python排序方法

Use the Counter.most_common() method, it’ll sort the items for you:

>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]

It’ll do so in the most efficient manner possible; if you ask for a Top N instead of all values, a heapq is used instead of a straight sort:

>>> x.most_common(1)
[('c', 7)]

Outside of counters, sorting can always be adjusted based on a key function; .sort() and sorted() both take callable that lets you specify a value on which to sort the input sequence; sorted(x, key=x.get, reverse=True) would give you the same sorting as x.most_common(), but only return the keys, for example:

>>> sorted(x, key=x.get, reverse=True)
['c', 'a', 'b']

or you can sort on only the value given (key, value) pairs:

>>> sorted(x.items(), key=lambda pair: pair[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]

See the Python sorting howto for more information.


回答 1

@MartijnPieters答案的一个相当不错的补充是,由于仅返回一个元组,因此可以按出现的顺序返回字典Collections.most_common。我经常将它与方便的日志文件的json输出结合起来:

from collections import Counter, OrderedDict

x = Counter({'a':5, 'b':3, 'c':7})
y = OrderedDict(x.most_common())

随着输出:

OrderedDict([('c', 7), ('a', 5), ('b', 3)])
{
  "c": 7, 
  "a": 5, 
  "b": 3
}

A rather nice addition to @MartijnPieters answer is to get back a dictionary sorted by occurrence since Collections.most_common only returns a tuple. I often couple this with a json output for handy log files:

from collections import Counter, OrderedDict

x = Counter({'a':5, 'b':3, 'c':7})
y = OrderedDict(x.most_common())

With the output:

OrderedDict([('c', 7), ('a', 5), ('b', 3)])
{
  "c": 7, 
  "a": 5, 
  "b": 3
}

回答 2

是:

>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})

使用排序的关键字键和lambda函数:

>>> sorted(x.items(), key=lambda i: i[1])
[('b', 3), ('a', 5), ('c', 7)]
>>> sorted(x.items(), key=lambda i: i[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]

这适用于所有词典。但是Counter具有特殊功能,可以为您提供已排序的项目(从最频繁到最不频繁)。叫做most_common()

>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
>>> list(reversed(x.most_common()))  # in order of least to most
[('b', 3), ('a', 5), ('c', 7)]

您还可以指定要查看的项目数:

>>> x.most_common(2)  # specify number you want
[('c', 7), ('a', 5)]

Yes:

>>> from collections import Counter
>>> x = Counter({'a':5, 'b':3, 'c':7})

Using the sorted keyword key and a lambda function:

>>> sorted(x.items(), key=lambda i: i[1])
[('b', 3), ('a', 5), ('c', 7)]
>>> sorted(x.items(), key=lambda i: i[1], reverse=True)
[('c', 7), ('a', 5), ('b', 3)]

This works for all dictionaries. However Counter has a special function which already gives you the sorted items (from most frequent, to least frequent). It’s called most_common():

>>> x.most_common()
[('c', 7), ('a', 5), ('b', 3)]
>>> list(reversed(x.most_common()))  # in order of least to most
[('b', 3), ('a', 5), ('c', 7)]

You can also specify how many items you want to see:

>>> x.most_common(2)  # specify number you want
[('c', 7), ('a', 5)]

回答 3

更一般的排序方式,其中key关键字定义排序方式,在数字类型表示降序之前减去:

>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x.items(), key=lambda k: -k[1])  # Ascending
[('c', 7), ('a', 5), ('b', 3)]

More general sorted, where the key keyword defines the sorting method, minus before numerical type indicates descending:

>>> x = Counter({'a':5, 'b':3, 'c':7})
>>> sorted(x.items(), key=lambda k: -k[1])  # Ascending
[('c', 7), ('a', 5), ('b', 3)]

通过索引访问collections.OrderedDict中的项目

问题:通过索引访问collections.OrderedDict中的项目

可以说我有以下代码:

import collections
d = collections.OrderedDict()
d['foo'] = 'python'
d['bar'] = 'spam'

有没有一种方法可以以编号方式访问项目,例如:

d(0) #foo's Output
d(1) #bar's Output

Lets say I have the following code:

import collections
d = collections.OrderedDict()
d['foo'] = 'python'
d['bar'] = 'spam'

Is there a way I can access the items in a numbered manner, like:

d(0) #foo's Output
d(1) #bar's Output

回答 0

如果是OrderedDict(),则可以通过获取(key,value)对的元组的索引来轻松访问元素,如下所示

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>> d.items()[0]
('foo', 'python')
>>> d.items()[1]
('bar', 'spam')

Python 3.X的注意事项

dict.items将返回一个可迭代的dict视图对象而不是一个列表。我们需要将调用包装到一个列表中,以使建立索引成为可能

>>> items = list(d.items())
>>> items
[('foo', 'python'), ('bar', 'spam')]
>>> items[0]
('foo', 'python')
>>> items[1]
('bar', 'spam')

If its an OrderedDict() you can easily access the elements by indexing by getting the tuples of (key,value) pairs as follows

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>> d.items()[0]
('foo', 'python')
>>> d.items()[1]
('bar', 'spam')

Note for Python 3.X

dict.items would return an iterable dict view object rather than a list. We need to wrap the call onto a list in order to make the indexing possible

>>> items = list(d.items())
>>> items
[('foo', 'python'), ('bar', 'spam')]
>>> items[0]
('foo', 'python')
>>> items[1]
('bar', 'spam')

回答 1

您是否必须使用OrderedDict还是特别想要以快速位置索引以某种方式排序的类似地图的类型?如果是后者,则考虑使用Python多种排序的dict类型之一(根据键的排序顺序对键值对进行排序)。一些实现还支持快速索引。例如,为此目的,sortedcontainers项目具有SortedDict类型。

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict()
>>> sd['foo'] = 'python'
>>> sd['bar'] = 'spam'
>>> print sd.iloc[0] # Note that 'bar' comes before 'foo' in sort order.
'bar'
>>> # If you want the value, then simple do a key lookup:
>>> print sd[sd.iloc[1]]
'python'

Do you have to use an OrderedDict or do you specifically want a map-like type that’s ordered in some way with fast positional indexing? If the latter, then consider one of Python’s many sorted dict types (which orders key-value pairs based on key sort order). Some implementations also support fast indexing. For example, the sortedcontainers project has a SortedDict type for just this purpose.

>>> from sortedcontainers import SortedDict
>>> sd = SortedDict()
>>> sd['foo'] = 'python'
>>> sd['bar'] = 'spam'
>>> print sd.iloc[0] # Note that 'bar' comes before 'foo' in sort order.
'bar'
>>> # If you want the value, then simple do a key lookup:
>>> print sd[sd.iloc[1]]
'python'

回答 2

如果您要在OrderedDict中创建第一个条目(或靠近它)而不创建列表,则是一种特殊情况。(此版本已更新为Python 3):

>>> from collections import OrderedDict
>>> 
>>> d = OrderedDict()
>>> d["foo"] = "one"
>>> d["bar"] = "two"
>>> d["baz"] = "three"
>>> next(iter(d.items()))
('foo', 'one')
>>> next(iter(d.values()))
'one'

(当您第一次说“ next()”时,它的意思实际上是“第一”。)

在我的非正式测试中,next(iter(d.items()))使用小OrderedDict仅比快一点items()[0]。使用10,000个条目的OrderedDict,next(iter(d.items()))比快200倍items()[0]

但是,如果您只保存items()列表一次,然后大量使用该列表,那可能会更快。或者,如果您反复{创建一个items()迭代器并将其逐步移动到所需位置},那可能会更慢。

Here is a special case if you want the first entry (or close to it) in an OrderedDict, without creating a list. (This has been updated to Python 3):

>>> from collections import OrderedDict
>>> 
>>> d = OrderedDict()
>>> d["foo"] = "one"
>>> d["bar"] = "two"
>>> d["baz"] = "three"
>>> next(iter(d.items()))
('foo', 'one')
>>> next(iter(d.values()))
'one'

(The first time you say “next()”, it really means “first.”)

In my informal test, next(iter(d.items())) with a small OrderedDict is only a tiny bit faster than items()[0]. With an OrderedDict of 10,000 entries, next(iter(d.items())) was about 200 times faster than items()[0].

BUT if you save the items() list once and then use the list a lot, that could be faster. Or if you repeatedly { create an items() iterator and step through it to to the position you want }, that could be slower.


回答 3

从包中使用IndexedOrderedDict会大大提高效率indexed

根据Niklas的评论,我对OrderedDictIndexedOrderedDict进行了基准测试,其中包含1000个条目。

In [1]: from numpy import *
In [2]: from indexed import IndexedOrderedDict
In [3]: id=IndexedOrderedDict(zip(arange(1000),random.random(1000)))
In [4]: timeit id.keys()[56]
1000000 loops, best of 3: 969 ns per loop

In [8]: from collections import OrderedDict
In [9]: od=OrderedDict(zip(arange(1000),random.random(1000)))
In [10]: timeit od.keys()[56]
10000 loops, best of 3: 104 µs per loop

在此特定情况下,在特定位置的索引元素中的IndexedOrderedDict快约100倍。

It is dramatically more efficient to use IndexedOrderedDict from the indexed package.

Following Niklas’s comment, I have done a benchmark on OrderedDict and IndexedOrderedDict with 1000 entries.

In [1]: from numpy import *
In [2]: from indexed import IndexedOrderedDict
In [3]: id=IndexedOrderedDict(zip(arange(1000),random.random(1000)))
In [4]: timeit id.keys()[56]
1000000 loops, best of 3: 969 ns per loop

In [8]: from collections import OrderedDict
In [9]: od=OrderedDict(zip(arange(1000),random.random(1000)))
In [10]: timeit od.keys()[56]
10000 loops, best of 3: 104 µs per loop

IndexedOrderedDict is ~100 times faster in indexing elements at specific position in this specific case.


回答 4

该社区Wiki尝试收集现有答案。

Python 2.7

在Python 2中,keys()values(),和items()函数OrderedDict的返回列表。使用values为例,最简单的方法是

d.values()[0]  # "python"
d.values()[1]  # "spam"

对于大集合,你只关心一个单一的指标,你能避免使用生成器版本创建的完整列表,iterkeysitervaluesiteritems

import itertools
next(itertools.islice(d.itervalues(), 0, 1))  # "python"
next(itertools.islice(d.itervalues(), 1, 2))  # "spam"

indexed.py包提供IndexedOrderedDict,这是专为这种使用情况下,将是最快的选项。

from indexed import IndexedOrderedDict
d = IndexedOrderedDict({'foo':'python','bar':'spam'})
d.values()[0]  # "python"
d.values()[1]  # "spam"

对于具有随机访问权限的大型词典,使用itervalues可能会更快:

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})'  'i = randint(0, size-1); d.values()[i:i+1]'
1000 loops, best of 3: 259 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
100 loops, best of 3: 2.3 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
10 loops, best of 3: 24.5 msec per loop

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
10000 loops, best of 3: 118 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
1000 loops, best of 3: 1.26 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
100 loops, best of 3: 10.9 msec per loop

$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 1000;   d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.19 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 10000;  d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.24 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 100000; d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.61 usec per loop

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .259      | .118           | .00219  |
|  10000 | 2.3       | 1.26           | .00224  |
| 100000 | 24.5      | 10.9           | .00261  |
+--------+-----------+----------------+---------+

Python 3.6

Python 3具有相同的两个基本选项(列表vs生成器),但是默认情况下dict方法返回生成器。

清单方法:

list(d.values())[0]  # "python"
list(d.values())[1]  # "spam"

生成器方法:

import itertools
next(itertools.islice(d.values(), 0, 1))  # "python"
next(itertools.islice(d.values(), 1, 2))  # "spam"

Python 3字典比python 2快一个数量级,并且使用生成器的速度类似。

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .0316     | .0165          | .00262  |
|  10000 | .288      | .166           | .00294  |
| 100000 | 3.53      | 1.48           | .00332  |
+--------+-----------+----------------+---------+

This community wiki attempts to collect existing answers.

Python 2.7

In python 2, the keys(), values(), and items() functions of OrderedDict return lists. Using values as an example, the simplest way is

d.values()[0]  # "python"
d.values()[1]  # "spam"

For large collections where you only care about a single index, you can avoid creating the full list using the generator versions, iterkeys, itervalues and iteritems:

import itertools
next(itertools.islice(d.itervalues(), 0, 1))  # "python"
next(itertools.islice(d.itervalues(), 1, 2))  # "spam"

The indexed.py package provides IndexedOrderedDict, which is designed for this use case and will be the fastest option.

from indexed import IndexedOrderedDict
d = IndexedOrderedDict({'foo':'python','bar':'spam'})
d.values()[0]  # "python"
d.values()[1]  # "spam"

Using itervalues can be considerably faster for large dictionaries with random access:

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})'  'i = randint(0, size-1); d.values()[i:i+1]'
1000 loops, best of 3: 259 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
100 loops, best of 3: 2.3 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i:i+1]'
10 loops, best of 3: 24.5 msec per loop

$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 1000;   d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
10000 loops, best of 3: 118 usec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 10000;  d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
1000 loops, best of 3: 1.26 msec per loop
$ python2 -m timeit -s 'from collections import OrderedDict; from random import randint; size = 100000; d = OrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); next(itertools.islice(d.itervalues(), i, i+1))'
100 loops, best of 3: 10.9 msec per loop

$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 1000;   d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.19 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 10000;  d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.24 usec per loop
$ python2 -m timeit -s 'from indexed import IndexedOrderedDict; from random import randint; size = 100000; d = IndexedOrderedDict({i:i for i in range(size)})' 'i = randint(0, size-1); d.values()[i]'
100000 loops, best of 3: 2.61 usec per loop

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .259      | .118           | .00219  |
|  10000 | 2.3       | 1.26           | .00224  |
| 100000 | 24.5      | 10.9           | .00261  |
+--------+-----------+----------------+---------+

Python 3.6

Python 3 has the same two basic options (list vs generator), but the dict methods return generators by default.

List method:

list(d.values())[0]  # "python"
list(d.values())[1]  # "spam"

Generator method:

import itertools
next(itertools.islice(d.values(), 0, 1))  # "python"
next(itertools.islice(d.values(), 1, 2))  # "spam"

Python 3 dictionaries are an order of magnitude faster than python 2 and have similar speedups for using generators.

+--------+-----------+----------------+---------+
|  size  | list (ms) | generator (ms) | indexed |
+--------+-----------+----------------+---------+
|   1000 | .0316     | .0165          | .00262  |
|  10000 | .288      | .166           | .00294  |
| 100000 | 3.53      | 1.48           | .00332  |
+--------+-----------+----------------+---------+

回答 5

这是一个新时代,Python 3.6.1词典现在可以保留其顺序。这些语义不明确,因为这需要BDFL批准。但是雷蒙德·海廷格(Raymond Hettinger)是下一个最好的东西(而且更有趣),他提出了一个非常有力的理由,那就是字典将在很长一段时间内被订购。

因此,现在很容易创建字典的切片:

test_dict = {
                'first':  1,
                'second': 2,
                'third':  3,
                'fourth': 4
            }

list(test_dict.items())[:2]

注意:现在,字典插入顺序保留在Python 3.7中正式的

It’s a new era and with Python 3.6.1 dictionaries now retain their order. These semantics aren’t explicit because that would require BDFL approval. But Raymond Hettinger is the next best thing (and funnier) and he makes a pretty strong case that dictionaries will be ordered for a very long time.

So now it’s easy to create slices of a dictionary:

test_dict = {
                'first':  1,
                'second': 2,
                'third':  3,
                'fourth': 4
            }

list(test_dict.items())[:2]

Note: Dictonary insertion-order preservation is now official in Python 3.7.


回答 6

对于OrderedDict(),您可以通过按以下方式获取(键,值)对的元组或通过使用’.values()’进行索引来访问元素。

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>>d.values()
odict_values(['python','spam'])
>>>list(d.values())
['python','spam']

for OrderedDict() you can access the elements by indexing by getting the tuples of (key,value) pairs as follows or using ‘.values()’

>>> import collections
>>> d = collections.OrderedDict()
>>> d['foo'] = 'python'
>>> d['bar'] = 'spam'
>>> d.items()
[('foo', 'python'), ('bar', 'spam')]
>>>d.values()
odict_values(['python','spam'])
>>>list(d.values())
['python','spam']

Python:defaultdict的defaultdict?

问题:Python:defaultdict的defaultdict?

有没有一种方法可以defaultdict(defaultdict(int))使以下代码正常工作?

for x in stuff:
    d[x.a][x.b] += x.c_int

d需要临时构建,具体取决于x.ax.b元素。

我可以使用:

for x in stuff:
    d[x.a,x.b] += x.c_int

但后来我将无法使用:

d.keys()
d[x.a].keys()

Is there a way to have a defaultdict(defaultdict(int)) in order to make the following code work?

for x in stuff:
    d[x.a][x.b] += x.c_int

d needs to be built ad-hoc, depending on x.a and x.b elements.

I could use:

for x in stuff:
    d[x.a,x.b] += x.c_int

but then I wouldn’t be able to use:

d.keys()
d[x.a].keys()

回答 0

是这样的:

defaultdict(lambda: defaultdict(int))

当您尝试访问不存在的键时,将调用的参数defaultdict(在这种情况下为lambda: defaultdict(int))。它的返回值将设置为该密钥的新值,这意味着在我们的情况下,d[Key_doesnt_exist]将为defaultdict(int)

如果尝试从最后一个defaultdict访问密钥,即d[Key_doesnt_exist][Key_doesnt_exist]它将返回0,这是最后一个defaultdict的参数的返回值int()

Yes like this:

defaultdict(lambda: defaultdict(int))

The argument of a defaultdict (in this case is lambda: defaultdict(int)) will be called when you try to access a key that doesn’t exist. The return value of it will be set as the new value of this key, which means in our case the value of d[Key_doesnt_exist] will be defaultdict(int).

If you try to access a key from this last defaultdict i.e. d[Key_doesnt_exist][Key_doesnt_exist] it will return 0, which is the return value of the argument of the last defaultdict i.e. int().


回答 1

defaultdict构造函数的参数是用于构建新元素的函数。因此,让我们使用lambda!

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0

从Python 2.7开始,使用Counter有了一个更好的解决方案

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

一些额外功能

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

有关更多信息,请参见PyMOTW-集合-容器数据类型Python文档-集合

The parameter to the defaultdict constructor is the function which will be called for building new elements. So let’s use a lambda !

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0

Since Python 2.7, there’s an even better solution using Counter:

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

Some bonus features

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

For more information see PyMOTW – Collections – Container data types and Python Documentation – collections


回答 2

我发现使用起来稍微更优雅partial

import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)

当然,这与lambda相同。

I find it slightly more elegant to use partial:

import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)

Of course, this is the same as a lambda.


回答 3

作为参考,可以通过以下方式实现通用的嵌套defaultdict工厂方法:

from collections import defaultdict
from functools import partial
from itertools import repeat


def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

深度定义了default_factory使用中定义的类型之前嵌套字典的数量。例如:

my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')

For reference, it’s possible to implement a generic nested defaultdict factory method through:

from collections import defaultdict
from functools import partial
from itertools import repeat


def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

The depth defines the number of nested dictionary before the type defined in default_factory is used. For example:

my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')

回答 4

先前的答案已经解决了如何制作两级或n级defaultdict。在某些情况下,您需要无限个:

def ddict():
    return defaultdict(ddict)

用法:

>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

Previous answers have addressed how to make a two-levels or n-levels defaultdict. In some cases you want an infinite one:

def ddict():
    return defaultdict(ddict)

Usage:

>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

回答 5

其他人已经正确回答了您如何使以下各项正常工作的问题:

for x in stuff:
    d[x.a][x.b] += x.c_int

一种替代方法是使用元组作为键:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

这种方法的好处是它很简单并且可以轻松扩展。如果您需要三个层次的映射,只需使用一个三项元组作为键。

Others have answered correctly your question of how to get the following to work:

for x in stuff:
    d[x.a][x.b] += x.c_int

An alternative would be to use tuples for keys:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

The nice thing about this approach is that it is simple and can be easily expanded. If you need a mapping three levels deep, just use a three item tuple for the key.


列表是否包含简短的包含功能?

问题:列表是否包含简短的包含功能?

我看到人们正在使用any另一个列表来查看列表中是否存在某项,但是有一种快速的方法吗?

if list.contains(myItem):
    # do something

I see people are using any to gather another list to see if an item exists in a list, but is there a quick way to just do?:

if list.contains(myItem):
    # do something

回答 0

您可以使用以下语法:

if myItem in list:
    # do something

同样,逆运算符:

if myItem not in list:
    # do something

它适用于列表,元组,集合和字典(检查键)。

请注意,这是列表和元组中的O(n)操作,而集合和字典中是O(1)操作。

You can use this syntax:

if myItem in list:
    # do something

Also, inverse operator:

if myItem not in list:
    # do something

It’s work fine for lists, tuples, sets and dicts (check keys).

Note that this is an O(n) operation in lists and tuples, but an O(1) operation in sets and dicts.


回答 1

除了别人说过的话,您可能还想知道什么in是调用list.__contains__方法,您可以在编写的任何类上定义该方法,并且可以非常方便地全面使用python。  

愚蠢的用途可能是:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

In addition to what other have said, you may also be interested to know that what in does is to call the list.__contains__ method, that you can define on any class you write and can get extremely handy to use python at his full extent.  

A dumb use may be:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

回答 2

我最近想出了这条衬垫,用于获取True列表中是否包含任何数量的项目,或者该列表中不包含任何项目或False根本不包含任何项目。使用next(...)会给它提供默认的返回值(False),这意味着它的运行速度应比运行整个列表理解的速度快得多。

list_does_contain = next((True for item in list_to_test if item == test_item), False)

I came up with this one liner recently for getting True if a list contains any number of occurrences of an item, or False if it contains no occurrences or nothing at all. Using next(...) gives this a default return value (False) and means it should run significantly faster than running the whole list comprehension.

list_does_contain = next((True for item in list_to_test if item == test_item), False)


回答 3

如果该项目不存在,则list方法index将返回,-1如果该项目存在,则将返回该项目在列表中的索引。或者,if您可以在语句中执行以下操作:

if myItem in list:
    #do things

您还可以使用以下if语句检查元素是否不在列表中:

if myItem not in list:
    #do things

The list method index will return -1 if the item is not present, and will return the index of the item in the list if it is present. Alternatively in an if statement you can do the following:

if myItem in list:
    #do things

You can also check if an element is not in a list with the following if statement:

if myItem not in list:
    #do things

Awesome-python-一个不错的Python框架、库、软件和资源的精选列表

Awesome Python

一个不错的Python框架、库、软件和资源的精选列表

灵感来自awawed-php


管理面板

管理接口库

  • ajenti – The admin panel your servers deserve.
  • django-grappelli – A jazzy skin for the Django Admin-Interface.
  • django-jet – Modern responsive template for the Django admin interface with improved functionality.
  • django-suit – Alternative Django Admin-Interface (free only for Non-commercial use).
  • django-xadmin – Drop-in replacement of Django admin comes with lots of goodies.
  • flask-admin – Simple and extensible administrative interface framework for Flask.
  • flower – Real-time monitor and web admin for Celery.
  • jet-bridge – Admin panel framework for any application with nice UI (ex Jet Django)
  • wooey – A Django app which creates automatic web UIs for Python scripts.

算法和设计模式

数据结构、算法和设计模式的Python实现。另见令人敬畏的算法

  • Algorithms
    • algorithms – Minimal examples of data structures and algorithms.
    • python-ds – A collection of data structure and algorithms for coding interviews.
    • sortedcontainers – Fast and pure-Python implementation of sorted collections.
    • TheAlgorithms – All Algorithms implemented in Python.
  • Design Patterns
    • PyPattyrn – A simple yet effective library for implementing common design patterns.
    • python-patterns – A collection of design patterns in Python.
    • transitions – A lightweight, object-oriented finite state machine implementation.

ASGI服务器

与ASGI兼容的Web服务器

  • daphne – A HTTP, HTTP2 and WebSocket protocol server for ASGI and ASGI-HTTP.
  • uvicorn – A lightning-fast ASGI server implementation, using uvloop and httptools.

异步编程

  • asyncio – (Python standard library) Asynchronous I/O, event loop, coroutines and tasks.
  • trio – A friendly library for async concurrency and I/O.
  • Twisted – An event-driven networking engine.
  • uvloop – Ultra fast asyncio event loop.

音频

用于操作音频及其元数据的库

  • Audio
    • audioread – Cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding.
    • dejavu – Audio fingerprinting and recognition.
    • kapre – Keras Audio Preprocessors
    • librosa – Python library for audio and music analysis
    • matchering – A library for automated reference audio mastering.
    • mingus – An advanced music theory and notation package with MIDI file and playback support.
    • pyAudioAnalysis – Audio feature extraction, classification, segmentation and applications.
    • pydub – Manipulate audio with a simple and easy high level interface.
    • TimeSide – Open web audio processing framework.
  • Metadata
    • beets – A music library manager and MusicBrainz tagger.
    • eyeD3 – A tool for working with audio files, specifically MP3 files containing ID3 metadata.
    • mutagen – A Python module to handle audio metadata.
    • tinytag – A library for reading music meta data of MP3, OGG, FLAC and Wave files.

身份验证

用于实现身份验证方案的库

  • OAuth
    • authlib – JavaScript Object Signing and Encryption draft implementation.
    • django-allauth – Authentication app for Django that “just works.”
    • django-oauth-toolkit – OAuth 2 goodies for Django.
    • oauthlib – A generic and thorough implementation of the OAuth request-signing logic.
    • python-oauth2 – A fully tested, abstract interface to creating OAuth clients and servers.
    • python-social-auth – An easy-to-setup social authentication mechanism.
  • JWT
    • pyjwt – JSON Web Token implementation in Python.
    • python-jose – A JOSE implementation in Python.
    • python-jwt – A module for generating and verifying JSON Web Tokens.

构建工具

从源代码编译软件

  • BitBake – A make-like build tool for embedded Linux.
  • buildout – A build system for creating, assembling and deploying applications from multiple parts.
  • PlatformIO – A console tool to build code with different development platforms.
  • pybuilder – A continuous build tool written in pure Python.
  • SCons – A software construction tool.

内置类增强功能

用于增强Python内置类的库

  • attrs – Replacement for __init__, __eq__, __repr__, etc. boilerplate in class definitions.
  • bidict – Efficient, Pythonic bidirectional map data structures and related functionality..
  • Box – Python dictionaries with advanced dot notation access.
  • dataclasses – (Python standard library) Data classes.
  • DottedDict – A library that provides a method of accessing lists and dicts with a dotted path notation.

CMS

内容管理系统

  • django-cms – An Open source enterprise CMS based on the Django.
  • feincms – One of the most advanced Content Management Systems built on Django.
  • indico – A feature-rich event management system, made @ CERN.
  • Kotti – A high-level, Pythonic web application framework built on Pyramid.
  • mezzanine – A powerful, consistent, and flexible content management platform.
  • plone – A CMS built on top of the open source application server Zope.
  • quokka – Flexible, extensible, small CMS powered by Flask and MongoDB.
  • wagtail – A Django content management system.

缓存

用于缓存数据的库

  • beaker – A WSGI middleware for sessions and caching.
  • django-cache-machine – Automatic caching and invalidation for Django models.
  • django-cacheops – A slick ORM cache with automatic granular event-driven invalidation.
  • dogpile.cache – dogpile.cache is next generation replacement for Beaker made by same authors.
  • HermesCache – Python caching library with tag-based invalidation and dogpile effect prevention.
  • pylibmc – A Python wrapper around the libmemcached interface.
  • python-diskcache – SQLite and file backed cache backend with faster lookups than memcached and redis.

ChatOps工具

Chatbot开发库

  • errbot – The easiest and most popular chatbot to implement ChatOps.

代码分析

静电分析工具,链接器和代码质量检查器。另见棒极了-静电-分析

  • Code Analysis
    • coala – Language independent and easily extendable code analysis application.
    • code2flow – Turn your Python and JavaScript code into DOT flowcharts.
    • prospector – A tool to analyse Python code.
    • pycallgraph – A library that visualises the flow (call graph) of your Python application.
    • vulture – A tool for finding and analysing dead Python code.
  • Code Linters
  • Code Formatters
    • black – The uncompromising Python code formatter.
    • isort – A Python utility / library to sort imports.
    • yapf – Yet another Python code formatter from Google.
  • Static Type Checkers, also see awesome-python-typing
    • mypy – Check variable types during compile time.
    • pyre-check – Performant type checking.
    • typeshed – Collection of library stubs for Python, with static types.
  • Static Type Annotations Generators
    • MonkeyType – A system for Python that generates static type annotations by collecting runtime types.
    • pyannotate – Auto-generate PEP-484 annotations.
    • pytype – Pytype checks and infers types for Python code – without requiring type annotations.

命令行界面开发

用于构建命令行应用程序的库

  • Command-line Application Development
    • cement – CLI Application Framework for Python.
    • click – A package for creating beautiful command line interfaces in a composable way.
    • cliff – A framework for creating command-line programs with multi-level commands.
    • docopt – Pythonic command line arguments parser.
    • python-fire – A library for creating command line interfaces from absolutely any Python object.
    • python-prompt-toolkit – A library for building powerful interactive command lines.
  • Terminal Rendering
    • alive-progress – A new kind of Progress Bar, with real-time throughput, eta and very cool animations.
    • asciimatics – A package to create full-screen text UIs (from interactive forms to ASCII animations).
    • bashplotlib – Making basic plots in the terminal.
    • colorama – Cross-platform colored terminal text.
    • rich – Python library for rich text and beautiful formatting in the terminal. Also provides a great RichHandler log handler.
    • tqdm – Fast, extensible progress bar for loops and CLI.

命令行工具

基于CLI的实用工具,可提高工作效率

  • Productivity Tools
    • copier – A library and command-line utility for rendering projects templates.
    • cookiecutter – A command-line utility that creates projects from cookiecutters (project templates).
    • doitlive – A tool for live presentations in the terminal.
    • howdoi – Instant coding answers via the command line.
    • Invoke – A tool for managing shell-oriented subprocesses and organizing executable Python code into CLI-invokable tasks.
    • PathPicker – Select files out of bash output.
    • percol – Adds flavor of interactive selection to the traditional pipe concept on UNIX.
    • thefuck – Correcting your previous console command.
    • tmuxp – A tmux session manager.
    • try – A dead simple CLI to try out python packages – it’s never been easier.
  • CLI Enhancements
    • httpie – A command line HTTP client, a user-friendly cURL replacement.
    • iredis – Redis CLI with autocompletion and syntax highlighting.
    • kube-shell – An integrated shell for working with the Kubernetes CLI.
    • litecli – SQLite CLI with autocompletion and syntax highlighting.
    • mycli – MySQL CLI with autocompletion and syntax highlighting.
    • pgcli – PostgreSQL CLI with autocompletion and syntax highlighting.
    • saws – A Supercharged aws-cli.

兼容性

用于从Python 2迁移到3的库

  • python-future – The missing compatibility layer between Python 2 and Python 3.
  • modernize – Modernizes Python code for eventual Python 3 migration.
  • six – Python 2 and 3 compatibility utilities.

计算机视觉

计算机视觉图书馆

  • EasyOCR – Ready-to-use OCR with 40+ languages supported.
  • Face Recognition – Simple facial recognition library.
  • Kornia – Open Source Differentiable Computer Vision Library for PyTorch.
  • OpenCV – Open Source Computer Vision Library.
  • pytesseract – A wrapper for Google Tesseract OCR.
  • SimpleCV – An open source framework for building computer vision applications.
  • tesserocr – Another simple, Pillow-friendly, wrapper around the tesseract-ocr API for OCR.

并发性和并行性

用于并发和并行执行的库。也请参见AWOWE-Asyncio

  • concurrent.futures – (Python standard library) A high-level interface for asynchronously executing callables.
  • eventlet – Asynchronous framework with WSGI support.
  • gevent – A coroutine-based Python networking library that uses greenlet.
  • multiprocessing – (Python standard library) Process-based parallelism.
  • scoop – Scalable Concurrent Operations in Python.
  • uvloop – Ultra fast implementation of asyncio event loop on top of libuv.

配置

用于存储和解析配置选项的库

  • configobj – INI file parser with validation.
  • configparser – (Python standard library) INI file parser.
  • hydra – Hydra is a framework for elegantly configuring complex applications.
  • profig – Config from multiple formats with value conversion.
  • python-decouple – Strict separation of settings from code.

密码学

  • cryptography – A package designed to expose cryptographic primitives and recipes to Python developers.
  • paramiko – The leading native Python SSHv2 protocol library.
  • passlib – Secure password storage/hashing library, very high level.
  • pynacl – Python binding to the Networking and Cryptography (NaCl) library.

数据分析

用于数据分析的库

  • AWS Data Wrangler – Pandas on AWS.
  • Blaze – NumPy and Pandas interface to Big Data.
  • Open Mining – Business Intelligence (BI) in Pandas interface.
  • Optimus – Agile Data Science Workflows made easy with PySpark.
  • Orange – Data mining, data visualization, analysis and machine learning through visual programming or scripts.
  • Pandas – A library providing high-performance, easy-to-use data structures and data analysis tools.

数据验证

用于验证数据的库。在许多情况下用于表单

  • Cerberus – A lightweight and extensible data validation library.
  • colander – Validating and deserializing data obtained via XML, JSON, an HTML form post.
  • jsonschema – An implementation of JSON Schema for Python.
  • schema – A library for validating Python data structures.
  • Schematics – Data Structure Validation.
  • valideer – Lightweight extensible data validation and adaptation library.
  • voluptuous – A Python data validation library.

数据可视化

用于可视化数据的库。另请参阅awawed-javascript

  • Altair – Declarative statistical visualization library for Python.
  • Bokeh – Interactive Web Plotting for Python.
  • bqplot – Interactive Plotting Library for the Jupyter Notebook
  • Cartopy – A cartographic python library with matplotlib support
  • Dash – Built on top of Flask, React and Plotly aimed at analytical web applications.
  • diagrams – Diagram as Code.
  • Matplotlib – A Python 2D plotting library.
  • plotnine – A grammar of graphics for Python based on ggplot2.
  • Pygal – A Python SVG Charts Creator.
  • PyGraphviz – Python interface to Graphviz.
  • PyQtGraph – Interactive and realtime 2D/3D/Image plotting and science/engineering widgets.
  • Seaborn – Statistical data visualization using Matplotlib.
  • VisPy – High-performance scientific visualization based on OpenGL.

数据库

在Python中实现的数据库

  • pickleDB – A simple and lightweight key-value store for Python.
  • tinydb – A tiny, document-oriented database.
  • ZODB – A native object database for Python. A key-value and object graph database.

数据库驱动程序

用于连接和操作数据库的库

  • MySQL – awesome-mysql
  • PostgreSQL – awesome-postgres
    • psycopg2 – The most popular PostgreSQL adapter for Python.
    • queries – A wrapper of the psycopg2 library for interacting with PostgreSQL.
  • SQlite – awesome-sqlite
    • sqlite3 – (Python standard library) SQlite interface compliant with DB-API 2.0
    • SuperSQLite – A supercharged SQLite library built on top of apsw.
  • Other Relational Databases
    • pymssql – A simple database interface to Microsoft SQL Server.
    • clickhouse-driver – Python driver with native interface for ClickHouse.
  • NoSQL Databases
    • cassandra-driver – The Python Driver for Apache Cassandra.
    • happybase – A developer-friendly library for Apache HBase.
    • kafka-python – The Python client for Apache Kafka.
    • py2neo – A client library and toolkit for working with Neo4j.
    • pymongo – The official Python client for MongoDB.
    • redis-py – The Python client for Redis.
  • Asynchronous Clients
    • motor – The async Python driver for MongoDB.

日期和时间

用于处理日期和时间的库

  • Arrow – A Python library that offers a sensible and human-friendly approach to creating, manipulating, formatting and converting dates, times and timestamps.
  • Chronyk – A Python 3 library for parsing human-written times and dates.
  • dateutil – Extensions to the standard Python datetime module.
  • delorean – A library for clearing up the inconvenient truths that arise dealing with datetimes.
  • maya – Datetimes for Humans.
  • moment – A Python library for dealing with dates/times. Inspired by Moment.js.
  • Pendulum – Python datetimes made easy.
  • PyTime – An easy-to-use Python module which aims to operate date/time/datetime by string.
  • pytz – World timezone definitions, modern and historical. Brings the tz database into Python.
  • when.py – Providing user-friendly functions to help perform common date and time actions.

调试工具

用于调试代码的库

  • pdb-like Debugger
    • ipdb – IPython-enabled pdb.
    • pdb++ – Another drop-in replacement for pdb.
    • pudb – A full-screen, console-based Python debugger.
    • wdb – An improbable web debugger through WebSockets.
  • Tracing
    • lptracestrace for Python programs.
    • manhole – Debugging UNIX socket connections and present the stacktraces for all threads and an interactive prompt.
    • pyringe – Debugger capable of attaching to and injecting code into Python processes.
    • python-hunter – A flexible code tracing toolkit.
  • Profiler
    • line_profiler – Line-by-line profiling.
    • memory_profiler – Monitor Memory usage of Python code.
    • py-spy – A sampling profiler for Python programs. Written in Rust.
    • pyflame – A ptracing profiler For Python.
    • vprof – Visual Python profiler.
  • Others
    • django-debug-toolbar – Display various debug information for Django.
    • django-devserver – A drop-in replacement for Django’s runserver.
    • flask-debugtoolbar – A port of the django-debug-toolbar to flask.
    • icecream – Inspect variables, expressions, and program execution with a single, simple function call.
    • pyelftools – Parsing and analyzing ELF files and DWARF debugging information.

深度学习

神经网络和深度学习的框架。另请参阅令人敬畏的深度学习

  • caffe – A fast open framework for deep learning..
  • keras – A high-level neural networks library and capable of running on top of either TensorFlow or Theano.
  • mxnet – A deep learning framework designed for both efficiency and flexibility.
  • pytorch – Tensors and Dynamic neural networks in Python with strong GPU acceleration.
  • SerpentAI – Game agent framework. Use any video game as a deep learning sandbox.
  • tensorflow – The most popular Deep Learning framework created by Google.
  • Theano – A library for fast numerical computation.

DevOps工具

适用于DevOps的软件和库

  • Configuration Management
    • ansible – A radically simple IT automation platform.
    • cloudinit – A multi-distribution package that handles early initialization of a cloud instance.
    • OpenStack – Open source software for building private and public clouds.
    • pyinfra – A versatile CLI tools and python libraries to automate infrastructure.
    • saltstack – Infrastructure automation and management system.
  • SSH-style Deployment
    • cuisine – Chef-like functionality for Fabric.
    • fabric – A simple, Pythonic tool for remote execution and deployment.
    • fabtools – Tools for writing awesome Fabric files.
  • Process Management
    • honcho – A Python clone of Foreman, for managing Procfile-based applications.
    • supervisor – Supervisor process control system for UNIX.
  • Monitoring
    • psutil – A cross-platform process and system utilities module.
  • Backup
    • BorgBackup – A deduplicating archiver with compression and encryption.
  • Others

分布式计算

分布式计算的框架和库

  • Batch Processing
    • dask – A flexible parallel computing library for analytic computing.
    • luigi – A module that helps you build complex pipelines of batch jobs.
    • mrjob – Run MapReduce jobs on Hadoop or Amazon Web Services.
    • PySparkApache Spark Python API.
    • Ray – A system for parallel and distributed Python that unifies the machine learning ecosystem.
  • Stream Processing

分布

为发行版创建打包的可执行文件的库

  • dh-virtualenv – Build and distribute a virtualenv as a Debian package.
  • Nuitka – Compile scripts, modules, packages to an executable or extension module.
  • py2app – Freezes Python scripts (Mac OS X).
  • py2exe – Freezes Python scripts (Windows).
  • pyarmor – A tool used to obfuscate python scripts, bind obfuscated scripts to fixed machine or expire obfuscated scripts.
  • PyInstaller – Converts Python programs into stand-alone executables (cross-platform).
  • pynsist – A tool to build Windows installers, installers bundle Python itself.
  • shiv – A command line utility for building fully self-contained zipapps (PEP 441), but with all their dependencies included.

文档

用于生成项目文档的库

  • sphinx – Python Documentation generator.
  • pdoc – Epydoc replacement to auto generate API documentation for Python libraries.
  • pycco – The literate-programming-style documentation generator.

下载器

用于下载的库

  • akshare – A financial data interface library, built for human beings!
  • s3cmd – A command line tool for managing Amazon S3 and CloudFront.
  • s4cmd – Super S3 command line tool, good for higher performance.
  • you-get – A YouTube/Youku/Niconico video downloader written in Python 3.
  • youtube-dl – A small command-line program to download videos from YouTube.

电子商务

电子商务和支付的框架和库

  • alipay – Unofficial Alipay API for Python.
  • Cartridge – A shopping cart app built using the Mezzanine.
  • django-oscar – An open-source e-commerce framework for Django.
  • django-shop – A Django based shop system.
  • forex-python – Foreign exchange rates, Bitcoin price index and currency conversion.
  • merchant – A Django app to accept payments from various payment processors.
  • moneyMoney class with optional CLDR-backed locale-aware formatting and an extensible currency exchange.
  • python-currencies – Display money format and its filthy currencies.
  • saleor – An e-commerce storefront for Django.
  • shoop – An open source E-Commerce platform based on Django.

编辑器插件和IDE

  • Emacs
    • elpy – Emacs Python Development Environment.
  • Sublime Text
    • anaconda – Anaconda turns your Sublime Text 3 in a full featured Python development IDE.
    • SublimeJEDI – A Sublime Text plugin to the awesome auto-complete library Jedi.
  • Vim
    • jedi-vim – Vim bindings for the Jedi auto-completion library for Python.
    • python-mode – An all in one plugin for turning Vim into a Python IDE.
    • YouCompleteMe – Includes Jedi-based completion engine for Python.
  • Visual Studio
    • PTVS – Python Tools for Visual Studio.
  • Visual Studio Code
    • Python – The official VSCode extension with rich support for Python.
  • IDE
    • PyCharm – Commercial Python IDE by JetBrains. Has free community edition available.
    • spyder – Open Source Python IDE.

电子邮件

用于发送和解析电子邮件的库

  • Mail Servers
    • modoboa – A mail hosting and management platform including a modern Web UI.
    • salmon – A Python Mail Server.
  • Clients
    • imbox – Python IMAP for Humans.
    • yagmail – Yet another Gmail/SMTP client.
  • Others
    • flanker – An email address and Mime parsing library.
    • mailer – High-performance extensible mail delivery framework.

企业应用程序集成

用于在企业环境中进行系统集成的平台和工具

  • Zato – ESB, SOA, REST, APIs and Cloud Integrations in Python.

环境管理

用于Python版本和虚拟环境管理的库

  • pyenv – Simple Python version management.
  • virtualenv – A tool to create isolated Python environments.

文件

用于文件操作和MIME类型检测的库

  • mimetypes – (Python standard library) Map filenames to MIME types.
  • path.py – A module wrapper for os.path.
  • pathlib – (Python standard library) An cross-platform, object-oriented path library.
  • PyFilesystem2 – Python’s filesystem abstraction layer.
  • python-magic – A Python interface to the libmagic file type identification library.
  • Unipath – An object-oriented approach to file/directory operations.
  • watchdog – API and shell utilities to monitor file system events.

外来函数接口

用于提供外来函数接口的库

  • cffi – Foreign Function Interface for Python calling C code.
  • ctypes – (Python standard library) Foreign Function Interface for Python calling C code.
  • PyCUDA – A Python wrapper for Nvidia’s CUDA API.
  • SWIG – Simplified Wrapper and Interface Generator.

表格

用于处理表单的库

  • Deform – Python HTML form generation library influenced by the formish form generation library.
  • django-bootstrap3 – Bootstrap 3 integration with Django.
  • django-bootstrap4 – Bootstrap 4 integration with Django.
  • django-crispy-forms – A Django app which lets you create beautiful forms in a very elegant and DRY way.
  • django-remote-forms – A platform independent Django form serializer.
  • WTForms – A flexible forms validation and rendering library.

函数式程序设计

使用Python进行函数式编程

  • Coconut – A variant of Python built for simple, elegant, Pythonic functional programming.
  • CyToolz – Cython implementation of Toolz: High performance functional utilities.
  • fn.py – Functional programming in Python: implementation of missing features to enjoy FP.
  • funcy – A fancy and practical functional tools.
  • more-itertools – More routines for operating on iterables, beyond itertools.
  • returns – A set of type-safe monads, transformers, and composition utilities.
  • Toolz – A collection of functional utilities for iterators, functions, and dictionaries.

GUI开发

用于使用图形用户界面应用程序的库

  • curses – Built-in wrapper for ncurses used to create terminal GUI applications.
  • Eel – A library for making simple Electron-like offline HTML/JS GUI apps.
  • enaml – Creating beautiful user-interfaces with Declarative Syntax like QML.
  • Flexx – Flexx is a pure Python toolkit for creating GUI’s, that uses web technology for its rendering.
  • Gooey – Turn command line programs into a full GUI application with one line.
  • kivy – A library for creating NUI applications, running on Windows, Linux, Mac OS X, Android and iOS.
  • pyglet – A cross-platform windowing and multimedia library for Python.
  • PyGObject – Python Bindings for GLib/GObject/GIO/GTK+ (GTK+3).
  • PyQt – Python bindings for the Qt cross-platform application and UI framework.
  • PySimpleGUI – Wrapper for tkinter, Qt, WxPython and Remi.
  • pywebview – A lightweight cross-platform native wrapper around a webview component.
  • Tkinter – Tkinter is Python’s de-facto standard GUI package.
  • Toga – A Python native, OS native GUI toolkit.
  • urwid – A library for creating terminal GUI applications with strong support for widgets, events, rich colors, etc.
  • wxPython – A blending of the wxWidgets C++ class library with the Python.
  • DearPyGui – A Simple GPU accelerated Python GUI framework

图形QL

用于使用GraphQL的库

  • graphene – GraphQL framework for Python.
  • tartiflette-aiohttp – An aiohttp-based wrapper for Tartiflette to expose GraphQL APIs over HTTP.
  • tartiflette-asgi – ASGI support for the Tartiflette GraphQL engine.
  • tartiflette – SDL-first GraphQL engine implementation for Python 3.6+ and asyncio.

游戏开发

超棒的游戏开发库

  • Arcade – Arcade is a modern Python framework for crafting games with compelling graphics and sound.
  • Cocos2d – cocos2d is a framework for building 2D games, demos, and other graphical/interactive applications.
  • Harfang3D – Python framework for 3D, VR and game development.
  • Panda3D – 3D game engine developed by Disney.
  • Pygame – Pygame is a set of Python modules designed for writing games.
  • PyOgre – Python bindings for the Ogre 3D render engine, can be used for games, simulations, anything 3D.
  • PyOpenGL – Python ctypes bindings for OpenGL and it’s related APIs.
  • PySDL2 – A ctypes based wrapper for the SDL2 library.
  • RenPy – A Visual Novel engine.

地理位置

用于对地址进行地理编码以及处理纬度和经度的库

  • django-countries – A Django app that provides a country field for models and forms.
  • GeoDjango – A world-class geographic web framework.
  • GeoIP – Python API for MaxMind GeoIP Legacy Database.
  • geojson – Python bindings and utilities for GeoJSON.
  • geopy – Python Geocoding Toolbox.

HTML操作

用于处理HTML和XML的库

  • BeautifulSoup – Providing Pythonic idioms for iterating, searching, and modifying HTML or XML.
  • bleach – A whitelist-based HTML sanitization and text linkification library.
  • cssutils – A CSS library for Python.
  • html5lib – A standards-compliant library for parsing and serializing HTML documents and fragments.
  • lxml – A very fast, easy-to-use and versatile library for handling HTML and XML.
  • MarkupSafe – Implements a XML/HTML/XHTML Markup safe string for Python.
  • pyquery – A jQuery-like library for parsing HTML.
  • untangle – Converts XML documents to Python objects for easy access.
  • WeasyPrint – A visual rendering engine for HTML and CSS that can export to PDF.
  • xmldataset – Simple XML Parsing.
  • xmltodict – Working with XML feel like you are working with JSON.

HTTP客户端

用于使用HTTP的库

  • grequests – requests + gevent for asynchronous HTTP requests.
  • httplib2 – Comprehensive HTTP client library.
  • httpx – A next generation HTTP client for Python.
  • requests – HTTP Requests for Humans.
  • treq – Python requests like API built on top of Twisted’s HTTP client.
  • urllib3 – A HTTP library with thread-safe connection pooling, file post support, sanity friendly.

硬件

用于硬件编程的库

  • ino – Command line toolkit for working with Arduino.
  • keyboard – Hook and simulate global keyboard events on Windows and Linux.
  • mouse – Hook and simulate global mouse events on Windows and Linux.
  • Pingo – Pingo provides a uniform API to program devices like the Raspberry Pi, pcDuino, Intel Galileo, etc.
  • PyUserInput – A module for cross-platform control of the mouse and keyboard.
  • scapy – A brilliant packet manipulation library.

图像处理

用于操作图像的库

  • hmap – Image histogram remapping.
  • imgSeek – A project for searching a collection of images using visual similarity.
  • nude.py – Nudity detection.
  • pagan – Retro identicon (Avatar) generation based on input string and hash.
  • pillow – Pillow is the friendly PIL fork.
  • python-barcode – Create barcodes in Python with no extra dependencies.
  • pygram – Instagram-like image filters.
  • PyMatting – A library for alpha matting.
  • python-qrcode – A pure Python QR Code generator.
  • pywal – A tool that generates color schemes from images.
  • pyvips – A fast image processing library with low memory needs.
  • Quads – Computer art based on quadtrees.
  • scikit-image – A Python library for (scientific) image processing.
  • thumbor – A smart imaging service. It enables on-demand crop, re-sizing and flipping of images.
  • wand – Python bindings for MagickWand, C API for ImageMagick.

实施方案

Python的实现

  • CLPython – Implementation of the Python programming language written in Common Lisp.
  • CPythonDefault, most widely used implementation of the Python programming language written in C.
  • Cython – Optimizing Static Compiler for Python.
  • Grumpy – More compiler than interpreter as more powerful CPython2.7 replacement (alpha).
  • IronPython – Implementation of the Python programming language written in C#.
  • Jython – Implementation of Python programming language written in Java for the JVM.
  • MicroPython – A lean and efficient Python programming language implementation.
  • Numba – Python JIT compiler to LLVM aimed at scientific Python.
  • PeachPy – x86-64 assembler embedded in Python.
  • Pyjion – A JIT for Python based upon CoreCLR.
  • PyPy – A very fast and compliant implementation of the Python language.
  • Pyston – A Python implementation using JIT techniques.
  • Stackless Python – An enhanced version of the Python programming language.

交互式口译器

交互式Python解释器(REPL)

国际化

用于使用I18n的库

  • Babel – An internationalization library for Python.
  • PyICU – A wrapper of International Components for Unicode C++ library (ICU).

作业调度器

用于调度作业的库

  • Airflow – Airflow is a platform to programmatically author, schedule and monitor workflows.
  • APScheduler – A light but powerful in-process task scheduler that lets you schedule functions.
  • django-schedule – A calendaring app for Django.
  • doit – A task runner and build tool.
  • gunnery – Multipurpose task execution tool for distributed systems with web-based interface.
  • Joblib – A set of tools to provide lightweight pipelining in Python.
  • Plan – Writing crontab file in Python like a charm.
  • Prefect – A modern workflow orchestration framework that makes it easy to build, schedule and monitor robust data pipelines.
  • schedule – Python job scheduling for humans.
  • Spiff – A powerful workflow engine implemented in pure Python.
  • TaskFlow – A Python library that helps to make task execution easy, consistent and reliable.

日志记录

用于生成和使用日志的库

  • logbook – Logging replacement for Python.
  • logging – (Python standard library) Logging facility for Python.
  • loguru – Library which aims to bring enjoyable logging in Python.
  • sentry-python – Sentry SDK for Python.
  • structlog – Structured logging made easy.

机器学习

机器学习库。另见令人敬畏的机器学习

  • gym – A toolkit for developing and comparing reinforcement learning algorithms.
  • H2O – Open Source Fast Scalable Machine Learning Platform.
  • Metrics – Machine learning evaluation metrics.
  • NuPIC – Numenta Platform for Intelligent Computing.
  • scikit-learn – The most popular Python library for Machine Learning.
  • Spark MLApache Spark‘s scalable Machine Learning library.
  • vowpal_porpoise – A lightweight Python wrapper for Vowpal Wabbit.
  • xgboost – A scalable, portable, and distributed gradient boosting library.
  • MindsDB – MindsDB is an open source AI layer for existing databases that allows you to effortlessly develop, train and deploy state-of-the-art machine learning models using standard queries.

Microsoft Windows

Microsoft Windows上的Python编程

  • Python(x,y) – Scientific-applications-oriented Python Distribution based on Qt and Spyder.
  • pythonlibs – Unofficial Windows binaries for Python extension packages.
  • PythonNet – Python Integration with the .NET Common Language Runtime (CLR).
  • PyWin32 – Python Extensions for Windows.
  • WinPython – Portable development environment for Windows 7/8.

杂项

不属于上述类别的有用的库或工具

  • blinker – A fast Python in-process signal/event dispatching system.
  • boltons – A set of pure-Python utilities.
  • itsdangerous – Various helpers to pass trusted data to untrusted environments.
  • magenta – A tool to generate music and art using artificial intelligence.
  • pluginbase – A simple but flexible plugin system for Python.
  • tryton – A general purpose business framework.

自然语言处理

用于处理人类语言的库

  • General
    • gensim – Topic Modeling for Humans.
    • langid.py – Stand-alone language identification system.
    • nltk – A leading platform for building Python programs to work with human language data.
    • pattern – A web mining module.
    • polyglot – Natural language pipeline supporting hundreds of languages.
    • pytext – A natural language modeling framework based on PyTorch.
    • PyTorch-NLP – A toolkit enabling rapid deep learning NLP prototyping for research.
    • spacy – A library for industrial-strength natural language processing in Python and Cython.
    • Stanza – The Stanford NLP Group’s official Python library, supporting 60+ languages.
  • Chinese
    • funNLP – A collection of tools and datasets for Chinese NLP.
    • jieba – The most popular Chinese text segmentation library.
    • pkuseg-python – A toolkit for Chinese word segmentation in various domains.
    • snownlp – A library for processing Chinese text.

网络虚拟化

虚拟网络和SDN(软件定义网络)的工具和库

  • mininet – A popular network emulator and API written in Python.
  • napalm – Cross-vendor API to manipulate network devices.
  • pox – A Python-based SDN control applications, such as OpenFlow SDN controllers.

新闻提要

用于构建用户活动的库

ORM

实现对象关系映射或数据映射技术的库

  • Relational Databases
    • Django Models – The Django ORM.
    • SQLAlchemy – The Python SQL Toolkit and Object Relational Mapper.
    • dataset – Store Python dicts in a database – works with SQLite, MySQL, and PostgreSQL.
    • orator – The Orator ORM provides a simple yet beautiful ActiveRecord implementation.
    • orm – An async ORM.
    • peewee – A small, expressive ORM.
    • pony – ORM that provides a generator-oriented interface to SQL.
    • pydal – A pure Python Database Abstraction Layer.
  • NoSQL Databases
    • hot-redis – Rich Python data types for Redis.
    • mongoengine – A Python Object-Document-Mapper for working with MongoDB.
    • PynamoDB – A Pythonic interface for Amazon DynamoDB.
    • redisco – A Python Library for Simple Models and Containers Persisted in Redis.

套餐管理

用于包和依赖项管理的库

  • pip – The package installer for Python.
    • pip-tools – A set of tools to keep your pinned Python dependencies fresh.
    • PyPI
  • conda – Cross-platform, Python-agnostic binary package manager.
  • poetry – Python dependency management and packaging made easy.

包资料档案库

本地PyPI存储库服务器和代理

  • bandersnatch – PyPI mirroring tool provided by Python Packaging Authority (PyPA).
  • devpi – PyPI server and packaging/testing/release tool.
  • localshop – Local PyPI server (custom packages and auto-mirroring of pypi).
  • warehouse – Next generation Python Package Repository (PyPI).

渗透试验

渗透测试的框架和工具

  • fsociety – A Penetration testing framework.
  • setoolkit – A toolkit for social engineering.
  • sqlmap – Automatic SQL injection and database takeover tool.

权限

允许或拒绝用户访问数据或功能的库

  • django-guardian – Implementation of per object permissions for Django 1.2+
  • django-rules – A tiny but powerful app providing object-level permissions to Django, without requiring a database.

流程

用于启动操作系统进程并与其通信的库

推荐系统

用于构建推荐系统的库

  • annoy – Approximate Nearest Neighbors in C++/Python optimized for memory usage.
  • fastFM – A library for Factorization Machines.
  • implicit – A fast Python implementation of collaborative filtering for implicit datasets.
  • libffm – A library for Field-aware Factorization Machine (FFM).
  • lightfm – A Python implementation of a number of popular recommendation algorithms.
  • spotlight – Deep recommender models using PyTorch.
  • Surprise – A scikit for building and analyzing recommender systems.
  • tensorrec – A Recommendation Engine Framework in TensorFlow.

重构

Python的重构工具和库

  • Bicycle Repair Man – Bicycle Repair Man, a refactoring tool for Python.
  • Bowler – Safe code refactoring for modern Python.
  • Rope – Rope is a python refactoring library.

RESTful API

用于构建REST风格API的库

  • Django
  • Flask
    • eve – REST API framework powered by Flask, MongoDB and good intentions.
    • flask-api – Browsable Web APIs for Flask.
    • flask-restful – Quickly building REST APIs for Flask.
  • Pyramid
    • cornice – A RESTful framework for Pyramid.
  • Framework agnostic
    • apistar – A smart Web API framework, designed for Python 3.
    • falcon – A high-performance framework for building cloud APIs and web app backends.
    • fastapi – A modern, fast, web framework for building APIs with Python 3.6+ based on standard Python type hints.
    • hug – A Python 3 framework for cleanly exposing APIs.
    • sandman2 – Automated REST APIs for existing database-driven systems.
    • sanic – A Python 3.6+ web server and web framework that’s written to go fast.
    • vibora – Fast, efficient and asynchronous Web framework inspired by Flask.

机器人学

机器人图书馆

  • PythonRobotics – This is a compilation of various robotics algorithms with visualizations.
  • rospy – This is a library for ROS (Robot Operating System).

RPC服务器

与RPC兼容的服务器

  • RPyC (Remote Python Call) – A transparent and symmetric RPC library for Python
  • zeroRPC – zerorpc is a flexible RPC implementation based on ZeroMQ and MessagePack.

科学

用于科学计算的库。另请参阅面向科学家的Python

  • astropy – A community Python library for Astronomy.
  • bcbio-nextgen – Providing best-practice pipelines for fully automated high throughput sequencing analysis.
  • bccb – Collection of useful code related to biological analysis.
  • Biopython – Biopython is a set of freely available tools for biological computation.
  • cclib – A library for parsing and interpreting the results of computational chemistry packages.
  • Colour – Implementing a comprehensive number of colour theory transformations and algorithms.
  • Karate Club – Unsupervised machine learning toolbox for graph structured data.
  • NetworkX – A high-productivity software for complex networks.
  • NIPY – A collection of neuroimaging toolkits.
  • NumPy – A fundamental package for scientific computing with Python.
  • ObsPy – A Python toolbox for seismology.
  • Open Babel – A chemical toolbox designed to speak the many languages of chemical data.
  • PyDy – Short for Python Dynamics, used to assist with workflow in the modeling of dynamic motion.
  • PyMC – Markov Chain Monte Carlo sampling toolkit.
  • QuTiP – Quantum Toolbox in Python.
  • RDKit – Cheminformatics and Machine Learning Software.
  • SciPy – A Python-based ecosystem of open-source software for mathematics, science, and engineering.
  • SimPy – A process-based discrete-event simulation framework.
  • statsmodels – Statistical modeling and econometrics in Python.
  • SymPy – A Python library for symbolic mathematics.
  • Zipline – A Pythonic algorithmic trading library.

搜索

用于对数据进行索引和执行搜索查询的库和软件

序列化

用于序列化复杂数据类型的库

无服务器框架

用于开发无服务器Python代码的框架

  • python-lambda – A toolkit for developing and deploying Python code in AWS Lambda.
  • Zappa – A tool for deploying WSGI applications on AWS Lambda and API Gateway.

基于Python的Shell

  • xonsh – A Python-powered, cross-platform, Unix-gazing shell language and command prompt.

特定格式处理

用于解析和操作特定文本格式的库

  • General
    • tablib – A module for Tabular Datasets in XLS, CSV, JSON, YAML.
  • Office
    • docxtpl – Editing a docx document by jinja2 template
    • openpyxl – A library for reading and writing Excel 2010 xlsx/xlsm/xltx/xltm files.
    • pyexcel – Providing one API for reading, manipulating and writing csv, ods, xls, xlsx and xlsm files.
    • python-docx – Reads, queries and modifies Microsoft Word 2007/2008 docx files.
    • python-pptx – Python library for creating and updating PowerPoint (.pptx) files.
    • unoconv – Convert between any document format supported by LibreOffice/OpenOffice.
    • XlsxWriter – A Python module for creating Excel .xlsx files.
    • xlwings – A BSD-licensed library that makes it easy to call Python from Excel and vice versa.
    • xlwt / xlrd – Writing and reading data and formatting information from Excel files.
  • PDF
    • PDFMiner – A tool for extracting information from PDF documents.
    • PyPDF2 – A library capable of splitting, merging and transforming PDF pages.
    • ReportLab – Allowing Rapid creation of rich PDF documents.
  • Markdown
    • Mistune – Fastest and full featured pure Python parsers of Markdown.
    • Python-Markdown – A Python implementation of John Gruber’s Markdown.
  • YAML
    • PyYAML – YAML implementations for Python.
  • CSV
    • csvkit – Utilities for converting to and working with CSV.
  • Archive
    • unp – A command line tool that can unpack archives easily.

静电网站生成器

静电网站生成器是一个软件,它接受一些文本+模板作为输入,并在输出上生成Html文件

  • lektor – An easy to use static CMS and blog engine.
  • mkdocs – Markdown friendly documentation generator.
  • makesite – Simple, lightweight, and magic-free static site/blog generator (< 130 lines).
  • nikola – A static website and blog generator.
  • pelican – Static site generator that supports Markdown and reST syntax.

加标签

用于标记项目的库

任务队列

用于处理任务队列的库

  • celery – An asynchronous task queue/job queue based on distributed message passing.
  • dramatiq – A fast and reliable background task processing library for Python 3.
  • huey – Little multi-threaded task queue.
  • mrq – A distributed worker task queue in Python using Redis & gevent.
  • rq – Simple job queues for Python.

模板引擎

用于模板化和词法分析的库和工具

  • Genshi – Python templating toolkit for generation of web-aware output.
  • Jinja2 – A modern and designer friendly templating language.
  • Mako – Hyperfast and lightweight templating for the Python platform.

测试

用于测试代码库和生成测试数据的库

  • Testing Frameworks
    • hypothesis – Hypothesis is an advanced Quickcheck style property based testing library.
    • nose2 – The successor to nose, based on `unittest2.
    • pytest – A mature full-featured Python testing tool.
    • Robot Framework – A generic test automation framework.
    • unittest – (Python standard library) Unit testing framework.
  • Test Runners
    • green – A clean, colorful test runner.
    • mamba – The definitive testing tool for Python. Born under the banner of BDD.
    • tox – Auto builds and tests distributions in multiple Python versions
  • GUI / Web Testing
    • locust – Scalable user load testing tool written in Python.
    • PyAutoGUI – PyAutoGUI is a cross-platform GUI automation Python module for human beings.
    • Schemathesis – A tool for automatic property-based testing of web applications built with Open API / Swagger specifications.
    • Selenium – Python bindings for Selenium WebDriver.
    • sixpack – A language-agnostic A/B Testing framework.
    • splinter – Open source tool for testing web applications.
  • Mock
    • doublex – Powerful test doubles framework for Python.
    • freezegun – Travel through time by mocking the datetime module.
    • httmock – A mocking library for requests for Python 2.6+ and 3.2+.
    • httpretty – HTTP request mock tool for Python.
    • mock – (Python standard library) A mocking and patching library.
    • mocket – A socket mock framework with gevent/asyncio/SSL support.
    • responses – A utility library for mocking out the requests Python library.
    • VCR.py – Record and replay HTTP interactions on your tests.
  • Object Factories
    • factory_boy – A test fixtures replacement for Python.
    • mixer – Another fixtures replacement. Supports Django, Flask, SQLAlchemy, Peewee and etc.
    • model_mommy – Creating random fixtures for testing in Django.
  • Code Coverage
    • coverage – Code coverage measurement.
  • Fake Data
    • fake2db – Fake database generator.
    • faker – A Python package that generates fake data.
    • mimesis – is a Python library that help you generate fake data.
    • radar – Generate random datetime / time.

文本处理

用于解析和操作纯文本的库

  • General
    • chardet – Python 2/3 compatible character encoding detector.
    • difflib – (Python standard library) Helpers for computing deltas.
    • ftfy – Makes Unicode text less broken and more consistent automagically.
    • fuzzywuzzy – Fuzzy String Matching.
    • Levenshtein – Fast computation of Levenshtein distance and string similarity.
    • pangu.py – Paranoid text spacing.
    • pyfiglet – An implementation of figlet written in Python.
    • pypinyin – Convert Chinese hanzi (漢字) to pinyin (拼音).
    • textdistance – Compute distance between sequences with 30+ algorithms.
    • unidecode – ASCII transliterations of Unicode text.
  • Slugify
    • awesome-slugify – A Python slugify library that can preserve unicode.
    • python-slugify – A Python slugify library that translates unicode to ASCII.
    • unicode-slugify – A slugifier that generates unicode slugs with Django as a dependency.
  • Unique identifiers
    • hashids – Implementation of hashids in Python.
    • shortuuid – A generator library for concise, unambiguous and URL-safe UUIDs.
  • Parser
    • ply – Implementation of lex and yacc parsing tools for Python.
    • pygments – A generic syntax highlighter.
    • pyparsing – A general purpose framework for generating parsers.
    • python-nameparser – Parsing human names into their individual components.
    • python-phonenumbers – Parsing, formatting, storing and validating international phone numbers.
    • python-user-agents – Browser user agent parser.
    • sqlparse – A non-validating SQL parser.

第三方接口

用于访问第三方服务API的库。另请参阅Python API包装器和库列表

URL操作

用于解析URL的库

  • furl – A small Python library that makes parsing and manipulating URLs easy.
  • purl – A simple, immutable URL class with a clean API for interrogation and manipulation.
  • pyshorteners – A pure Python URL shortening lib.
  • webargs – A friendly library for parsing HTTP request arguments with built-in support for popular web frameworks.

视频

用于操作视频和GIF的库

  • moviepy – A module for script-based movie editing with many formats, including animated GIFs.
  • scikit-video – Video processing routines for SciPy.
  • vidgear – Most Powerful multi-threaded Video Processing framework.

Web资产管理

用于管理、压缩和缩小网站资产的工具

  • django-compressor – Compresses linked and inline JavaScript or CSS into a single cached file.
  • django-pipeline – An asset packaging library for Django.
  • django-storages – A collection of custom storage back ends for Django.
  • fanstatic – Packages, optimizes, and serves static file dependencies as Python packages.
  • fileconveyor – A daemon to detect and sync files to CDNs, S3 and FTP.
  • flask-assets – Helps you integrate webassets into your Flask app.
  • webassets – Bundles, optimizes, and manages unique cache-busting URLs for static resources.

Web内容提取

用于提取Web内容的库

  • html2text – Convert HTML to Markdown-formatted text.
  • lassie – Web Content Retrieval for Humans.
  • micawber – A small library for extracting rich content from URLs.
  • newspaper – News extraction, article extraction and content curation in Python.
  • python-readability – Fast Python port of arc90’s readability tool.
  • requests-html – Pythonic HTML Parsing for Humans.
  • sumy – A module for automatic summarization of text documents and HTML pages.
  • textract – Extract text from any document, Word, PowerPoint, PDFs, etc.
  • toapi – Every web site provides APIs.

网络爬行

用于自动执行Web抓取的库

  • cola – A distributed crawling framework.
  • feedparser – Universal feed parser.
  • grab – Site scraping framework.
  • MechanicalSoup – A Python library for automating interaction with websites.
  • portia – Visual scraping for Scrapy.
  • pyspider – A powerful spider system.
  • robobrowser – A simple, Pythonic library for browsing the web without a standalone web browser.
  • scrapy – A fast high-level screen scraping and web crawling framework.

Web框架

传统的全栈Web框架。另请参阅REST风格的API

WebSocket

用于使用WebSocket的库

  • autobahn-python – WebSocket & WAMP for Python on Twisted and asyncio.
  • channels – Developer-friendly asynchrony for Django.
  • websockets – A library for building WebSocket servers and clients with a focus on correctness and simplicity.

WSGI服务器

与WSGI兼容的Web服务器

  • bjoern – Asynchronous, very fast and written in C.
  • gunicorn – Pre-forked, ported from Ruby’s Unicorn project.
  • uWSGI – A project aims at developing a full stack for building hosting services, written in C.
  • waitress – Multi-threaded, powers Pyramid.
  • werkzeug – A WSGI utility library for Python that powers Flask and can easily be embedded into your own projects.

Resources

在何处查找学习资源或新的Python库

书籍

网站

时事通讯

播客

Contributing

我们永远欢迎您的贡献!请先看一下投稿指南。

如果我不确定这些库是否很棒,我会保留一些拉取请求,您可以通过向它们添加:+1:来投票支持它们。当票数达到20时,拉取请求将被合并


如果您对这份固执己见的列表有任何疑问,请不要犹豫,请在Twitter上联系我@VintaChen,或者在GitHub上打开一个问题