分类目录归档:知识问答

如何从集合中检索元素而不删除它?

问题:如何从集合中检索元素而不删除它?

假设以下内容:

>>> s = set([1, 2, 3])

如何获得的值(任意值)出来s而不做s.pop()?我想将该项目保留在集合中,直到我确定可以删除它为止-只有在异步调用另一个主机后才能确定。

快速又肮脏:

>>> elem = s.pop()
>>> s.add(elem)

但是您知道更好的方法吗?理想的情况是恒定时间。

Suppose the following:

>>> s = set([1, 2, 3])

How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it – something I can only be sure of after an asynchronous call to another host.

Quick and dirty:

>>> elem = s.pop()
>>> s.add(elem)

But do you know of a better way? Ideally in constant time.


回答 0

不需要复制整个集合的两个选项:

for e in s:
    break
# e is now an element from s

要么…

e = next(iter(s))

但是总的来说,集合不支持索引或切片。

Two options that don’t require copying the whole set:

for e in s:
    break
# e is now an element from s

Or…

e = next(iter(s))

But in general, sets don’t support indexing or slicing.


回答 1

最少的代码是:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

显然,这将创建一个新列表,其中包含该集合的每个成员,因此如果您的集合很大,就不会很大。

Least code would be:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

Obviously this would create a new list which contains each member of the set, so not great if your set is very large.


回答 2

我想知道这些功能如何在不同的集合上执行,所以我做了一个基准测试:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

在此处输入图片说明

该图清楚地表明,一些方法(RandomSampleSetUnpackingListIndex)依赖于集的大小,并应在一般情况下可避免(至少在性能可能是重要的)。正如其他答案已经显示的那样,最快的方法是ForLoop

但是,只要使用恒定时间方法之一,性能差异就可以忽略不计。


iteration_utilities(免责声明:我是作者)包含此用例的便捷功能first

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

我也将其包含在上述基准中。它可以与其他两个“快速”解决方案竞争,但是两者之间的差异并不大。

I wondered how the functions will perform for different sets, so I did a benchmark:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

enter image description here

This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.

However as long as one of the constant time approaches is used the performance difference will be negligible.


iteration_utilities (Disclaimer: I’m the author) contains a convenience function for this use-case: first:

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

I also included it in the benchmark above. It can compete with the other two “fast” solutions but the difference isn’t much either way.


回答 3

tl; dr

for first_item in muh_set: break仍然是Python 3.x中的最佳方法。诅咒你,圭多。

do这样做

欢迎使用从wr推断出来的另一组Python 3.x计时出色的特定于Python 2.x的响应。与AChampion同样有用的特定于Python 3.x的响应不同,下面的时间安排建议了上面提到的时间异常解决方案,包括:

欢乐代码段

开启,收听,定时:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

快速过时的计时

看哪!按最快到最慢的片段排序:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

全家人的面部植物

毫不奇怪,手动迭代的速度至少是下一个最快解决方案的两倍。尽管与Bad Old Python 2.x相比(以前的手动迭代速度至少快四倍),差距有所减小,但PPE 20狂热者对我而言最冗长的解决方案是最好的解决方案感到失望。至少将一个集转换为一个列表以仅提取该集的第一个元素是预期的那样可怕。感谢Guido,愿他的光芒继续指导我们。

令人惊讶的是,基于RNG的解决方案绝对可怕。列表转换不好,但random 确实会带来糟糕的结果。对于随机数上帝来说如此之多。

我只是希望那些无定形set.get_first()的人已经为我们准备了一种方法。如果您正在阅读本文,他们:“请。做点什么。”

tl;dr

for first_item in muh_set: break remains the optimal approach in Python 3.x. Curse you, Guido.

y u do this

Welcome to yet another set of Python 3.x timings, extrapolated from wr.‘s excellent Python 2.x-specific response. Unlike AChampion‘s equally helpful Python 3.x-specific response, the timings below also time outlier solutions suggested above – including:

Code Snippets for Great Joy

Turn on, tune in, time it:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Quickly Obsoleted Timeless Timings

Behold! Ordered by fastest to slowest snippets:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

Faceplants for the Whole Family

Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.

Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but random really takes the awful-sauce cake. So much for the Random Number God.

I just wish the amorphous They would PEP up a set.get_first() method for us already. If you’re reading this, They: “Please. Do something.”


回答 4

为了提供不同方法背后的一些时序图,请考虑以下代码。 get()是我自定义添加到Python的setobject.c中,只是一个pop()而没有删除该元素。

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

输出为:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

这意味着for / break解决方案是最快的(有时比自定义get()解决方案更快)。

To provide some timing figures behind the different approaches, consider the following code. The get() is my custom addition to Python’s setobject.c, being just a pop() without removing the element.

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

The output is:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).


回答 5

由于您需要一个随机元素,因此也可以使用:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

该文档似乎并未提及的性能random.sample。从包含大量列表和大量集合的非常快速的实证检验来看,列表似乎是恒定时间,但集合却并非如此。同样,对集合的迭代也不是随机的。顺序不确定,但可以预测:

>>> list(set(range(10))) == range(10)
True 

如果随机性很重要,并且您需要在恒定时间内(大量集合)使用一堆元素,那么我会先使用random.sample并转换为列表:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time

Since you want a random element, this will also work:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

The documentation doesn’t seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn’t random; the order is undefined but predictable:

>>> list(set(range(10))) == range(10)
True 

If randomness is important and you need a bunch of elements in constant time (large sets), I’d use random.sample and convert to a list first:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time

回答 6

似乎是最紧凑的(6个符号),但是获得集合元素的方法很慢(由PEP 3132实现):

e,*_=s

在Python 3.5+中,您还可以使用以下7符号表达式(感谢PEP 448):

[*s][0]

这两种选择在我的机器上都比for循环方法慢大约1000倍。

Seemingly the most compact (6 symbols) though very slow way to get a set element (made possible by PEP 3132):

e,*_=s

With Python 3.5+ you can also use this 7-symbol expression (thanks to PEP 448):

[*s][0]

Both options are roughly 1000 times slower on my machine than the for-loop method.


回答 7

我使用编写的实用程序函数。它的名称有点误导,因为它暗示它可能是随机物品或类似物品。

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None

I use a utility function I wrote. Its name is somewhat misleading because it kind of implies it might be a random item or something like that.

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None

回答 8

在@wr之后。帖子,我得到类似的结果(对于Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

输出:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

但是,在更改基础集(例如,调用remove())时,对于可迭代的示例(foriter)来说效果很差:

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

结果是:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272

Following @wr. post, I get similar results (for Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Output:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

However, when changing the underlying set (e.g. call to remove()) things go badly for the iterable examples (for, iter):

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Results in:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272

回答 9

我通常对小型馆藏所做的是创建这样的解析器/转换器方法

def convertSetToList(setName):
return list(setName)

然后我可以使用新列表并按索引号访问

userFields = convertSetToList(user)
name = request.json[userFields[0]]

作为列表,您将拥有可能需要使用的所有其他方法

What I usually do for small collections is to create kind of parser/converter method like this

def convertSetToList(setName):
return list(setName)

Then I can use the new list and access by index number

userFields = convertSetToList(user)
name = request.json[userFields[0]]

As a list you will have all the other methods that you may need to work with


回答 10

怎么s.copy().pop()样 我尚未计时,但它应该可以正常工作,而且很简单。但是,它适用于小型集,因为它可以复制整个集。

How about s.copy().pop()? I haven’t timed it, but it should work and it’s simple. It works best for small sets however, as it copies the whole set.


回答 11

另一种选择是将字典与不需要的值一起使用。例如,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

您可以将键视为一组,但它们只是一个数组:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

这种选择的副作用是您的代码将与较旧set的Python 早期版本向后兼容。这也许不是最佳答案,但这是另一种选择。

编辑:您甚至可以执行以下操作来隐藏使用字典而不是数组或集合的事实:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

Another option is to use a dictionary with values you don’t care about. E.g.,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

You can treat the keys as a set except that they’re just an array:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

A side effect of this choice is that your code will be backwards compatible with older, pre-set versions of Python. It’s maybe not the best answer but it’s another option.

Edit: You can even do something like this to hide the fact that you used a dict instead of an array or set:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

查找Python对象具有的方法

问题:查找Python对象具有的方法

给定任何种类的Python对象,是否有一种简单的方法来获取该对象具有的所有方法的列表?

要么,

如果这不可能,那么除了简单地检查调用该方法时是否发生错误之外,是否至少有一种简单的方法来检查它是否具有特定的方法?

Given a Python object of any kind, is there an easy way to get the list of all methods that this object has?

Or,

if this is not possible, is there at least an easy way to check if it has a particular method other than simply checking if an error occurs when the method is called?


回答 0

对于许多对象,您可以使用以下代码,将“ object”替换为您感兴趣的对象:

object_methods = [method_name for method_name in dir(object)
                  if callable(getattr(object, method_name))]

我在diveintopython.net上发现了它(现已存档)。希望可以提供更多详细信息!

如果得到AttributeError,则可以改用

getattr(不能容忍熊猫风格的python3.6抽象虚拟子类。此代码与上面的代码相同,并且忽略异常。

import pandas as pd 
df = pd.DataFrame([[10, 20, 30], [100, 200, 300]],  
                  columns=['foo', 'bar', 'baz']) 
def get_methods(object, spacing=20): 
  methodList = [] 
  for method_name in dir(object): 
    try: 
        if callable(getattr(object, method_name)): 
            methodList.append(str(method_name)) 
    except: 
        methodList.append(str(method_name)) 
  processFunc = (lambda s: ' '.join(s.split())) or (lambda s: s) 
  for method in methodList: 
    try: 
        print(str(method.ljust(spacing)) + ' ' + 
              processFunc(str(getattr(object, method).__doc__)[0:90])) 
    except: 
        print(method.ljust(spacing) + ' ' + ' getattr() failed') 


get_methods(df['foo']) 

For many objects, you can use this code, replacing ‘object’ with the object you’re interested in:

object_methods = [method_name for method_name in dir(object)
                  if callable(getattr(object, method_name))]

I discovered it at diveintopython.net (Now archived). Hopefully, that should provide some further detail!

If you get an AttributeError, you can use this instead:

getattr( is intolerant of pandas style python3.6 abstract virtual sub-classes. This code does the same as above and ignores exceptions.

import pandas as pd 
df = pd.DataFrame([[10, 20, 30], [100, 200, 300]],  
                  columns=['foo', 'bar', 'baz']) 
def get_methods(object, spacing=20): 
  methodList = [] 
  for method_name in dir(object): 
    try: 
        if callable(getattr(object, method_name)): 
            methodList.append(str(method_name)) 
    except: 
        methodList.append(str(method_name)) 
  processFunc = (lambda s: ' '.join(s.split())) or (lambda s: s) 
  for method in methodList: 
    try: 
        print(str(method.ljust(spacing)) + ' ' + 
              processFunc(str(getattr(object, method).__doc__)[0:90])) 
    except: 
        print(method.ljust(spacing) + ' ' + ' getattr() failed') 


get_methods(df['foo']) 

回答 1

您可以使用内置dir()函数来获取模块具有的所有属性的列表。在命令行上尝试此操作以查看其工作原理。

>>> import moduleName
>>> dir(moduleName)

另外,您可以使用 hasattr(module_name, "attr_name")函数来查找模块是否具有特定属性。

有关更多信息,请参见Python自省指南

You can use the built in dir() function to get a list of all the attributes a module has. Try this at the command line to see how it works.

>>> import moduleName
>>> dir(moduleName)

Also, you can use the hasattr(module_name, "attr_name") function to find out if a module has a specific attribute.

See the Guide to Python introspection for more information.


回答 2

最简单的方法是使用dir(objectname)。它将显示该对象可用的所有方法。很酷的把戏。

The simplest method is to use dir(objectname). It will display all the methods available for that object. Cool trick.


回答 3

要检查它是否具有特定方法:

hasattr(object,"method")

To check if it has a particular method:

hasattr(object,"method")

回答 4

我相信您想要的是这样的:

来自对象的属性列表

以我的拙见,内置功能dir()可以为您完成这项工作。取自help(dir)Python Shell上的输出:

目录(…)

dir([object]) -> list of strings

如果不带参数调用,则返回当前作用域中的名称。

否则,返回按字母顺序排列的名称列表,该列表包含给定对象的(某些)属性以及从中可以访问的属性。

如果对象提供了一个名为的方法__dir__,则将使用该方法;否则,将使用默认的dir()逻辑并返回:

  • 对于模块对象:模块的属性。
  • 对于一个类对象:其属性,以及递归其基类的属性。
  • 对于任何其他对象:其属性,其类的属性,以及递归地为其类的基类的属性。

例如:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> a = "I am a string"
>>>
>>> type(a)
<class 'str'>
>>>
>>> dir(a)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__',
'__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'_formatter_field_name_split', '_formatter_parser', 'capitalize',
'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find',
'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper', 'zfill']

在检查您的问题时,我决定展示我的思路,并以更好的格式输出 dir()

dir_attributes.py(Python 2.7.6)

#!/usr/bin/python
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print "\nObject Data: %s" % obj
print "Object Type: %s\n" % type(obj)

for method in dir(obj):
    # the comma at the end of the print, makes it printing 
    # in the same line, 4 times (count)
    print "| {0: <20}".format(method),
    count += 1
    if count == 4:
        count = 0
        print

dir_attributes.py(Python 3.4.3)

#!/usr/bin/python3
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print("\nObject Data: ", obj)
print("Object Type: ", type(obj),"\n")

for method in dir(obj):
    # the end=" " at the end of the print statement, 
    # makes it printing in the same line, 4 times (count)
    print("|    {:20}".format(method), end=" ")
    count += 1
    if count == 4:
        count = 0
        print("")

希望我有所贡献:)。

I believe that what you want is something like this:

a list of attributes from an object

In my humble opinion, the built-in function dir() can do this job for you. Taken from help(dir) output on your Python Shell:

dir(…)

dir([object]) -> list of strings

If called without an argument, return the names in the current scope.

Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it.

If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns:

  • for a module object: the module’s attributes.
  • for a class object: its attributes, and recursively the attributes of its bases.
  • for any other object: its attributes, its class’s attributes, and recursively the attributes of its class’s base classes.

For example:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> a = "I am a string"
>>>
>>> type(a)
<class 'str'>
>>>
>>> dir(a)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__',
'__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'_formatter_field_name_split', '_formatter_parser', 'capitalize',
'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find',
'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper', 'zfill']

As I was checking your issue, I decided to demonstrate my train of thought, with a better formatting of the output of dir().

dir_attributes.py (Python 2.7.6)

#!/usr/bin/python
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print "\nObject Data: %s" % obj
print "Object Type: %s\n" % type(obj)

for method in dir(obj):
    # the comma at the end of the print, makes it printing 
    # in the same line, 4 times (count)
    print "| {0: <20}".format(method),
    count += 1
    if count == 4:
        count = 0
        print

dir_attributes.py (Python 3.4.3)

#!/usr/bin/python3
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print("\nObject Data: ", obj)
print("Object Type: ", type(obj),"\n")

for method in dir(obj):
    # the end=" " at the end of the print statement, 
    # makes it printing in the same line, 4 times (count)
    print("|    {:20}".format(method), end=" ")
    count += 1
    if count == 4:
        count = 0
        print("")

Hope that I have contributed :).


回答 5

除了更直接的答案外,如果我不提及iPython,我也会不知所措。点击“选项卡”以查看具有自动补全功能的可用方法。

找到方法后,请尝试:

help(object.method) 

查看pydocs,方法签名等。

啊… REPL

On top of the more direct answers, I’d be remiss if I didn’t mention iPython. Hit ‘tab’ to see the available methods, with autocompletion.

And once you’ve found a method, try:

help(object.method) 

to see the pydocs, method signature, etc.

Ahh… REPL.


回答 6

如果您特别想要方法,则应使用inspect.ismethod

对于方法名称:

import inspect
method_names = [attr for attr in dir(self) if inspect.ismethod(getattr(self, attr))]

对于方法本身:

import inspect
methods = [member for member in [getattr(self, attr) for attr in dir(self)] if inspect.ismethod(member)]

有时inspect.isroutine也可能有用(对于内置,C扩展,不带“绑定”编译器指令的Cython)。

If you specifically want methods, you should use inspect.ismethod.

For method names:

import inspect
method_names = [attr for attr in dir(self) if inspect.ismethod(getattr(self, attr))]

For the methods themselves:

import inspect
methods = [member for member in [getattr(self, attr) for attr in dir(self)] if inspect.ismethod(member)]

Sometimes inspect.isroutine can be useful too (for built-ins, C extensions, Cython without the “binding” compiler directive).


回答 7

打开bash shell(在Ubuntu上为ctrl + alt + T)。在其中启动python3 shell。创建对象以观察方法。只需在其后添加一个点,然后按两次“ tab”,您将看到类似的内容:

 user@note:~$ python3
 Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
 [GCC 4.8.4] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import readline
 >>> readline.parse_and_bind("tab: complete")
 >>> s = "Any object. Now it's a string"
 >>> s. # here tab should be pressed twice
 s.__add__(           s.__rmod__(          s.istitle(
 s.__class__(         s.__rmul__(          s.isupper(
 s.__contains__(      s.__setattr__(       s.join(
 s.__delattr__(       s.__sizeof__(        s.ljust(
 s.__dir__(           s.__str__(           s.lower(
 s.__doc__            s.__subclasshook__(  s.lstrip(
 s.__eq__(            s.capitalize(        s.maketrans(
 s.__format__(        s.casefold(          s.partition(
 s.__ge__(            s.center(            s.replace(
 s.__getattribute__(  s.count(             s.rfind(
 s.__getitem__(       s.encode(            s.rindex(
 s.__getnewargs__(    s.endswith(          s.rjust(
 s.__gt__(            s.expandtabs(        s.rpartition(
 s.__hash__(          s.find(              s.rsplit(
 s.__init__(          s.format(            s.rstrip(
 s.__iter__(          s.format_map(        s.split(
 s.__le__(            s.index(             s.splitlines(
 s.__len__(           s.isalnum(           s.startswith(
 s.__lt__(            s.isalpha(           s.strip(
 s.__mod__(           s.isdecimal(         s.swapcase(
 s.__mul__(           s.isdigit(           s.title(
 s.__ne__(            s.isidentifier(      s.translate(
 s.__new__(           s.islower(           s.upper(
 s.__reduce__(        s.isnumeric(         s.zfill(
 s.__reduce_ex__(     s.isprintable(       
 s.__repr__(          s.isspace(           

Open bash shell (ctrl+alt+T on Ubuntu). Start python3 shell in it. Create object to observe methods of. Just add a dot after it and press twice “tab” and you’ll see something like that:

 user@note:~$ python3
 Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
 [GCC 4.8.4] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import readline
 >>> readline.parse_and_bind("tab: complete")
 >>> s = "Any object. Now it's a string"
 >>> s. # here tab should be pressed twice
 s.__add__(           s.__rmod__(          s.istitle(
 s.__class__(         s.__rmul__(          s.isupper(
 s.__contains__(      s.__setattr__(       s.join(
 s.__delattr__(       s.__sizeof__(        s.ljust(
 s.__dir__(           s.__str__(           s.lower(
 s.__doc__            s.__subclasshook__(  s.lstrip(
 s.__eq__(            s.capitalize(        s.maketrans(
 s.__format__(        s.casefold(          s.partition(
 s.__ge__(            s.center(            s.replace(
 s.__getattribute__(  s.count(             s.rfind(
 s.__getitem__(       s.encode(            s.rindex(
 s.__getnewargs__(    s.endswith(          s.rjust(
 s.__gt__(            s.expandtabs(        s.rpartition(
 s.__hash__(          s.find(              s.rsplit(
 s.__init__(          s.format(            s.rstrip(
 s.__iter__(          s.format_map(        s.split(
 s.__le__(            s.index(             s.splitlines(
 s.__len__(           s.isalnum(           s.startswith(
 s.__lt__(            s.isalpha(           s.strip(
 s.__mod__(           s.isdecimal(         s.swapcase(
 s.__mul__(           s.isdigit(           s.title(
 s.__ne__(            s.isidentifier(      s.translate(
 s.__new__(           s.islower(           s.upper(
 s.__reduce__(        s.isnumeric(         s.zfill(
 s.__reduce_ex__(     s.isprintable(       
 s.__repr__(          s.isspace(           

回答 8

此处指出的所有方法的问题在于,您不能确定某个方法不存在。

在Python可以拦截点主叫通__getattr____getattribute__,从而可以在“运行时”创建方法

范例:

class MoreMethod(object):
    def some_method(self, x):
        return x
    def __getattr__(self, *args):
        return lambda x: x*2

如果执行它,则可以调用对象字典中不存在的方法…

>>> o = MoreMethod()
>>> o.some_method(5)
5
>>> dir(o)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'some_method']
>>> o.i_dont_care_of_the_name(5)
10

这就是为什么您在Python中使用Easier而不是权限范式来请求宽恕

The problem with all methods indicated here is that you CAN’T be sure that a method doesn’t exist.

In Python you can intercept the dot calling thru __getattr__ and __getattribute__, making it possible to create method “at runtime”

Exemple:

class MoreMethod(object):
    def some_method(self, x):
        return x
    def __getattr__(self, *args):
        return lambda x: x*2

If you execute it, you can call method non existing in the object dictionary…

>>> o = MoreMethod()
>>> o.some_method(5)
5
>>> dir(o)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'some_method']
>>> o.i_dont_care_of_the_name(5)
10

And it’s why you use the Easier to ask for forgiveness than permission paradigms in Python.


回答 9

获取任何对象的方法列表的最简单方法是使用help()命令。

%help(object)

它将列出与该对象关联的所有可用/重要方法。

例如:

help(str)

The simplest way to get list of methods of any object is to use help() command.

%help(object)

It will list out all the available/important methods associated with that object.

For example:

help(str)

回答 10

import moduleName
for x in dir(moduleName):
print(x)

这应该工作:)

import moduleName
for x in dir(moduleName):
print(x)

This should work :)


回答 11

可以创建一个getAttrs函数,该函数将返回对象的可调用属性名称

def getAttrs(object):
  return filter(lambda m: callable(getattr(object, m)), dir(object))

print getAttrs('Foo bar'.split(' '))

那会回来

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
 '__delslice__', '__eq__', '__format__', '__ge__', '__getattribute__', 
 '__getitem__', '__getslice__', '__gt__', '__iadd__', '__imul__', '__init__', 
 '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', 
 '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', 
 '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', 
 '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 
 'remove', 'reverse', 'sort']

One can create a getAttrs function that will return an object’s callable property names

def getAttrs(object):
  return filter(lambda m: callable(getattr(object, m)), dir(object))

print getAttrs('Foo bar'.split(' '))

That’d return

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
 '__delslice__', '__eq__', '__format__', '__ge__', '__getattribute__', 
 '__getitem__', '__getslice__', '__gt__', '__iadd__', '__imul__', '__init__', 
 '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', 
 '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', 
 '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', 
 '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 
 'remove', 'reverse', 'sort']

回答 12

没有可靠的方法来列出所有对象的方法。dir(object)通常是有用的,但是在某些情况下,它可能不会列出所有方法。根据dir()文档“使用参数尝试返回该对象的有效属性列表。”

callable(getattr(object, method))如此处已经提到的,可以检查该方法是否存在。

There is no reliable way to list all object’s methods. dir(object) is usually useful, but in some cases it may not list all methods. According to dir() documentation: “With an argument, attempt to return a list of valid attributes for that object.”

Checking that method exists can be done by callable(getattr(object, method)) as already mentioned there.


回答 13

以清单为对象

obj = []

list(filter(lambda x:callable(getattr(obj,x)),obj.__dir__()))

你得到:

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Take a list as an object

obj = []

list(filter(lambda x:callable(getattr(obj,x)),obj.__dir__()))

You get:

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

回答 14

…至少有一种简单的方法可以检查它是否具有特定的方法,而不仅仅是在调用该方法时检查是否发生错误

虽然“ 比请求更容易获得宽恕 ”无疑是Python方式,但您正在寻找的可能是:

d={'foo':'bar', 'spam':'eggs'}
if 'get' in dir(d):
    d.get('foo')
# OUT: 'bar'

…is there at least an easy way to check if it has a particular method other than simply checking if an error occurs when the method is called

While “Easier to ask for forgiveness than permission” is certainly the Pythonic way, what you are looking for maybe:

d={'foo':'bar', 'spam':'eggs'}
if 'get' in dir(d):
    d.get('foo')
# OUT: 'bar'

回答 15

为了在整个模块中搜索特定方法

for method in dir(module) :
  if "keyword_of_methode" in method :
   print(method, end="\n")

In order to search for a specific method in a whole module

for method in dir(module) :
  if "keyword_of_methode" in method :
   print(method, end="\n")

回答 16

例如,如果您使用的是shell plus,则可以改用以下命令:

>> MyObject??

这样,用“ ??” 在对象之后,它将向您显示该类具有的所有属性/方法。

If you are, for instance, using shell plus you can use this instead:

>> MyObject??

that way, with the ‘??’ just after your object, it’ll show you all the attributes/methods the class has.


为什么某些函数在函数名称前后都有下划线“ __”?

问题:为什么某些函数在函数名称前后都有下划线“ __”?

这种“强调”似乎经常发生,我想知道这是否是Python语言中的要求,还是仅仅是出于约定?

另外,有人可以说出并解释哪些函数倾向于带有下划线,以及为什么(__init__例如)?

This “underscoring” seems to occur a lot, and I was wondering if this was a requirement in the Python language, or merely a matter of convention?

Also, could someone name and explain which functions tend to have the underscores, and why (__init__, for instance)?


回答 0

Python PEP 8-Python代码样式指南

描述性:命名样式

可以识别以下使用前划线或后划线的特殊形式(通常可以将它们与任何大小写惯例结合使用):

  • _single_leading_underscore:“内部使用”指示器较弱。例如from M import *,不导入名称以下划线开头的对象。

  • single_trailing_underscore_:按惯例用于避免与Python关键字发生冲突,例如

    Tkinter.Toplevel(master, class_='ClassName')

  • __double_leading_underscore:在命名类属性时,调用名称修饰(在类FooBar内部,__boo变为_FooBar__boo;见下文)。

  • __double_leading_and_trailing_underscore__:位于用户控制的命名空间中的“魔术”对象或属性。例如__init____import____file__。请勿发明此类名称;仅按记录使用它们。

请注意,带有双引号和尾部下划线的名称本质上是为Python本身保留的:“切勿发明此类名称;仅将其用作文档”。

From the Python PEP 8 — Style Guide for Python Code:

Descriptive: Naming Styles

The following special forms using leading or trailing underscores are recognized (these can generally be combined with any case convention):

  • _single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore.

  • single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g.

    Tkinter.Toplevel(master, class_='ClassName')

  • __double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

  • __double_leading_and_trailing_underscore__: “magic” objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.

Note that names with double leading and trailing underscores are essentially reserved for Python itself: “Never invent such names; only use them as documented”.


回答 1

其他受访者在将双下划线和下划线作为“特殊”或“魔术”方法的命名惯例进行描述时是正确的。

尽管您可以直接调用这些方法([10, 20].__len__()例如),但是下划线的存在暗示这些方法旨在间接调用(len([10, 20])例如)。大多数python运算符都有一个关联的“魔术”方法(例如,这a[x]是调用的常用方法a.__getitem__(x))。

The other respondents are correct in describing the double leading and trailing underscores as a naming convention for “special” or “magic” methods.

While you can call these methods directly ([10, 20].__len__() for example), the presence of the underscores is a hint that these methods are intended to be invoked indirectly (len([10, 20]) for example). Most python operators have an associated “magic” method (for example, a[x] is the usual way of invoking a.__getitem__(x)).


回答 2

带有双下划线的名称是Python的“特殊”名称。它们在Python语言参考的第3节“数据模型”中列出。

Names surrounded by double underscores are “special” to Python. They’re listed in the Python Language Reference, section 3, “Data model”.


回答 3

实际上,当需要在父类和子类名称之间进行区分时,我使用_方法名称。我已经阅读了一些使用这种方法创建父子类的代码。例如,我可以提供以下代码:

class ThreadableMixin:
   def start_worker(self):
       threading.Thread(target=self.worker).start()

   def worker(self):
      try:
        self._worker()
    except tornado.web.HTTPError, e:
        self.set_status(e.status_code)
    except:
        logging.error("_worker problem", exc_info=True)
        self.set_status(500)
    tornado.ioloop.IOLoop.instance().add_callback(self.async_callback(self.results))

和具有_worker方法的孩子

class Handler(tornado.web.RequestHandler, ThreadableMixin):
   def _worker(self):
      self.res = self.render_string("template.html",
        title = _("Title"),
        data = self.application.db.query("select ... where object_id=%s", self.object_id)
    )

Actually I use _ method names when I need to differ between parent and child class names. I’ve read some codes that used this way of creating parent-child classes. As an example I can provide this code:

class ThreadableMixin:
   def start_worker(self):
       threading.Thread(target=self.worker).start()

   def worker(self):
      try:
        self._worker()
    except tornado.web.HTTPError, e:
        self.set_status(e.status_code)
    except:
        logging.error("_worker problem", exc_info=True)
        self.set_status(500)
    tornado.ioloop.IOLoop.instance().add_callback(self.async_callback(self.results))

and the child that have a _worker method

class Handler(tornado.web.RequestHandler, ThreadableMixin):
   def _worker(self):
      self.res = self.render_string("template.html",
        title = _("Title"),
        data = self.application.db.query("select ... where object_id=%s", self.object_id)
    )


回答 4

此约定用于诸如__init__和的特殊变量或方法(所谓的“魔术方法”)__len__。这些方法提供特殊的语法功能或执行特殊的操作。

例如,__file__表示__eq__执行a == b表达式时执行的Python文件的位置。

用户当然可以制作一个自定义的特殊方法,这种情况很少见,但是通常可能会修改一些内置的特殊方法(例如,您应该使用该类来初始化类,该类__init__将在类的实例首先执行时初始化)被建造)。

class A:
    def __init__(self, a):  # use special method '__init__' for initializing
        self.a = a
    def __custom__(self):  # custom special method. you might almost do not use it
        pass

This convention is used for special variables or methods (so-called “magic method”) such as __init__ and __len__. These methods provides special syntactic features or do special things.

For example, __file__ indicates the location of Python file, __eq__ is executed when a == b expression is executed.

A user of course can make a custom special method, which is a very rare case, but often might modify some of the built-in special methods (e.g. you should initialize the class with __init__ that will be executed at first when an instance of a class is created).

class A:
    def __init__(self, a):  # use special method '__init__' for initializing
        self.a = a
    def __custom__(self):  # custom special method. you might almost do not use it
        pass

回答 5

添加了一个示例来了解__在python中的用法。这是所有__的列表

https://docs.python.org/3/genindex-all.html#_

某些类别的标识符(除关键字外)具有特殊含义。在任何其他情况下,*名称的任何使用,如果未遵循明确记录的使用,均会在没有警告的情况下发生破损。

使用__的访问限制

"""
Identifiers:
-  Contain only (A-z, 0-9, and _ )
-  Start with a lowercase letter or _.
-  Single leading _ :  private
-  Double leading __ :  strong private
-  Start & End  __ : Language defined Special Name of Object/ Method
-  Class names start with an uppercase letter.
-

"""


class BankAccount(object):
    def __init__(self, name, money, password):
        self.name = name            # Public
        self._money = money         # Private : Package Level
        self.__password = password  # Super Private

    def earn_money(self, amount):
        self._money += amount
        print("Salary Received: ", amount, " Updated Balance is: ", self._money)

    def withdraw_money(self, amount):
        self._money -= amount
        print("Money Withdraw: ", amount, " Updated Balance is: ", self._money)

    def show_balance(self):
        print(" Current Balance is: ", self._money)


account = BankAccount("Hitesh", 1000, "PWD")  # Object Initalization

# Method Call
account.earn_money(100)

# Show Balance
print(account.show_balance())

print("PUBLIC ACCESS:", account.name)  # Public Access

# account._money is accessible because it is only hidden by convention
print("PROTECTED ACCESS:", account._money)  # Protected Access

# account.__password will throw error but account._BankAccount__password will not
# because __password is super private
print("PRIVATE ACCESS:", account._BankAccount__password)

# Method Call
account.withdraw_money(200)

# Show Balance
print(account.show_balance())

# account._money is accessible because it is only hidden by convention
print(account._money)  # Protected Access

Added an example to understand the use of __ in python. Here is the list of All __

https://docs.python.org/3/genindex-all.html#_

Certain classes of identifiers (besides keywords) have special meanings. Any use of * names, in any other context, that does not follow explicitly documented use, is subject to breakage without warning

Access restriction using __

"""
Identifiers:
-  Contain only (A-z, 0-9, and _ )
-  Start with a lowercase letter or _.
-  Single leading _ :  private
-  Double leading __ :  strong private
-  Start & End  __ : Language defined Special Name of Object/ Method
-  Class names start with an uppercase letter.
-

"""


class BankAccount(object):
    def __init__(self, name, money, password):
        self.name = name            # Public
        self._money = money         # Private : Package Level
        self.__password = password  # Super Private

    def earn_money(self, amount):
        self._money += amount
        print("Salary Received: ", amount, " Updated Balance is: ", self._money)

    def withdraw_money(self, amount):
        self._money -= amount
        print("Money Withdraw: ", amount, " Updated Balance is: ", self._money)

    def show_balance(self):
        print(" Current Balance is: ", self._money)


account = BankAccount("Hitesh", 1000, "PWD")  # Object Initalization

# Method Call
account.earn_money(100)

# Show Balance
print(account.show_balance())

print("PUBLIC ACCESS:", account.name)  # Public Access

# account._money is accessible because it is only hidden by convention
print("PROTECTED ACCESS:", account._money)  # Protected Access

# account.__password will throw error but account._BankAccount__password will not
# because __password is super private
print("PRIVATE ACCESS:", account._BankAccount__password)

# Method Call
account.withdraw_money(200)

# Show Balance
print(account.show_balance())

# account._money is accessible because it is only hidden by convention
print(account._money)  # Protected Access

用Python漂亮地打印XML

问题:用Python漂亮地打印XML

在Python中漂亮地打印XML的最佳方法(或多种方法)是什么?

What is the best way (or are the various ways) to pretty print XML in Python?


回答 0

import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()

回答 1

lxml是最新的,更新的,并且包含漂亮的打印功能

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

查看lxml教程:http : //lxml.de/tutorial.html

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

Check out the lxml tutorial: http://lxml.de/tutorial.html


回答 2

另一个解决方案是借用indent函数,以与自2.5以来内置在Python中的ElementTree库一起使用。如下所示:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)

Another solution is to borrow this indent function, for use with the ElementTree library that’s built in to Python since 2.5. Here’s what that would look like:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)

回答 3

这是我的(hacky?)解决方案,用于解决丑陋的文本节点问题。

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

上面的代码将生成:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

代替这个:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

免责声明:可能存在一些限制。

Here’s my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.


回答 4

正如其他人指出的那样,lxml内置了一个漂亮的打印机。

请注意,尽管默认情况下它将CDATA节更改为普通文本,这可能会带来讨厌的结果。

这是一个Python函数,可保留输入文件,仅更改缩进(请注意strip_cdata=False)。此外,它确保输出使用UTF-8作为编码,而不是默认的ASCII(请注意encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

用法示例:

prettyPrintXml('some_folder/some_file.xml')

As others pointed out, lxml has a pretty printer built in.

Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.

Here’s a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

Example usage:

prettyPrintXml('some_folder/some_file.xml')

回答 5

BeautifulSoup有一个易于使用的prettify()方法。

每个缩进级别缩进一个空格。它比lxml的pretty_print好得多,而且又短又可爱。

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()

BeautifulSoup has a easy to use prettify() method.

It indents one space per indentation level. It works much better than lxml’s pretty_print and is short and sweet.

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()

回答 6

如果有的xmllint话,可以产生一个子流程并使用它。xmllint --format <file>漂亮地将其输入XML打印到标准输出。

请注意,此方法使用python外部的程序,这使其有点像hack。

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))

If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

Note that this method uses an program external to python, which makes it sort of a hack.

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))

回答 7

我尝试编辑上面的“ ade”答案,但是在最初匿名提供反馈后,Stack Overflow不允许我进行编辑。这是用于精巧打印ElementTree的函数的错误版本。

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '

I tried to edit “ade”s answer above, but Stack Overflow wouldn’t let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '

回答 8

如果您使用的是DOM实现,则每种都有自己的内置漂亮打印形式:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

如果您使用的其他东西没有它自己的漂亮打印机-或那些漂亮打印机没有按照您想要的方式做-您可能必须编写或继承自己的序列化器。

If you’re using a DOM implementation, each has their own form of pretty-printing built-in:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

If you’re using something else without its own pretty-printer — or those pretty-printers don’t quite do it the way you want —  you’d probably have to write or subclass your own serialiser.


回答 9

我对minidom的漂亮字体有一些疑问。每当我尝试用给定编码之外的字符漂亮地打印文档时,都会出现UnicodeError,例如,如果我在文档中有一个β并且尝试了doc.toprettyxml(encoding='latin-1')。这是我的解决方法:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')

I had some problems with minidom’s pretty print. I’d get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here’s my workaround for it:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')

回答 10

from yattag import indent

pretty_string = indent(ugly_string)

除非您要求使用以下命令,否则它不会在文本节点内添加空格或换行符:

indent(mystring, indent_text = True)

您可以指定缩进单位应该是什么以及换行符应该是什么样。

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

该文档位于http://www.yattag.org主页上。

from yattag import indent

pretty_string = indent(ugly_string)

It won’t add spaces or newlines inside text nodes, unless you ask for it with:

indent(mystring, indent_text = True)

You can specify what the indentation unit should be and what the newline should look like.

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

The doc is on http://www.yattag.org homepage.


回答 11

我编写了一个解决方案,以遍历现有的ElementTree并按照通常期望的那样使用文本/尾部缩进。

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings

I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings

回答 12

python的XML漂亮打印对于此任务看起来非常不错。(也应适当命名。)

一种替代方法是使用pyXML,它具有PrettyPrint功能

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.


回答 13

这是一个Python3解决方案,它摆脱了丑陋的换行符问题(大量空白),并且仅使用标准库,而不像大多数其他实现那样。

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

我在这里找到了解决常见换行问题的方法。

Here’s a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.


回答 14

您可以将流行的外部库xmltodict与一起使用unparsepretty=True您将获得最佳结果:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False反对<?xml version="1.0" encoding="UTF-8"?>在顶部。

You can use popular external library xmltodict, with unparse and pretty=True you will get best result:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.


回答 15

看一下vkbeautify模块。

这是我非常流行的javascript / nodejs插件的同名python版本。它可以漂亮地打印/最小化XML,JSON和CSS文本。输入和输出可以是字符串/文件的任意组合。它非常紧凑,没有任何依赖性。

例子

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file') 

Take a look at the vkbeautify module.

It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn’t have any dependency.

Examples:

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file') 

回答 16

如果您不想进行重新解析,则可以使用xmlpp.py库和该get_pprint()函数。在我的用例中,它工作得很好且流畅,而无需重新解析为lxml ElementTree对象。

An alternative if you don’t want to have to reparse, there is the xmlpp.py library with the get_pprint() function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.


回答 17

您可以尝试这种变化…

安装BeautifulSoup和后端lxml(解析器)库:

user$ pip3 install lxml bs4

处理您的XML文档:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())  

You can try this variation…

Install BeautifulSoup and the backend lxml (parser) libraries:

user$ pip3 install lxml bs4

Process your XML document:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())  

回答 18

我遇到了这个问题,并像这样解决了它:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

在我的代码中,此方法的调用方式如下:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

这仅是因为etree默认情况下会使用two spaces缩进,但我发现并不太强调缩进,因此效果不佳。我无法为etree设置任何设置或为任何函数更改标准etree缩进的参数。我喜欢使用etree多么容易,但这确实让我很烦。

I had this problem and solved it like this:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

In my code this method is called like this:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

This works only because etree by default uses two spaces to indent, which I don’t find very much emphasizing the indentation and therefore not pretty. I couldn’t ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.


回答 19

要将整个xml文档转换为漂亮的xml文档
(例如:假设您已提取[解压缩] LibreOffice Writer .odt或.ods文件,并且想要将丑陋的“ content.xml”文件转换为自动化git版本控制git difftool.odt / .ods文件的生成,例如我在此处实现的)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

参考资料:
-感谢本·诺兰德在本页上的回答,这为我提供了大部分帮助。

For converting an entire xml document to a pretty xml document
(ex: assuming you’ve extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly “content.xml” file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I’m implementing here)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

References:
– Thanks to Ben Noland’s answer on this page which got me most of the way there.


回答 20

from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

对于带有中文的xml来说效果很好!

from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

It’s working well for the xml with Chinese!


回答 21

如果由于某种原因您无法使用其他用户提到的任何Python模块,那么我建议使用以下针对Python 2.7的解决方案:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

据我所知,该解决方案将在xmllint安装了该软件包的基于Unix的系统上运行。

If for some reason you can’t get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

As far as I know, this solution will work on Unix-based systems that have the xmllint package installed.


回答 22

我用几行代码解决了这个问题,打开文件,遍历文件并添加缩进,然后再次保存。我正在处理小型xml文件,并且不想添加依赖项,也不想为用户安装更多库。无论如何,这就是我最终得到的结果:

    f = open(file_name,'r')
    xml = f.read()
    f.close()

    #Removing old indendations
    raw_xml = ''        
    for line in xml:
        raw_xml += line

    xml = raw_xml

    new_xml = ''
    indent = '    '
    deepness = 0

    for i in range((len(xml))):

        new_xml += xml[i]   
        if(i<len(xml)-3):

            simpleSplit = xml[i:(i+2)] == '><'
            advancSplit = xml[i:(i+3)] == '></'        
            end = xml[i:(i+2)] == '/>'    
            start = xml[i] == '<'

            if(advancSplit):
                deepness += -1
                new_xml += '\n' + indent*deepness
                simpleSplit = False
                deepness += -1
            if(simpleSplit):
                new_xml += '\n' + indent*deepness
            if(start):
                deepness += 1
            if(end):
                deepness += -1

    f = open(file_name,'w')
    f.write(new_xml)
    f.close()

它对我有用,也许有人会使用它:)

I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:

    f = open(file_name,'r')
    xml = f.read()
    f.close()

    #Removing old indendations
    raw_xml = ''        
    for line in xml:
        raw_xml += line

    xml = raw_xml

    new_xml = ''
    indent = '    '
    deepness = 0

    for i in range((len(xml))):

        new_xml += xml[i]   
        if(i<len(xml)-3):

            simpleSplit = xml[i:(i+2)] == '><'
            advancSplit = xml[i:(i+3)] == '></'        
            end = xml[i:(i+2)] == '/>'    
            start = xml[i] == '<'

            if(advancSplit):
                deepness += -1
                new_xml += '\n' + indent*deepness
                simpleSplit = False
                deepness += -1
            if(simpleSplit):
                new_xml += '\n' + indent*deepness
            if(start):
                deepness += 1
            if(end):
                deepness += -1

    f = open(file_name,'w')
    f.write(new_xml)
    f.close()

It works for me, perhaps someone will have some use of it :)


SQLAlchemy按降序排列?

问题:SQLAlchemy按降序排列?

如何descending在如下所示的SQLAlchemy查询中使用ORDER BY ?

此查询有效,但以升序返回:

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount) # This row :)
        )

如果我尝试:

.order_by(desc(model.Entry.amount))

然后我得到:NameError: global name 'desc' is not defined

How can I use ORDER BY descending in a SQLAlchemy query like the following?

This query works, but returns them in ascending order:

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount) # This row :)
        )

If I try:

.order_by(desc(model.Entry.amount))

then I get: NameError: global name 'desc' is not defined.


回答 0

与FYI一样,您也可以将这些内容指定为列属性。例如,我可能已经做过:

.order_by(model.Entry.amount.desc())

这很方便,因为它避免了import,并且可以在其他地方(例如关系定义等)中使用它。

有关更多信息,您可以参考此

Just as an FYI, you can also specify those things as column attributes. For instance, I might have done:

.order_by(model.Entry.amount.desc())

This is handy since it avoids an import, and you can use it on other places such as in a relation definition, etc.

For more information, you can refer this


回答 1

from sqlalchemy import desc
someselect.order_by(desc(table1.mycol))

来自@ jpmc26的用法

from sqlalchemy import desc
someselect.order_by(desc(table1.mycol))

Usage from @jpmc26


回答 2

您可能要做的另一件事是:

.order_by("name desc")

这将导致:ORDER BY名称desc。这里的缺点是按顺序使用显式列名。

One other thing you might do is:

.order_by("name desc")

This will result in: ORDER BY name desc. The disadvantage here is the explicit column name used in order by.


回答 3

您可以.desc()像这样在查询中使用函数

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount.desc())
        )

这将按金额降序排列,或者

query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    model.Entry.amount.desc()
)
)

使用SQLAlchemy的desc函数

from sqlalchemy import desc
query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    desc(model.Entry.amount)
)
)

对于官方文档,请使用链接或检查以下代码段

sqlalchemy.sql.expression.desc(column)产生一个降序的ORDER BY子句元素。

例如:

from sqlalchemy import desc

stmt = select([users_table]).order_by(desc(users_table.c.name))

将产生如下SQL:

SELECT id, name FROM user ORDER BY name DESC

desc()函数是ColumnElement.desc()方法的独立版本,可用于所有SQL表达式,例如:

stmt = select([users_table]).order_by(users_table.c.name.desc())

参数column –一个ColumnElement(例如标量SQL表达式),用于应用desc()操作。

也可以看看

asc()

nullsfirst()

nullslast()

Select.order_by()

You can use .desc() function in your query just like this

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount.desc())
        )

This will order by amount in descending order or

query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    model.Entry.amount.desc()
)
)

Use of desc function of SQLAlchemy

from sqlalchemy import desc
query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    desc(model.Entry.amount)
)
)

For official docs please use the link or check below snippet

sqlalchemy.sql.expression.desc(column) Produce a descending ORDER BY clause element.

e.g.:

from sqlalchemy import desc

stmt = select([users_table]).order_by(desc(users_table.c.name))

will produce SQL as:

SELECT id, name FROM user ORDER BY name DESC

The desc() function is a standalone version of the ColumnElement.desc() method available on all SQL expressions, e.g.:

stmt = select([users_table]).order_by(users_table.c.name.desc())

Parameters column – A ColumnElement (e.g. scalar SQL expression) with which to apply the desc() operation.

See also

asc()

nullsfirst()

nullslast()

Select.order_by()


回答 4

你可以试试: .order_by(ClientTotal.id.desc())

session = Session()
auth_client_name = 'client3' 
result_by_auth_client = session.query(ClientTotal).filter(ClientTotal.client ==
auth_client_name).order_by(ClientTotal.id.desc()).all()

for rbac in result_by_auth_client:
    print(rbac.id) 
session.close()

You can try: .order_by(ClientTotal.id.desc())

session = Session()
auth_client_name = 'client3' 
result_by_auth_client = session.query(ClientTotal).filter(ClientTotal.client ==
auth_client_name).order_by(ClientTotal.id.desc()).all()

for rbac in result_by_auth_client:
    print(rbac.id) 
session.close()

回答 5

@Radu答案的补充,与SQL一样,如果您有许多具有相同属性的表,则可以在参数中添加表名。

.order_by("TableName.name desc")

Complementary at @Radu answer, As in SQL, you can add the table name in the parameter if you have many table with the same attribute.

.order_by("TableName.name desc")

Python中的字母范围

问题:Python中的字母范围

而不是像这样列出字母字符:

alpha = ['a', 'b', 'c', 'd'.........'z']

有什么办法可以将它分组到某个范围之内?例如,对于数字,可以使用进行分组range()

range(1, 10)

Instead of making a list of alphabet characters like this:

alpha = ['a', 'b', 'c', 'd'.........'z']

is there any way that we can group it to a range or something? For example, for numbers it can be grouped using range():

range(1, 10)

回答 0

>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

如果您确实需要列表:

>>> list(string.ascii_lowercase)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

并做到这一点 range

>>> list(map(chr, range(97, 123))) #or list(map(chr, range(ord('a'), ord('z')+1)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

其他有用的string模块功能:

>>> help(string) # on Python 3
....
DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    whitespace = ' \t\n\r\x0b\x0c'
>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

If you really need a list:

>>> list(string.ascii_lowercase)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

And to do it with range

>>> list(map(chr, range(97, 123))) #or list(map(chr, range(ord('a'), ord('z')+1)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Other helpful string module features:

>>> help(string) # on Python 3
....
DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    whitespace = ' \t\n\r\x0b\x0c'

回答 1

[chr(i) for i in range(ord('a'),ord('z')+1)]
[chr(i) for i in range(ord('a'),ord('z')+1)]

回答 2

在Python 2.7和3中,您可以使用以下代码:

import string
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

正如@Zaz所说: string.lowercase已弃用,不再在Python 3中string.ascii_lowercase工作,但在两者中都工作

In Python 2.7 and 3 you can use this:

import string
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

As @Zaz says: string.lowercase is deprecated and no longer works in Python 3 but string.ascii_lowercase works in both


回答 3

这是一个简单的字母范围实现:

def letter_range(start, stop="{", step=1):
    """Yield a range of lowercase letters.""" 
    for ord_ in range(ord(start.lower()), ord(stop.lower()), step):
        yield chr(ord_)

演示版

list(letter_range("a", "f"))
# ['a', 'b', 'c', 'd', 'e']

list(letter_range("a", "f", step=2))
# ['a', 'c', 'e']

Here is a simple letter-range implementation:

Code

def letter_range(start, stop="{", step=1):
    """Yield a range of lowercase letters.""" 
    for ord_ in range(ord(start.lower()), ord(stop.lower()), step):
        yield chr(ord_)

Demo

list(letter_range("a", "f"))
# ['a', 'b', 'c', 'd', 'e']

list(letter_range("a", "f", step=2))
# ['a', 'c', 'e']

回答 4

如果要查找letters[1:10]与R 相等的值,则可以使用:

 import string
 list(string.ascii_lowercase[0:10])

If you are looking to an equivalent of letters[1:10] from R, you can use:

 import string
 list(string.ascii_lowercase[0:10])

回答 5

使用内置的range函数在python中打印大写和小写字母

def upperCaseAlphabets():
    print("Upper Case Alphabets")
    for i in range(65, 91):
        print(chr(i), end=" ")
    print()

def lowerCaseAlphabets():
    print("Lower Case Alphabets")
    for i in range(97, 123):
        print(chr(i), end=" ")

upperCaseAlphabets();
lowerCaseAlphabets();

Print the Upper and Lower case alphabets in python using a built-in range function

def upperCaseAlphabets():
    print("Upper Case Alphabets")
    for i in range(65, 91):
        print(chr(i), end=" ")
    print()

def lowerCaseAlphabets():
    print("Lower Case Alphabets")
    for i in range(97, 123):
        print(chr(i), end=" ")

upperCaseAlphabets();
lowerCaseAlphabets();

回答 6

这是我能找出的最简单方法:

#!/usr/bin/python3 for i in range(97, 123): print("{:c}".format(i), end='')

因此,从97到122是与’a’和’z’等效的ASCII码。请注意,小写字母和放置123的需要,因为将不包括在内)。

在打印功能中,请确保设置{:c}(字符)格式,在这种情况下,我们希望它一起打印所有内容,甚至不让最后一行换行,这样end=''就可以了。

结果是这样的: abcdefghijklmnopqrstuvwxyz

This is the easiest way I can figure out:

#!/usr/bin/python3 for i in range(97, 123): print("{:c}".format(i), end='')

So, 97 to 122 are the ASCII number equivalent to ‘a’ to and ‘z’. Notice the lowercase and the need to put 123, since it will not be included).

In print function make sure to set the {:c} (character) format, and, in this case, we want it to print it all together not even letting a new line at the end, so end=''would do the job.

The result is this: abcdefghijklmnopqrstuvwxyz


获取引起异常的异常描述和堆栈跟踪,全部作为字符串

问题:获取引起异常的异常描述和堆栈跟踪,全部作为字符串

我看过很多关于Python中堆栈跟踪和异常的文章。但是还没有找到我所需要的。

我有一段Python 2.7代码可能会引发异常。我想捕获它并将其完整描述和导致错误的堆栈跟踪分配给字符串(只是我们在控制台上看到的所有内容)。我需要此字符串以将其打印到GUI中的文本框中。

像这样:

try:
    method_that_can_raise_an_exception(params)
except Exception as e:
    print_to_textbox(complete_exception_description(e))

问题是:函数什么complete_exception_description

I’ve seen a lot of posts about stack trace and exceptions in Python. But haven’t found what I need.

I have a chunk of Python 2.7 code that may raise an exception. I would like to catch it and assign to a string its full description and the stack trace that caused the error (simply all we use to see on the console). I need this string to print it to a text box in the GUI.

Something like this:

try:
    method_that_can_raise_an_exception(params)
except Exception as e:
    print_to_textbox(complete_exception_description(e))

The problem is: what is the function complete_exception_description?


回答 0

请参阅traceback模块,特别是format_exc()功能。在这里

import traceback

try:
    raise ValueError
except ValueError:
    tb = traceback.format_exc()
else:
    tb = "No error"
finally:
    print tb

See the traceback module, specifically the format_exc() function. Here.

import traceback

try:
    raise ValueError
except ValueError:
    tb = traceback.format_exc()
else:
    tb = "No error"
finally:
    print tb

回答 1

让我们创建一个相当复杂的堆栈跟踪,以证明我们获得了完整的堆栈跟踪:

def raise_error():
    raise RuntimeError('something bad happened!')

def do_something_that_might_error():
    raise_error()

记录完整的堆栈跟踪

最佳做法是为模块设置一个记录器。它将知道模块的名称,并能够更改级别(在其他属性(例如处理程序)中)

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

我们可以使用此记录器来获取错误:

try:
    do_something_that_might_error()
except Exception as error:
    logger.exception(error)

哪个日志:

ERROR:__main__:something bad happened!
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

因此,我们得到与发生错误时相同的输出:

>>> do_something_that_might_error()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

仅获取字符串

如果您真的只想要字符串,请改用traceback.format_exc函数,在此处演示如何记录字符串:

import traceback
try:
    do_something_that_might_error()
except Exception as error:
    just_the_string = traceback.format_exc()
    logger.debug(just_the_string)

哪个日志:

DEBUG:__main__:Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

Let’s create a decently complicated stacktrace, in order to demonstrate that we get the full stacktrace:

def raise_error():
    raise RuntimeError('something bad happened!')

def do_something_that_might_error():
    raise_error()

Logging the full stacktrace

A best practice is to have a logger set up for your module. It will know the name of the module and be able to change levels (among other attributes, such as handlers)

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

And we can use this logger to get the error:

try:
    do_something_that_might_error()
except Exception as error:
    logger.exception(error)

Which logs:

ERROR:__main__:something bad happened!
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

And so we get the same output as when we have an error:

>>> do_something_that_might_error()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

Getting just the string

If you really just want the string, use the traceback.format_exc function instead, demonstrating logging the string here:

import traceback
try:
    do_something_that_might_error()
except Exception as error:
    just_the_string = traceback.format_exc()
    logger.debug(just_the_string)

Which logs:

DEBUG:__main__:Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

回答 2

对于Python 3,以下代码将Exception完全按照使用traceback.format_exc()以下命令获得的格式格式化对象:

import traceback

try: 
    method_that_can_raise_an_exception(params)
except Exception as ex:
    print(''.join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__)))

优点是仅Exception需要对象(由于记录的__traceback__属性),因此可以更轻松地将其作为参数传递给另一个函数以进行进一步处理。

With Python 3, the following code will format an Exception object exactly as would be obtained using traceback.format_exc():

import traceback

try: 
    method_that_can_raise_an_exception(params)
except Exception as ex:
    print(''.join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__)))

The advantage being that only the Exception object is needed (thanks to the recorded __traceback__ attribute), and can therefore be more easily passed as an argument to another function for further processing.


回答 3

>>> import sys
>>> import traceback
>>> try:
...   5 / 0
... except ZeroDivisionError as e:
...   type_, value_, traceback_ = sys.exc_info()
>>> traceback.format_tb(traceback_)
['  File "<stdin>", line 2, in <module>\n']
>>> value_
ZeroDivisionError('integer division or modulo by zero',)
>>> type_
<type 'exceptions.ZeroDivisionError'>
>>>
>>> 5 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

您使用sys.exc_info()来收集信息和traceback模块中的函数以对其进行格式化。 以下是一些格式化示例。

整个异常字符串位于:

>>> ex = traceback.format_exception(type_, value_, traceback_)
>>> ex
['Traceback (most recent call last):\n', '  File "<stdin>", line 2, in <module>\n', 'ZeroDivisionError: integer division or modulo by zero\n']
>>> import sys
>>> import traceback
>>> try:
...   5 / 0
... except ZeroDivisionError as e:
...   type_, value_, traceback_ = sys.exc_info()
>>> traceback.format_tb(traceback_)
['  File "<stdin>", line 2, in <module>\n']
>>> value_
ZeroDivisionError('integer division or modulo by zero',)
>>> type_
<type 'exceptions.ZeroDivisionError'>
>>>
>>> 5 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

You use sys.exc_info() to collect the information and the functions in the traceback module to format it. Here are some examples for formatting it.

The whole exception string is at:

>>> ex = traceback.format_exception(type_, value_, traceback_)
>>> ex
['Traceback (most recent call last):\n', '  File "<stdin>", line 2, in <module>\n', 'ZeroDivisionError: integer division or modulo by zero\n']

回答 4

对于那些使用Python-3的人

使用tracebackmodule和exception.__traceback__one可以提取堆栈跟踪,如下所示:

  • 使用获取当前的堆栈跟踪traceback.extract_stack()
  • 删除最后三个元素(因为那些是使我进入调试功能的堆栈中的条目)
  • __traceback__使用以下方法从异常对象追加traceback.extract_tb()
  • 使用格式化整个东西 traceback.format_list()
import traceback
def exception_to_string(excp):
   stack = traceback.extract_stack()[:-3] + traceback.extract_tb(excp.__traceback__)  # add limit=?? 
   pretty = traceback.format_list(stack)
   return ''.join(pretty) + '\n  {} {}'.format(excp.__class__,excp)

一个简单的演示:

def foo():
    try:
        something_invalid()
    except Exception as e:
        print(exception_to_string(e))

def bar():
    return foo()

调用时,将得到以下输出bar()

  File "./test.py", line 57, in <module>
    bar()
  File "./test.py", line 55, in bar
    return foo()
  File "./test.py", line 50, in foo
    something_invalid()

  <class 'NameError'> name 'something_invalid' is not defined

For those using Python-3

Using traceback module and exception.__traceback__ one can extract the stack-trace as follows:

  • grab the current stack-trace using traceback.extract_stack()
  • remove the last three elements (as those are entries in the stack that got me to my debug function)
  • append the __traceback__ from the exception object using traceback.extract_tb()
  • format the whole thing using traceback.format_list()
import traceback
def exception_to_string(excp):
   stack = traceback.extract_stack()[:-3] + traceback.extract_tb(excp.__traceback__)  # add limit=?? 
   pretty = traceback.format_list(stack)
   return ''.join(pretty) + '\n  {} {}'.format(excp.__class__,excp)

A simple demonstration:

def foo():
    try:
        something_invalid()
    except Exception as e:
        print(exception_to_string(e))

def bar():
    return foo()

We get the following output when we call bar():

  File "./test.py", line 57, in <module>
    bar()
  File "./test.py", line 55, in bar
    return foo()
  File "./test.py", line 50, in foo
    something_invalid()

  <class 'NameError'> name 'something_invalid' is not defined

回答 5

您可能还会考虑使用内置的Python模块cgitb来获取一些非常好且格式正确的异常信息,包括局部变量值,源代码上下文,函数参数等。

例如此代码…

import cgitb
cgitb.enable(format='text')

def func2(a, divisor):
    return a / divisor

def func1(a, b):
    c = b - 5
    return func2(a, c)

func1(1, 5)

我们得到这个异常输出…

ZeroDivisionError
Python 3.4.2: C:\tools\python\python.exe
Tue Sep 22 15:29:33 2015

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 c:\TEMP\cgittest2.py in <module>()
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
func1 = <function func1>

 c:\TEMP\cgittest2.py in func1(a=1, b=5)
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
global func2 = <function func2>
a = 1
c = 0

 c:\TEMP\cgittest2.py in func2(a=1, divisor=0)
    3
    4 def func2(a, divisor):
    5   return a / divisor
    6
    7 def func1(a, b):
a = 1
divisor = 0
ZeroDivisionError: division by zero
    __cause__ = None
    __class__ = <class 'ZeroDivisionError'>
    __context__ = None
    __delattr__ = <method-wrapper '__delattr__' of ZeroDivisionError object>
    __dict__ = {}
    __dir__ = <built-in method __dir__ of ZeroDivisionError object>
    __doc__ = 'Second argument to a division or modulo operation was zero.'
    __eq__ = <method-wrapper '__eq__' of ZeroDivisionError object>
    __format__ = <built-in method __format__ of ZeroDivisionError object>
    __ge__ = <method-wrapper '__ge__' of ZeroDivisionError object>
    __getattribute__ = <method-wrapper '__getattribute__' of ZeroDivisionError object>
    __gt__ = <method-wrapper '__gt__' of ZeroDivisionError object>
    __hash__ = <method-wrapper '__hash__' of ZeroDivisionError object>
    __init__ = <method-wrapper '__init__' of ZeroDivisionError object>
    __le__ = <method-wrapper '__le__' of ZeroDivisionError object>
    __lt__ = <method-wrapper '__lt__' of ZeroDivisionError object>
    __ne__ = <method-wrapper '__ne__' of ZeroDivisionError object>
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of ZeroDivisionError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of ZeroDivisionError object>
    __repr__ = <method-wrapper '__repr__' of ZeroDivisionError object>
    __setattr__ = <method-wrapper '__setattr__' of ZeroDivisionError object>
    __setstate__ = <built-in method __setstate__ of ZeroDivisionError object>
    __sizeof__ = <built-in method __sizeof__ of ZeroDivisionError object>
    __str__ = <method-wrapper '__str__' of ZeroDivisionError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __suppress_context__ = False
    __traceback__ = <traceback object>
    args = ('division by zero',)
    with_traceback = <built-in method with_traceback of ZeroDivisionError object>

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "cgittest2.py", line 11, in <module>
    func1(1, 5)
  File "cgittest2.py", line 9, in func1
    return func2(a, c)
  File "cgittest2.py", line 5, in func2
    return a / divisor
ZeroDivisionError: division by zero

You might also consider using the built-in Python module, cgitb, to get some really good, nicely formatted exception information including local variable values, source code context, function parameters etc..

For instance for this code…

import cgitb
cgitb.enable(format='text')

def func2(a, divisor):
    return a / divisor

def func1(a, b):
    c = b - 5
    return func2(a, c)

func1(1, 5)

we get this exception output…

ZeroDivisionError
Python 3.4.2: C:\tools\python\python.exe
Tue Sep 22 15:29:33 2015

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 c:\TEMP\cgittest2.py in <module>()
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
func1 = <function func1>

 c:\TEMP\cgittest2.py in func1(a=1, b=5)
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
global func2 = <function func2>
a = 1
c = 0

 c:\TEMP\cgittest2.py in func2(a=1, divisor=0)
    3
    4 def func2(a, divisor):
    5   return a / divisor
    6
    7 def func1(a, b):
a = 1
divisor = 0
ZeroDivisionError: division by zero
    __cause__ = None
    __class__ = <class 'ZeroDivisionError'>
    __context__ = None
    __delattr__ = <method-wrapper '__delattr__' of ZeroDivisionError object>
    __dict__ = {}
    __dir__ = <built-in method __dir__ of ZeroDivisionError object>
    __doc__ = 'Second argument to a division or modulo operation was zero.'
    __eq__ = <method-wrapper '__eq__' of ZeroDivisionError object>
    __format__ = <built-in method __format__ of ZeroDivisionError object>
    __ge__ = <method-wrapper '__ge__' of ZeroDivisionError object>
    __getattribute__ = <method-wrapper '__getattribute__' of ZeroDivisionError object>
    __gt__ = <method-wrapper '__gt__' of ZeroDivisionError object>
    __hash__ = <method-wrapper '__hash__' of ZeroDivisionError object>
    __init__ = <method-wrapper '__init__' of ZeroDivisionError object>
    __le__ = <method-wrapper '__le__' of ZeroDivisionError object>
    __lt__ = <method-wrapper '__lt__' of ZeroDivisionError object>
    __ne__ = <method-wrapper '__ne__' of ZeroDivisionError object>
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of ZeroDivisionError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of ZeroDivisionError object>
    __repr__ = <method-wrapper '__repr__' of ZeroDivisionError object>
    __setattr__ = <method-wrapper '__setattr__' of ZeroDivisionError object>
    __setstate__ = <built-in method __setstate__ of ZeroDivisionError object>
    __sizeof__ = <built-in method __sizeof__ of ZeroDivisionError object>
    __str__ = <method-wrapper '__str__' of ZeroDivisionError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __suppress_context__ = False
    __traceback__ = <traceback object>
    args = ('division by zero',)
    with_traceback = <built-in method with_traceback of ZeroDivisionError object>

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "cgittest2.py", line 11, in <module>
    func1(1, 5)
  File "cgittest2.py", line 9, in func1
    return func2(a, c)
  File "cgittest2.py", line 5, in func2
    return a / divisor
ZeroDivisionError: division by zero

回答 6

如果您希望在不处理异常的情况下获得相同的信息,则可以执行以下操作。这样做import traceback,然后:

try:
    ...
except Exception as e:
    print(traceback.print_tb(e.__traceback__))

我正在使用Python 3.7。

If you would like to get the same information given when an exception isn’t handled you can do something like this. Do import traceback and then:

try:
    ...
except Exception as e:
    print(traceback.print_tb(e.__traceback__))

I’m using Python 3.7.


回答 7

对于Python 3.5+

因此,您可以从您的异常以及其他任何异常中获取stacktrace。使用traceback.TracebackException它(只需更换ex您的除外):

print("".join(traceback.TracebackException.from_exception(ex).format())

一个扩展的示例和其他功能可以做到这一点:

import traceback

try:
    1/0
except Exception as ex:
    print("".join(traceback.TracebackException.from_exception(ex).format()) == traceback.format_exc() == "".join(traceback.format_exception(type(ex), ex, ex.__traceback__))) # This is True !!
    print("".join(traceback.TracebackException.from_exception(ex).format()))

输出将是这样的:

True
Traceback (most recent call last):
  File "untidsfsdfsdftled.py", line 29, in <module>
    1/0
ZeroDivisionError: division by zero

For Python 3.5+:

So, you can get the stacktrace from your exception as from any other exception. Use traceback.TracebackException for it (just replace ex with your exception):

print("".join(traceback.TracebackException.from_exception(ex).format())

An extended example and other features to do this:

import traceback

try:
    1/0
except Exception as ex:
    print("".join(traceback.TracebackException.from_exception(ex).format()) == traceback.format_exc() == "".join(traceback.format_exception(type(ex), ex, ex.__traceback__))) # This is True !!
    print("".join(traceback.TracebackException.from_exception(ex).format()))

The output will be something like this:

True
Traceback (most recent call last):
  File "untidsfsdfsdftled.py", line 29, in <module>
    1/0
ZeroDivisionError: division by zero

回答 8

我的2美分:

import sys, traceback
try: 
  ...
except Exception, e:
  T, V, TB = sys.exc_info()
  print ''.join(traceback.format_exception(T,V,TB))

my 2-cents:

import sys, traceback
try: 
  ...
except Exception, e:
  T, V, TB = sys.exc_info()
  print ''.join(traceback.format_exception(T,V,TB))

回答 9

如果您的目标是使异常和stacktrace消息看起来完全像python引发错误时一样,则以下内容在python 2 + 3中均适用:

import sys, traceback


def format_stacktrace():
    parts = ["Traceback (most recent call last):\n"]
    parts.extend(traceback.format_stack(limit=25)[:-2])
    parts.extend(traceback.format_exception(*sys.exc_info())[1:])
    return "".join(parts)

# EXAMPLE BELOW...

def a():
    b()


def b():
    c()


def c():
    d()


def d():
    assert False, "Noooh don't do it."


print("THIS IS THE FORMATTED STRING")
print("============================\n")

try:
    a()
except:
    stacktrace = format_stacktrace()
    print(stacktrace)

print("THIS IS HOW PYTHON DOES IT")
print("==========================\n")
a()

它通过format_stacktrace()从堆栈中删除最后一个调用并加入其余调用来工作。运行时,以上示例给出以下输出:

THIS IS THE FORMATTED STRING
============================

Traceback (most recent call last):
  File "test.py", line 31, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

THIS IS HOW PYTHON DOES IT
==========================

Traceback (most recent call last):
  File "test.py", line 38, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

If your goal is to make the exception and stacktrace message look exactly like when python throws an error, the following works in both python 2+3:

import sys, traceback


def format_stacktrace():
    parts = ["Traceback (most recent call last):\n"]
    parts.extend(traceback.format_stack(limit=25)[:-2])
    parts.extend(traceback.format_exception(*sys.exc_info())[1:])
    return "".join(parts)

# EXAMPLE BELOW...

def a():
    b()


def b():
    c()


def c():
    d()


def d():
    assert False, "Noooh don't do it."


print("THIS IS THE FORMATTED STRING")
print("============================\n")

try:
    a()
except:
    stacktrace = format_stacktrace()
    print(stacktrace)

print("THIS IS HOW PYTHON DOES IT")
print("==========================\n")
a()

It works by removing the last format_stacktrace() call from the stack and joining the rest. When run, the example above gives the following output:

THIS IS THE FORMATTED STRING
============================

Traceback (most recent call last):
  File "test.py", line 31, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

THIS IS HOW PYTHON DOES IT
==========================

Traceback (most recent call last):
  File "test.py", line 38, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

回答 10

我定义了以下帮助程序类:

import traceback
class TracedExeptions(object):
    def __init__(self):
        pass
    def __enter__(self):
        pass

    def __exit__(self, etype, value, tb):
      if value :
        if not hasattr(value, 'traceString'):
          value.traceString = "\n".join(traceback.format_exception(etype, value, tb))
        return False
      return True

我以后可以这样使用:

with TracedExeptions():
  #some-code-which-might-throw-any-exception

以后可以像这样消费它:

def log_err(ex):
  if hasattr(ex, 'traceString'):
    print("ERROR:{}".format(ex.traceString));
  else:
    print("ERROR:{}".format(ex));

(背景:我很沮丧,因为将Promises与s一起使用Exception,不幸的是将一个地方引发的异常传递给另一个地方的on_rejected处理程序,因此很难从原始位置进行回溯)

I defined following helper class:

import traceback
class TracedExeptions(object):
    def __init__(self):
        pass
    def __enter__(self):
        pass

    def __exit__(self, etype, value, tb):
      if value :
        if not hasattr(value, 'traceString'):
          value.traceString = "\n".join(traceback.format_exception(etype, value, tb))
        return False
      return True

Which I can later use like this:

with TracedExeptions():
  #some-code-which-might-throw-any-exception

And later can consume it like this:

def log_err(ex):
  if hasattr(ex, 'traceString'):
    print("ERROR:{}".format(ex.traceString));
  else:
    print("ERROR:{}".format(ex));

(Background: I was frustraded because of using Promises together with Exceptions, which unfortunately passes exceptions raised in one place to a on_rejected handler in another place, and thus it is difficult to get the traceback from original location)


SQLAlchemy:flush()和commit()有什么区别?

问题:SQLAlchemy:flush()和commit()有什么区别?

flush()commit()SQLAlchemy 之间有什么区别?

我已经阅读了文档,但没有一个更明智-他们似乎假设了我没有的预见性。

我对它们对内存使用量的影响特别感兴趣。我正在从一系列文件(总共约500万行)中将一些数据加载到数据库中,而我的会话有时会崩溃-这是一个大型数据库,并且一台内存不足的机器。

我想知道我使用的电话太多commit()还是不够flush()-但是如果不真正了解两者之间的区别,很难分辨!

What the difference is between flush() and commit() in SQLAlchemy?

I’ve read the docs, but am none the wiser – they seem to assume a pre-understanding that I don’t have.

I’m particularly interested in their impact on memory usage. I’m loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over – it’s a large database and a machine with not much memory.

I’m wondering if I’m using too many commit() and not enough flush() calls – but without really understanding what the difference is, it’s hard to tell!


回答 0

会话对象基本上是数据库更改(更新,插入,删除)的正在进行的事务。这些操作在提交之前不会持久化到数据库中(如果您的程序在会话中事务中由于某种原因中止,则其中所有未提交的更改都将丢失)。

会话对象向注册了事务操作session.add(),但是直到session.flush()调用它之前,它们都不会将它们传递给数据库。

session.flush()将一系列操作传达给数据库(插入,更新,删除)。数据库将它们维护为事务中的挂起操作。直到数据库收到当前事务的COMMIT为止,session.commit()所做的更改才会永久保留在磁盘上,或对其他事务可见。

session.commit() 将这些更改提交(持久)到数据库。

flush()始终称为一个呼叫的一部分commit()1)。

当您使用Session对象查询数据库时,查询将同时从数据库及其所保存的未提交事务的已刷新部分返回结果。默认情况下,Session反对autoflush其操作,但是可以禁用它。

希望这个例子可以使这个更加清楚:

#---
s = Session()

s.add(Foo('A')) # The Foo('A') object has been added to the session.
                # It has not been committed to the database yet,
                #   but is returned as part of a query.
print 1, s.query(Foo).all()
s.commit()

#---
s2 = Session()
s2.autoflush = False

s2.add(Foo('B'))
print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned
                             #   as part of this query because it hasn't
                             #   been flushed yet.
s2.flush()                   # Now, Foo('B') is in the same state as
                             #   Foo('A') was above.
print 3, s2.query(Foo).all() 
s2.rollback()                # Foo('B') has not been committed, and rolling
                             #   back the session's transaction removes it
                             #   from the session.
print 4, s2.query(Foo).all()

#---
Output:
1 [<Foo('A')>]
2 [<Foo('A')>]
3 [<Foo('A')>, <Foo('B')>]
4 [<Foo('A')>]

A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren’t persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).

The session object registers transaction operations with session.add(), but doesn’t yet communicate them to the database until session.flush() is called.

session.flush() communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren’t persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).

session.commit() commits (persists) those changes to the database.

flush() is always called as part of a call to commit() (1).

When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush their operations, but this can be disabled.

Hopefully this example will make this clearer:

#---
s = Session()

s.add(Foo('A')) # The Foo('A') object has been added to the session.
                # It has not been committed to the database yet,
                #   but is returned as part of a query.
print 1, s.query(Foo).all()
s.commit()

#---
s2 = Session()
s2.autoflush = False

s2.add(Foo('B'))
print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned
                             #   as part of this query because it hasn't
                             #   been flushed yet.
s2.flush()                   # Now, Foo('B') is in the same state as
                             #   Foo('A') was above.
print 3, s2.query(Foo).all() 
s2.rollback()                # Foo('B') has not been committed, and rolling
                             #   back the session's transaction removes it
                             #   from the session.
print 4, s2.query(Foo).all()

#---
Output:
1 [<Foo('A')>]
2 [<Foo('A')>]
3 [<Foo('A')>, <Foo('B')>]
4 [<Foo('A')>]

回答 1

正如@snapshoe所说

flush() 将您的SQL语句发送到数据库

commit() 提交交易。

时间session.autocommit == False

commit()flush()如果您设置,会打电话autoflush == True

时间session.autocommit == True

commit()如果您尚未开始交易,则无法调用(您可能没有,因为您可能仅使用此模式来避免手动管理交易)。

在这种模式下,您必须调用flush()以保存您的ORM更改。刷新还会有效地提交您的数据。

As @snapshoe says

flush() sends your SQL statements to the database

commit() commits the transaction.

When session.autocommit == False:

commit() will call flush() if you set autoflush == True.

When session.autocommit == True:

You can’t call commit() if you haven’t started a transaction (which you probably haven’t since you would probably only use this mode to avoid manually managing transactions).

In this mode, you must call flush() to save your ORM changes. The flush effectively also commits your data.


回答 2

如果可以提交,为什么还要刷新?

作为刚接触数据库和sqlalchemy的新手,以前的答案- flush()将SQL语句发送到数据库并commit()保留它们-对我来说还不清楚。这些定义很有意义,但是从定义中并不清楚为什么您要使用冲洗而不是仅仅提交。

由于提交始终会刷新(https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing),因此听起来确实很相似。我认为要强调的大问题是,刷新不是永久的并且可以撤消,而提交是永久的,在某种意义上,您不能要求数据库撤消上一次提交(我认为)

@snapshoe突出显示,如果要查询数据库并获取包含新添加对象的结果,则需要先刷新(或提交,然后将为您刷新)。也许这对某些人有用,尽管我不确定为什么您要刷新而不是提交(除了可以撤消的琐碎回答之外)。

在另一个示例中,我正在本地数据库和远程服务器之间同步文档,并且如果用户决定取消,则应撤消所有添加/更新/删除操作(即,不进行部分同步,仅进行完全同步)。更新单个文档时,我决定只删除旧行,然后从远程服务器添加更新的版本。事实证明,由于sqlalchemy的编写方式,不能保证提交时的操作顺序。这导致添加了一个重复版本(在尝试删除旧版本之前),这导致数据库无法通过唯一约束。为了解决这个问题,我使用flush()了顺序,但是如果以后同步过程失败,我仍然可以撤消操作。

在以下位置查看我的帖子:在sqlalchemy中提交时,添加和删除是否有任何顺序

同样,有人想知道提交时是否保持添加顺序,即如果我先添加object1然后添加object2是否在将对象添加到会话时将SQLAlchemy保存object1到数据库之前object2

同样,这里假定使用flush()将确保所需的行为。因此,总而言之,刷新的一种用途是提供顺序保证(我认为),同时又允许自己使用提交不提供的“撤消”选项。

自动刷新和自动提交

注意,自动刷新可用于确保查询对更新的数据库起作用,因为sqlalchemy将在执行查询之前刷新。https://docs.sqlalchemy.org/zh/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush

自动提交是我尚不完全了解的其他东西,但听起来不鼓励使用它:https : //docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params。自动提交

内存使用情况

现在,最初的问题实际上是想了解刷新与提交对内存的影响。由于持久性或不持久性是数据库提供的(我认为),因此仅刷新就足以卸载数据库-尽管如果您不关心撤消操作,提交也不会受到损害(实际上可能会有所帮助-见下文)。 。

sqlalchemy对已刷新的对象使用弱引用:https : //docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior

这意味着,如果您没有将对象明确保留在某个地方(例如列表或dict中),则sqlalchemy不会将其保留在内存中。

但是,这时您需要担心数据库方面的问题。可能在未提交的情况下进行刷新会带来一些内存损失,以维护事务。同样,我对此并不陌生,但是这里的链接似乎恰恰表明了这一点:https : //stackoverflow.com/a/15305650/764365

换句话说,尽管应该在内存和性能之间进行权衡,但是提交应该减少内存的使用。换句话说,您可能不想一次提交每一个数据库更改(出于性能原因),但是等待太长时间会增加内存使用率。

Why flush if you can commit?

As someone new to working with databases and sqlalchemy, the previous answers – that flush() sends SQL statements to the DB and commit() persists them – were not clear to me. The definitions make sense but it isn’t immediately clear from the definitions why you would use a flush instead of just committing.

Since a commit always flushes (https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing) these sound really similar. I think the big issue to highlight is that a flush is not permanent and can be undone, whereas a commit is permanent, in the sense that you can’t ask the database to undo the last commit (I think)

@snapshoe highlights that if you want to query the database and get results that include newly added objects, you need to have flushed first (or committed, which will flush for you). Perhaps this is useful for some people although I’m not sure why you would want to flush rather than commit (other than the trivial answer that it can be undone).

In another example I was syncing documents between a local DB and a remote server, and if the user decided to cancel, all adds/updates/deletes should be undone (i.e. no partial sync, only a full sync). When updating a single document I’ve decided to simply delete the old row and add the updated version from the remote server. It turns out that due to the way sqlalchemy is written, order of operations when committing is not guaranteed. This resulted in adding a duplicate version (before attempting to delete the old one), which resulted in the DB failing a unique constraint. To get around this I used flush() so that order was maintained, but I could still undo if later the sync process failed.

See my post on this at: Is there any order for add versus delete when committing in sqlalchemy

Similarly, someone wanted to know whether add order is maintained when committing, i.e. if I add object1 then add object2, does object1 get added to the database before object2 Does SQLAlchemy save order when adding objects to session?

Again, here presumably the use of a flush() would ensure the desired behavior. So in summary, one use for flush is to provide order guarantees (I think), again while still allowing yourself an “undo” option that commit does not provide.

Autoflush and Autocommit

Note, autoflush can be used to ensure queries act on an updated database as sqlalchemy will flush before executing the query. https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush

Autocommit is something else that I don’t completely understand but it sounds like its use is discouraged: https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autocommit

Memory Usage

Now the original question actually wanted to know about the impact of flush vs. commit for memory purposes. As the ability to persist or not is something the database offers (I think), simply flushing should be sufficient to offload to the database – although committing shouldn’t hurt (actually probably helps – see below) if you don’t care about undoing.

sqlalchemy uses weak referencing for objects that have been flushed: https://docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior

This means if you don’t have an object explicitly held onto somewhere, like in a list or dict, sqlalchemy won’t keep it in memory.

However, then you have the database side of things to worry about. Presumably flushing without committing comes with some memory penalty to maintain the transaction. Again, I’m new to this but here’s a link that seems to suggest exactly this: https://stackoverflow.com/a/15305650/764365

In other words, commits should reduce memory usage, although presumably there is a trade-off between memory and performance here. In other words, you probably don’t want to commit every single database change, one at a time (for performance reasons), but waiting too long will increase memory usage.


回答 3

这并不能严格回答最初的问题,但是有些人提到session.autoflush = True不必与您一起使用session.flush()……而且并非总是如此。

如果要在事务中间使用新创建的对象的ID,则必须调用session.flush()

# Given a model with at least this id
class AModel(Base):
   id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key

session.autoflush = True

a = AModel()
session.add(a)
a.id  # None
session.flush()
a.id  # autoincremented integer

这是因为autoflush自动填写ID(尽管对象的查询将,这有时会导致混乱,如“为什么这个作品在这里,但不是吗?”但是,snapshoe已涵盖这一部分)。


一个相关方面对我来说似乎很重要,但并未真正提及:

为什么不一直承诺呢?-答案是原子性

可以这么说:所有操作必须全部成功执行,否则任何一个都不会生效。

例如,如果要创建/更新/删除某个对象(A),然后创建/更新/删除另一个对象(B),但是如果(B)失败,则要还原(A)。这意味着这两个操作是原子操作。

因此,如果(B)需要(A)的结果,则要flush在(A)commit之后和(B)之后调用。

另外,如果是session.autoflush is True,除了我上面提到的情况或Jimbo的回答中的其他情况外,您将不需要flush手动调用。

This does not strictly answer the original question but some people have mentioned that with session.autoflush = True you don’t have to use session.flush()… And this is not always true.

If you want to use the id of a newly created object in the middle of a transaction, you must call session.flush().

# Given a model with at least this id
class AModel(Base):
   id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key

session.autoflush = True

a = AModel()
session.add(a)
a.id  # None
session.flush()
a.id  # autoincremented integer

This is because autoflush does NOT auto fill the id (although a query of the object will, which sometimes can cause confusion as in “why this works here but not there?” But snapshoe already covered this part).


One related aspect that seems pretty important to me and wasn’t really mentioned:

Why would you not commit all the time? – The answer is atomicity.

A fancy word to say: an ensemble of operations have to all be executed successfully OR none of them will take effect.

For example, if you want to create/update/delete some object (A) and then create/update/delete another (B), but if (B) fails you want to revert (A). This means those 2 operations are atomic.

Therefore, if (B) needs a result of (A), you want to call flush after (A) and commit after (B).

Also, if session.autoflush is True, except for the case that I mentioned above or others in Jimbo‘s answer, you will not need to call flush manually.


在Python类中支持等价(“平等”)的优雅方法

问题:在Python类中支持等价(“平等”)的优雅方法

编写自定义类时,通过==!=运算符允许等效性通常很重要。在Python中,这可以通过分别实现__eq____ne__特殊方法来实现。我发现执行此操作的最简单方法是以下方法:

class Foo:
    def __init__(self, item):
        self.item = item

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return False

    def __ne__(self, other):
        return not self.__eq__(other)

您知道这样做更优雅的方法吗?您知道使用上述__dict__s 比较方法有什么特别的缺点吗?

注意:需要澄清一点-当__eq____ne__未定义时,您会发现以下行为:

>>> a = Foo(1)
>>> b = Foo(1)
>>> a is b
False
>>> a == b
False

也就是说,a == b评估为False因为它确实运行了a is b,所以对身份进行了测试(即“ ab?是同一对象”)。

__eq____ne__定义,你会发现这种行为(这是一个我们后):

>>> a = Foo(1)
>>> b = Foo(1)
>>> a is b
False
>>> a == b
True

When writing custom classes it is often important to allow equivalence by means of the == and != operators. In Python, this is made possible by implementing the __eq__ and __ne__ special methods, respectively. The easiest way I’ve found to do this is the following method:

class Foo:
    def __init__(self, item):
        self.item = item

    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return False

    def __ne__(self, other):
        return not self.__eq__(other)

Do you know of more elegant means of doing this? Do you know of any particular disadvantages to using the above method of comparing __dict__s?

Note: A bit of clarification–when __eq__ and __ne__ are undefined, you’ll find this behavior:

>>> a = Foo(1)
>>> b = Foo(1)
>>> a is b
False
>>> a == b
False

That is, a == b evaluates to False because it really runs a is b, a test of identity (i.e., “Is a the same object as b?”).

When __eq__ and __ne__ are defined, you’ll find this behavior (which is the one we’re after):

>>> a = Foo(1)
>>> b = Foo(1)
>>> a is b
False
>>> a == b
True

回答 0

考虑这个简单的问题:

class Number:

    def __init__(self, number):
        self.number = number


n1 = Number(1)
n2 = Number(1)

n1 == n2 # False -- oops

因此,默认情况下,Python使用对象标识符进行比较操作:

id(n1) # 140400634555856
id(n2) # 140400634555920

覆盖__eq__函数似乎可以解决问题:

def __eq__(self, other):
    """Overrides the default implementation"""
    if isinstance(other, Number):
        return self.number == other.number
    return False


n1 == n2 # True
n1 != n2 # True in Python 2 -- oops, False in Python 3

Python 2中,请始终记住也要重写该__ne__函数,如文档所述:

比较运算符之间没有隐含的关系。的真相x==y并不意味着那x!=y是错误的。因此,在定义时__eq__(),还应该定义一个,__ne__()以便操作符能够按预期运行。

def __ne__(self, other):
    """Overrides the default implementation (unnecessary in Python 3)"""
    return not self.__eq__(other)


n1 == n2 # True
n1 != n2 # False

Python 3中,不再需要这样做,因为文档指出:

默认情况下,除非为,否则将__ne__()委托给__eq__()结果并将其取反NotImplemented。比较运算符之间没有其他隐含关系,例如,的真相(x<y or x==y)并不意味着x<=y

但这不能解决我们所有的问题。让我们添加一个子类:

class SubNumber(Number):
    pass


n3 = SubNumber(1)

n1 == n3 # False for classic-style classes -- oops, True for new-style classes
n3 == n1 # True
n1 != n3 # True for classic-style classes -- oops, False for new-style classes
n3 != n1 # False

注意: Python 2有两种类:

  • 经典样式(或旧样式)类,它们继承自object,并声明为class A:class A():或者经典样式类class A(B):在哪里B

  • 新样式类,那些从继承object和声明为class A(object)class A(B):其中B一个新式类。Python 3中只被声明为新的样式类class A:class A(object):class A(B):

对于经典风格的类,比较操作始终调用第一个操作数的方法,而对于新风格的类,则始终调用子类操作数的方法,而不管操作数的顺序如何

所以在这里,如果Number是经典样式的类:

  • n1 == n3电话n1.__eq__;
  • n3 == n1电话n3.__eq__;
  • n1 != n3电话n1.__ne__;
  • n3 != n1来电n3.__ne__

如果Number是一个新式类:

  • 双方n1 == n3n3 == n1打电话n3.__eq__;
  • n1 != n3n3 != n1打电话n3.__ne__

要解决Python 2经典样式类的==!=运算符的不可交换性问题,当不支持操作数类型时,__eq____ne__方法应返回NotImplemented值。该文档NotImplemented值定义为:

如果数字方法和丰富比较方法未实现所提供操作数的操作,则可能返回此值。(然后,解释程序将根据操作员尝试执行反射操作或其他回退。)其真实值是true。

在这种情况下操作者的代表的比较操作的反射的方法的的其他操作数。该文档将反映的方法定义为:

这些方法没有交换参数版本(当左参数不支持该操作但右参数支持该操作时使用);相反,__lt__()and __gt__()是彼此的反射,__le__()and __ge__()是彼此的反射,and __eq__()and __ne__()是自己的反射。

结果看起来像这样:

def __eq__(self, other):
    """Overrides the default implementation"""
    if isinstance(other, Number):
        return self.number == other.number
    return NotImplemented

def __ne__(self, other):
    """Overrides the default implementation (unnecessary in Python 3)"""
    x = self.__eq__(other)
    if x is NotImplemented:
        return NotImplemented
    return not x

如果操作数是不相关的类型(无继承),如果需要and 运算符的可交换性,那么即使对于新式类,也要返回NotImplemented值而不是False正确的做法。==!=

我们到了吗?不完全的。我们有多少个唯一数字?

len(set([n1, n2, n3])) # 3 -- oops

集合使用对象的哈希值,默认情况下,Python返回对象标识符的哈希值。让我们尝试覆盖它:

def __hash__(self):
    """Overrides the default implementation"""
    return hash(tuple(sorted(self.__dict__.items())))

len(set([n1, n2, n3])) # 1

最终结果如下所示(我在末尾添加了一些断言以进行验证):

class Number:

    def __init__(self, number):
        self.number = number

    def __eq__(self, other):
        """Overrides the default implementation"""
        if isinstance(other, Number):
            return self.number == other.number
        return NotImplemented

    def __ne__(self, other):
        """Overrides the default implementation (unnecessary in Python 3)"""
        x = self.__eq__(other)
        if x is not NotImplemented:
            return not x
        return NotImplemented

    def __hash__(self):
        """Overrides the default implementation"""
        return hash(tuple(sorted(self.__dict__.items())))


class SubNumber(Number):
    pass


n1 = Number(1)
n2 = Number(1)
n3 = SubNumber(1)
n4 = SubNumber(4)

assert n1 == n2
assert n2 == n1
assert not n1 != n2
assert not n2 != n1

assert n1 == n3
assert n3 == n1
assert not n1 != n3
assert not n3 != n1

assert not n1 == n4
assert not n4 == n1
assert n1 != n4
assert n4 != n1

assert len(set([n1, n2, n3, ])) == 1
assert len(set([n1, n2, n3, n4])) == 2

Consider this simple problem:

class Number:

    def __init__(self, number):
        self.number = number


n1 = Number(1)
n2 = Number(1)

n1 == n2 # False -- oops

So, Python by default uses the object identifiers for comparison operations:

id(n1) # 140400634555856
id(n2) # 140400634555920

Overriding the __eq__ function seems to solve the problem:

def __eq__(self, other):
    """Overrides the default implementation"""
    if isinstance(other, Number):
        return self.number == other.number
    return False


n1 == n2 # True
n1 != n2 # True in Python 2 -- oops, False in Python 3

In Python 2, always remember to override the __ne__ function as well, as the documentation states:

There are no implied relationships among the comparison operators. The truth of x==y does not imply that x!=y is false. Accordingly, when defining __eq__(), one should also define __ne__() so that the operators will behave as expected.

def __ne__(self, other):
    """Overrides the default implementation (unnecessary in Python 3)"""
    return not self.__eq__(other)


n1 == n2 # True
n1 != n2 # False

In Python 3, this is no longer necessary, as the documentation states:

By default, __ne__() delegates to __eq__() and inverts the result unless it is NotImplemented. There are no other implied relationships among the comparison operators, for example, the truth of (x<y or x==y) does not imply x<=y.

But that does not solve all our problems. Let’s add a subclass:

class SubNumber(Number):
    pass


n3 = SubNumber(1)

n1 == n3 # False for classic-style classes -- oops, True for new-style classes
n3 == n1 # True
n1 != n3 # True for classic-style classes -- oops, False for new-style classes
n3 != n1 # False

Note: Python 2 has two kinds of classes:

  • classic-style (or old-style) classes, that do not inherit from object and that are declared as class A:, class A(): or class A(B): where B is a classic-style class;

  • new-style classes, that do inherit from object and that are declared as class A(object) or class A(B): where B is a new-style class. Python 3 has only new-style classes that are declared as class A:, class A(object): or class A(B):.

For classic-style classes, a comparison operation always calls the method of the first operand, while for new-style classes, it always calls the method of the subclass operand, regardless of the order of the operands.

So here, if Number is a classic-style class:

  • n1 == n3 calls n1.__eq__;
  • n3 == n1 calls n3.__eq__;
  • n1 != n3 calls n1.__ne__;
  • n3 != n1 calls n3.__ne__.

And if Number is a new-style class:

  • both n1 == n3 and n3 == n1 call n3.__eq__;
  • both n1 != n3 and n3 != n1 call n3.__ne__.

To fix the non-commutativity issue of the == and != operators for Python 2 classic-style classes, the __eq__ and __ne__ methods should return the NotImplemented value when an operand type is not supported. The documentation defines the NotImplemented value as:

Numeric methods and rich comparison methods may return this value if they do not implement the operation for the operands provided. (The interpreter will then try the reflected operation, or some other fallback, depending on the operator.) Its truth value is true.

In this case the operator delegates the comparison operation to the reflected method of the other operand. The documentation defines reflected methods as:

There are no swapped-argument versions of these methods (to be used when the left argument does not support the operation but the right argument does); rather, __lt__() and __gt__() are each other’s reflection, __le__() and __ge__() are each other’s reflection, and __eq__() and __ne__() are their own reflection.

The result looks like this:

def __eq__(self, other):
    """Overrides the default implementation"""
    if isinstance(other, Number):
        return self.number == other.number
    return NotImplemented

def __ne__(self, other):
    """Overrides the default implementation (unnecessary in Python 3)"""
    x = self.__eq__(other)
    if x is NotImplemented:
        return NotImplemented
    return not x

Returning the NotImplemented value instead of False is the right thing to do even for new-style classes if commutativity of the == and != operators is desired when the operands are of unrelated types (no inheritance).

Are we there yet? Not quite. How many unique numbers do we have?

len(set([n1, n2, n3])) # 3 -- oops

Sets use the hashes of objects, and by default Python returns the hash of the identifier of the object. Let’s try to override it:

def __hash__(self):
    """Overrides the default implementation"""
    return hash(tuple(sorted(self.__dict__.items())))

len(set([n1, n2, n3])) # 1

The end result looks like this (I added some assertions at the end for validation):

class Number:

    def __init__(self, number):
        self.number = number

    def __eq__(self, other):
        """Overrides the default implementation"""
        if isinstance(other, Number):
            return self.number == other.number
        return NotImplemented

    def __ne__(self, other):
        """Overrides the default implementation (unnecessary in Python 3)"""
        x = self.__eq__(other)
        if x is not NotImplemented:
            return not x
        return NotImplemented

    def __hash__(self):
        """Overrides the default implementation"""
        return hash(tuple(sorted(self.__dict__.items())))


class SubNumber(Number):
    pass


n1 = Number(1)
n2 = Number(1)
n3 = SubNumber(1)
n4 = SubNumber(4)

assert n1 == n2
assert n2 == n1
assert not n1 != n2
assert not n2 != n1

assert n1 == n3
assert n3 == n1
assert not n1 != n3
assert not n3 != n1

assert not n1 == n4
assert not n4 == n1
assert n1 != n4
assert n4 != n1

assert len(set([n1, n2, n3, ])) == 1
assert len(set([n1, n2, n3, n4])) == 2

回答 1

您需要小心继承:

>>> class Foo:
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return False

>>> class Bar(Foo):pass

>>> b = Bar()
>>> f = Foo()
>>> f == b
True
>>> b == f
False

更严格地检查类型,如下所示:

def __eq__(self, other):
    if type(other) is type(self):
        return self.__dict__ == other.__dict__
    return False

除此之外,您的方法会很好地工作,这就是专用方法的目的。

You need to be careful with inheritance:

>>> class Foo:
    def __eq__(self, other):
        if isinstance(other, self.__class__):
            return self.__dict__ == other.__dict__
        else:
            return False

>>> class Bar(Foo):pass

>>> b = Bar()
>>> f = Foo()
>>> f == b
True
>>> b == f
False

Check types more strictly, like this:

def __eq__(self, other):
    if type(other) is type(self):
        return self.__dict__ == other.__dict__
    return False

Besides that, your approach will work fine, that’s what special methods are there for.


回答 2

您描述的方式就是我一直以来所做的方式。由于它是完全通用的,因此您始终可以将该功能分解为mixin类,并在需要该功能的类中继承它。

class CommonEqualityMixin(object):

    def __eq__(self, other):
        return (isinstance(other, self.__class__)
            and self.__dict__ == other.__dict__)

    def __ne__(self, other):
        return not self.__eq__(other)

class Foo(CommonEqualityMixin):

    def __init__(self, item):
        self.item = item

The way you describe is the way I’ve always done it. Since it’s totally generic, you can always break that functionality out into a mixin class and inherit it in classes where you want that functionality.

class CommonEqualityMixin(object):

    def __eq__(self, other):
        return (isinstance(other, self.__class__)
            and self.__dict__ == other.__dict__)

    def __ne__(self, other):
        return not self.__eq__(other)

class Foo(CommonEqualityMixin):

    def __init__(self, item):
        self.item = item

回答 3

这不是一个直接的答案,但似乎足够相关,可以解决,因为它有时可以节省一些冗长的乏味。从文档中直接切出…


functools.total_ordering(cls)

给定一个定义了一个或多个丰富比较排序方法的类,此类装饰器将提供其余的类。这简化了指定所有可能的丰富比较操作所涉及的工作:

这个类必须定义之一__lt__()__le__()__gt__(),或__ge__()。另外,该类应提供一个__eq__()方法。

2.7版中的新功能

@total_ordering
class Student:
    def __eq__(self, other):
        return ((self.lastname.lower(), self.firstname.lower()) ==
                (other.lastname.lower(), other.firstname.lower()))
    def __lt__(self, other):
        return ((self.lastname.lower(), self.firstname.lower()) <
                (other.lastname.lower(), other.firstname.lower()))

Not a direct answer but seemed relevant enough to be tacked on as it saves a bit of verbose tedium on occasion. Cut straight from the docs…


functools.total_ordering(cls)

Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations:

The class must define one of __lt__(), __le__(), __gt__(), or __ge__(). In addition, the class should supply an __eq__() method.

New in version 2.7

@total_ordering
class Student:
    def __eq__(self, other):
        return ((self.lastname.lower(), self.firstname.lower()) ==
                (other.lastname.lower(), other.firstname.lower()))
    def __lt__(self, other):
        return ((self.lastname.lower(), self.firstname.lower()) <
                (other.lastname.lower(), other.firstname.lower()))

回答 4

您不必覆盖两者,__eq____ne__只能覆盖,__cmp__但这将对==,!==,<,>等结果产生影响。

is测试对象身份。这意味着,当a和b都持有对同一对象的引用时,isb就会出现True。在python中,您始终会在变量中持有对对象的引用,而不是实际对象,因此从本质上来说,如果a为b为true,则其中的对象应位于相同的内存位置。最重要的是,为什么您要继续压倒这种行为?

编辑:我不知道__cmp__从python 3中删除了,所以避免它。

You don’t have to override both __eq__ and __ne__ you can override only __cmp__ but this will make an implication on the result of ==, !==, < , > and so on.

is tests for object identity. This means a is b will be True in the case when a and b both hold the reference to the same object. In python you always hold a reference to an object in a variable not the actual object, so essentially for a is b to be true the objects in them should be located in the same memory location. How and most importantly why would you go about overriding this behaviour?

Edit: I didn’t know __cmp__ was removed from python 3 so avoid it.


回答 5

从这个答案:https : //stackoverflow.com/a/30676267/541136我已经证明了这一点,尽管__ne__用术语定义是正确的__eq__-而不是

def __ne__(self, other):
    return not self.__eq__(other)

您应该使用:

def __ne__(self, other):
    return not self == other

From this answer: https://stackoverflow.com/a/30676267/541136 I have demonstrated that, while it’s correct to define __ne__ in terms __eq__ – instead of

def __ne__(self, other):
    return not self.__eq__(other)

you should use:

def __ne__(self, other):
    return not self == other

回答 6

我认为您要查找的两个术语是相等(==)和同一性(is)。例如:

>>> a = [1,2,3]
>>> b = [1,2,3]
>>> a == b
True       <-- a and b have values which are equal
>>> a is b
False      <-- a and b are not the same list object

I think that the two terms you’re looking for are equality (==) and identity (is). For example:

>>> a = [1,2,3]
>>> b = [1,2,3]
>>> a == b
True       <-- a and b have values which are equal
>>> a is b
False      <-- a and b are not the same list object

回答 7

“ is”测试将使用内置的“ id()”函数测试身份,该函数实质上返回对象的内存地址,因此不可重载。

但是,在测试类的相等性的情况下,您可能希望对测试更加严格一些,只比较类中的数据属性:

import types

class ComparesNicely(object):

    def __eq__(self, other):
        for key, value in self.__dict__.iteritems():
            if (isinstance(value, types.FunctionType) or 
                    key.startswith("__")):
                continue

            if key not in other.__dict__:
                return False

            if other.__dict__[key] != value:
                return False

         return True

该代码将只比较类的非函数数据成员,并且跳过通常需要的任何私有内容。对于普通的旧Python对象,我有一个实现__init__,__str__,__repr__和__eq__的基类,因此我的POPO对象不承担所有额外(在大多数情况下相同)逻辑的负担。

The ‘is’ test will test for identity using the builtin ‘id()’ function which essentially returns the memory address of the object and therefore isn’t overloadable.

However in the case of testing the equality of a class you probably want to be a little bit more strict about your tests and only compare the data attributes in your class:

import types

class ComparesNicely(object):

    def __eq__(self, other):
        for key, value in self.__dict__.iteritems():
            if (isinstance(value, types.FunctionType) or 
                    key.startswith("__")):
                continue

            if key not in other.__dict__:
                return False

            if other.__dict__[key] != value:
                return False

         return True

This code will only compare non function data members of your class as well as skipping anything private which is generally what you want. In the case of Plain Old Python Objects I have a base class which implements __init__, __str__, __repr__ and __eq__ so my POPO objects don’t carry the burden of all that extra (and in most cases identical) logic.


回答 8

我喜欢使用泛型类装饰器,而不是使用子类/混合器

def comparable(cls):
    """ Class decorator providing generic comparison functionality """

    def __eq__(self, other):
        return isinstance(other, self.__class__) and self.__dict__ == other.__dict__

    def __ne__(self, other):
        return not self.__eq__(other)

    cls.__eq__ = __eq__
    cls.__ne__ = __ne__
    return cls

用法:

@comparable
class Number(object):
    def __init__(self, x):
        self.x = x

a = Number(1)
b = Number(1)
assert a == b

Instead of using subclassing/mixins, I like to use a generic class decorator

def comparable(cls):
    """ Class decorator providing generic comparison functionality """

    def __eq__(self, other):
        return isinstance(other, self.__class__) and self.__dict__ == other.__dict__

    def __ne__(self, other):
        return not self.__eq__(other)

    cls.__eq__ = __eq__
    cls.__ne__ = __ne__
    return cls

Usage:

@comparable
class Number(object):
    def __init__(self, x):
        self.x = x

a = Number(1)
b = Number(1)
assert a == b

回答 9

这合并了对Algorias答案的评论,并通过单个属性比较对象,因为我不在乎整个字典。hasattr(other, "id")必须为真,但我知道这是因为我在构造函数中进行了设置。

def __eq__(self, other):
    if other is self:
        return True

    if type(other) is not type(self):
        # delegate to superclass
        return NotImplemented

    return other.id == self.id

This incorporates the comments on Algorias’ answer, and compares objects by a single attribute because I don’t care about the whole dict. hasattr(other, "id") must be true, but I know it is because I set it in the constructor.

def __eq__(self, other):
    if other is self:
        return True

    if type(other) is not type(self):
        # delegate to superclass
        return NotImplemented

    return other.id == self.id

如何获得列表元素的所有可能组合?

问题:如何获得列表元素的所有可能组合?

我有一个包含15个数字的列表,我需要编写一些代码来生成这些数字的所有32,768个组合。

我已经找到了一些代码(通过Googling),这些代码显然可以满足我的需求,但是我发现代码相当不透明并且对使用它很谨慎。另外,我觉得必须有一个更优雅的解决方案。

对我而言,唯一发生的就是循环遍历十进制整数1–32768,并将其转换为二进制,然后使用二进制表示形式作为筛选器来选择适当的数字。

有谁知道更好的方法吗?使用map(),也许?

I have a list with 15 numbers in, and I need to write some code that produces all 32,768 combinations of those numbers.

I’ve found some code (by Googling) that apparently does what I’m looking for, but I found the code fairly opaque and am wary of using it. Plus I have a feeling there must be a more elegant solution.

The only thing that occurs to me would be to just loop through the decimal integers 1–32768 and convert those to binary, and use the binary representation as a filter to pick out the appropriate numbers.

Does anyone know of a better way? Using map(), maybe?


回答 0

看看itertools.combinations

itertools.combinations(iterable, r)

从可迭代的输入中返回元素的r长度子序列。

组合按字典顺序排序。因此,如果将输入的iterable排序,则将按排序顺序生成组合元组。

从2.6开始,包括电池!

Have a look at itertools.combinations:

itertools.combinations(iterable, r)

Return r length subsequences of elements from the input iterable.

Combinations are emitted in lexicographic sort order. So, if the input iterable is sorted, the combination tuples will be produced in sorted order.

Since 2.6, batteries are included!


回答 1

这个答案错过了一个方面:OP要求所有组合……而不仅仅是长度“ r”的组合。

因此,您要么必须遍历所有长度“ L”:

import itertools

stuff = [1, 2, 3]
for L in range(0, len(stuff)+1):
    for subset in itertools.combinations(stuff, L):
        print(subset)

或者-如果您想变得眼花or乱(或者想让以后阅读代码的人都屈服)-您可以生成“ combinations()”生成器链,然后进行迭代:

from itertools import chain, combinations
def all_subsets(ss):
    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

for subset in all_subsets(stuff):
    print(subset)

This answer missed one aspect: the OP asked for ALL combinations… not just combinations of length “r”.

So you’d either have to loop through all lengths “L”:

import itertools

stuff = [1, 2, 3]
for L in range(0, len(stuff)+1):
    for subset in itertools.combinations(stuff, L):
        print(subset)

Or — if you want to get snazzy (or bend the brain of whoever reads your code after you) — you can generate the chain of “combinations()” generators, and iterate through that:

from itertools import chain, combinations
def all_subsets(ss):
    return chain(*map(lambda x: combinations(ss, x), range(0, len(ss)+1)))

for subset in all_subsets(stuff):
    print(subset)

回答 2

这是一个懒惰的单行代码,也使用itertools:

from itertools import compress, product

def combinations(items):
    return ( set(compress(items,mask)) for mask in product(*[[0,1]]*len(items)) )
    # alternative:                      ...in product([0,1], repeat=len(items)) )

答案背后的主要思想是:有2 ^ N个组合-与长度为N的二进制字符串的数目相同。对于每个二进制字符串,请选择与“ 1”相对应的所有元素。

items=abc * mask=###
 |
 V
000 -> 
001 ->   c
010 ->  b
011 ->  bc
100 -> a
101 -> a c
110 -> ab
111 -> abc

注意事项:

  • 这就需要你可以调用len(...)items(解决方法:如果items是这样的迭代就像一台生成器,与第一把它变成一个列表items=list(_itemsArg)
  • 这要求迭代的顺序items不是随机的(解决方法:不要疯了)
  • 这就要求项目是独一无二的,要不然{2,2,1}{2,1,1}都将崩溃{2,1}(解决方法:使用collections.Counter作为一个下拉更换set;它基本上是一个多集…尽管你可能需要以后使用tuple(sorted(Counter(...).elements())),如果你需要它是可哈希)

演示版

>>> list(combinations(range(4)))
[set(), {3}, {2}, {2, 3}, {1}, {1, 3}, {1, 2}, {1, 2, 3}, {0}, {0, 3}, {0, 2}, {0, 2, 3}, {0, 1}, {0, 1, 3}, {0, 1, 2}, {0, 1, 2, 3}]

>>> list(combinations('abcd'))
[set(), {'d'}, {'c'}, {'c', 'd'}, {'b'}, {'b', 'd'}, {'c', 'b'}, {'c', 'b', 'd'}, {'a'}, {'a', 'd'}, {'a', 'c'}, {'a', 'c', 'd'}, {'a', 'b'}, {'a', 'b', 'd'}, {'a', 'c', 'b'}, {'a', 'c', 'b', 'd'}]

Here’s a lazy one-liner, also using itertools:

from itertools import compress, product

def combinations(items):
    return ( set(compress(items,mask)) for mask in product(*[[0,1]]*len(items)) )
    # alternative:                      ...in product([0,1], repeat=len(items)) )

Main idea behind this answer: there are 2^N combinations — same as the number of binary strings of length N. For each binary string, you pick all elements corresponding to a “1”.

items=abc * mask=###
 |
 V
000 -> 
001 ->   c
010 ->  b
011 ->  bc
100 -> a
101 -> a c
110 -> ab
111 -> abc

Things to consider:

  • This requires that you can call len(...) on items (workaround: if items is something like an iterable like a generator, turn it into a list first with items=list(_itemsArg))
  • This requires that the order of iteration on items is not random (workaround: don’t be insane)
  • This requires that the items are unique, or else {2,2,1} and {2,1,1} will both collapse to {2,1} (workaround: use collections.Counter as a drop-in replacement for set; it’s basically a multiset… though you may need to later use tuple(sorted(Counter(...).elements())) if you need it to be hashable)

Demo

>>> list(combinations(range(4)))
[set(), {3}, {2}, {2, 3}, {1}, {1, 3}, {1, 2}, {1, 2, 3}, {0}, {0, 3}, {0, 2}, {0, 2, 3}, {0, 1}, {0, 1, 3}, {0, 1, 2}, {0, 1, 2, 3}]

>>> list(combinations('abcd'))
[set(), {'d'}, {'c'}, {'c', 'd'}, {'b'}, {'b', 'd'}, {'c', 'b'}, {'c', 'b', 'd'}, {'a'}, {'a', 'd'}, {'a', 'c'}, {'a', 'c', 'd'}, {'a', 'b'}, {'a', 'b', 'd'}, {'a', 'c', 'b'}, {'a', 'c', 'b', 'd'}]

回答 3

在@Dan H 极力支持的答案的评论powerset()中,itertools文档中提到了该配方—包括Dan本人的配方。但是,到目前为止,还没有人将其发布为答案。由于它可能是解决问题的最佳方法,即使不是最好的方法之一,并且在另一位评论者的鼓励下,它显示如下。该函数产生的所有列表中的元件的独特组合长度可能的(包括那些含有零和所有的元素)。

注意:如果,微妙的不同,目标是获得唯一的独特元素的组合,改线s = list(iterable),以s = list(set(iterable))消除任何重复的元素。无论如何,iterable最终会变成的事实list意味着它将与生成器一起使用(与其他几个答案不同)。

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)  # allows duplicate elements
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

stuff = [1, 2, 3]
for i, combo in enumerate(powerset(stuff), 1):
    print('combo #{}: {}'.format(i, combo))

输出:

combo #1: ()
combo #2: (1,)
combo #3: (2,)
combo #4: (3,)
combo #5: (1, 2)
combo #6: (1, 3)
combo #7: (2, 3)
combo #8: (1, 2, 3)

In comments under the highly upvoted answer by @Dan H, mention is made of the powerset() recipe in the itertools documentation—including one by Dan himself. However, so far no one has posted it as an answer. Since it’s probably one of the better if not the best approach to the problem—and given a little encouragement from another commenter, it’s shown below. The function produces all unique combinations of the list elements of every length possible (including those containing zero and all the elements).

Note: If the, subtly different, goal is to obtain only combinations of unique elements, change the line s = list(iterable) to s = list(set(iterable)) to eliminate any duplicate elements. Regardless, the fact that the iterable is ultimately turned into a list means it will work with generators (unlike several of the other answers).

from itertools import chain, combinations

def powerset(iterable):
    "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)"
    s = list(iterable)  # allows duplicate elements
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

stuff = [1, 2, 3]
for i, combo in enumerate(powerset(stuff), 1):
    print('combo #{}: {}'.format(i, combo))

Output:

combo #1: ()
combo #2: (1,)
combo #3: (2,)
combo #4: (3,)
combo #5: (1, 2)
combo #6: (1, 3)
combo #7: (2, 3)
combo #8: (1, 2, 3)

回答 4

这是使用递归的一种:

>>> import copy
>>> def combinations(target,data):
...     for i in range(len(data)):
...         new_target = copy.copy(target)
...         new_data = copy.copy(data)
...         new_target.append(data[i])
...         new_data = data[i+1:]
...         print new_target
...         combinations(new_target,
...                      new_data)
...                      
... 
>>> target = []
>>> data = ['a','b','c','d']
>>> 
>>> combinations(target,data)
['a']
['a', 'b']
['a', 'b', 'c']
['a', 'b', 'c', 'd']
['a', 'b', 'd']
['a', 'c']
['a', 'c', 'd']
['a', 'd']
['b']
['b', 'c']
['b', 'c', 'd']
['b', 'd']
['c']
['c', 'd']
['d']

Here is one using recursion:

>>> import copy
>>> def combinations(target,data):
...     for i in range(len(data)):
...         new_target = copy.copy(target)
...         new_data = copy.copy(data)
...         new_target.append(data[i])
...         new_data = data[i+1:]
...         print new_target
...         combinations(new_target,
...                      new_data)
...                      
... 
>>> target = []
>>> data = ['a','b','c','d']
>>> 
>>> combinations(target,data)
['a']
['a', 'b']
['a', 'b', 'c']
['a', 'b', 'c', 'd']
['a', 'b', 'd']
['a', 'c']
['a', 'c', 'd']
['a', 'd']
['b']
['b', 'c']
['b', 'c', 'd']
['b', 'd']
['c']
['c', 'd']
['d']

回答 5

这种单行代码为您提供了所有组合(如果原始列表/集合包含不同的元素,则在0n项之间n),并使用本机方法itertools.combinations

Python 2

from itertools import combinations

input = ['a', 'b', 'c', 'd']

output = sum([map(list, combinations(input, i)) for i in range(len(input) + 1)], [])

Python 3

from itertools import combinations

input = ['a', 'b', 'c', 'd']

output = sum([list(map(list, combinations(input, i))) for i in range(len(input) + 1)], [])

输出将是:

[[],
 ['a'],
 ['b'],
 ['c'],
 ['d'],
 ['a', 'b'],
 ['a', 'c'],
 ['a', 'd'],
 ['b', 'c'],
 ['b', 'd'],
 ['c', 'd'],
 ['a', 'b', 'c'],
 ['a', 'b', 'd'],
 ['a', 'c', 'd'],
 ['b', 'c', 'd'],
 ['a', 'b', 'c', 'd']]

在线尝试:

http://ideone.com/COghfX

This one-liner gives you all the combinations (between 0 and n items if the original list/set contains n distinct elements) and uses the native method itertools.combinations:

Python 2

from itertools import combinations

input = ['a', 'b', 'c', 'd']

output = sum([map(list, combinations(input, i)) for i in range(len(input) + 1)], [])

Python 3

from itertools import combinations

input = ['a', 'b', 'c', 'd']

output = sum([list(map(list, combinations(input, i))) for i in range(len(input) + 1)], [])

The output will be:

[[],
 ['a'],
 ['b'],
 ['c'],
 ['d'],
 ['a', 'b'],
 ['a', 'c'],
 ['a', 'd'],
 ['b', 'c'],
 ['b', 'd'],
 ['c', 'd'],
 ['a', 'b', 'c'],
 ['a', 'b', 'd'],
 ['a', 'c', 'd'],
 ['b', 'c', 'd'],
 ['a', 'b', 'c', 'd']]

Try it online:

http://ideone.com/COghfX


回答 6

我同意Dan H的观点,Ben确实要求所有组合。itertools.combinations()没有给出所有组合。

另一个问题是,如果可迭代的输入很大,则最好返回一个生成器而不是列表中的所有内容:

iterable = range(10)
for s in xrange(len(iterable)+1):
  for comb in itertools.combinations(iterable, s):
    yield comb

I agree with Dan H that Ben indeed asked for all combinations. itertools.combinations() does not give all combinations.

Another issue is, if the input iterable is big, it is perhaps better to return a generator instead of everything in a list:

iterable = range(10)
for s in xrange(len(iterable)+1):
  for comb in itertools.combinations(iterable, s):
    yield comb

回答 7

这是一种可以轻松转移到支持递归的所有编程语言的方法(没有itertools,没有yield,没有列表理解)

def combs(a):
    if len(a) == 0:
        return [[]]
    cs = []
    for c in combs(a[1:]):
        cs += [c, c+[a[0]]]
    return cs

>>> combs([1,2,3,4,5])
[[], [1], [2], [2, 1], [3], [3, 1], [3, 2], ..., [5, 4, 3, 2, 1]]

This is an approach that can be easily transfered to all programming languages supporting recursion (no itertools, no yield, no list comprehension):

def combs(a):
    if len(a) == 0:
        return [[]]
    cs = []
    for c in combs(a[1:]):
        cs += [c, c+[a[0]]]
    return cs

>>> combs([1,2,3,4,5])
[[], [1], [2], [2, 1], [3], [3, 1], [3, 2], ..., [5, 4, 3, 2, 1]]

回答 8

您可以使用以下简单代码在python中生成列表的所有组合

import itertools

a = [1,2,3,4]
for i in xrange(0,len(a)+1):
   print list(itertools.combinations(a,i))

结果将是:

[()]
[(1,), (2,), (3,), (4,)]
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]
[(1, 2, 3, 4)]

You can generating all combinations of a list in python using this simple code

import itertools

a = [1,2,3,4]
for i in xrange(0,len(a)+1):
   print list(itertools.combinations(a,i))

Result would be :

[()]
[(1,), (2,), (3,), (4,)]
[(1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4)]
[(1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4)]
[(1, 2, 3, 4)]

回答 9

我认为我可以为那些寻求答案的人添加此功能,而无需导入itertools或任何其他额外的库。

def powerSet(items):
    """
    Power set generator: get all possible combinations of a list’s elements

    Input:
        items is a list
    Output:
        returns 2**n combination lists one at a time using a generator 

    Reference: edx.org 6.00.2x Lecture 2 - Decision Trees and dynamic programming
    """

    N = len(items)
    # enumerate the 2**N possible combinations
    for i in range(2**N):
        combo = []
        for j in range(N):
            # test bit jth of integer i
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        yield combo

简单良率生成器用法:

for i in powerSet([1,2,3,4]):
    print (i, ", ",  end="")

上面用法示例的输出:

[],[1],[2],[1,2],[3],[1,3],[2,3],[1,2,3],[4],[1,4] ,[2、4],[1、2、4],[3、4],[1、3、4],[2、3、4],[1、2、3、4],

I thought I would add this function for those seeking an answer without importing itertools or any other extra libraries.

def powerSet(items):
    """
    Power set generator: get all possible combinations of a list’s elements

    Input:
        items is a list
    Output:
        returns 2**n combination lists one at a time using a generator 

    Reference: edx.org 6.00.2x Lecture 2 - Decision Trees and dynamic programming
    """

    N = len(items)
    # enumerate the 2**N possible combinations
    for i in range(2**N):
        combo = []
        for j in range(N):
            # test bit jth of integer i
            if (i >> j) % 2 == 1:
                combo.append(items[j])
        yield combo

Simple Yield Generator Usage:

for i in powerSet([1,2,3,4]):
    print (i, ", ",  end="")

Output from Usage example above:

[] , [1] , [2] , [1, 2] , [3] , [1, 3] , [2, 3] , [1, 2, 3] , [4] , [1, 4] , [2, 4] , [1, 2, 4] , [3, 4] , [1, 3, 4] , [2, 3, 4] , [1, 2, 3, 4] ,


回答 10

这是涉及使用itertools.combinations函数的另一种解决方案(单行),但是这里我们使用了双重列表理解(与for循环或求和相对):

def combs(x):
    return [c for i in range(len(x)+1) for c in combinations(x,i)]

演示:

>>> combs([1,2,3,4])
[(), 
 (1,), (2,), (3,), (4,), 
 (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4), 
 (1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4), 
 (1, 2, 3, 4)]

Here is yet another solution (one-liner), involving using the itertools.combinations function, but here we use a double list comprehension (as opposed to a for loop or sum):

def combs(x):
    return [c for i in range(len(x)+1) for c in combinations(x,i)]

Demo:

>>> combs([1,2,3,4])
[(), 
 (1,), (2,), (3,), (4,), 
 (1, 2), (1, 3), (1, 4), (2, 3), (2, 4), (3, 4), 
 (1, 2, 3), (1, 2, 4), (1, 3, 4), (2, 3, 4), 
 (1, 2, 3, 4)]

回答 11

from itertools import permutations, combinations


features = ['A', 'B', 'C']
tmp = []
for i in range(len(features)):
    oc = combinations(features, i + 1)
    for c in oc:
        tmp.append(list(c))

输出

[
 ['A'],
 ['B'],
 ['C'],
 ['A', 'B'],
 ['A', 'C'],
 ['B', 'C'],
 ['A', 'B', 'C']
]
from itertools import permutations, combinations


features = ['A', 'B', 'C']
tmp = []
for i in range(len(features)):
    oc = combinations(features, i + 1)
    for c in oc:
        tmp.append(list(c))

output

[
 ['A'],
 ['B'],
 ['C'],
 ['A', 'B'],
 ['A', 'C'],
 ['B', 'C'],
 ['A', 'B', 'C']
]

回答 12

下面是一个“标准递归答案”,类似于其他类似的答案https://stackoverflow.com/a/23743696/711085。(我们实际上不必担心会耗尽堆栈空间,因为我们不可能处理所有N!个置换。)

它依次访问每个元素,然后接受或离开它(我们可以从该算法直接看到2 ^ N基数)。

def combs(xs, i=0):
    if i==len(xs):
        yield ()
        return
    for c in combs(xs,i+1):
        yield c
        yield c+(xs[i],)

演示:

>>> list( combs(range(5)) )
[(), (0,), (1,), (1, 0), (2,), (2, 0), (2, 1), (2, 1, 0), (3,), (3, 0), (3, 1), (3, 1, 0), (3, 2), (3, 2, 0), (3, 2, 1), (3, 2, 1, 0), (4,), (4, 0), (4, 1), (4, 1, 0), (4, 2), (4, 2, 0), (4, 2, 1), (4, 2, 1, 0), (4, 3), (4, 3, 0), (4, 3, 1), (4, 3, 1, 0), (4, 3, 2), (4, 3, 2, 0), (4, 3, 2, 1), (4, 3, 2, 1, 0)]

>>> list(sorted( combs(range(5)), key=len))
[(), 
 (0,), (1,), (2,), (3,), (4,), 
 (1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (4, 3), 
 (2, 1, 0), (3, 1, 0), (3, 2, 0), (3, 2, 1), (4, 1, 0), (4, 2, 0), (4, 2, 1), (4, 3, 0), (4, 3, 1), (4, 3, 2), 
 (3, 2, 1, 0), (4, 2, 1, 0), (4, 3, 1, 0), (4, 3, 2, 0), (4, 3, 2, 1), 
 (4, 3, 2, 1, 0)]

>>> len(set(combs(range(5))))
32

Below is a “standard recursive answer”, similar to the other similar answer https://stackoverflow.com/a/23743696/711085 . (We don’t realistically have to worry about running out of stack space since there’s no way we could process all N! permutations.)

It visits every element in turn, and either takes it or leaves it (we can directly see the 2^N cardinality from this algorithm).

def combs(xs, i=0):
    if i==len(xs):
        yield ()
        return
    for c in combs(xs,i+1):
        yield c
        yield c+(xs[i],)

Demo:

>>> list( combs(range(5)) )
[(), (0,), (1,), (1, 0), (2,), (2, 0), (2, 1), (2, 1, 0), (3,), (3, 0), (3, 1), (3, 1, 0), (3, 2), (3, 2, 0), (3, 2, 1), (3, 2, 1, 0), (4,), (4, 0), (4, 1), (4, 1, 0), (4, 2), (4, 2, 0), (4, 2, 1), (4, 2, 1, 0), (4, 3), (4, 3, 0), (4, 3, 1), (4, 3, 1, 0), (4, 3, 2), (4, 3, 2, 0), (4, 3, 2, 1), (4, 3, 2, 1, 0)]

>>> list(sorted( combs(range(5)), key=len))
[(), 
 (0,), (1,), (2,), (3,), (4,), 
 (1, 0), (2, 0), (2, 1), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2), (4, 3), 
 (2, 1, 0), (3, 1, 0), (3, 2, 0), (3, 2, 1), (4, 1, 0), (4, 2, 0), (4, 2, 1), (4, 3, 0), (4, 3, 1), (4, 3, 2), 
 (3, 2, 1, 0), (4, 2, 1, 0), (4, 3, 1, 0), (4, 3, 2, 0), (4, 3, 2, 1), 
 (4, 3, 2, 1, 0)]

>>> len(set(combs(range(5))))
32

回答 13

使用列表理解:

def selfCombine( list2Combine, length ):
    listCombined = str( ['list2Combine[i' + str( i ) + ']' for i in range( length )] ).replace( "'", '' ) \
                     + 'for i0 in range(len( list2Combine ) )'
    if length > 1:
        listCombined += str( [' for i' + str( i ) + ' in range( i' + str( i - 1 ) + ', len( list2Combine ) )' for i in range( 1, length )] )\
            .replace( "', '", ' ' )\
            .replace( "['", '' )\
            .replace( "']", '' )

    listCombined = '[' + listCombined + ']'
    listCombined = eval( listCombined )

    return listCombined

list2Combine = ['A', 'B', 'C']
listCombined = selfCombine( list2Combine, 2 )

输出为:

['A', 'A']
['A', 'B']
['A', 'C']
['B', 'B']
['B', 'C']
['C', 'C']

Using list comprehension:

def selfCombine( list2Combine, length ):
    listCombined = str( ['list2Combine[i' + str( i ) + ']' for i in range( length )] ).replace( "'", '' ) \
                     + 'for i0 in range(len( list2Combine ) )'
    if length > 1:
        listCombined += str( [' for i' + str( i ) + ' in range( i' + str( i - 1 ) + ', len( list2Combine ) )' for i in range( 1, length )] )\
            .replace( "', '", ' ' )\
            .replace( "['", '' )\
            .replace( "']", '' )

    listCombined = '[' + listCombined + ']'
    listCombined = eval( listCombined )

    return listCombined

list2Combine = ['A', 'B', 'C']
listCombined = selfCombine( list2Combine, 2 )

Output would be:

['A', 'A']
['A', 'B']
['A', 'C']
['B', 'B']
['B', 'C']
['C', 'C']

回答 14

该代码采用带有嵌套列表的简单算法…

# FUNCTION getCombos: To generate all combos of an input list, consider the following sets of nested lists...
#
#           [ [ [] ] ]
#           [ [ [] ], [ [A] ] ]
#           [ [ [] ], [ [A],[B] ],         [ [A,B] ] ]
#           [ [ [] ], [ [A],[B],[C] ],     [ [A,B],[A,C],[B,C] ],                   [ [A,B,C] ] ]
#           [ [ [] ], [ [A],[B],[C],[D] ], [ [A,B],[A,C],[B,C],[A,D],[B,D],[C,D] ], [ [A,B,C],[A,B,D],[A,C,D],[B,C,D] ], [ [A,B,C,D] ] ]
#
#  There is a set of lists for each number of items that will occur in a combo (including an empty set).
#  For each additional item, begin at the back of the list by adding an empty list, then taking the set of
#  lists in the previous column (e.g., in the last list, for sets of 3 items you take the existing set of
#  3-item lists and append to it additional lists created by appending the item (4) to the lists in the
#  next smallest item count set. In this case, for the three sets of 2-items in the previous list. Repeat
#  for each set of lists back to the initial list containing just the empty list.
#

def getCombos(listIn = ['A','B','C','D','E','F'] ):
    listCombos = [ [ [] ] ]     # list of lists of combos, seeded with a list containing only the empty list
    listSimple = []             # list to contain the final returned list of items (e.g., characters)

    for item in listIn:
        listCombos.append([])   # append an emtpy list to the end for each new item added
        for index in xrange(len(listCombos)-1, 0, -1):  # set the index range to work through the list
            for listPrev in listCombos[index-1]:        # retrieve the lists from the previous column
                listCur = listPrev[:]                   # create a new temporary list object to update
                listCur.append(item)                    # add the item to the previous list to make it current
                listCombos[index].append(listCur)       # list length and append it to the current list

                itemCombo = ''                          # Create a str to concatenate list items into a str
                for item in listCur:                    # concatenate the members of the lists to create
                    itemCombo += item                   # create a string of items
                listSimple.append(itemCombo)            # add to the final output list

    return [listSimple, listCombos]
# END getCombos()

This code employs a simple algorithm with nested lists…

# FUNCTION getCombos: To generate all combos of an input list, consider the following sets of nested lists...
#
#           [ [ [] ] ]
#           [ [ [] ], [ [A] ] ]
#           [ [ [] ], [ [A],[B] ],         [ [A,B] ] ]
#           [ [ [] ], [ [A],[B],[C] ],     [ [A,B],[A,C],[B,C] ],                   [ [A,B,C] ] ]
#           [ [ [] ], [ [A],[B],[C],[D] ], [ [A,B],[A,C],[B,C],[A,D],[B,D],[C,D] ], [ [A,B,C],[A,B,D],[A,C,D],[B,C,D] ], [ [A,B,C,D] ] ]
#
#  There is a set of lists for each number of items that will occur in a combo (including an empty set).
#  For each additional item, begin at the back of the list by adding an empty list, then taking the set of
#  lists in the previous column (e.g., in the last list, for sets of 3 items you take the existing set of
#  3-item lists and append to it additional lists created by appending the item (4) to the lists in the
#  next smallest item count set. In this case, for the three sets of 2-items in the previous list. Repeat
#  for each set of lists back to the initial list containing just the empty list.
#

def getCombos(listIn = ['A','B','C','D','E','F'] ):
    listCombos = [ [ [] ] ]     # list of lists of combos, seeded with a list containing only the empty list
    listSimple = []             # list to contain the final returned list of items (e.g., characters)

    for item in listIn:
        listCombos.append([])   # append an emtpy list to the end for each new item added
        for index in xrange(len(listCombos)-1, 0, -1):  # set the index range to work through the list
            for listPrev in listCombos[index-1]:        # retrieve the lists from the previous column
                listCur = listPrev[:]                   # create a new temporary list object to update
                listCur.append(item)                    # add the item to the previous list to make it current
                listCombos[index].append(listCur)       # list length and append it to the current list

                itemCombo = ''                          # Create a str to concatenate list items into a str
                for item in listCur:                    # concatenate the members of the lists to create
                    itemCombo += item                   # create a string of items
                listSimple.append(itemCombo)            # add to the final output list

    return [listSimple, listCombos]
# END getCombos()

回答 15

我知道这是更为实际使用itertools得到所有的组合,但你可以想代码实现这个部分,只有列表中理解,如果你这么碰巧的愿望,给予了很多

对于两对组合:

    lambda l: [(a, b) for i, a in enumerate(l) for b in l[i+1:]]


而且,对于三对组合,就这么简单:

    lambda l: [(a, b, c) for i, a in enumerate(l) for ii, b in enumerate(l[i+1:]) for c in l[i+ii+2:]]


结果与使用itertools.combinations相同:

import itertools
combs_3 = lambda l: [
    (a, b, c) for i, a in enumerate(l) 
    for ii, b in enumerate(l[i+1:]) 
    for c in l[i+ii+2:]
]
data = ((1, 2), 5, "a", None)
print("A:", list(itertools.combinations(data, 3)))
print("B:", combs_3(data))
# A: [((1, 2), 5, 'a'), ((1, 2), 5, None), ((1, 2), 'a', None), (5, 'a', None)]
# B: [((1, 2), 5, 'a'), ((1, 2), 5, None), ((1, 2), 'a', None), (5, 'a', None)]

I know it’s far more practical to use itertools to get the all the combinations, but you can achieve this partly with only list comprehension if you so happen to desire, granted you want to code a lot

For combinations of two pairs:

    lambda l: [(a, b) for i, a in enumerate(l) for b in l[i+1:]]


And, for combinations of three pairs, it’s as easy as this:

    lambda l: [(a, b, c) for i, a in enumerate(l) for ii, b in enumerate(l[i+1:]) for c in l[i+ii+2:]]


The result is identical to using itertools.combinations:
import itertools
combs_3 = lambda l: [
    (a, b, c) for i, a in enumerate(l) 
    for ii, b in enumerate(l[i+1:]) 
    for c in l[i+ii+2:]
]
data = ((1, 2), 5, "a", None)
print("A:", list(itertools.combinations(data, 3)))
print("B:", combs_3(data))
# A: [((1, 2), 5, 'a'), ((1, 2), 5, None), ((1, 2), 'a', None), (5, 'a', None)]
# B: [((1, 2), 5, 'a'), ((1, 2), 5, None), ((1, 2), 'a', None), (5, 'a', None)]

回答 16

不使用itertools:

def combine(inp):
    return combine_helper(inp, [], [])


def combine_helper(inp, temp, ans):
    for i in range(len(inp)):
        current = inp[i]
        remaining = inp[i + 1:]
        temp.append(current)
        ans.append(tuple(temp))
        combine_helper(remaining, temp, ans)
        temp.pop()
    return ans


print(combine(['a', 'b', 'c', 'd']))

Without using itertools:

def combine(inp):
    return combine_helper(inp, [], [])


def combine_helper(inp, temp, ans):
    for i in range(len(inp)):
        current = inp[i]
        remaining = inp[i + 1:]
        temp.append(current)
        ans.append(tuple(temp))
        combine_helper(remaining, temp, ans)
        temp.pop()
    return ans


print(combine(['a', 'b', 'c', 'd']))

回答 17

这是两个实现 itertools.combinations

返回列表的一个

def combinations(lst, depth, start=0, items=[]):
    if depth <= 0:
        return [items]
    out = []
    for i in range(start, len(lst)):
        out += combinations(lst, depth - 1, i + 1, items + [lst[i]])
    return out

一个还生成器

def combinations(lst, depth, start=0, prepend=[]):
    if depth <= 0:
        yield prepend
    else:
        for i in range(start, len(lst)):
            for c in combinations(lst, depth - 1, i + 1, prepend + [lst[i]]):
                yield c

请注意,建议为这些函数提供帮助函数,因为prepend参数是静态的,并且不会在每次调用时更改

print([c for c in combinations([1, 2, 3, 4], 3)])
# [[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]

# get a hold of prepend
prepend = [c for c in combinations([], -1)][0]
prepend.append(None)

print([c for c in combinations([1, 2, 3, 4], 3)])
# [[None, 1, 2, 3], [None, 1, 2, 4], [None, 1, 3, 4], [None, 2, 3, 4]]

这是一个非常肤浅的案例,但总比后悔要安全

Here are two implementations of itertools.combinations

One that returns a list

def combinations(lst, depth, start=0, items=[]):
    if depth <= 0:
        return [items]
    out = []
    for i in range(start, len(lst)):
        out += combinations(lst, depth - 1, i + 1, items + [lst[i]])
    return out

One returns a generator

def combinations(lst, depth, start=0, prepend=[]):
    if depth <= 0:
        yield prepend
    else:
        for i in range(start, len(lst)):
            for c in combinations(lst, depth - 1, i + 1, prepend + [lst[i]]):
                yield c

Please note that providing a helper function to those is advised because the prepend argument is static and is not changing with every call

print([c for c in combinations([1, 2, 3, 4], 3)])
# [[1, 2, 3], [1, 2, 4], [1, 3, 4], [2, 3, 4]]

# get a hold of prepend
prepend = [c for c in combinations([], -1)][0]
prepend.append(None)

print([c for c in combinations([1, 2, 3, 4], 3)])
# [[None, 1, 2, 3], [None, 1, 2, 4], [None, 1, 3, 4], [None, 2, 3, 4]]

This is a very superficial case but better be safe than sorry


回答 18

怎么样..使用字符串而不是列表,但是同样的事情..字符串可以像Python中的列表一样对待:

def comb(s, res):
    if not s: return
    res.add(s)
    for i in range(0, len(s)):
        t = s[0:i] + s[i + 1:]
        comb(t, res)

res = set()
comb('game', res) 

print(res)

How about this.. used a string instead of list, but same thing.. string can be treated like a list in Python:

def comb(s, res):
    if not s: return
    res.add(s)
    for i in range(0, len(s)):
        t = s[0:i] + s[i + 1:]
        comb(t, res)

res = set()
comb('game', res) 

print(res)

回答 19

来自itertools的组合

import itertools
col_names = ["aa","bb", "cc", "dd"]
all_combinations = itertools.chain(*[itertools.combinations(col_names,i+1) for i,_ in enumerate(col_names)])
print(list(all_combinations))

谢谢

Combination from itertools

import itertools
col_names = ["aa","bb", "cc", "dd"]
all_combinations = itertools.chain(*[itertools.combinations(col_names,i+1) for i,_ in enumerate(col_names)])
print(list(all_combinations))

Thanks


回答 20

没有 itertools Python 3,您可以执行以下操作:

def combinations(arr, carry):
    for i in range(len(arr)):
        yield carry + arr[i]
        yield from combinations(arr[i + 1:], carry + arr[i])

最初在哪里 carry = "".

Without itertools in Python 3 you could do something like this:

def combinations(arr, carry):
    for i in range(len(arr)):
        yield carry + arr[i]
        yield from combinations(arr[i + 1:], carry + arr[i])

where initially carry = "".


回答 21

3个功能:

  1. n个元素的所有组合列表
  2. n个元素的所有组合列出了顺序不同的地方
  3. 所有排列
import sys

def permutations(a):
    return combinations(a, len(a))

def combinations(a, n):
    if n == 1:
        for x in a:
            yield [x]
    else:
        for i in range(len(a)):
            for x in combinations(a[:i] + a[i+1:], n-1):
                yield [a[i]] + x

def combinationsNoOrder(a, n):
    if n == 1:
        for x in a:
            yield [x]
    else:
        for i in range(len(a)):
            for x in combinationsNoOrder(a[:i], n-1):
                yield [a[i]] + x

if __name__ == "__main__":
    for s in combinations(list(map(int, sys.argv[2:])), int(sys.argv[1])):
        print(s)

3 functions:

  1. all combinations of n elements list
  2. all combinations of n elements list where order is not distinct
  3. all permutations
import sys

def permutations(a):
    return combinations(a, len(a))

def combinations(a, n):
    if n == 1:
        for x in a:
            yield [x]
    else:
        for i in range(len(a)):
            for x in combinations(a[:i] + a[i+1:], n-1):
                yield [a[i]] + x

def combinationsNoOrder(a, n):
    if n == 1:
        for x in a:
            yield [x]
    else:
        for i in range(len(a)):
            for x in combinationsNoOrder(a[:i], n-1):
                yield [a[i]] + x

if __name__ == "__main__":
    for s in combinations(list(map(int, sys.argv[2:])), int(sys.argv[1])):
        print(s)

回答 22

这是我的实现

    def get_combinations(list_of_things):
    """gets every combination of things in a list returned as a list of lists

    Should be read : add all combinations of a certain size to the end of a list for every possible size in the
    the list_of_things.

    """
    list_of_combinations = [list(combinations_of_a_certain_size)
                            for possible_size_of_combinations in range(1,  len(list_of_things))
                            for combinations_of_a_certain_size in itertools.combinations(list_of_things,
                                                                                         possible_size_of_combinations)]
    return list_of_combinations

This is my implementation

    def get_combinations(list_of_things):
    """gets every combination of things in a list returned as a list of lists

    Should be read : add all combinations of a certain size to the end of a list for every possible size in the
    the list_of_things.

    """
    list_of_combinations = [list(combinations_of_a_certain_size)
                            for possible_size_of_combinations in range(1,  len(list_of_things))
                            for combinations_of_a_certain_size in itertools.combinations(list_of_things,
                                                                                         possible_size_of_combinations)]
    return list_of_combinations

回答 23

您也可以使用优质包装中的Powerset功能more_itertools

from more_itertools import powerset

l = [1,2,3]
list(powerset(l))

# [(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]

我们还可以验证它是否符合OP的要求

from more_itertools import ilen

assert ilen(powerset(range(15))) == 32_768

You can also use the powerset function from the excellent more_itertools package.

from more_itertools import powerset

l = [1,2,3]
list(powerset(l))

# [(), (1,), (2,), (3,), (1, 2), (1, 3), (2, 3), (1, 2, 3)]

We can also verify, that it meets OP’s requirement

from more_itertools import ilen

assert ilen(powerset(range(15))) == 32_768

回答 24

def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
    return
indices = range(r)
yield tuple(pool[i] for i in indices)
while True:
    for i in reversed(range(r)):
        if indices[i] != i + n - r:
            break
    else:
        return
    indices[i] += 1
    for j in range(i+1, r):
        indices[j] = indices[j-1] + 1
    yield tuple(pool[i] for i in indices)


x = [2, 3, 4, 5, 1, 6, 4, 7, 8, 3, 9]
for i in combinations(x, 2):
    print i
def combinations(iterable, r):
# combinations('ABCD', 2) --> AB AC AD BC BD CD
# combinations(range(4), 3) --> 012 013 023 123
pool = tuple(iterable)
n = len(pool)
if r > n:
    return
indices = range(r)
yield tuple(pool[i] for i in indices)
while True:
    for i in reversed(range(r)):
        if indices[i] != i + n - r:
            break
    else:
        return
    indices[i] += 1
    for j in range(i+1, r):
        indices[j] = indices[j-1] + 1
    yield tuple(pool[i] for i in indices)


x = [2, 3, 4, 5, 1, 6, 4, 7, 8, 3, 9]
for i in combinations(x, 2):
    print i

回答 25

如果有人正在寻找反向列表,就像我曾经那样:

stuff = [1, 2, 3, 4]

def reverse(bla, y):
    for subset in itertools.combinations(bla, len(bla)-y):
        print list(subset)
    if y != len(bla):
        y += 1
        reverse(bla, y)

reverse(stuff, 1)

If someone is looking for a reversed list, like I was:

stuff = [1, 2, 3, 4]

def reverse(bla, y):
    for subset in itertools.combinations(bla, len(bla)-y):
        print list(subset)
    if y != len(bla):
        y += 1
        reverse(bla, y)

reverse(stuff, 1)

回答 26

flag = 0
requiredCals =12
from itertools import chain, combinations

def powerset(iterable):
    s = list(iterable)  # allows duplicate elements
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

stuff = [2,9,5,1,6]
for i, combo in enumerate(powerset(stuff), 1):
    if(len(combo)>0):
        #print(combo , sum(combo))
        if(sum(combo)== requiredCals):
            flag = 1
            break
if(flag==1):
    print('True')
else:
    print('else')
flag = 0
requiredCals =12
from itertools import chain, combinations

def powerset(iterable):
    s = list(iterable)  # allows duplicate elements
    return chain.from_iterable(combinations(s, r) for r in range(len(s)+1))

stuff = [2,9,5,1,6]
for i, combo in enumerate(powerset(stuff), 1):
    if(len(combo)>0):
        #print(combo , sum(combo))
        if(sum(combo)== requiredCals):
            flag = 1
            break
if(flag==1):
    print('True')
else:
    print('else')