标签归档:Python

检查字符串是否以XXXX开头

问题:检查字符串是否以XXXX开头

我想知道如何检查Python中字符串是否以“ hello”开头。

在Bash中,我通常这样做:

if [[ "$string" =~ ^hello ]]; then
 do something here
fi

如何在Python中实现相同的目标?

I would like to know how to check whether a string starts with “hello” in Python.

In Bash I usually do:

if [[ "$string" =~ ^hello ]]; then
 do something here
fi

How do I achieve the same in Python?


回答 0

aString = "hello world"
aString.startswith("hello")

有关的更多信息startswith

aString = "hello world"
aString.startswith("hello")

More info about startswith.


回答 1

RanRag已经回答了您的特定问题。

但是,更一般地说,您在做什么

if [[ "$string" =~ ^hello ]]

正则表达式匹配。要在Python中执行相同的操作,您可以执行以下操作:

import re
if re.match(r'^hello', somestring):
    # do stuff

显然,在这种情况下somestring.startswith('hello')更好。

RanRag has already answered it for your specific question.

However, more generally, what you are doing with

if [[ "$string" =~ ^hello ]]

is a regex match. To do the same in Python, you would do:

import re
if re.match(r'^hello', somestring):
    # do stuff

Obviously, in this case, somestring.startswith('hello') is better.


回答 2

如果您想将多个单词与魔术单词匹配,则可以将单词匹配为元组:

>>> magicWord = 'zzzTest'
>>> magicWord.startswith(('zzz', 'yyy', 'rrr'))
True

注意startswithstr or a tuple of str

请参阅文档

In case you want to match multiple words to your magic word you can pass the words to match as a tuple:

>>> magicWord = 'zzzTest'
>>> magicWord.startswith(('zzz', 'yyy', 'rrr'))
True

Note: startswith takes str or a tuple of str

See the docs.


回答 3

也可以这样

regex=re.compile('^hello')

## THIS WAY YOU CAN CHECK FOR MULTIPLE STRINGS
## LIKE
## regex=re.compile('^hello|^john|^world')

if re.match(regex, somestring):
    print("Yes")

Can also be done this way..

regex=re.compile('^hello')

## THIS WAY YOU CAN CHECK FOR MULTIPLE STRINGS
## LIKE
## regex=re.compile('^hello|^john|^world')

if re.match(regex, somestring):
    print("Yes")

我可以将JSON加载到OrderedDict吗?

问题:我可以将JSON加载到OrderedDict吗?

好的,所以我可以在中使用OrderedDict json.dump。也就是说,OrderedDict可以用作JSON的输入。

但是可以用作输出吗?如果可以,怎么办?就我而言,我想load放入OrderedDict,以便可以将键的顺序保留在文件中。

如果没有,是否有某种解决方法?

Ok so I can use an OrderedDict in json.dump. That is, an OrderedDict can be used as an input to JSON.

But can it be used as an output? If so how? In my case I’d like to load into an OrderedDict so I can keep the order of the keys in the file.

If not, is there some kind of workaround?


回答 0

是的你可以。通过指定JSONDecoderobject_pairs_hook参数。实际上,这是文档中给出的确切示例。

>>> json.JSONDecoder(object_pairs_hook=collections.OrderedDict).decode('{"foo":1, "bar": 2}')
OrderedDict([('foo', 1), ('bar', 2)])
>>> 

您可以将此参数传递给json.loads(如果不需要出于其他目的的Decoder实例),如下所示:

>>> import json
>>> from collections import OrderedDict
>>> data = json.loads('{"foo":1, "bar": 2}', object_pairs_hook=OrderedDict)
>>> print json.dumps(data, indent=4)
{
    "foo": 1,
    "bar": 2
}
>>> 

使用json.load以相同的方式完成:

>>> data = json.load(open('config.json'), object_pairs_hook=OrderedDict)

Yes, you can. By specifying the object_pairs_hook argument to JSONDecoder. In fact, this is the exact example given in the documentation.

>>> json.JSONDecoder(object_pairs_hook=collections.OrderedDict).decode('{"foo":1, "bar": 2}')
OrderedDict([('foo', 1), ('bar', 2)])
>>> 

You can pass this parameter to json.loads (if you don’t need a Decoder instance for other purposes) like so:

>>> import json
>>> from collections import OrderedDict
>>> data = json.loads('{"foo":1, "bar": 2}', object_pairs_hook=OrderedDict)
>>> print json.dumps(data, indent=4)
{
    "foo": 1,
    "bar": 2
}
>>> 

Using json.load is done in the same way:

>>> data = json.load(open('config.json'), object_pairs_hook=OrderedDict)

回答 1

适用于Python 2.7+的简单版本

my_ordered_dict = json.loads(json_str, object_pairs_hook=collections.OrderedDict)

或适用于Python 2.4至2.6

import simplejson as json
import ordereddict

my_ordered_dict = json.loads(json_str, object_pairs_hook=ordereddict.OrderedDict)

Simple version for Python 2.7+

my_ordered_dict = json.loads(json_str, object_pairs_hook=collections.OrderedDict)

Or for Python 2.4 to 2.6

import simplejson as json
import ordereddict

my_ordered_dict = json.loads(json_str, object_pairs_hook=ordereddict.OrderedDict)

回答 2

一些好消息!从3.6版开始,cPython实现保留了字典的插入顺序(https://mail.python.org/pipermail/python-dev/2016-September/146327.html)。这意味着json库现在默认保留顺序。观察python 3.5和3.6之间的行为差​​异。编码:

import json
data = json.loads('{"foo":1, "bar":2, "fiddle":{"bar":2, "foo":1}}')
print(json.dumps(data, indent=4))

在py3.5中,结果顺序是不确定的:

{
    "fiddle": {
        "bar": 2,
        "foo": 1
    },
    "bar": 2,
    "foo": 1
}

在python 3.6的cPython实现中:

{
    "foo": 1,
    "bar": 2,
    "fiddle": {
        "bar": 2,
        "foo": 1
    }
}

真正的好消息是,这已成为python 3.7的语言规范(与cPython 3.6+的实现细节相反):https ://mail.python.org/pipermail/python-dev/2017-December/151283 .html

因此,您的问题的答案现在变成:升级到python 3.6!:)

Some great news! Since version 3.6 the cPython implementation has preserved the insertion order of dictionaries (https://mail.python.org/pipermail/python-dev/2016-September/146327.html). This means that the json library is now order preserving by default. Observe the difference in behaviour between python 3.5 and 3.6. The code:

import json
data = json.loads('{"foo":1, "bar":2, "fiddle":{"bar":2, "foo":1}}')
print(json.dumps(data, indent=4))

In py3.5 the resulting order is undefined:

{
    "fiddle": {
        "bar": 2,
        "foo": 1
    },
    "bar": 2,
    "foo": 1
}

In the cPython implementation of python 3.6:

{
    "foo": 1,
    "bar": 2,
    "fiddle": {
        "bar": 2,
        "foo": 1
    }
}

The really great news is that this has become a language specification as of python 3.7 (as opposed to an implementation detail of cPython 3.6+): https://mail.python.org/pipermail/python-dev/2017-December/151283.html

So the answer to your question now becomes: upgrade to python 3.6! :)


回答 3

除了转储字典,您总是可以写出密钥列表,然后OrderedDict通过遍历列表来重建密钥?

You could always write out the list of keys in addition to dumping the dict, and then reconstruct the OrderedDict by iterating through the list?


回答 4

除了在字典旁边转储键的有序列表之外,另一种具有显式优点的低技术解决方案是转储键-值对的(有序)列表ordered_dict.items()。加载很简单OrderedDict(<list of key-value pairs>)。尽管JSON没有这个概念(JSON字典没有顺序),但这仍然可以处理有序字典。

利用json以正确顺序转储OrderedDict 的事实确实很好。但是,通常必须将所有 JSON字典作为OrderedDict 读取(通过object_pairs_hook参数)是不必要的繁琐操作,也不一定有意义,因此显式转换必须排序的字典也是有意义的。

In addition to dumping the ordered list of keys alongside the dictionary, another low-tech solution, which has the advantage of being explicit, is to dump the (ordered) list of key-value pairs ordered_dict.items(); loading is a simple OrderedDict(<list of key-value pairs>). This handles an ordered dictionary despite the fact that JSON does not have this concept (JSON dictionaries have no order).

It is indeed nice to take advantage of the fact that json dumps the OrderedDict in the correct order. However, it is in general unnecessarily heavy and not necessarily meaningful to have to read all JSON dictionaries as an OrderedDict (through the object_pairs_hook argument), so an explicit conversion of only the dictionaries that must be ordered makes sense too.


回答 5

如果指定object_pairs_hook参数,则通常使用的load命令将起作用:

import json
from  collections import OrderedDict
with open('foo.json', 'r') as fp:
    metrics_types = json.load(fp, object_pairs_hook=OrderedDict)

The normally used load command will work if you specify the object_pairs_hook parameter:

import json
from  collections import OrderedDict
with open('foo.json', 'r') as fp:
    metrics_types = json.load(fp, object_pairs_hook=OrderedDict)

如何从集合中检索元素而不删除它?

问题:如何从集合中检索元素而不删除它?

假设以下内容:

>>> s = set([1, 2, 3])

如何获得的值(任意值)出来s而不做s.pop()?我想将该项目保留在集合中,直到我确定可以删除它为止-只有在异步调用另一个主机后才能确定。

快速又肮脏:

>>> elem = s.pop()
>>> s.add(elem)

但是您知道更好的方法吗?理想的情况是恒定时间。

Suppose the following:

>>> s = set([1, 2, 3])

How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it – something I can only be sure of after an asynchronous call to another host.

Quick and dirty:

>>> elem = s.pop()
>>> s.add(elem)

But do you know of a better way? Ideally in constant time.


回答 0

不需要复制整个集合的两个选项:

for e in s:
    break
# e is now an element from s

要么…

e = next(iter(s))

但是总的来说,集合不支持索引或切片。

Two options that don’t require copying the whole set:

for e in s:
    break
# e is now an element from s

Or…

e = next(iter(s))

But in general, sets don’t support indexing or slicing.


回答 1

最少的代码是:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

显然,这将创建一个新列表,其中包含该集合的每个成员,因此如果您的集合很大,就不会很大。

Least code would be:

>>> s = set([1, 2, 3])
>>> list(s)[0]
1

Obviously this would create a new list which contains each member of the set, so not great if your set is very large.


回答 2

我想知道这些功能如何在不同的集合上执行,所以我做了一个基准测试:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

在此处输入图片说明

该图清楚地表明,一些方法(RandomSampleSetUnpackingListIndex)依赖于集的大小,并应在一般情况下可避免(至少在性能可能是重要的)。正如其他答案已经显示的那样,最快的方法是ForLoop

但是,只要使用恒定时间方法之一,性能差异就可以忽略不计。


iteration_utilities(免责声明:我是作者)包含此用例的便捷功能first

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

我也将其包含在上述基准中。它可以与其他两个“快速”解决方案竞争,但是两者之间的差异并不大。

I wondered how the functions will perform for different sets, so I did a benchmark:

from random import sample

def ForLoop(s):
    for e in s:
        break
    return e

def IterNext(s):
    return next(iter(s))

def ListIndex(s):
    return list(s)[0]

def PopAdd(s):
    e = s.pop()
    s.add(e)
    return e

def RandomSample(s):
    return sample(s, 1)

def SetUnpacking(s):
    e, *_ = s
    return e

from simple_benchmark import benchmark

b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
              {2**i: set(range(2**i)) for i in range(1, 20)},
              argument_name='set size',
              function_aliases={first: 'First'})

b.plot()

enter image description here

This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.

However as long as one of the constant time approaches is used the performance difference will be negligible.


iteration_utilities (Disclaimer: I’m the author) contains a convenience function for this use-case: first:

>>> from iteration_utilities import first
>>> first({1,2,3,4})
1

I also included it in the benchmark above. It can compete with the other two “fast” solutions but the difference isn’t much either way.


回答 3

tl; dr

for first_item in muh_set: break仍然是Python 3.x中的最佳方法。诅咒你,圭多。

do这样做

欢迎使用从wr推断出来的另一组Python 3.x计时出色的特定于Python 2.x的响应。与AChampion同样有用的特定于Python 3.x的响应不同,下面的时间安排建议了上面提到的时间异常解决方案,包括:

欢乐代码段

开启,收听,定时:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

快速过时的计时

看哪!按最快到最慢的片段排序:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

全家人的面部植物

毫不奇怪,手动迭代的速度至少是下一个最快解决方案的两倍。尽管与Bad Old Python 2.x相比(以前的手动迭代速度至少快四倍),差距有所减小,但PPE 20狂热者对我而言最冗长的解决方案是最好的解决方案感到失望。至少将一个集转换为一个列表以仅提取该集的第一个元素是预期的那样可怕。感谢Guido,愿他的光芒继续指导我们。

令人惊讶的是,基于RNG的解决方案绝对可怕。列表转换不好,但random 确实会带来糟糕的结果。对于随机数上帝来说如此之多。

我只是希望那些无定形set.get_first()的人已经为我们准备了一种方法。如果您正在阅读本文,他们:“请。做点什么。”

tl;dr

for first_item in muh_set: break remains the optimal approach in Python 3.x. Curse you, Guido.

y u do this

Welcome to yet another set of Python 3.x timings, extrapolated from wr.‘s excellent Python 2.x-specific response. Unlike AChampion‘s equally helpful Python 3.x-specific response, the timings below also time outlier solutions suggested above – including:

Code Snippets for Great Joy

Turn on, tune in, time it:

from timeit import Timer

stats = [
    "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
    "for i in range(1000): next(iter(s))",
    "for i in range(1000): s.add(s.pop())",
    "for i in range(1000): list(s)[0]",
    "for i in range(1000): random.sample(s, 1)",
]

for stat in stats:
    t = Timer(stat, setup="import random\ns=set(range(100))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Quickly Obsoleted Timeless Timings

Behold! Ordered by fastest to slowest snippets:

$ ./test_get.py
Time for for i in range(1000): 
    for x in s: 
        break:   0.249871
Time for for i in range(1000): next(iter(s)):    0.526266
Time for for i in range(1000): s.add(s.pop()):   0.658832
Time for for i in range(1000): list(s)[0]:   4.117106
Time for for i in range(1000): random.sample(s, 1):  21.851104

Faceplants for the Whole Family

Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.

Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but random really takes the awful-sauce cake. So much for the Random Number God.

I just wish the amorphous They would PEP up a set.get_first() method for us already. If you’re reading this, They: “Please. Do something.”


回答 4

为了提供不同方法背后的一些时序图,请考虑以下代码。 get()是我自定义添加到Python的setobject.c中,只是一个pop()而没有删除该元素。

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

输出为:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

这意味着for / break解决方案是最快的(有时比自定义get()解决方案更快)。

To provide some timing figures behind the different approaches, consider the following code. The get() is my custom addition to Python’s setobject.c, being just a pop() without removing the element.

from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

The output is:

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break:   0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).


回答 5

由于您需要一个随机元素,因此也可以使用:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

该文档似乎并未提及的性能random.sample。从包含大量列表和大量集合的非常快速的实证检验来看,列表似乎是恒定时间,但集合却并非如此。同样,对集合的迭代也不是随机的。顺序不确定,但可以预测:

>>> list(set(range(10))) == range(10)
True 

如果随机性很重要,并且您需要在恒定时间内(大量集合)使用一堆元素,那么我会先使用random.sample并转换为列表:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time

Since you want a random element, this will also work:

>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]

The documentation doesn’t seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn’t random; the order is undefined but predictable:

>>> list(set(range(10))) == range(10)
True 

If randomness is important and you need a bunch of elements in constant time (large sets), I’d use random.sample and convert to a list first:

>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time

回答 6

似乎是最紧凑的(6个符号),但是获得集合元素的方法很慢(由PEP 3132实现):

e,*_=s

在Python 3.5+中,您还可以使用以下7符号表达式(感谢PEP 448):

[*s][0]

这两种选择在我的机器上都比for循环方法慢大约1000倍。

Seemingly the most compact (6 symbols) though very slow way to get a set element (made possible by PEP 3132):

e,*_=s

With Python 3.5+ you can also use this 7-symbol expression (thanks to PEP 448):

[*s][0]

Both options are roughly 1000 times slower on my machine than the for-loop method.


回答 7

我使用编写的实用程序函数。它的名称有点误导,因为它暗示它可能是随机物品或类似物品。

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None

I use a utility function I wrote. Its name is somewhat misleading because it kind of implies it might be a random item or something like that.

def anyitem(iterable):
    try:
        return iter(iterable).next()
    except StopIteration:
        return None

回答 8

在@wr之后。帖子,我得到类似的结果(对于Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

输出:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

但是,在更改基础集(例如,调用remove())时,对于可迭代的示例(foriter)来说效果很差:

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

结果是:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272

Following @wr. post, I get similar results (for Python3.5)

from timeit import *

stats = ["for i in range(1000): next(iter(s))",
         "for i in range(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in range(1000): s.add(s.pop())"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Output:

Time for for i in range(1000): next(iter(s)):    0.205888
Time for for i in range(1000): 
    for x in s: 
        break:                                   0.083397
Time for for i in range(1000): s.add(s.pop()):   0.226570

However, when changing the underlying set (e.g. call to remove()) things go badly for the iterable examples (for, iter):

from timeit import *

stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
         "while s:\n\tfor x in s: break\n\ts.remove(x)",
         "while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100000))")
    try:
        print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
    except:
        t.print_exc()

Results in:

Time for while s:
    a = next(iter(s))
    s.remove(a):             2.938494
Time for while s:
    for x in s: break
    s.remove(x):             2.728367
Time for while s:
    x=s.pop()
    s.add(x)
    s.remove(x):             0.030272

回答 9

我通常对小型馆藏所做的是创建这样的解析器/转换器方法

def convertSetToList(setName):
return list(setName)

然后我可以使用新列表并按索引号访问

userFields = convertSetToList(user)
name = request.json[userFields[0]]

作为列表,您将拥有可能需要使用的所有其他方法

What I usually do for small collections is to create kind of parser/converter method like this

def convertSetToList(setName):
return list(setName)

Then I can use the new list and access by index number

userFields = convertSetToList(user)
name = request.json[userFields[0]]

As a list you will have all the other methods that you may need to work with


回答 10

怎么s.copy().pop()样 我尚未计时,但它应该可以正常工作,而且很简单。但是,它适用于小型集,因为它可以复制整个集。

How about s.copy().pop()? I haven’t timed it, but it should work and it’s simple. It works best for small sets however, as it copies the whole set.


回答 11

另一种选择是将字典与不需要的值一起使用。例如,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

您可以将键视为一组,但它们只是一个数组:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

这种选择的副作用是您的代码将与较旧set的Python 早期版本向后兼容。这也许不是最佳答案,但这是另一种选择。

编辑:您甚至可以执行以下操作来隐藏使用字典而不是数组或集合的事实:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

Another option is to use a dictionary with values you don’t care about. E.g.,


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
...

You can treat the keys as a set except that they’re just an array:


keys = poor_man_set.keys()
print "Some key = %s" % keys[0]

A side effect of this choice is that your code will be backwards compatible with older, pre-set versions of Python. It’s maybe not the best answer but it’s another option.

Edit: You can even do something like this to hide the fact that you used a dict instead of an array or set:


poor_man_set = {}
poor_man_set[1] = None
poor_man_set[2] = None
poor_man_set[3] = None
poor_man_set = poor_man_set.keys()

查找Python对象具有的方法

问题:查找Python对象具有的方法

给定任何种类的Python对象,是否有一种简单的方法来获取该对象具有的所有方法的列表?

要么,

如果这不可能,那么除了简单地检查调用该方法时是否发生错误之外,是否至少有一种简单的方法来检查它是否具有特定的方法?

Given a Python object of any kind, is there an easy way to get the list of all methods that this object has?

Or,

if this is not possible, is there at least an easy way to check if it has a particular method other than simply checking if an error occurs when the method is called?


回答 0

对于许多对象,您可以使用以下代码,将“ object”替换为您感兴趣的对象:

object_methods = [method_name for method_name in dir(object)
                  if callable(getattr(object, method_name))]

我在diveintopython.net上发现了它(现已存档)。希望可以提供更多详细信息!

如果得到AttributeError,则可以改用

getattr(不能容忍熊猫风格的python3.6抽象虚拟子类。此代码与上面的代码相同,并且忽略异常。

import pandas as pd 
df = pd.DataFrame([[10, 20, 30], [100, 200, 300]],  
                  columns=['foo', 'bar', 'baz']) 
def get_methods(object, spacing=20): 
  methodList = [] 
  for method_name in dir(object): 
    try: 
        if callable(getattr(object, method_name)): 
            methodList.append(str(method_name)) 
    except: 
        methodList.append(str(method_name)) 
  processFunc = (lambda s: ' '.join(s.split())) or (lambda s: s) 
  for method in methodList: 
    try: 
        print(str(method.ljust(spacing)) + ' ' + 
              processFunc(str(getattr(object, method).__doc__)[0:90])) 
    except: 
        print(method.ljust(spacing) + ' ' + ' getattr() failed') 


get_methods(df['foo']) 

For many objects, you can use this code, replacing ‘object’ with the object you’re interested in:

object_methods = [method_name for method_name in dir(object)
                  if callable(getattr(object, method_name))]

I discovered it at diveintopython.net (Now archived). Hopefully, that should provide some further detail!

If you get an AttributeError, you can use this instead:

getattr( is intolerant of pandas style python3.6 abstract virtual sub-classes. This code does the same as above and ignores exceptions.

import pandas as pd 
df = pd.DataFrame([[10, 20, 30], [100, 200, 300]],  
                  columns=['foo', 'bar', 'baz']) 
def get_methods(object, spacing=20): 
  methodList = [] 
  for method_name in dir(object): 
    try: 
        if callable(getattr(object, method_name)): 
            methodList.append(str(method_name)) 
    except: 
        methodList.append(str(method_name)) 
  processFunc = (lambda s: ' '.join(s.split())) or (lambda s: s) 
  for method in methodList: 
    try: 
        print(str(method.ljust(spacing)) + ' ' + 
              processFunc(str(getattr(object, method).__doc__)[0:90])) 
    except: 
        print(method.ljust(spacing) + ' ' + ' getattr() failed') 


get_methods(df['foo']) 

回答 1

您可以使用内置dir()函数来获取模块具有的所有属性的列表。在命令行上尝试此操作以查看其工作原理。

>>> import moduleName
>>> dir(moduleName)

另外,您可以使用 hasattr(module_name, "attr_name")函数来查找模块是否具有特定属性。

有关更多信息,请参见Python自省指南

You can use the built in dir() function to get a list of all the attributes a module has. Try this at the command line to see how it works.

>>> import moduleName
>>> dir(moduleName)

Also, you can use the hasattr(module_name, "attr_name") function to find out if a module has a specific attribute.

See the Guide to Python introspection for more information.


回答 2

最简单的方法是使用dir(objectname)。它将显示该对象可用的所有方法。很酷的把戏。

The simplest method is to use dir(objectname). It will display all the methods available for that object. Cool trick.


回答 3

要检查它是否具有特定方法:

hasattr(object,"method")

To check if it has a particular method:

hasattr(object,"method")

回答 4

我相信您想要的是这样的:

来自对象的属性列表

以我的拙见,内置功能dir()可以为您完成这项工作。取自help(dir)Python Shell上的输出:

目录(…)

dir([object]) -> list of strings

如果不带参数调用,则返回当前作用域中的名称。

否则,返回按字母顺序排列的名称列表,该列表包含给定对象的(某些)属性以及从中可以访问的属性。

如果对象提供了一个名为的方法__dir__,则将使用该方法;否则,将使用默认的dir()逻辑并返回:

  • 对于模块对象:模块的属性。
  • 对于一个类对象:其属性,以及递归其基类的属性。
  • 对于任何其他对象:其属性,其类的属性,以及递归地为其类的基类的属性。

例如:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> a = "I am a string"
>>>
>>> type(a)
<class 'str'>
>>>
>>> dir(a)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__',
'__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'_formatter_field_name_split', '_formatter_parser', 'capitalize',
'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find',
'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper', 'zfill']

在检查您的问题时,我决定展示我的思路,并以更好的格式输出 dir()

dir_attributes.py(Python 2.7.6)

#!/usr/bin/python
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print "\nObject Data: %s" % obj
print "Object Type: %s\n" % type(obj)

for method in dir(obj):
    # the comma at the end of the print, makes it printing 
    # in the same line, 4 times (count)
    print "| {0: <20}".format(method),
    count += 1
    if count == 4:
        count = 0
        print

dir_attributes.py(Python 3.4.3)

#!/usr/bin/python3
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print("\nObject Data: ", obj)
print("Object Type: ", type(obj),"\n")

for method in dir(obj):
    # the end=" " at the end of the print statement, 
    # makes it printing in the same line, 4 times (count)
    print("|    {:20}".format(method), end=" ")
    count += 1
    if count == 4:
        count = 0
        print("")

希望我有所贡献:)。

I believe that what you want is something like this:

a list of attributes from an object

In my humble opinion, the built-in function dir() can do this job for you. Taken from help(dir) output on your Python Shell:

dir(…)

dir([object]) -> list of strings

If called without an argument, return the names in the current scope.

Else, return an alphabetized list of names comprising (some of) the attributes of the given object, and of attributes reachable from it.

If the object supplies a method named __dir__, it will be used; otherwise the default dir() logic is used and returns:

  • for a module object: the module’s attributes.
  • for a class object: its attributes, and recursively the attributes of its bases.
  • for any other object: its attributes, its class’s attributes, and recursively the attributes of its class’s base classes.

For example:

$ python
Python 2.7.6 (default, Jun 22 2015, 17:58:13) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.

>>> a = "I am a string"
>>>
>>> type(a)
<class 'str'>
>>>
>>> dir(a)
['__add__', '__class__', '__contains__', '__delattr__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__',
'__getnewargs__', '__getslice__', '__gt__', '__hash__', '__init__',
'__le__', '__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__',
'__reduce__', '__reduce_ex__', '__repr__', '__rmod__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__',
'_formatter_field_name_split', '_formatter_parser', 'capitalize',
'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find',
'format', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace',
'istitle', 'isupper', 'join', 'ljust', 'lower', 'lstrip', 'partition',
'replace', 'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip',
'split', 'splitlines', 'startswith', 'strip', 'swapcase', 'title',
'translate', 'upper', 'zfill']

As I was checking your issue, I decided to demonstrate my train of thought, with a better formatting of the output of dir().

dir_attributes.py (Python 2.7.6)

#!/usr/bin/python
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print "\nObject Data: %s" % obj
print "Object Type: %s\n" % type(obj)

for method in dir(obj):
    # the comma at the end of the print, makes it printing 
    # in the same line, 4 times (count)
    print "| {0: <20}".format(method),
    count += 1
    if count == 4:
        count = 0
        print

dir_attributes.py (Python 3.4.3)

#!/usr/bin/python3
""" Demonstrates the usage of dir(), with better output. """

__author__ = "ivanleoncz"

obj = "I am a string."
count = 0

print("\nObject Data: ", obj)
print("Object Type: ", type(obj),"\n")

for method in dir(obj):
    # the end=" " at the end of the print statement, 
    # makes it printing in the same line, 4 times (count)
    print("|    {:20}".format(method), end=" ")
    count += 1
    if count == 4:
        count = 0
        print("")

Hope that I have contributed :).


回答 5

除了更直接的答案外,如果我不提及iPython,我也会不知所措。点击“选项卡”以查看具有自动补全功能的可用方法。

找到方法后,请尝试:

help(object.method) 

查看pydocs,方法签名等。

啊… REPL

On top of the more direct answers, I’d be remiss if I didn’t mention iPython. Hit ‘tab’ to see the available methods, with autocompletion.

And once you’ve found a method, try:

help(object.method) 

to see the pydocs, method signature, etc.

Ahh… REPL.


回答 6

如果您特别想要方法,则应使用inspect.ismethod

对于方法名称:

import inspect
method_names = [attr for attr in dir(self) if inspect.ismethod(getattr(self, attr))]

对于方法本身:

import inspect
methods = [member for member in [getattr(self, attr) for attr in dir(self)] if inspect.ismethod(member)]

有时inspect.isroutine也可能有用(对于内置,C扩展,不带“绑定”编译器指令的Cython)。

If you specifically want methods, you should use inspect.ismethod.

For method names:

import inspect
method_names = [attr for attr in dir(self) if inspect.ismethod(getattr(self, attr))]

For the methods themselves:

import inspect
methods = [member for member in [getattr(self, attr) for attr in dir(self)] if inspect.ismethod(member)]

Sometimes inspect.isroutine can be useful too (for built-ins, C extensions, Cython without the “binding” compiler directive).


回答 7

打开bash shell(在Ubuntu上为ctrl + alt + T)。在其中启动python3 shell。创建对象以观察方法。只需在其后添加一个点,然后按两次“ tab”,您将看到类似的内容:

 user@note:~$ python3
 Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
 [GCC 4.8.4] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import readline
 >>> readline.parse_and_bind("tab: complete")
 >>> s = "Any object. Now it's a string"
 >>> s. # here tab should be pressed twice
 s.__add__(           s.__rmod__(          s.istitle(
 s.__class__(         s.__rmul__(          s.isupper(
 s.__contains__(      s.__setattr__(       s.join(
 s.__delattr__(       s.__sizeof__(        s.ljust(
 s.__dir__(           s.__str__(           s.lower(
 s.__doc__            s.__subclasshook__(  s.lstrip(
 s.__eq__(            s.capitalize(        s.maketrans(
 s.__format__(        s.casefold(          s.partition(
 s.__ge__(            s.center(            s.replace(
 s.__getattribute__(  s.count(             s.rfind(
 s.__getitem__(       s.encode(            s.rindex(
 s.__getnewargs__(    s.endswith(          s.rjust(
 s.__gt__(            s.expandtabs(        s.rpartition(
 s.__hash__(          s.find(              s.rsplit(
 s.__init__(          s.format(            s.rstrip(
 s.__iter__(          s.format_map(        s.split(
 s.__le__(            s.index(             s.splitlines(
 s.__len__(           s.isalnum(           s.startswith(
 s.__lt__(            s.isalpha(           s.strip(
 s.__mod__(           s.isdecimal(         s.swapcase(
 s.__mul__(           s.isdigit(           s.title(
 s.__ne__(            s.isidentifier(      s.translate(
 s.__new__(           s.islower(           s.upper(
 s.__reduce__(        s.isnumeric(         s.zfill(
 s.__reduce_ex__(     s.isprintable(       
 s.__repr__(          s.isspace(           

Open bash shell (ctrl+alt+T on Ubuntu). Start python3 shell in it. Create object to observe methods of. Just add a dot after it and press twice “tab” and you’ll see something like that:

 user@note:~$ python3
 Python 3.4.3 (default, Nov 17 2016, 01:08:31) 
 [GCC 4.8.4] on linux
 Type "help", "copyright", "credits" or "license" for more information.
 >>> import readline
 >>> readline.parse_and_bind("tab: complete")
 >>> s = "Any object. Now it's a string"
 >>> s. # here tab should be pressed twice
 s.__add__(           s.__rmod__(          s.istitle(
 s.__class__(         s.__rmul__(          s.isupper(
 s.__contains__(      s.__setattr__(       s.join(
 s.__delattr__(       s.__sizeof__(        s.ljust(
 s.__dir__(           s.__str__(           s.lower(
 s.__doc__            s.__subclasshook__(  s.lstrip(
 s.__eq__(            s.capitalize(        s.maketrans(
 s.__format__(        s.casefold(          s.partition(
 s.__ge__(            s.center(            s.replace(
 s.__getattribute__(  s.count(             s.rfind(
 s.__getitem__(       s.encode(            s.rindex(
 s.__getnewargs__(    s.endswith(          s.rjust(
 s.__gt__(            s.expandtabs(        s.rpartition(
 s.__hash__(          s.find(              s.rsplit(
 s.__init__(          s.format(            s.rstrip(
 s.__iter__(          s.format_map(        s.split(
 s.__le__(            s.index(             s.splitlines(
 s.__len__(           s.isalnum(           s.startswith(
 s.__lt__(            s.isalpha(           s.strip(
 s.__mod__(           s.isdecimal(         s.swapcase(
 s.__mul__(           s.isdigit(           s.title(
 s.__ne__(            s.isidentifier(      s.translate(
 s.__new__(           s.islower(           s.upper(
 s.__reduce__(        s.isnumeric(         s.zfill(
 s.__reduce_ex__(     s.isprintable(       
 s.__repr__(          s.isspace(           

回答 8

此处指出的所有方法的问题在于,您不能确定某个方法不存在。

在Python可以拦截点主叫通__getattr____getattribute__,从而可以在“运行时”创建方法

范例:

class MoreMethod(object):
    def some_method(self, x):
        return x
    def __getattr__(self, *args):
        return lambda x: x*2

如果执行它,则可以调用对象字典中不存在的方法…

>>> o = MoreMethod()
>>> o.some_method(5)
5
>>> dir(o)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'some_method']
>>> o.i_dont_care_of_the_name(5)
10

这就是为什么您在Python中使用Easier而不是权限范式来请求宽恕

The problem with all methods indicated here is that you CAN’T be sure that a method doesn’t exist.

In Python you can intercept the dot calling thru __getattr__ and __getattribute__, making it possible to create method “at runtime”

Exemple:

class MoreMethod(object):
    def some_method(self, x):
        return x
    def __getattr__(self, *args):
        return lambda x: x*2

If you execute it, you can call method non existing in the object dictionary…

>>> o = MoreMethod()
>>> o.some_method(5)
5
>>> dir(o)
['__class__', '__delattr__', '__dict__', '__doc__', '__format__', '__getattr__', '__getattribute__', '__hash__', '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', '__weakref__', 'some_method']
>>> o.i_dont_care_of_the_name(5)
10

And it’s why you use the Easier to ask for forgiveness than permission paradigms in Python.


回答 9

获取任何对象的方法列表的最简单方法是使用help()命令。

%help(object)

它将列出与该对象关联的所有可用/重要方法。

例如:

help(str)

The simplest way to get list of methods of any object is to use help() command.

%help(object)

It will list out all the available/important methods associated with that object.

For example:

help(str)

回答 10

import moduleName
for x in dir(moduleName):
print(x)

这应该工作:)

import moduleName
for x in dir(moduleName):
print(x)

This should work :)


回答 11

可以创建一个getAttrs函数,该函数将返回对象的可调用属性名称

def getAttrs(object):
  return filter(lambda m: callable(getattr(object, m)), dir(object))

print getAttrs('Foo bar'.split(' '))

那会回来

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
 '__delslice__', '__eq__', '__format__', '__ge__', '__getattribute__', 
 '__getitem__', '__getslice__', '__gt__', '__iadd__', '__imul__', '__init__', 
 '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', 
 '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', 
 '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', 
 '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 
 'remove', 'reverse', 'sort']

One can create a getAttrs function that will return an object’s callable property names

def getAttrs(object):
  return filter(lambda m: callable(getattr(object, m)), dir(object))

print getAttrs('Foo bar'.split(' '))

That’d return

['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
 '__delslice__', '__eq__', '__format__', '__ge__', '__getattribute__', 
 '__getitem__', '__getslice__', '__gt__', '__iadd__', '__imul__', '__init__', 
 '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', 
 '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', 
 '__setattr__', '__setitem__', '__setslice__', '__sizeof__', '__str__', 
 '__subclasshook__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 
 'remove', 'reverse', 'sort']

回答 12

没有可靠的方法来列出所有对象的方法。dir(object)通常是有用的,但是在某些情况下,它可能不会列出所有方法。根据dir()文档“使用参数尝试返回该对象的有效属性列表。”

callable(getattr(object, method))如此处已经提到的,可以检查该方法是否存在。

There is no reliable way to list all object’s methods. dir(object) is usually useful, but in some cases it may not list all methods. According to dir() documentation: “With an argument, attempt to return a list of valid attributes for that object.”

Checking that method exists can be done by callable(getattr(object, method)) as already mentioned there.


回答 13

以清单为对象

obj = []

list(filter(lambda x:callable(getattr(obj,x)),obj.__dir__()))

你得到:

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

Take a list as an object

obj = []

list(filter(lambda x:callable(getattr(obj,x)),obj.__dir__()))

You get:

['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__iadd__',
 '__imul__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__reversed__',
 '__rmul__',
 '__setattr__',
 '__setitem__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'append',
 'clear',
 'copy',
 'count',
 'extend',
 'index',
 'insert',
 'pop',
 'remove',
 'reverse',
 'sort']

回答 14

…至少有一种简单的方法可以检查它是否具有特定的方法,而不仅仅是在调用该方法时检查是否发生错误

虽然“ 比请求更容易获得宽恕 ”无疑是Python方式,但您正在寻找的可能是:

d={'foo':'bar', 'spam':'eggs'}
if 'get' in dir(d):
    d.get('foo')
# OUT: 'bar'

…is there at least an easy way to check if it has a particular method other than simply checking if an error occurs when the method is called

While “Easier to ask for forgiveness than permission” is certainly the Pythonic way, what you are looking for maybe:

d={'foo':'bar', 'spam':'eggs'}
if 'get' in dir(d):
    d.get('foo')
# OUT: 'bar'

回答 15

为了在整个模块中搜索特定方法

for method in dir(module) :
  if "keyword_of_methode" in method :
   print(method, end="\n")

In order to search for a specific method in a whole module

for method in dir(module) :
  if "keyword_of_methode" in method :
   print(method, end="\n")

回答 16

例如,如果您使用的是shell plus,则可以改用以下命令:

>> MyObject??

这样,用“ ??” 在对象之后,它将向您显示该类具有的所有属性/方法。

If you are, for instance, using shell plus you can use this instead:

>> MyObject??

that way, with the ‘??’ just after your object, it’ll show you all the attributes/methods the class has.


为什么某些函数在函数名称前后都有下划线“ __”?

问题:为什么某些函数在函数名称前后都有下划线“ __”?

这种“强调”似乎经常发生,我想知道这是否是Python语言中的要求,还是仅仅是出于约定?

另外,有人可以说出并解释哪些函数倾向于带有下划线,以及为什么(__init__例如)?

This “underscoring” seems to occur a lot, and I was wondering if this was a requirement in the Python language, or merely a matter of convention?

Also, could someone name and explain which functions tend to have the underscores, and why (__init__, for instance)?


回答 0

Python PEP 8-Python代码样式指南

描述性:命名样式

可以识别以下使用前划线或后划线的特殊形式(通常可以将它们与任何大小写惯例结合使用):

  • _single_leading_underscore:“内部使用”指示器较弱。例如from M import *,不导入名称以下划线开头的对象。

  • single_trailing_underscore_:按惯例用于避免与Python关键字发生冲突,例如

    Tkinter.Toplevel(master, class_='ClassName')

  • __double_leading_underscore:在命名类属性时,调用名称修饰(在类FooBar内部,__boo变为_FooBar__boo;见下文)。

  • __double_leading_and_trailing_underscore__:位于用户控制的命名空间中的“魔术”对象或属性。例如__init____import____file__。请勿发明此类名称;仅按记录使用它们。

请注意,带有双引号和尾部下划线的名称本质上是为Python本身保留的:“切勿发明此类名称;仅将其用作文档”。

From the Python PEP 8 — Style Guide for Python Code:

Descriptive: Naming Styles

The following special forms using leading or trailing underscores are recognized (these can generally be combined with any case convention):

  • _single_leading_underscore: weak “internal use” indicator. E.g. from M import * does not import objects whose name starts with an underscore.

  • single_trailing_underscore_: used by convention to avoid conflicts with Python keyword, e.g.

    Tkinter.Toplevel(master, class_='ClassName')

  • __double_leading_underscore: when naming a class attribute, invokes name mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

  • __double_leading_and_trailing_underscore__: “magic” objects or attributes that live in user-controlled namespaces. E.g. __init__, __import__ or __file__. Never invent such names; only use them as documented.

Note that names with double leading and trailing underscores are essentially reserved for Python itself: “Never invent such names; only use them as documented”.


回答 1

其他受访者在将双下划线和下划线作为“特殊”或“魔术”方法的命名惯例进行描述时是正确的。

尽管您可以直接调用这些方法([10, 20].__len__()例如),但是下划线的存在暗示这些方法旨在间接调用(len([10, 20])例如)。大多数python运算符都有一个关联的“魔术”方法(例如,这a[x]是调用的常用方法a.__getitem__(x))。

The other respondents are correct in describing the double leading and trailing underscores as a naming convention for “special” or “magic” methods.

While you can call these methods directly ([10, 20].__len__() for example), the presence of the underscores is a hint that these methods are intended to be invoked indirectly (len([10, 20]) for example). Most python operators have an associated “magic” method (for example, a[x] is the usual way of invoking a.__getitem__(x)).


回答 2

带有双下划线的名称是Python的“特殊”名称。它们在Python语言参考的第3节“数据模型”中列出。

Names surrounded by double underscores are “special” to Python. They’re listed in the Python Language Reference, section 3, “Data model”.


回答 3

实际上,当需要在父类和子类名称之间进行区分时,我使用_方法名称。我已经阅读了一些使用这种方法创建父子类的代码。例如,我可以提供以下代码:

class ThreadableMixin:
   def start_worker(self):
       threading.Thread(target=self.worker).start()

   def worker(self):
      try:
        self._worker()
    except tornado.web.HTTPError, e:
        self.set_status(e.status_code)
    except:
        logging.error("_worker problem", exc_info=True)
        self.set_status(500)
    tornado.ioloop.IOLoop.instance().add_callback(self.async_callback(self.results))

和具有_worker方法的孩子

class Handler(tornado.web.RequestHandler, ThreadableMixin):
   def _worker(self):
      self.res = self.render_string("template.html",
        title = _("Title"),
        data = self.application.db.query("select ... where object_id=%s", self.object_id)
    )

Actually I use _ method names when I need to differ between parent and child class names. I’ve read some codes that used this way of creating parent-child classes. As an example I can provide this code:

class ThreadableMixin:
   def start_worker(self):
       threading.Thread(target=self.worker).start()

   def worker(self):
      try:
        self._worker()
    except tornado.web.HTTPError, e:
        self.set_status(e.status_code)
    except:
        logging.error("_worker problem", exc_info=True)
        self.set_status(500)
    tornado.ioloop.IOLoop.instance().add_callback(self.async_callback(self.results))

and the child that have a _worker method

class Handler(tornado.web.RequestHandler, ThreadableMixin):
   def _worker(self):
      self.res = self.render_string("template.html",
        title = _("Title"),
        data = self.application.db.query("select ... where object_id=%s", self.object_id)
    )


回答 4

此约定用于诸如__init__和的特殊变量或方法(所谓的“魔术方法”)__len__。这些方法提供特殊的语法功能或执行特殊的操作。

例如,__file__表示__eq__执行a == b表达式时执行的Python文件的位置。

用户当然可以制作一个自定义的特殊方法,这种情况很少见,但是通常可能会修改一些内置的特殊方法(例如,您应该使用该类来初始化类,该类__init__将在类的实例首先执行时初始化)被建造)。

class A:
    def __init__(self, a):  # use special method '__init__' for initializing
        self.a = a
    def __custom__(self):  # custom special method. you might almost do not use it
        pass

This convention is used for special variables or methods (so-called “magic method”) such as __init__ and __len__. These methods provides special syntactic features or do special things.

For example, __file__ indicates the location of Python file, __eq__ is executed when a == b expression is executed.

A user of course can make a custom special method, which is a very rare case, but often might modify some of the built-in special methods (e.g. you should initialize the class with __init__ that will be executed at first when an instance of a class is created).

class A:
    def __init__(self, a):  # use special method '__init__' for initializing
        self.a = a
    def __custom__(self):  # custom special method. you might almost do not use it
        pass

回答 5

添加了一个示例来了解__在python中的用法。这是所有__的列表

https://docs.python.org/3/genindex-all.html#_

某些类别的标识符(除关键字外)具有特殊含义。在任何其他情况下,*名称的任何使用,如果未遵循明确记录的使用,均会在没有警告的情况下发生破损。

使用__的访问限制

"""
Identifiers:
-  Contain only (A-z, 0-9, and _ )
-  Start with a lowercase letter or _.
-  Single leading _ :  private
-  Double leading __ :  strong private
-  Start & End  __ : Language defined Special Name of Object/ Method
-  Class names start with an uppercase letter.
-

"""


class BankAccount(object):
    def __init__(self, name, money, password):
        self.name = name            # Public
        self._money = money         # Private : Package Level
        self.__password = password  # Super Private

    def earn_money(self, amount):
        self._money += amount
        print("Salary Received: ", amount, " Updated Balance is: ", self._money)

    def withdraw_money(self, amount):
        self._money -= amount
        print("Money Withdraw: ", amount, " Updated Balance is: ", self._money)

    def show_balance(self):
        print(" Current Balance is: ", self._money)


account = BankAccount("Hitesh", 1000, "PWD")  # Object Initalization

# Method Call
account.earn_money(100)

# Show Balance
print(account.show_balance())

print("PUBLIC ACCESS:", account.name)  # Public Access

# account._money is accessible because it is only hidden by convention
print("PROTECTED ACCESS:", account._money)  # Protected Access

# account.__password will throw error but account._BankAccount__password will not
# because __password is super private
print("PRIVATE ACCESS:", account._BankAccount__password)

# Method Call
account.withdraw_money(200)

# Show Balance
print(account.show_balance())

# account._money is accessible because it is only hidden by convention
print(account._money)  # Protected Access

Added an example to understand the use of __ in python. Here is the list of All __

https://docs.python.org/3/genindex-all.html#_

Certain classes of identifiers (besides keywords) have special meanings. Any use of * names, in any other context, that does not follow explicitly documented use, is subject to breakage without warning

Access restriction using __

"""
Identifiers:
-  Contain only (A-z, 0-9, and _ )
-  Start with a lowercase letter or _.
-  Single leading _ :  private
-  Double leading __ :  strong private
-  Start & End  __ : Language defined Special Name of Object/ Method
-  Class names start with an uppercase letter.
-

"""


class BankAccount(object):
    def __init__(self, name, money, password):
        self.name = name            # Public
        self._money = money         # Private : Package Level
        self.__password = password  # Super Private

    def earn_money(self, amount):
        self._money += amount
        print("Salary Received: ", amount, " Updated Balance is: ", self._money)

    def withdraw_money(self, amount):
        self._money -= amount
        print("Money Withdraw: ", amount, " Updated Balance is: ", self._money)

    def show_balance(self):
        print(" Current Balance is: ", self._money)


account = BankAccount("Hitesh", 1000, "PWD")  # Object Initalization

# Method Call
account.earn_money(100)

# Show Balance
print(account.show_balance())

print("PUBLIC ACCESS:", account.name)  # Public Access

# account._money is accessible because it is only hidden by convention
print("PROTECTED ACCESS:", account._money)  # Protected Access

# account.__password will throw error but account._BankAccount__password will not
# because __password is super private
print("PRIVATE ACCESS:", account._BankAccount__password)

# Method Call
account.withdraw_money(200)

# Show Balance
print(account.show_balance())

# account._money is accessible because it is only hidden by convention
print(account._money)  # Protected Access

用Python漂亮地打印XML

问题:用Python漂亮地打印XML

在Python中漂亮地打印XML的最佳方法(或多种方法)是什么?

What is the best way (or are the various ways) to pretty print XML in Python?


回答 0

import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()
import xml.dom.minidom

dom = xml.dom.minidom.parse(xml_fname) # or xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = dom.toprettyxml()

回答 1

lxml是最新的,更新的,并且包含漂亮的打印功能

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

查看lxml教程:http : //lxml.de/tutorial.html

lxml is recent, updated, and includes a pretty print function

import lxml.etree as etree

x = etree.parse("filename")
print etree.tostring(x, pretty_print=True)

Check out the lxml tutorial: http://lxml.de/tutorial.html


回答 2

另一个解决方案是借用indent函数,以与自2.5以来内置在Python中的ElementTree库一起使用。如下所示:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)

Another solution is to borrow this indent function, for use with the ElementTree library that’s built in to Python since 2.5. Here’s what that would look like:

from xml.etree import ElementTree

def indent(elem, level=0):
    i = "\n" + level*"  "
    j = "\n" + (level-1)*"  "
    if len(elem):
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
        for subelem in elem:
            indent(subelem, level+1)
        if not elem.tail or not elem.tail.strip():
            elem.tail = j
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = j
    return elem        

root = ElementTree.parse('/tmp/xmlfile').getroot()
indent(root)
ElementTree.dump(root)

回答 3

这是我的(hacky?)解决方案,用于解决丑陋的文本节点问题。

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

上面的代码将生成:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

代替这个:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

免责声明:可能存在一些限制。

Here’s my (hacky?) solution to get around the ugly text node problem.

uglyXml = doc.toprettyxml(indent='  ')

text_re = re.compile('>\n\s+([^<>\s].*?)\n\s+</', re.DOTALL)    
prettyXml = text_re.sub('>\g<1></', uglyXml)

print prettyXml

The above code will produce:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>1</id>
    <title>Add Visual Studio 2005 and 2008 solution files</title>
    <details>We need Visual Studio 2005/2008 project files for Windows.</details>
  </issue>
</issues>

Instead of this:

<?xml version="1.0" ?>
<issues>
  <issue>
    <id>
      1
    </id>
    <title>
      Add Visual Studio 2005 and 2008 solution files
    </title>
    <details>
      We need Visual Studio 2005/2008 project files for Windows.
    </details>
  </issue>
</issues>

Disclaimer: There are probably some limitations.


回答 4

正如其他人指出的那样,lxml内置了一个漂亮的打印机。

请注意,尽管默认情况下它将CDATA节更改为普通文本,这可能会带来讨厌的结果。

这是一个Python函数,可保留输入文件,仅更改缩进(请注意strip_cdata=False)。此外,它确保输出使用UTF-8作为编码,而不是默认的ASCII(请注意encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

用法示例:

prettyPrintXml('some_folder/some_file.xml')

As others pointed out, lxml has a pretty printer built in.

Be aware though that by default it changes CDATA sections to normal text, which can have nasty results.

Here’s a Python function that preserves the input file and only changes the indentation (notice the strip_cdata=False). Furthermore it makes sure the output uses UTF-8 as encoding instead of the default ASCII (notice the encoding='utf-8'):

from lxml import etree

def prettyPrintXml(xmlFilePathToPrettyPrint):
    assert xmlFilePathToPrettyPrint is not None
    parser = etree.XMLParser(resolve_entities=False, strip_cdata=False)
    document = etree.parse(xmlFilePathToPrettyPrint, parser)
    document.write(xmlFilePathToPrettyPrint, pretty_print=True, encoding='utf-8')

Example usage:

prettyPrintXml('some_folder/some_file.xml')

回答 5

BeautifulSoup有一个易于使用的prettify()方法。

每个缩进级别缩进一个空格。它比lxml的pretty_print好得多,而且又短又可爱。

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()

BeautifulSoup has a easy to use prettify() method.

It indents one space per indentation level. It works much better than lxml’s pretty_print and is short and sweet.

from bs4 import BeautifulSoup

bs = BeautifulSoup(open(xml_file), 'xml')
print bs.prettify()

回答 6

如果有的xmllint话,可以产生一个子流程并使用它。xmllint --format <file>漂亮地将其输入XML打印到标准输出。

请注意,此方法使用python外部的程序,这使其有点像hack。

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))

If you have xmllint you can spawn a subprocess and use it. xmllint --format <file> pretty-prints its input XML to standard output.

Note that this method uses an program external to python, which makes it sort of a hack.

def pretty_print_xml(xml):
    proc = subprocess.Popen(
        ['xmllint', '--format', '/dev/stdin'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
    )
    (output, error_output) = proc.communicate(xml);
    return output

print(pretty_print_xml(data))

回答 7

我尝试编辑上面的“ ade”答案,但是在最初匿名提供反馈后,Stack Overflow不允许我进行编辑。这是用于精巧打印ElementTree的函数的错误版本。

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '

I tried to edit “ade”s answer above, but Stack Overflow wouldn’t let me edit after I had initially provided feedback anonymously. This is a less buggy version of the function to pretty-print an ElementTree.

def indent(elem, level=0, more_sibs=False):
    i = "\n"
    if level:
        i += (level-1) * '  '
    num_kids = len(elem)
    if num_kids:
        if not elem.text or not elem.text.strip():
            elem.text = i + "  "
            if level:
                elem.text += '  '
        count = 0
        for kid in elem:
            indent(kid, level+1, count < num_kids - 1)
            count += 1
        if not elem.tail or not elem.tail.strip():
            elem.tail = i
            if more_sibs:
                elem.tail += '  '
    else:
        if level and (not elem.tail or not elem.tail.strip()):
            elem.tail = i
            if more_sibs:
                elem.tail += '  '

回答 8

如果您使用的是DOM实现,则每种都有自己的内置漂亮打印形式:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

如果您使用的其他东西没有它自己的漂亮打印机-或那些漂亮打印机没有按照您想要的方式做-您可能必须编写或继承自己的序列化器。

If you’re using a DOM implementation, each has their own form of pretty-printing built-in:

# minidom
#
document.toprettyxml()

# 4DOM
#
xml.dom.ext.PrettyPrint(document, stream)

# pxdom (or other DOM Level 3 LS-compliant imp)
#
serializer.domConfig.setParameter('format-pretty-print', True)
serializer.writeToString(document)

If you’re using something else without its own pretty-printer — or those pretty-printers don’t quite do it the way you want —  you’d probably have to write or subclass your own serialiser.


回答 9

我对minidom的漂亮字体有一些疑问。每当我尝试用给定编码之外的字符漂亮地打印文档时,都会出现UnicodeError,例如,如果我在文档中有一个β并且尝试了doc.toprettyxml(encoding='latin-1')。这是我的解决方法:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')

I had some problems with minidom’s pretty print. I’d get a UnicodeError whenever I tried pretty-printing a document with characters outside the given encoding, eg if I had a β in a document and I tried doc.toprettyxml(encoding='latin-1'). Here’s my workaround for it:

def toprettyxml(doc, encoding):
    """Return a pretty-printed XML document in a given encoding."""
    unistr = doc.toprettyxml().replace(u'<?xml version="1.0" ?>',
                          u'<?xml version="1.0" encoding="%s"?>' % encoding)
    return unistr.encode(encoding, 'xmlcharrefreplace')

回答 10

from yattag import indent

pretty_string = indent(ugly_string)

除非您要求使用以下命令,否则它不会在文本节点内添加空格或换行符:

indent(mystring, indent_text = True)

您可以指定缩进单位应该是什么以及换行符应该是什么样。

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

该文档位于http://www.yattag.org主页上。

from yattag import indent

pretty_string = indent(ugly_string)

It won’t add spaces or newlines inside text nodes, unless you ask for it with:

indent(mystring, indent_text = True)

You can specify what the indentation unit should be and what the newline should look like.

pretty_xml_string = indent(
    ugly_xml_string,
    indentation = '    ',
    newline = '\r\n'
)

The doc is on http://www.yattag.org homepage.


回答 11

我编写了一个解决方案,以遍历现有的ElementTree并按照通常期望的那样使用文本/尾部缩进。

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings

I wrote a solution to walk through an existing ElementTree and use text/tail to indent it as one typically expects.

def prettify(element, indent='  '):
    queue = [(0, element)]  # (level, element)
    while queue:
        level, element = queue.pop(0)
        children = [(level + 1, child) for child in list(element)]
        if children:
            element.text = '\n' + indent * (level+1)  # for child open
        if queue:
            element.tail = '\n' + indent * queue[0][0]  # for sibling open
        else:
            element.tail = '\n' + indent * (level-1)  # for parent close
        queue[0:0] = children  # prepend so children come before siblings

回答 12

python的XML漂亮打印对于此任务看起来非常不错。(也应适当命名。)

一种替代方法是使用pyXML,它具有PrettyPrint功能

XML pretty print for python looks pretty good for this task. (Appropriately named, too.)

An alternative is to use pyXML, which has a PrettyPrint function.


回答 13

这是一个Python3解决方案,它摆脱了丑陋的换行符问题(大量空白),并且仅使用标准库,而不像大多数其他实现那样。

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

我在这里找到了解决常见换行问题的方法。

Here’s a Python3 solution that gets rid of the ugly newline issue (tons of whitespace), and it only uses standard libraries unlike most other implementations.

import xml.etree.ElementTree as ET
import xml.dom.minidom
import os

def pretty_print_xml_given_root(root, output_xml):
    """
    Useful for when you are editing xml data on the fly
    """
    xml_string = xml.dom.minidom.parseString(ET.tostring(root)).toprettyxml()
    xml_string = os.linesep.join([s for s in xml_string.splitlines() if s.strip()]) # remove the weird newline issue
    with open(output_xml, "w") as file_out:
        file_out.write(xml_string)

def pretty_print_xml_given_file(input_xml, output_xml):
    """
    Useful for when you want to reformat an already existing xml file
    """
    tree = ET.parse(input_xml)
    root = tree.getroot()
    pretty_print_xml_given_root(root, output_xml)

I found how to fix the common newline issue here.


回答 14

您可以将流行的外部库xmltodict与一起使用unparsepretty=True您将获得最佳结果:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False反对<?xml version="1.0" encoding="UTF-8"?>在顶部。

You can use popular external library xmltodict, with unparse and pretty=True you will get best result:

xmltodict.unparse(
    xmltodict.parse(my_xml), full_document=False, pretty=True)

full_document=False against <?xml version="1.0" encoding="UTF-8"?> at the top.


回答 15

看一下vkbeautify模块。

这是我非常流行的javascript / nodejs插件的同名python版本。它可以漂亮地打印/最小化XML,JSON和CSS文本。输入和输出可以是字符串/文件的任意组合。它非常紧凑,没有任何依赖性。

例子

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file') 

Take a look at the vkbeautify module.

It is a python version of my very popular javascript/nodejs plugin with the same name. It can pretty-print/minify XML, JSON and CSS text. Input and output can be string/file in any combinations. It is very compact and doesn’t have any dependency.

Examples:

import vkbeautify as vkb

vkb.xml(text)                       
vkb.xml(text, 'path/to/dest/file')  
vkb.xml('path/to/src/file')        
vkb.xml('path/to/src/file', 'path/to/dest/file') 

回答 16

如果您不想进行重新解析,则可以使用xmlpp.py库和该get_pprint()函数。在我的用例中,它工作得很好且流畅,而无需重新解析为lxml ElementTree对象。

An alternative if you don’t want to have to reparse, there is the xmlpp.py library with the get_pprint() function. It worked nice and smoothly for my use cases, without having to reparse to an lxml ElementTree object.


回答 17

您可以尝试这种变化…

安装BeautifulSoup和后端lxml(解析器)库:

user$ pip3 install lxml bs4

处理您的XML文档:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())  

You can try this variation…

Install BeautifulSoup and the backend lxml (parser) libraries:

user$ pip3 install lxml bs4

Process your XML document:

from bs4 import BeautifulSoup

with open('/path/to/file.xml', 'r') as doc: 
    for line in doc: 
        print(BeautifulSoup(line, 'lxml-xml').prettify())  

回答 18

我遇到了这个问题,并像这样解决了它:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

在我的代码中,此方法的调用方式如下:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

这仅是因为etree默认情况下会使用two spaces缩进,但我发现并不太强调缩进,因此效果不佳。我无法为etree设置任何设置或为任何函数更改标准etree缩进的参数。我喜欢使用etree多么容易,但这确实让我很烦。

I had this problem and solved it like this:

def write_xml_file (self, file, xml_root_element, xml_declaration=False, pretty_print=False, encoding='unicode', indent='\t'):
    pretty_printed_xml = etree.tostring(xml_root_element, xml_declaration=xml_declaration, pretty_print=pretty_print, encoding=encoding)
    if pretty_print: pretty_printed_xml = pretty_printed_xml.replace('  ', indent)
    file.write(pretty_printed_xml)

In my code this method is called like this:

try:
    with open(file_path, 'w') as file:
        file.write('<?xml version="1.0" encoding="utf-8" ?>')

        # create some xml content using etree ...

        xml_parser = XMLParser()
        xml_parser.write_xml_file(file, xml_root, xml_declaration=False, pretty_print=True, encoding='unicode', indent='\t')

except IOError:
    print("Error while writing in log file!")

This works only because etree by default uses two spaces to indent, which I don’t find very much emphasizing the indentation and therefore not pretty. I couldn’t ind any setting for etree or parameter for any function to change the standard etree indent. I like how easy it is to use etree, but this was really annoying me.


回答 19

要将整个xml文档转换为漂亮的xml文档
(例如:假设您已提取[解压缩] LibreOffice Writer .odt或.ods文件,并且想要将丑陋的“ content.xml”文件转换为自动化git版本控制git difftool.odt / .ods文件的生成,例如我在此处实现的)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

参考资料:
-感谢本·诺兰德在本页上的回答,这为我提供了大部分帮助。

For converting an entire xml document to a pretty xml document
(ex: assuming you’ve extracted [unzipped] a LibreOffice Writer .odt or .ods file, and you want to convert the ugly “content.xml” file to a pretty one for automated git version control and git difftooling of .odt/.ods files, such as I’m implementing here)

import xml.dom.minidom

file = open("./content.xml", 'r')
xml_string = file.read()
file.close()

parsed_xml = xml.dom.minidom.parseString(xml_string)
pretty_xml_as_string = parsed_xml.toprettyxml()

file = open("./content_new.xml", 'w')
file.write(pretty_xml_as_string)
file.close()

References:
– Thanks to Ben Noland’s answer on this page which got me most of the way there.


回答 20

from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

对于带有中文的xml来说效果很好!

from lxml import etree
import xml.dom.minidom as mmd

xml_root = etree.parse(xml_fiel_path, etree.XMLParser())

def print_xml(xml_root):
    plain_xml = etree.tostring(xml_root).decode('utf-8')
    urgly_xml = ''.join(plain_xml .split())
    good_xml = mmd.parseString(urgly_xml)
    print(good_xml.toprettyxml(indent='    ',))

It’s working well for the xml with Chinese!


回答 21

如果由于某种原因您无法使用其他用户提到的任何Python模块,那么我建议使用以下针对Python 2.7的解决方案:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

据我所知,该解决方案将在xmllint安装了该软件包的基于Unix的系统上运行。

If for some reason you can’t get your hands on any of the Python modules that other users mentioned, I suggest the following solution for Python 2.7:

import subprocess

def makePretty(filepath):
  cmd = "xmllint --format " + filepath
  prettyXML = subprocess.check_output(cmd, shell = True)
  with open(filepath, "w") as outfile:
    outfile.write(prettyXML)

As far as I know, this solution will work on Unix-based systems that have the xmllint package installed.


回答 22

我用几行代码解决了这个问题,打开文件,遍历文件并添加缩进,然后再次保存。我正在处理小型xml文件,并且不想添加依赖项,也不想为用户安装更多库。无论如何,这就是我最终得到的结果:

    f = open(file_name,'r')
    xml = f.read()
    f.close()

    #Removing old indendations
    raw_xml = ''        
    for line in xml:
        raw_xml += line

    xml = raw_xml

    new_xml = ''
    indent = '    '
    deepness = 0

    for i in range((len(xml))):

        new_xml += xml[i]   
        if(i<len(xml)-3):

            simpleSplit = xml[i:(i+2)] == '><'
            advancSplit = xml[i:(i+3)] == '></'        
            end = xml[i:(i+2)] == '/>'    
            start = xml[i] == '<'

            if(advancSplit):
                deepness += -1
                new_xml += '\n' + indent*deepness
                simpleSplit = False
                deepness += -1
            if(simpleSplit):
                new_xml += '\n' + indent*deepness
            if(start):
                deepness += 1
            if(end):
                deepness += -1

    f = open(file_name,'w')
    f.write(new_xml)
    f.close()

它对我有用,也许有人会使用它:)

I solved this with some lines of code, opening the file, going trough it and adding indentation, then saving it again. I was working with small xml files, and did not want to add dependencies, or more libraries to install for the user. Anyway, here is what I ended up with:

    f = open(file_name,'r')
    xml = f.read()
    f.close()

    #Removing old indendations
    raw_xml = ''        
    for line in xml:
        raw_xml += line

    xml = raw_xml

    new_xml = ''
    indent = '    '
    deepness = 0

    for i in range((len(xml))):

        new_xml += xml[i]   
        if(i<len(xml)-3):

            simpleSplit = xml[i:(i+2)] == '><'
            advancSplit = xml[i:(i+3)] == '></'        
            end = xml[i:(i+2)] == '/>'    
            start = xml[i] == '<'

            if(advancSplit):
                deepness += -1
                new_xml += '\n' + indent*deepness
                simpleSplit = False
                deepness += -1
            if(simpleSplit):
                new_xml += '\n' + indent*deepness
            if(start):
                deepness += 1
            if(end):
                deepness += -1

    f = open(file_name,'w')
    f.write(new_xml)
    f.close()

It works for me, perhaps someone will have some use of it :)


SQLAlchemy按降序排列?

问题:SQLAlchemy按降序排列?

如何descending在如下所示的SQLAlchemy查询中使用ORDER BY ?

此查询有效,但以升序返回:

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount) # This row :)
        )

如果我尝试:

.order_by(desc(model.Entry.amount))

然后我得到:NameError: global name 'desc' is not defined

How can I use ORDER BY descending in a SQLAlchemy query like the following?

This query works, but returns them in ascending order:

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount) # This row :)
        )

If I try:

.order_by(desc(model.Entry.amount))

then I get: NameError: global name 'desc' is not defined.


回答 0

与FYI一样,您也可以将这些内容指定为列属性。例如,我可能已经做过:

.order_by(model.Entry.amount.desc())

这很方便,因为它避免了import,并且可以在其他地方(例如关系定义等)中使用它。

有关更多信息,您可以参考此

Just as an FYI, you can also specify those things as column attributes. For instance, I might have done:

.order_by(model.Entry.amount.desc())

This is handy since it avoids an import, and you can use it on other places such as in a relation definition, etc.

For more information, you can refer this


回答 1

from sqlalchemy import desc
someselect.order_by(desc(table1.mycol))

来自@ jpmc26的用法

from sqlalchemy import desc
someselect.order_by(desc(table1.mycol))

Usage from @jpmc26


回答 2

您可能要做的另一件事是:

.order_by("name desc")

这将导致:ORDER BY名称desc。这里的缺点是按顺序使用显式列名。

One other thing you might do is:

.order_by("name desc")

This will result in: ORDER BY name desc. The disadvantage here is the explicit column name used in order by.


回答 3

您可以.desc()像这样在查询中使用函数

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount.desc())
        )

这将按金额降序排列,或者

query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    model.Entry.amount.desc()
)
)

使用SQLAlchemy的desc函数

from sqlalchemy import desc
query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    desc(model.Entry.amount)
)
)

对于官方文档,请使用链接或检查以下代码段

sqlalchemy.sql.expression.desc(column)产生一个降序的ORDER BY子句元素。

例如:

from sqlalchemy import desc

stmt = select([users_table]).order_by(desc(users_table.c.name))

将产生如下SQL:

SELECT id, name FROM user ORDER BY name DESC

desc()函数是ColumnElement.desc()方法的独立版本,可用于所有SQL表达式,例如:

stmt = select([users_table]).order_by(users_table.c.name.desc())

参数column –一个ColumnElement(例如标量SQL表达式),用于应用desc()操作。

也可以看看

asc()

nullsfirst()

nullslast()

Select.order_by()

You can use .desc() function in your query just like this

query = (model.Session.query(model.Entry)
        .join(model.ClassificationItem)
        .join(model.EnumerationValue)
        .filter_by(id=c.row.id)
        .order_by(model.Entry.amount.desc())
        )

This will order by amount in descending order or

query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    model.Entry.amount.desc()
)
)

Use of desc function of SQLAlchemy

from sqlalchemy import desc
query = session.query(
    model.Entry
).join(
    model.ClassificationItem
).join(
    model.EnumerationValue
).filter_by(
    id=c.row.id
).order_by(
    desc(model.Entry.amount)
)
)

For official docs please use the link or check below snippet

sqlalchemy.sql.expression.desc(column) Produce a descending ORDER BY clause element.

e.g.:

from sqlalchemy import desc

stmt = select([users_table]).order_by(desc(users_table.c.name))

will produce SQL as:

SELECT id, name FROM user ORDER BY name DESC

The desc() function is a standalone version of the ColumnElement.desc() method available on all SQL expressions, e.g.:

stmt = select([users_table]).order_by(users_table.c.name.desc())

Parameters column – A ColumnElement (e.g. scalar SQL expression) with which to apply the desc() operation.

See also

asc()

nullsfirst()

nullslast()

Select.order_by()


回答 4

你可以试试: .order_by(ClientTotal.id.desc())

session = Session()
auth_client_name = 'client3' 
result_by_auth_client = session.query(ClientTotal).filter(ClientTotal.client ==
auth_client_name).order_by(ClientTotal.id.desc()).all()

for rbac in result_by_auth_client:
    print(rbac.id) 
session.close()

You can try: .order_by(ClientTotal.id.desc())

session = Session()
auth_client_name = 'client3' 
result_by_auth_client = session.query(ClientTotal).filter(ClientTotal.client ==
auth_client_name).order_by(ClientTotal.id.desc()).all()

for rbac in result_by_auth_client:
    print(rbac.id) 
session.close()

回答 5

@Radu答案的补充,与SQL一样,如果您有许多具有相同属性的表,则可以在参数中添加表名。

.order_by("TableName.name desc")

Complementary at @Radu answer, As in SQL, you can add the table name in the parameter if you have many table with the same attribute.

.order_by("TableName.name desc")

Python中的字母范围

问题:Python中的字母范围

而不是像这样列出字母字符:

alpha = ['a', 'b', 'c', 'd'.........'z']

有什么办法可以将它分组到某个范围之内?例如,对于数字,可以使用进行分组range()

range(1, 10)

Instead of making a list of alphabet characters like this:

alpha = ['a', 'b', 'c', 'd'.........'z']

is there any way that we can group it to a range or something? For example, for numbers it can be grouped using range():

range(1, 10)

回答 0

>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

如果您确实需要列表:

>>> list(string.ascii_lowercase)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

并做到这一点 range

>>> list(map(chr, range(97, 123))) #or list(map(chr, range(ord('a'), ord('z')+1)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

其他有用的string模块功能:

>>> help(string) # on Python 3
....
DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    whitespace = ' \t\n\r\x0b\x0c'
>>> import string
>>> string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

If you really need a list:

>>> list(string.ascii_lowercase)
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

And to do it with range

>>> list(map(chr, range(97, 123))) #or list(map(chr, range(ord('a'), ord('z')+1)))
['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z']

Other helpful string module features:

>>> help(string) # on Python 3
....
DATA
    ascii_letters = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'
    ascii_lowercase = 'abcdefghijklmnopqrstuvwxyz'
    ascii_uppercase = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
    digits = '0123456789'
    hexdigits = '0123456789abcdefABCDEF'
    octdigits = '01234567'
    printable = '0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'
    punctuation = '!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
    whitespace = ' \t\n\r\x0b\x0c'

回答 1

[chr(i) for i in range(ord('a'),ord('z')+1)]
[chr(i) for i in range(ord('a'),ord('z')+1)]

回答 2

在Python 2.7和3中,您可以使用以下代码:

import string
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

正如@Zaz所说: string.lowercase已弃用,不再在Python 3中string.ascii_lowercase工作,但在两者中都工作

In Python 2.7 and 3 you can use this:

import string
string.ascii_lowercase
'abcdefghijklmnopqrstuvwxyz'

string.ascii_uppercase
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

As @Zaz says: string.lowercase is deprecated and no longer works in Python 3 but string.ascii_lowercase works in both


回答 3

这是一个简单的字母范围实现:

def letter_range(start, stop="{", step=1):
    """Yield a range of lowercase letters.""" 
    for ord_ in range(ord(start.lower()), ord(stop.lower()), step):
        yield chr(ord_)

演示版

list(letter_range("a", "f"))
# ['a', 'b', 'c', 'd', 'e']

list(letter_range("a", "f", step=2))
# ['a', 'c', 'e']

Here is a simple letter-range implementation:

Code

def letter_range(start, stop="{", step=1):
    """Yield a range of lowercase letters.""" 
    for ord_ in range(ord(start.lower()), ord(stop.lower()), step):
        yield chr(ord_)

Demo

list(letter_range("a", "f"))
# ['a', 'b', 'c', 'd', 'e']

list(letter_range("a", "f", step=2))
# ['a', 'c', 'e']

回答 4

如果要查找letters[1:10]与R 相等的值,则可以使用:

 import string
 list(string.ascii_lowercase[0:10])

If you are looking to an equivalent of letters[1:10] from R, you can use:

 import string
 list(string.ascii_lowercase[0:10])

回答 5

使用内置的range函数在python中打印大写和小写字母

def upperCaseAlphabets():
    print("Upper Case Alphabets")
    for i in range(65, 91):
        print(chr(i), end=" ")
    print()

def lowerCaseAlphabets():
    print("Lower Case Alphabets")
    for i in range(97, 123):
        print(chr(i), end=" ")

upperCaseAlphabets();
lowerCaseAlphabets();

Print the Upper and Lower case alphabets in python using a built-in range function

def upperCaseAlphabets():
    print("Upper Case Alphabets")
    for i in range(65, 91):
        print(chr(i), end=" ")
    print()

def lowerCaseAlphabets():
    print("Lower Case Alphabets")
    for i in range(97, 123):
        print(chr(i), end=" ")

upperCaseAlphabets();
lowerCaseAlphabets();

回答 6

这是我能找出的最简单方法:

#!/usr/bin/python3 for i in range(97, 123): print("{:c}".format(i), end='')

因此,从97到122是与’a’和’z’等效的ASCII码。请注意,小写字母和放置123的需要,因为将不包括在内)。

在打印功能中,请确保设置{:c}(字符)格式,在这种情况下,我们希望它一起打印所有内容,甚至不让最后一行换行,这样end=''就可以了。

结果是这样的: abcdefghijklmnopqrstuvwxyz

This is the easiest way I can figure out:

#!/usr/bin/python3 for i in range(97, 123): print("{:c}".format(i), end='')

So, 97 to 122 are the ASCII number equivalent to ‘a’ to and ‘z’. Notice the lowercase and the need to put 123, since it will not be included).

In print function make sure to set the {:c} (character) format, and, in this case, we want it to print it all together not even letting a new line at the end, so end=''would do the job.

The result is this: abcdefghijklmnopqrstuvwxyz


获取引起异常的异常描述和堆栈跟踪,全部作为字符串

问题:获取引起异常的异常描述和堆栈跟踪,全部作为字符串

我看过很多关于Python中堆栈跟踪和异常的文章。但是还没有找到我所需要的。

我有一段Python 2.7代码可能会引发异常。我想捕获它并将其完整描述和导致错误的堆栈跟踪分配给字符串(只是我们在控制台上看到的所有内容)。我需要此字符串以将其打印到GUI中的文本框中。

像这样:

try:
    method_that_can_raise_an_exception(params)
except Exception as e:
    print_to_textbox(complete_exception_description(e))

问题是:函数什么complete_exception_description

I’ve seen a lot of posts about stack trace and exceptions in Python. But haven’t found what I need.

I have a chunk of Python 2.7 code that may raise an exception. I would like to catch it and assign to a string its full description and the stack trace that caused the error (simply all we use to see on the console). I need this string to print it to a text box in the GUI.

Something like this:

try:
    method_that_can_raise_an_exception(params)
except Exception as e:
    print_to_textbox(complete_exception_description(e))

The problem is: what is the function complete_exception_description?


回答 0

请参阅traceback模块,特别是format_exc()功能。在这里

import traceback

try:
    raise ValueError
except ValueError:
    tb = traceback.format_exc()
else:
    tb = "No error"
finally:
    print tb

See the traceback module, specifically the format_exc() function. Here.

import traceback

try:
    raise ValueError
except ValueError:
    tb = traceback.format_exc()
else:
    tb = "No error"
finally:
    print tb

回答 1

让我们创建一个相当复杂的堆栈跟踪,以证明我们获得了完整的堆栈跟踪:

def raise_error():
    raise RuntimeError('something bad happened!')

def do_something_that_might_error():
    raise_error()

记录完整的堆栈跟踪

最佳做法是为模块设置一个记录器。它将知道模块的名称,并能够更改级别(在其他属性(例如处理程序)中)

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

我们可以使用此记录器来获取错误:

try:
    do_something_that_might_error()
except Exception as error:
    logger.exception(error)

哪个日志:

ERROR:__main__:something bad happened!
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

因此,我们得到与发生错误时相同的输出:

>>> do_something_that_might_error()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

仅获取字符串

如果您真的只想要字符串,请改用traceback.format_exc函数,在此处演示如何记录字符串:

import traceback
try:
    do_something_that_might_error()
except Exception as error:
    just_the_string = traceback.format_exc()
    logger.debug(just_the_string)

哪个日志:

DEBUG:__main__:Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

Let’s create a decently complicated stacktrace, in order to demonstrate that we get the full stacktrace:

def raise_error():
    raise RuntimeError('something bad happened!')

def do_something_that_might_error():
    raise_error()

Logging the full stacktrace

A best practice is to have a logger set up for your module. It will know the name of the module and be able to change levels (among other attributes, such as handlers)

import logging
logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

And we can use this logger to get the error:

try:
    do_something_that_might_error()
except Exception as error:
    logger.exception(error)

Which logs:

ERROR:__main__:something bad happened!
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

And so we get the same output as when we have an error:

>>> do_something_that_might_error()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

Getting just the string

If you really just want the string, use the traceback.format_exc function instead, demonstrating logging the string here:

import traceback
try:
    do_something_that_might_error()
except Exception as error:
    just_the_string = traceback.format_exc()
    logger.debug(just_the_string)

Which logs:

DEBUG:__main__:Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 2, in do_something_that_might_error
  File "<stdin>", line 2, in raise_error
RuntimeError: something bad happened!

回答 2

对于Python 3,以下代码将Exception完全按照使用traceback.format_exc()以下命令获得的格式格式化对象:

import traceback

try: 
    method_that_can_raise_an_exception(params)
except Exception as ex:
    print(''.join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__)))

优点是仅Exception需要对象(由于记录的__traceback__属性),因此可以更轻松地将其作为参数传递给另一个函数以进行进一步处理。

With Python 3, the following code will format an Exception object exactly as would be obtained using traceback.format_exc():

import traceback

try: 
    method_that_can_raise_an_exception(params)
except Exception as ex:
    print(''.join(traceback.format_exception(etype=type(ex), value=ex, tb=ex.__traceback__)))

The advantage being that only the Exception object is needed (thanks to the recorded __traceback__ attribute), and can therefore be more easily passed as an argument to another function for further processing.


回答 3

>>> import sys
>>> import traceback
>>> try:
...   5 / 0
... except ZeroDivisionError as e:
...   type_, value_, traceback_ = sys.exc_info()
>>> traceback.format_tb(traceback_)
['  File "<stdin>", line 2, in <module>\n']
>>> value_
ZeroDivisionError('integer division or modulo by zero',)
>>> type_
<type 'exceptions.ZeroDivisionError'>
>>>
>>> 5 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

您使用sys.exc_info()来收集信息和traceback模块中的函数以对其进行格式化。 以下是一些格式化示例。

整个异常字符串位于:

>>> ex = traceback.format_exception(type_, value_, traceback_)
>>> ex
['Traceback (most recent call last):\n', '  File "<stdin>", line 2, in <module>\n', 'ZeroDivisionError: integer division or modulo by zero\n']
>>> import sys
>>> import traceback
>>> try:
...   5 / 0
... except ZeroDivisionError as e:
...   type_, value_, traceback_ = sys.exc_info()
>>> traceback.format_tb(traceback_)
['  File "<stdin>", line 2, in <module>\n']
>>> value_
ZeroDivisionError('integer division or modulo by zero',)
>>> type_
<type 'exceptions.ZeroDivisionError'>
>>>
>>> 5 / 0
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ZeroDivisionError: integer division or modulo by zero

You use sys.exc_info() to collect the information and the functions in the traceback module to format it. Here are some examples for formatting it.

The whole exception string is at:

>>> ex = traceback.format_exception(type_, value_, traceback_)
>>> ex
['Traceback (most recent call last):\n', '  File "<stdin>", line 2, in <module>\n', 'ZeroDivisionError: integer division or modulo by zero\n']

回答 4

对于那些使用Python-3的人

使用tracebackmodule和exception.__traceback__one可以提取堆栈跟踪,如下所示:

  • 使用获取当前的堆栈跟踪traceback.extract_stack()
  • 删除最后三个元素(因为那些是使我进入调试功能的堆栈中的条目)
  • __traceback__使用以下方法从异常对象追加traceback.extract_tb()
  • 使用格式化整个东西 traceback.format_list()
import traceback
def exception_to_string(excp):
   stack = traceback.extract_stack()[:-3] + traceback.extract_tb(excp.__traceback__)  # add limit=?? 
   pretty = traceback.format_list(stack)
   return ''.join(pretty) + '\n  {} {}'.format(excp.__class__,excp)

一个简单的演示:

def foo():
    try:
        something_invalid()
    except Exception as e:
        print(exception_to_string(e))

def bar():
    return foo()

调用时,将得到以下输出bar()

  File "./test.py", line 57, in <module>
    bar()
  File "./test.py", line 55, in bar
    return foo()
  File "./test.py", line 50, in foo
    something_invalid()

  <class 'NameError'> name 'something_invalid' is not defined

For those using Python-3

Using traceback module and exception.__traceback__ one can extract the stack-trace as follows:

  • grab the current stack-trace using traceback.extract_stack()
  • remove the last three elements (as those are entries in the stack that got me to my debug function)
  • append the __traceback__ from the exception object using traceback.extract_tb()
  • format the whole thing using traceback.format_list()
import traceback
def exception_to_string(excp):
   stack = traceback.extract_stack()[:-3] + traceback.extract_tb(excp.__traceback__)  # add limit=?? 
   pretty = traceback.format_list(stack)
   return ''.join(pretty) + '\n  {} {}'.format(excp.__class__,excp)

A simple demonstration:

def foo():
    try:
        something_invalid()
    except Exception as e:
        print(exception_to_string(e))

def bar():
    return foo()

We get the following output when we call bar():

  File "./test.py", line 57, in <module>
    bar()
  File "./test.py", line 55, in bar
    return foo()
  File "./test.py", line 50, in foo
    something_invalid()

  <class 'NameError'> name 'something_invalid' is not defined

回答 5

您可能还会考虑使用内置的Python模块cgitb来获取一些非常好且格式正确的异常信息,包括局部变量值,源代码上下文,函数参数等。

例如此代码…

import cgitb
cgitb.enable(format='text')

def func2(a, divisor):
    return a / divisor

def func1(a, b):
    c = b - 5
    return func2(a, c)

func1(1, 5)

我们得到这个异常输出…

ZeroDivisionError
Python 3.4.2: C:\tools\python\python.exe
Tue Sep 22 15:29:33 2015

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 c:\TEMP\cgittest2.py in <module>()
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
func1 = <function func1>

 c:\TEMP\cgittest2.py in func1(a=1, b=5)
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
global func2 = <function func2>
a = 1
c = 0

 c:\TEMP\cgittest2.py in func2(a=1, divisor=0)
    3
    4 def func2(a, divisor):
    5   return a / divisor
    6
    7 def func1(a, b):
a = 1
divisor = 0
ZeroDivisionError: division by zero
    __cause__ = None
    __class__ = <class 'ZeroDivisionError'>
    __context__ = None
    __delattr__ = <method-wrapper '__delattr__' of ZeroDivisionError object>
    __dict__ = {}
    __dir__ = <built-in method __dir__ of ZeroDivisionError object>
    __doc__ = 'Second argument to a division or modulo operation was zero.'
    __eq__ = <method-wrapper '__eq__' of ZeroDivisionError object>
    __format__ = <built-in method __format__ of ZeroDivisionError object>
    __ge__ = <method-wrapper '__ge__' of ZeroDivisionError object>
    __getattribute__ = <method-wrapper '__getattribute__' of ZeroDivisionError object>
    __gt__ = <method-wrapper '__gt__' of ZeroDivisionError object>
    __hash__ = <method-wrapper '__hash__' of ZeroDivisionError object>
    __init__ = <method-wrapper '__init__' of ZeroDivisionError object>
    __le__ = <method-wrapper '__le__' of ZeroDivisionError object>
    __lt__ = <method-wrapper '__lt__' of ZeroDivisionError object>
    __ne__ = <method-wrapper '__ne__' of ZeroDivisionError object>
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of ZeroDivisionError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of ZeroDivisionError object>
    __repr__ = <method-wrapper '__repr__' of ZeroDivisionError object>
    __setattr__ = <method-wrapper '__setattr__' of ZeroDivisionError object>
    __setstate__ = <built-in method __setstate__ of ZeroDivisionError object>
    __sizeof__ = <built-in method __sizeof__ of ZeroDivisionError object>
    __str__ = <method-wrapper '__str__' of ZeroDivisionError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __suppress_context__ = False
    __traceback__ = <traceback object>
    args = ('division by zero',)
    with_traceback = <built-in method with_traceback of ZeroDivisionError object>

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "cgittest2.py", line 11, in <module>
    func1(1, 5)
  File "cgittest2.py", line 9, in func1
    return func2(a, c)
  File "cgittest2.py", line 5, in func2
    return a / divisor
ZeroDivisionError: division by zero

You might also consider using the built-in Python module, cgitb, to get some really good, nicely formatted exception information including local variable values, source code context, function parameters etc..

For instance for this code…

import cgitb
cgitb.enable(format='text')

def func2(a, divisor):
    return a / divisor

def func1(a, b):
    c = b - 5
    return func2(a, c)

func1(1, 5)

we get this exception output…

ZeroDivisionError
Python 3.4.2: C:\tools\python\python.exe
Tue Sep 22 15:29:33 2015

A problem occurred in a Python script.  Here is the sequence of
function calls leading up to the error, in the order they occurred.

 c:\TEMP\cgittest2.py in <module>()
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
func1 = <function func1>

 c:\TEMP\cgittest2.py in func1(a=1, b=5)
    7 def func1(a, b):
    8   c = b - 5
    9   return func2(a, c)
   10
   11 func1(1, 5)
global func2 = <function func2>
a = 1
c = 0

 c:\TEMP\cgittest2.py in func2(a=1, divisor=0)
    3
    4 def func2(a, divisor):
    5   return a / divisor
    6
    7 def func1(a, b):
a = 1
divisor = 0
ZeroDivisionError: division by zero
    __cause__ = None
    __class__ = <class 'ZeroDivisionError'>
    __context__ = None
    __delattr__ = <method-wrapper '__delattr__' of ZeroDivisionError object>
    __dict__ = {}
    __dir__ = <built-in method __dir__ of ZeroDivisionError object>
    __doc__ = 'Second argument to a division or modulo operation was zero.'
    __eq__ = <method-wrapper '__eq__' of ZeroDivisionError object>
    __format__ = <built-in method __format__ of ZeroDivisionError object>
    __ge__ = <method-wrapper '__ge__' of ZeroDivisionError object>
    __getattribute__ = <method-wrapper '__getattribute__' of ZeroDivisionError object>
    __gt__ = <method-wrapper '__gt__' of ZeroDivisionError object>
    __hash__ = <method-wrapper '__hash__' of ZeroDivisionError object>
    __init__ = <method-wrapper '__init__' of ZeroDivisionError object>
    __le__ = <method-wrapper '__le__' of ZeroDivisionError object>
    __lt__ = <method-wrapper '__lt__' of ZeroDivisionError object>
    __ne__ = <method-wrapper '__ne__' of ZeroDivisionError object>
    __new__ = <built-in method __new__ of type object>
    __reduce__ = <built-in method __reduce__ of ZeroDivisionError object>
    __reduce_ex__ = <built-in method __reduce_ex__ of ZeroDivisionError object>
    __repr__ = <method-wrapper '__repr__' of ZeroDivisionError object>
    __setattr__ = <method-wrapper '__setattr__' of ZeroDivisionError object>
    __setstate__ = <built-in method __setstate__ of ZeroDivisionError object>
    __sizeof__ = <built-in method __sizeof__ of ZeroDivisionError object>
    __str__ = <method-wrapper '__str__' of ZeroDivisionError object>
    __subclasshook__ = <built-in method __subclasshook__ of type object>
    __suppress_context__ = False
    __traceback__ = <traceback object>
    args = ('division by zero',)
    with_traceback = <built-in method with_traceback of ZeroDivisionError object>

The above is a description of an error in a Python program.  Here is
the original traceback:

Traceback (most recent call last):
  File "cgittest2.py", line 11, in <module>
    func1(1, 5)
  File "cgittest2.py", line 9, in func1
    return func2(a, c)
  File "cgittest2.py", line 5, in func2
    return a / divisor
ZeroDivisionError: division by zero

回答 6

如果您希望在不处理异常的情况下获得相同的信息,则可以执行以下操作。这样做import traceback,然后:

try:
    ...
except Exception as e:
    print(traceback.print_tb(e.__traceback__))

我正在使用Python 3.7。

If you would like to get the same information given when an exception isn’t handled you can do something like this. Do import traceback and then:

try:
    ...
except Exception as e:
    print(traceback.print_tb(e.__traceback__))

I’m using Python 3.7.


回答 7

对于Python 3.5+

因此,您可以从您的异常以及其他任何异常中获取stacktrace。使用traceback.TracebackException它(只需更换ex您的除外):

print("".join(traceback.TracebackException.from_exception(ex).format())

一个扩展的示例和其他功能可以做到这一点:

import traceback

try:
    1/0
except Exception as ex:
    print("".join(traceback.TracebackException.from_exception(ex).format()) == traceback.format_exc() == "".join(traceback.format_exception(type(ex), ex, ex.__traceback__))) # This is True !!
    print("".join(traceback.TracebackException.from_exception(ex).format()))

输出将是这样的:

True
Traceback (most recent call last):
  File "untidsfsdfsdftled.py", line 29, in <module>
    1/0
ZeroDivisionError: division by zero

For Python 3.5+:

So, you can get the stacktrace from your exception as from any other exception. Use traceback.TracebackException for it (just replace ex with your exception):

print("".join(traceback.TracebackException.from_exception(ex).format())

An extended example and other features to do this:

import traceback

try:
    1/0
except Exception as ex:
    print("".join(traceback.TracebackException.from_exception(ex).format()) == traceback.format_exc() == "".join(traceback.format_exception(type(ex), ex, ex.__traceback__))) # This is True !!
    print("".join(traceback.TracebackException.from_exception(ex).format()))

The output will be something like this:

True
Traceback (most recent call last):
  File "untidsfsdfsdftled.py", line 29, in <module>
    1/0
ZeroDivisionError: division by zero

回答 8

我的2美分:

import sys, traceback
try: 
  ...
except Exception, e:
  T, V, TB = sys.exc_info()
  print ''.join(traceback.format_exception(T,V,TB))

my 2-cents:

import sys, traceback
try: 
  ...
except Exception, e:
  T, V, TB = sys.exc_info()
  print ''.join(traceback.format_exception(T,V,TB))

回答 9

如果您的目标是使异常和stacktrace消息看起来完全像python引发错误时一样,则以下内容在python 2 + 3中均适用:

import sys, traceback


def format_stacktrace():
    parts = ["Traceback (most recent call last):\n"]
    parts.extend(traceback.format_stack(limit=25)[:-2])
    parts.extend(traceback.format_exception(*sys.exc_info())[1:])
    return "".join(parts)

# EXAMPLE BELOW...

def a():
    b()


def b():
    c()


def c():
    d()


def d():
    assert False, "Noooh don't do it."


print("THIS IS THE FORMATTED STRING")
print("============================\n")

try:
    a()
except:
    stacktrace = format_stacktrace()
    print(stacktrace)

print("THIS IS HOW PYTHON DOES IT")
print("==========================\n")
a()

它通过format_stacktrace()从堆栈中删除最后一个调用并加入其余调用来工作。运行时,以上示例给出以下输出:

THIS IS THE FORMATTED STRING
============================

Traceback (most recent call last):
  File "test.py", line 31, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

THIS IS HOW PYTHON DOES IT
==========================

Traceback (most recent call last):
  File "test.py", line 38, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

If your goal is to make the exception and stacktrace message look exactly like when python throws an error, the following works in both python 2+3:

import sys, traceback


def format_stacktrace():
    parts = ["Traceback (most recent call last):\n"]
    parts.extend(traceback.format_stack(limit=25)[:-2])
    parts.extend(traceback.format_exception(*sys.exc_info())[1:])
    return "".join(parts)

# EXAMPLE BELOW...

def a():
    b()


def b():
    c()


def c():
    d()


def d():
    assert False, "Noooh don't do it."


print("THIS IS THE FORMATTED STRING")
print("============================\n")

try:
    a()
except:
    stacktrace = format_stacktrace()
    print(stacktrace)

print("THIS IS HOW PYTHON DOES IT")
print("==========================\n")
a()

It works by removing the last format_stacktrace() call from the stack and joining the rest. When run, the example above gives the following output:

THIS IS THE FORMATTED STRING
============================

Traceback (most recent call last):
  File "test.py", line 31, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

THIS IS HOW PYTHON DOES IT
==========================

Traceback (most recent call last):
  File "test.py", line 38, in <module>
    a()
  File "test.py", line 12, in a
    b()
  File "test.py", line 16, in b
    c()
  File "test.py", line 20, in c
    d()
  File "test.py", line 24, in d
    assert False, "Noooh don't do it."
AssertionError: Noooh don't do it.

回答 10

我定义了以下帮助程序类:

import traceback
class TracedExeptions(object):
    def __init__(self):
        pass
    def __enter__(self):
        pass

    def __exit__(self, etype, value, tb):
      if value :
        if not hasattr(value, 'traceString'):
          value.traceString = "\n".join(traceback.format_exception(etype, value, tb))
        return False
      return True

我以后可以这样使用:

with TracedExeptions():
  #some-code-which-might-throw-any-exception

以后可以像这样消费它:

def log_err(ex):
  if hasattr(ex, 'traceString'):
    print("ERROR:{}".format(ex.traceString));
  else:
    print("ERROR:{}".format(ex));

(背景:我很沮丧,因为将Promises与s一起使用Exception,不幸的是将一个地方引发的异常传递给另一个地方的on_rejected处理程序,因此很难从原始位置进行回溯)

I defined following helper class:

import traceback
class TracedExeptions(object):
    def __init__(self):
        pass
    def __enter__(self):
        pass

    def __exit__(self, etype, value, tb):
      if value :
        if not hasattr(value, 'traceString'):
          value.traceString = "\n".join(traceback.format_exception(etype, value, tb))
        return False
      return True

Which I can later use like this:

with TracedExeptions():
  #some-code-which-might-throw-any-exception

And later can consume it like this:

def log_err(ex):
  if hasattr(ex, 'traceString'):
    print("ERROR:{}".format(ex.traceString));
  else:
    print("ERROR:{}".format(ex));

(Background: I was frustraded because of using Promises together with Exceptions, which unfortunately passes exceptions raised in one place to a on_rejected handler in another place, and thus it is difficult to get the traceback from original location)


SQLAlchemy:flush()和commit()有什么区别?

问题:SQLAlchemy:flush()和commit()有什么区别?

flush()commit()SQLAlchemy 之间有什么区别?

我已经阅读了文档,但没有一个更明智-他们似乎假设了我没有的预见性。

我对它们对内存使用量的影响特别感兴趣。我正在从一系列文件(总共约500万行)中将一些数据加载到数据库中,而我的会话有时会崩溃-这是一个大型数据库,并且一台内存不足的机器。

我想知道我使用的电话太多commit()还是不够flush()-但是如果不真正了解两者之间的区别,很难分辨!

What the difference is between flush() and commit() in SQLAlchemy?

I’ve read the docs, but am none the wiser – they seem to assume a pre-understanding that I don’t have.

I’m particularly interested in their impact on memory usage. I’m loading some data into a database from a series of files (around 5 million rows in total) and my session is occasionally falling over – it’s a large database and a machine with not much memory.

I’m wondering if I’m using too many commit() and not enough flush() calls – but without really understanding what the difference is, it’s hard to tell!


回答 0

会话对象基本上是数据库更改(更新,插入,删除)的正在进行的事务。这些操作在提交之前不会持久化到数据库中(如果您的程序在会话中事务中由于某种原因中止,则其中所有未提交的更改都将丢失)。

会话对象向注册了事务操作session.add(),但是直到session.flush()调用它之前,它们都不会将它们传递给数据库。

session.flush()将一系列操作传达给数据库(插入,更新,删除)。数据库将它们维护为事务中的挂起操作。直到数据库收到当前事务的COMMIT为止,session.commit()所做的更改才会永久保留在磁盘上,或对其他事务可见。

session.commit() 将这些更改提交(持久)到数据库。

flush()始终称为一个呼叫的一部分commit()1)。

当您使用Session对象查询数据库时,查询将同时从数据库及其所保存的未提交事务的已刷新部分返回结果。默认情况下,Session反对autoflush其操作,但是可以禁用它。

希望这个例子可以使这个更加清楚:

#---
s = Session()

s.add(Foo('A')) # The Foo('A') object has been added to the session.
                # It has not been committed to the database yet,
                #   but is returned as part of a query.
print 1, s.query(Foo).all()
s.commit()

#---
s2 = Session()
s2.autoflush = False

s2.add(Foo('B'))
print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned
                             #   as part of this query because it hasn't
                             #   been flushed yet.
s2.flush()                   # Now, Foo('B') is in the same state as
                             #   Foo('A') was above.
print 3, s2.query(Foo).all() 
s2.rollback()                # Foo('B') has not been committed, and rolling
                             #   back the session's transaction removes it
                             #   from the session.
print 4, s2.query(Foo).all()

#---
Output:
1 [<Foo('A')>]
2 [<Foo('A')>]
3 [<Foo('A')>, <Foo('B')>]
4 [<Foo('A')>]

A Session object is basically an ongoing transaction of changes to a database (update, insert, delete). These operations aren’t persisted to the database until they are committed (if your program aborts for some reason in mid-session transaction, any uncommitted changes within are lost).

The session object registers transaction operations with session.add(), but doesn’t yet communicate them to the database until session.flush() is called.

session.flush() communicates a series of operations to the database (insert, update, delete). The database maintains them as pending operations in a transaction. The changes aren’t persisted permanently to disk, or visible to other transactions until the database receives a COMMIT for the current transaction (which is what session.commit() does).

session.commit() commits (persists) those changes to the database.

flush() is always called as part of a call to commit() (1).

When you use a Session object to query the database, the query will return results both from the database and from the flushed parts of the uncommitted transaction it holds. By default, Session objects autoflush their operations, but this can be disabled.

Hopefully this example will make this clearer:

#---
s = Session()

s.add(Foo('A')) # The Foo('A') object has been added to the session.
                # It has not been committed to the database yet,
                #   but is returned as part of a query.
print 1, s.query(Foo).all()
s.commit()

#---
s2 = Session()
s2.autoflush = False

s2.add(Foo('B'))
print 2, s2.query(Foo).all() # The Foo('B') object is *not* returned
                             #   as part of this query because it hasn't
                             #   been flushed yet.
s2.flush()                   # Now, Foo('B') is in the same state as
                             #   Foo('A') was above.
print 3, s2.query(Foo).all() 
s2.rollback()                # Foo('B') has not been committed, and rolling
                             #   back the session's transaction removes it
                             #   from the session.
print 4, s2.query(Foo).all()

#---
Output:
1 [<Foo('A')>]
2 [<Foo('A')>]
3 [<Foo('A')>, <Foo('B')>]
4 [<Foo('A')>]

回答 1

正如@snapshoe所说

flush() 将您的SQL语句发送到数据库

commit() 提交交易。

时间session.autocommit == False

commit()flush()如果您设置,会打电话autoflush == True

时间session.autocommit == True

commit()如果您尚未开始交易,则无法调用(您可能没有,因为您可能仅使用此模式来避免手动管理交易)。

在这种模式下,您必须调用flush()以保存您的ORM更改。刷新还会有效地提交您的数据。

As @snapshoe says

flush() sends your SQL statements to the database

commit() commits the transaction.

When session.autocommit == False:

commit() will call flush() if you set autoflush == True.

When session.autocommit == True:

You can’t call commit() if you haven’t started a transaction (which you probably haven’t since you would probably only use this mode to avoid manually managing transactions).

In this mode, you must call flush() to save your ORM changes. The flush effectively also commits your data.


回答 2

如果可以提交,为什么还要刷新?

作为刚接触数据库和sqlalchemy的新手,以前的答案- flush()将SQL语句发送到数据库并commit()保留它们-对我来说还不清楚。这些定义很有意义,但是从定义中并不清楚为什么您要使用冲洗而不是仅仅提交。

由于提交始终会刷新(https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing),因此听起来确实很相似。我认为要强调的大问题是,刷新不是永久的并且可以撤消,而提交是永久的,在某种意义上,您不能要求数据库撤消上一次提交(我认为)

@snapshoe突出显示,如果要查询数据库并获取包含新添加对象的结果,则需要先刷新(或提交,然后将为您刷新)。也许这对某些人有用,尽管我不确定为什么您要刷新而不是提交(除了可以撤消的琐碎回答之外)。

在另一个示例中,我正在本地数据库和远程服务器之间同步文档,并且如果用户决定取消,则应撤消所有添加/更新/删除操作(即,不进行部分同步,仅进行完全同步)。更新单个文档时,我决定只删除旧行,然后从远程服务器添加更新的版本。事实证明,由于sqlalchemy的编写方式,不能保证提交时的操作顺序。这导致添加了一个重复版本(在尝试删除旧版本之前),这导致数据库无法通过唯一约束。为了解决这个问题,我使用flush()了顺序,但是如果以后同步过程失败,我仍然可以撤消操作。

在以下位置查看我的帖子:在sqlalchemy中提交时,添加和删除是否有任何顺序

同样,有人想知道提交时是否保持添加顺序,即如果我先添加object1然后添加object2是否在将对象添加到会话时将SQLAlchemy保存object1到数据库之前object2

同样,这里假定使用flush()将确保所需的行为。因此,总而言之,刷新的一种用途是提供顺序保证(我认为),同时又允许自己使用提交不提供的“撤消”选项。

自动刷新和自动提交

注意,自动刷新可用于确保查询对更新的数据库起作用,因为sqlalchemy将在执行查询之前刷新。https://docs.sqlalchemy.org/zh/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush

自动提交是我尚不完全了解的其他东西,但听起来不鼓励使用它:https : //docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params。自动提交

内存使用情况

现在,最初的问题实际上是想了解刷新与提交对内存的影响。由于持久性或不持久性是数据库提供的(我认为),因此仅刷新就足以卸载数据库-尽管如果您不关心撤消操作,提交也不会受到损害(实际上可能会有所帮助-见下文)。 。

sqlalchemy对已刷新的对象使用弱引用:https : //docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior

这意味着,如果您没有将对象明确保留在某个地方(例如列表或dict中),则sqlalchemy不会将其保留在内存中。

但是,这时您需要担心数据库方面的问题。可能在未提交的情况下进行刷新会带来一些内存损失,以维护事务。同样,我对此并不陌生,但是这里的链接似乎恰恰表明了这一点:https : //stackoverflow.com/a/15305650/764365

换句话说,尽管应该在内存和性能之间进行权衡,但是提交应该减少内存的使用。换句话说,您可能不想一次提交每一个数据库更改(出于性能原因),但是等待太长时间会增加内存使用率。

Why flush if you can commit?

As someone new to working with databases and sqlalchemy, the previous answers – that flush() sends SQL statements to the DB and commit() persists them – were not clear to me. The definitions make sense but it isn’t immediately clear from the definitions why you would use a flush instead of just committing.

Since a commit always flushes (https://docs.sqlalchemy.org/en/13/orm/session_basics.html#committing) these sound really similar. I think the big issue to highlight is that a flush is not permanent and can be undone, whereas a commit is permanent, in the sense that you can’t ask the database to undo the last commit (I think)

@snapshoe highlights that if you want to query the database and get results that include newly added objects, you need to have flushed first (or committed, which will flush for you). Perhaps this is useful for some people although I’m not sure why you would want to flush rather than commit (other than the trivial answer that it can be undone).

In another example I was syncing documents between a local DB and a remote server, and if the user decided to cancel, all adds/updates/deletes should be undone (i.e. no partial sync, only a full sync). When updating a single document I’ve decided to simply delete the old row and add the updated version from the remote server. It turns out that due to the way sqlalchemy is written, order of operations when committing is not guaranteed. This resulted in adding a duplicate version (before attempting to delete the old one), which resulted in the DB failing a unique constraint. To get around this I used flush() so that order was maintained, but I could still undo if later the sync process failed.

See my post on this at: Is there any order for add versus delete when committing in sqlalchemy

Similarly, someone wanted to know whether add order is maintained when committing, i.e. if I add object1 then add object2, does object1 get added to the database before object2 Does SQLAlchemy save order when adding objects to session?

Again, here presumably the use of a flush() would ensure the desired behavior. So in summary, one use for flush is to provide order guarantees (I think), again while still allowing yourself an “undo” option that commit does not provide.

Autoflush and Autocommit

Note, autoflush can be used to ensure queries act on an updated database as sqlalchemy will flush before executing the query. https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autoflush

Autocommit is something else that I don’t completely understand but it sounds like its use is discouraged: https://docs.sqlalchemy.org/en/13/orm/session_api.html#sqlalchemy.orm.session.Session.params.autocommit

Memory Usage

Now the original question actually wanted to know about the impact of flush vs. commit for memory purposes. As the ability to persist or not is something the database offers (I think), simply flushing should be sufficient to offload to the database – although committing shouldn’t hurt (actually probably helps – see below) if you don’t care about undoing.

sqlalchemy uses weak referencing for objects that have been flushed: https://docs.sqlalchemy.org/en/13/orm/session_state_management.html#session-referencing-behavior

This means if you don’t have an object explicitly held onto somewhere, like in a list or dict, sqlalchemy won’t keep it in memory.

However, then you have the database side of things to worry about. Presumably flushing without committing comes with some memory penalty to maintain the transaction. Again, I’m new to this but here’s a link that seems to suggest exactly this: https://stackoverflow.com/a/15305650/764365

In other words, commits should reduce memory usage, although presumably there is a trade-off between memory and performance here. In other words, you probably don’t want to commit every single database change, one at a time (for performance reasons), but waiting too long will increase memory usage.


回答 3

这并不能严格回答最初的问题,但是有些人提到session.autoflush = True不必与您一起使用session.flush()……而且并非总是如此。

如果要在事务中间使用新创建的对象的ID,则必须调用session.flush()

# Given a model with at least this id
class AModel(Base):
   id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key

session.autoflush = True

a = AModel()
session.add(a)
a.id  # None
session.flush()
a.id  # autoincremented integer

这是因为autoflush自动填写ID(尽管对象的查询将,这有时会导致混乱,如“为什么这个作品在这里,但不是吗?”但是,snapshoe已涵盖这一部分)。


一个相关方面对我来说似乎很重要,但并未真正提及:

为什么不一直承诺呢?-答案是原子性

可以这么说:所有操作必须全部成功执行,否则任何一个都不会生效。

例如,如果要创建/更新/删除某个对象(A),然后创建/更新/删除另一个对象(B),但是如果(B)失败,则要还原(A)。这意味着这两个操作是原子操作。

因此,如果(B)需要(A)的结果,则要flush在(A)commit之后和(B)之后调用。

另外,如果是session.autoflush is True,除了我上面提到的情况或Jimbo的回答中的其他情况外,您将不需要flush手动调用。

This does not strictly answer the original question but some people have mentioned that with session.autoflush = True you don’t have to use session.flush()… And this is not always true.

If you want to use the id of a newly created object in the middle of a transaction, you must call session.flush().

# Given a model with at least this id
class AModel(Base):
   id = Column(Integer, primary_key=True)  # autoincrement by default on integer primary key

session.autoflush = True

a = AModel()
session.add(a)
a.id  # None
session.flush()
a.id  # autoincremented integer

This is because autoflush does NOT auto fill the id (although a query of the object will, which sometimes can cause confusion as in “why this works here but not there?” But snapshoe already covered this part).


One related aspect that seems pretty important to me and wasn’t really mentioned:

Why would you not commit all the time? – The answer is atomicity.

A fancy word to say: an ensemble of operations have to all be executed successfully OR none of them will take effect.

For example, if you want to create/update/delete some object (A) and then create/update/delete another (B), but if (B) fails you want to revert (A). This means those 2 operations are atomic.

Therefore, if (B) needs a result of (A), you want to call flush after (A) and commit after (B).

Also, if session.autoflush is True, except for the case that I mentioned above or others in Jimbo‘s answer, you will not need to call flush manually.