如何检查以下所有项目是否都在列表中?

问题:如何检查以下所有项目是否都在列表中?

我发现存在一个有关的问题,即如何查找列表中是否至少有一项:如何检查列表中
是否有以下一项?

但是,找到列表中是否存在所有项的最佳方式是什么?

搜索文档后,我发现此解决方案:

>>> l = ['a', 'b', 'c']
>>> set(['a', 'b']) <= set(l)
True
>>> set(['a', 'x']) <= set(l)
False

其他解决方案是这样的:

>>> l = ['a', 'b', 'c']
>>> all(x in l for x in ['a', 'b'])
True
>>> all(x in l for x in ['a', 'x'])
False

但是在这里您必须进行更多的键入。

还有其他解决方案吗?

I found, that there is related question, about how to find if at least one item exists in a list:
How to check if one of the following items is in a list?

But what is the best and pythonic way to find whether all items exists in a list?

Searching through the docs I found this solution:

>>> l = ['a', 'b', 'c']
>>> set(['a', 'b']) <= set(l)
True
>>> set(['a', 'x']) <= set(l)
False

Other solution would be this:

>>> l = ['a', 'b', 'c']
>>> all(x in l for x in ['a', 'b'])
True
>>> all(x in l for x in ['a', 'x'])
False

But here you must do more typing.

Is there any other solutions?


回答 0

<=Python 这样的运算符通常不会被覆盖以表示与“小于或等于”明显不同的东西。对于标准库来说,这样做是不寻常的-对我来说,它听起来像是旧版API。

使用等效的且更明确命名的方法set.issubset。注意,您不需要将参数转换为集合;如有需要,它将为您完成。

set(['a', 'b']).issubset(['a', 'b', 'c'])

Operators like <= in Python are generally not overriden to mean something significantly different than “less than or equal to”. It’s unusual for the standard library does this–it smells like legacy API to me.

Use the equivalent and more clearly-named method, set.issubset. Note that you don’t need to convert the argument to a set; it’ll do that for you if needed.

set(['a', 'b']).issubset(['a', 'b', 'c'])

回答 1

我可能会set以以下方式使用:

set(l).issuperset(set(['a','b'])) 

或反过来:

set(['a','b']).issubset(set(l)) 

我觉得它更具可读性,但可能会过分杀人。集合对于计算集合之间的并集/交叉点/差异特别有用,但是在这种情况下,它可能不是最佳选择。

I would probably use set in the following manner :

set(l).issuperset(set(['a','b'])) 

or the other way round :

set(['a','b']).issubset(set(l)) 

I find it a bit more readable, but it may be over-kill. Sets are particularly useful to compute union/intersection/differences between collections, but it may not be the best option in this situation …


回答 2

我喜欢这两个,因为它们看起来最合乎逻辑,后者更短并且可能最快(此处使用set文字语法显示,该文字语法已 反向移植到Python 2.7):

all(x in {'a', 'b', 'c'} for x in ['a', 'b'])
#   or
{'a', 'b'}.issubset({'a', 'b', 'c'})

I like these two because they seem the most logical, the latter being shorter and probably fastest (shown here using set literal syntax which has been backported to Python 2.7):

all(x in {'a', 'b', 'c'} for x in ['a', 'b'])
#   or
{'a', 'b'}.issubset({'a', 'b', 'c'})

回答 3

如果您的列表包含这样的重复项,该怎么办:

v1 = ['s', 'h', 'e', 'e', 'p']
v2 = ['s', 's', 'h']

集不包含重复项。因此,以下行返回True。

set(v2).issubset(v1)

要计算重复项,可以使用以下代码:

v1 = sorted(v1)
v2 = sorted(v2)


def is_subseq(v2, v1):
    """Check whether v2 is a subsequence of v1."""
    it = iter(v1)
    return all(c in it for c in v2) 

因此,以下行返回False。

is_subseq(v2, v1)

What if your lists contain duplicates like this:

v1 = ['s', 'h', 'e', 'e', 'p']
v2 = ['s', 's', 'h']

Sets do not contain duplicates. So, the following line returns True.

set(v2).issubset(v1)

To count for duplicates, you can use the code:

v1 = sorted(v1)
v2 = sorted(v2)


def is_subseq(v2, v1):
    """Check whether v2 is a subsequence of v1."""
    it = iter(v1)
    return all(c in it for c in v2) 

So, the following line returns False.

is_subseq(v2, v1)

回答 4

这就是我在网上搜索的内容,但不幸的是,我在python解释器上进行实验时发现不是在线的。

>>> case  = "caseCamel"
>>> label = "Case Camel"
>>> list  = ["apple", "banana"]
>>>
>>> (case or label) in list
False
>>> list = ["apple", "caseCamel"]
>>> (case or label) in list
True
>>> (case and label) in list
False
>>> list = ["case", "caseCamel", "Case Camel"]
>>> (case and label) in list
True
>>>

如果您有一个完整的变量列表 sublist variable

>>>
>>> list  = ["case", "caseCamel", "Case Camel"]
>>> label = "Case Camel"
>>> case  = "caseCamel"
>>>
>>> sublist = ["unique banana", "very unique banana"]
>>>
>>> # example for if any (at least one) item contained in superset (or statement)
...
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
False
>>>
>>> sublist[0] = label
>>>
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
True
>>>
>>> # example for whether a subset (all items) contained in superset (and statement)
...
>>> # a bit of demorgan's law
...
>>> next((False for item in sublist if item not in list), True)
False
>>>
>>> sublist[1] = case
>>>
>>> next((False for item in sublist if item not in list), True)
True
>>>
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
True
>>>
>>>

This was what I was searching online but unfortunately found not online but while experimenting on python interpreter.

>>> case  = "caseCamel"
>>> label = "Case Camel"
>>> list  = ["apple", "banana"]
>>>
>>> (case or label) in list
False
>>> list = ["apple", "caseCamel"]
>>> (case or label) in list
True
>>> (case and label) in list
False
>>> list = ["case", "caseCamel", "Case Camel"]
>>> (case and label) in list
True
>>>

and if you have a looong list of variables held in a sublist variable

>>>
>>> list  = ["case", "caseCamel", "Case Camel"]
>>> label = "Case Camel"
>>> case  = "caseCamel"
>>>
>>> sublist = ["unique banana", "very unique banana"]
>>>
>>> # example for if any (at least one) item contained in superset (or statement)
...
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
False
>>>
>>> sublist[0] = label
>>>
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
True
>>>
>>> # example for whether a subset (all items) contained in superset (and statement)
...
>>> # a bit of demorgan's law
...
>>> next((False for item in sublist if item not in list), True)
False
>>>
>>> sublist[1] = case
>>>
>>> next((False for item in sublist if item not in list), True)
True
>>>
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
True
>>>
>>>

回答 5

如何使用lambda表达式执行此操作的示例如下:

issublist = lambda x, y: 0 in [_ in x for _ in y]

An example of how to do this using a lambda expression would be:

issublist = lambda x, y: 0 in [_ in x for _ in y]

回答 6

不是OP的情况,但是-对于任何想要在字典中声明交集 并由于不良谷歌搜索而最终到这里的人(例如我)-您需要与dict.items

>>> a = {'key': 'value'}
>>> b = {'key': 'value', 'extra_key': 'extra_value'}
>>> all(item in a.items() for item in b.items())
True
>>> all(item in b.items() for item in a.items())
False

那是因为dict.items返回键/值对的元组,就像Python中的任何对象一样,它们可以互换地比较

Not OP’s case, but – for anyone who wants to assert intersection in dicts and ended up here due to poor googling (e.g. me) – you need to work with dict.items:

>>> a = {'key': 'value'}
>>> b = {'key': 'value', 'extra_key': 'extra_value'}
>>> all(item in a.items() for item in b.items())
True
>>> all(item in b.items() for item in a.items())
False

That’s because dict.items returns tuples of key/value pairs, and much like any object in Python, they’re interchangeably comparable


更改源代码时自动加载gunicorn

问题:更改源代码时自动加载gunicorn

最后,我将开发环境从runserver迁移到gunicorn / nginx。

将runserver的自动重载功能复制到gunicorn会很方便,因此当源更改时,服务器会自动重新启动。否则,我必须使用手动重新启动服务器kill -HUP

有什么方法可以避免手动重启?

Finally I migrated my development env from runserver to gunicorn/nginx.

It’d be convenient to replicate the autoreload feature of runserver to gunicorn, so the server automatically restarts when source changes. Otherwise I have to restart the server manually with kill -HUP.

Any way to avoid the manual restart?


回答 0

尽管这是个老问题,但仅出于一致性考虑-因为19.0版本的gunicorn可以--reload选择。因此,不再需要第三方工具。

While this is old question you need to know that ever since version 19.0 gunicorn has had the --reload option. So now no third party tools are needed.


回答 1

一种选择是使用–max-requests通过添加--max-requests 1到启动选项来将每个生成的进程限制为仅服务一个请求。每个新产生的进程都应该看到您的代码更改,并且在开发环境中,每个请求的额外启动时间可以忽略不计。

One option would be to use the –max-requests to limit each spawned process to serving only one request by adding --max-requests 1 to the startup options. Every newly spawned process should see your code changes and in a development environment the extra startup time per request should be negligible.


回答 2

Bryan Helmig提出了这个建议,我对其进行了修改,以使其直接使用run_gunicorn而不是gunicorn直接启动,以便可以将这3个命令剪切并粘贴到django项目根文件夹中的shell中(激活了virtualenv):

pip install watchdog -U
watchmedo shell-command --patterns="*.py;*.html;*.css;*.js" --recursive --command='echo "${watch_src_path}" && kill -HUP `cat gunicorn.pid`' . &
python manage.py run_gunicorn 127.0.0.1:80 --pid=gunicorn.pid

Bryan Helmig came up with this and I modified it to use run_gunicorn instead of launching gunicorn directly, to make it possible to just cut and paste these 3 commands into a shell in your django project root folder (with your virtualenv activated):

pip install watchdog -U
watchmedo shell-command --patterns="*.py;*.html;*.css;*.js" --recursive --command='echo "${watch_src_path}" && kill -HUP `cat gunicorn.pid`' . &
python manage.py run_gunicorn 127.0.0.1:80 --pid=gunicorn.pid

回答 3

我使用git push部署到生产环境,并设置git挂钩来运行脚本。这种方法的优点是您还可以同时进行迁移和软件包安装。 https://mikeeverhart.net/2013/01/using-git-to-deploy-code/

mkdir -p /home/git/project_name.git
cd /home/git/project_name.git
git init --bare

然后创建一个脚本/home/git/project_name.git/hooks/post-receive

#!/bin/bash
GIT_WORK_TREE=/path/to/project git checkout -f
source /path/to/virtualenv/activate
pip install -r /path/to/project/requirements.txt
python /path/to/project/manage.py migrate
sudo supervisorctl restart project_name

确保为chmod u+x post-receive,并将用户添加到sudoers。允许它sudo supervisorctl不带密码运行。 https://www.cyberciti.biz/faq/linux-unix-running-sudo-command-without-a-password/

在本地/开发服务器上,我进行了设置,git remote从而可以推送到生产服务器

git remote add production ssh://user_name@production-server/home/git/project_name.git

# initial push
git push production +master:refs/heads/master

# subsequent push
git push production master

另外,在脚本运行时,您将看到所有提示。因此,您将看到迁移/软件包安装/主管重启是否存在任何问题。

I use git push to deploy to production and set up git hooks to run a script. The advantage of this approach is you can also do your migration and package installation at the same time. https://mikeeverhart.net/2013/01/using-git-to-deploy-code/

mkdir -p /home/git/project_name.git
cd /home/git/project_name.git
git init --bare

Then create a script /home/git/project_name.git/hooks/post-receive.

#!/bin/bash
GIT_WORK_TREE=/path/to/project git checkout -f
source /path/to/virtualenv/activate
pip install -r /path/to/project/requirements.txt
python /path/to/project/manage.py migrate
sudo supervisorctl restart project_name

Make sure to chmod u+x post-receive, and add user to sudoers. Allow it to run sudo supervisorctl without password. https://www.cyberciti.biz/faq/linux-unix-running-sudo-command-without-a-password/

From my local / development server, I set up git remote that allows me to push to the production server

git remote add production ssh://user_name@production-server/home/git/project_name.git

# initial push
git push production +master:refs/heads/master

# subsequent push
git push production master

As a bonus, you will get to see all the prompts as the script is running. So you will see if there is any issue with the migration/package installation/supervisor restart.


查找列表元素之间的差异

问题:查找列表元素之间的差异

给定一个数字列表,人们如何发现i第()个元素与其第()个元素之间的差异i+1

使用lambda表达式还是列表理解更好?

例如:

给定一个列表t=[1,3,6,...],我们的目标是要找到一个列表v=[2,3,...],因为3-1=26-3=3等等。

Given a list of numbers, how does one find differences between every (i)-th elements and its (i+1)-th?

Is it better to use a lambda expression or maybe a list comprehension?

For example:

Given a list t=[1,3,6,...], the goal is to find a list v=[2,3,...] because 3-1=2, 6-3=3, etc.


回答 0

>>> t
[1, 3, 6]
>>> [j-i for i, j in zip(t[:-1], t[1:])]  # or use itertools.izip in py2k
[2, 3]
>>> t
[1, 3, 6]
>>> [j-i for i, j in zip(t[:-1], t[1:])]  # or use itertools.izip in py2k
[2, 3]

回答 1

其他答案是正确的,但是如果您要进行数值运算,则可能需要考虑使用numpy。使用numpy,答案是:

v = numpy.diff(t)

The other answers are correct but if you’re doing numerical work, you might want to consider numpy. Using numpy, the answer is:

v = numpy.diff(t)

回答 2

如果您不想使用numpynor zip,则可以使用以下解决方案:

>>> t = [1, 3, 6]
>>> v = [t[i+1]-t[i] for i in range(len(t)-1)]
>>> v
[2, 3]

If you don’t want to use numpy nor zip, you can use the following solution:

>>> t = [1, 3, 6]
>>> v = [t[i+1]-t[i] for i in range(len(t)-1)]
>>> v
[2, 3]

回答 3

您可以使用itertools.teezip有效地构建结果:

from itertools import tee
# python2 only:
#from itertools import izip as zip

def differences(seq):
    iterable, copied = tee(seq)
    next(copied)
    for x, y in zip(iterable, copied):
        yield y - x

itertools.islice改为使用:

from itertools import islice

def differences(seq):
    nexts = islice(seq, 1, None)
    for x, y in zip(seq, nexts):
        yield y - x

您也可以避免使用itertools模块:

def differences(seq):
    iterable = iter(seq)
    prev = next(iterable)
    for element in iterable:
        yield element - prev
        prev = element

如果您不需要存储所有结果并支持无限的可迭代对象,那么所有这些解决方案都可以在恒定的空间中工作。


以下是解决方案的一些微观基准:

In [12]: L = range(10**6)

In [13]: from collections import deque
In [15]: %timeit deque(differences_tee(L), maxlen=0)
10 loops, best of 3: 122 ms per loop

In [16]: %timeit deque(differences_islice(L), maxlen=0)
10 loops, best of 3: 127 ms per loop

In [17]: %timeit deque(differences_no_it(L), maxlen=0)
10 loops, best of 3: 89.9 ms per loop

以及其他建议的解决方案:

In [18]: %timeit [x[1] - x[0] for x in zip(L[1:], L)]
10 loops, best of 3: 163 ms per loop

In [19]: %timeit [L[i+1]-L[i] for i in range(len(L)-1)]
1 loops, best of 3: 395 ms per loop

In [20]: import numpy as np

In [21]: %timeit np.diff(L)
1 loops, best of 3: 479 ms per loop

In [35]: %%timeit
    ...: res = []
    ...: for i in range(len(L) - 1):
    ...:     res.append(L[i+1] - L[i])
    ...: 
1 loops, best of 3: 234 ms per loop

注意:

  • zip(L[1:], L)等价于,zip(L[1:], L[:-1])因为zip已经终止于最短的输入,但是它避免了的整个副本L
  • 通过索引访问单个元素非常慢,因为每次索引访问都是python中的方法调用
  • numpy.diff之所以很慢是因为它必须首先将转换listndarray。显然,如果你开始ndarray这将是快:

    In [22]: arr = np.array(L)
    
    In [23]: %timeit np.diff(arr)
    100 loops, best of 3: 3.02 ms per loop

You can use itertools.tee and zip to efficiently build the result:

from itertools import tee
# python2 only:
#from itertools import izip as zip

def differences(seq):
    iterable, copied = tee(seq)
    next(copied)
    for x, y in zip(iterable, copied):
        yield y - x

Or using itertools.islice instead:

from itertools import islice

def differences(seq):
    nexts = islice(seq, 1, None)
    for x, y in zip(seq, nexts):
        yield y - x

You can also avoid using the itertools module:

def differences(seq):
    iterable = iter(seq)
    prev = next(iterable)
    for element in iterable:
        yield element - prev
        prev = element

All these solution work in constant space if you don’t need to store all the results and support infinite iterables.


Here are some micro-benchmarks of the solutions:

In [12]: L = range(10**6)

In [13]: from collections import deque
In [15]: %timeit deque(differences_tee(L), maxlen=0)
10 loops, best of 3: 122 ms per loop

In [16]: %timeit deque(differences_islice(L), maxlen=0)
10 loops, best of 3: 127 ms per loop

In [17]: %timeit deque(differences_no_it(L), maxlen=0)
10 loops, best of 3: 89.9 ms per loop

And the other proposed solutions:

In [18]: %timeit [x[1] - x[0] for x in zip(L[1:], L)]
10 loops, best of 3: 163 ms per loop

In [19]: %timeit [L[i+1]-L[i] for i in range(len(L)-1)]
1 loops, best of 3: 395 ms per loop

In [20]: import numpy as np

In [21]: %timeit np.diff(L)
1 loops, best of 3: 479 ms per loop

In [35]: %%timeit
    ...: res = []
    ...: for i in range(len(L) - 1):
    ...:     res.append(L[i+1] - L[i])
    ...: 
1 loops, best of 3: 234 ms per loop

Note that:

  • zip(L[1:], L) is equivalent to zip(L[1:], L[:-1]) since zip already terminates on the shortest input, however it avoids a whole copy of L.
  • Accessing the single elements by index is very slow because every index access is a method call in python
  • numpy.diff is slow because it has to first convert the list to a ndarray. Obviously if you start with an ndarray it will be much faster:

    In [22]: arr = np.array(L)
    
    In [23]: %timeit np.diff(arr)
    100 loops, best of 3: 3.02 ms per loop
    

回答 4

使用:=Python 3.8+中可用的walrus运算符:

>>> t = [1, 3, 6]
>>> prev = t[0]; [-prev + (prev := x) for x in t[1:]]
[2, 3]

Using the := walrus operator available in Python 3.8+:

>>> t = [1, 3, 6]
>>> prev = t[0]; [-prev + (prev := x) for x in t[1:]]
[2, 3]

回答 5

我建议使用

v = np.diff(t)

这是简单易读的。

但如果你想v有相同的长度,t然后

v = np.diff([t[0]] + t) # for python 3.x

要么

v = np.diff(t + [t[-1]])

仅供参考:这仅适用于列表。

用于numpy数组

v = np.diff(np.append(t[0], t))

I would suggest using

v = np.diff(t)

this is simple and easy to read.

But if you want v to have the same length as t then

v = np.diff([t[0]] + t) # for python 3.x

or

v = np.diff(t + [t[-1]])

FYI: this will only work for lists.

for numpy arrays

v = np.diff(np.append(t[0], t))

回答 6

功能方法:

>>> import operator
>>> a = [1,3,5,7,11,13,17,21]
>>> map(operator.sub, a[1:], a[:-1])
[2, 2, 2, 4, 2, 4, 4]

使用生成器:

>>> import operator, itertools
>>> g1,g2 = itertools.tee((x*x for x in xrange(5)),2)
>>> list(itertools.imap(operator.sub, itertools.islice(g1,1,None), g2))
[1, 3, 5, 7]

使用索引:

>>> [a[i+1]-a[i] for i in xrange(len(a)-1)]
[2, 2, 2, 4, 2, 4, 4]

A functional approach:

>>> import operator
>>> a = [1,3,5,7,11,13,17,21]
>>> map(operator.sub, a[1:], a[:-1])
[2, 2, 2, 4, 2, 4, 4]

Using generator:

>>> import operator, itertools
>>> g1,g2 = itertools.tee((x*x for x in xrange(5)),2)
>>> list(itertools.imap(operator.sub, itertools.islice(g1,1,None), g2))
[1, 3, 5, 7]

Using indices:

>>> [a[i+1]-a[i] for i in xrange(len(a)-1)]
[2, 2, 2, 4, 2, 4, 4]

回答 7

好。我想我找到了正确的解决方案:

v = [x[1]-x[0] for x in zip(t[1:],t[:-1])]

Ok. I think I found the proper solution:

v = [x[1]-x[0] for x in zip(t[1:],t[:-1])]

回答 8

具有周期边界的解决方案

有时,使用数值积分时,您可能希望将具有周期性边界条件的列表与众不同(因此,第一个元素计算与最后一个元素的差。在这种情况下,numpy.roll函数会有所帮助:

v-np.roll(v,1)

零前置解决方案

另一个numpy解决方案(仅出于完整性考虑)是使用

numpy.ediff1d(v)

它作为numpy.diff起作用,但仅在向量上起作用(它使输入数组变平)。它提供了在结果矢量前添加或添加数字的功能。这在处理通常是气象变量(例如,雨水,潜热等)通量变化的累积字段时很有用,因为您想要一个与输入变量长度相同的结果列表,而第一个条目保持不变。

那你会写

np.ediff1d(v,to_begin=v[0])

当然,您也可以使用np.diff命令来执行此操作,在这种情况下,尽管您需要使用prepend关键字在序列前加零:

np.diff(v,prepend=0.0) 

以上所有解决方案都返回一个与输入长度相同的向量。

Solution with periodic boundaries

Sometimes with numerical integration you will want to difference a list with periodic boundary conditions (so the first element calculates the difference to the last. In this case the numpy.roll function is helpful:

v-np.roll(v,1)

Solutions with zero prepended

Another numpy solution (just for completeness) is to use

numpy.ediff1d(v)

This works as numpy.diff, but only on a vector (it flattens the input array). It offers the ability to prepend or append numbers to the resulting vector. This is useful when handling accumulated fields that is often the case fluxes in meteorological variables (e.g. rain, latent heat etc), as you want a resulting list of the same length as the input variable, with the first entry untouched.

Then you would write

np.ediff1d(v,to_begin=v[0])

Of course, you can also do this with the np.diff command, in this case though you need to prepend zero to the series with the prepend keyword:

np.diff(v,prepend=0.0) 

All the above solutions return a vector that is the same length as the input.


回答 9

我的方式

>>>v = [1,2,3,4,5]
>>>[v[i] - v[i-1] for i, value in enumerate(v[1:], 1)]
[1, 1, 1, 1]

My way

>>>v = [1,2,3,4,5]
>>>[v[i] - v[i-1] for i, value in enumerate(v[1:], 1)]
[1, 1, 1, 1]

基于正则表达式的Python拆分字符串

问题:基于正则表达式的Python拆分字符串

"HELLO there HOW are YOU"用大写单词分割字符串的最佳方法是什么(在Python中)?

所以我最终得到一个像这样的数组: results = ['HELLO there', 'HOW are', 'YOU']


编辑:

我努力了:

p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)

不过,它似乎不起作用。

What is the best way to split a string like "HELLO there HOW are YOU" by upper case words (in Python)?

So I’d end up with an array like such: results = ['HELLO there', 'HOW are', 'YOU']


EDIT:

I have tried:

p = re.compile("\b[A-Z]{2,}\b")
print p.split(page_text)

It doesn’t seem to work, though.


回答 0

我建议

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

检查这个演示

I suggest

l = re.compile("(?<!^)\s+(?=[A-Z])(?!.\s)").split(s)

Check this demo.


回答 1

您可以先行使用:

re.split(r'[ ](?=[A-Z]+\b)', input)

这将在每个空格处分开,后跟一串大写字母,每个大写字母以单词边界结尾。

请注意,方括号仅出于可读性考虑,也可以省略。

如果一个单词的第一个字母大写就足够了(因此,如果您也想在其前面拆分Hello),它将变得更加容易:

re.split(r'[ ](?=[A-Z])', input)

现在,这会在每个空格处分开,后跟任何大写字母。

You could use a lookahead:

re.split(r'[ ](?=[A-Z]+\b)', input)

This will split at every space that is followed by a string of upper-case letters which end in a word-boundary.

Note that the square brackets are only for readability and could as well be omitted.

If it is enough that the first letter of a word is upper case (so if you would want to split in front of Hello as well) it gets even easier:

re.split(r'[ ](?=[A-Z])', input)

Now this splits at every space followed by any upper-case letter.


回答 2

您的问题包含字符串文字"\b[A-Z]{2,}\b",但是这\b意味着退格,因为没有r-修饰符。

试试:r"\b[A-Z]{2,}\b"

Your question contains the string literal "\b[A-Z]{2,}\b", but that \b will mean backspace, because there is no r-modifier.

Try: r"\b[A-Z]{2,}\b".


在python中创建漂亮的列输出

问题:在python中创建漂亮的列输出

我试图在python中创建一个不错的列列表,以与我创建的命令行管理工具一起使用。

基本上,我想要一个像这样的列表:

[['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

变成:

a            b            c
aaaaaaaaaa   b            c
a            bbbbbbbbbb   c

使用普通标签不会解决问题,因为我不知道每一行中最长的数据。

这与Linux中的’column -t’相同。

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c" | column -t
a           b           c
aaaaaaaaaa  b           c
a           bbbbbbbbbb  c

我到处寻找各种python库来执行此操作,但找不到任何有用的方法。

I am trying to create a nice column list in python for use with commandline admin tools which I create.

Basicly, I want a list like:

[['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

To turn into:

a            b            c
aaaaaaaaaa   b            c
a            bbbbbbbbbb   c

Using plain tabs wont do the trick here because I don’t know the longest data in each row.

This is the same behavior as ‘column -t’ in Linux..

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c

$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c" | column -t
a           b           c
aaaaaaaaaa  b           c
a           bbbbbbbbbb  c

I have looked around for various python libraries to do this but can’t find anything useful.


回答 0

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

col_width = max(len(word) for row in data for word in row) + 2  # padding
for row in data:
    print "".join(word.ljust(col_width) for word in row)

a            b            c            
aaaaaaaaaa   b            c            
a            bbbbbbbbbb   c   

这样做是计算最长的数据条目以确定列宽,然后.ljust()在打印出每一列时用于添加必要的填充。

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]

col_width = max(len(word) for row in data for word in row) + 2  # padding
for row in data:
    print "".join(word.ljust(col_width) for word in row)

a            b            c            
aaaaaaaaaa   b            c            
a            bbbbbbbbbb   c   

What this does is calculate the longest data entry to determine the column width, then use .ljust() to add the necessary padding when printing out each column.


回答 1

从Python 2.6+开始,您可以通过以下方式使用格式字符串,以将列设置为至少20个字符,并将文本向右对齐。

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]
for row in table_data:
    print("{: >20} {: >20} {: >20}".format(*row))

输出:

               a                    b                    c
      aaaaaaaaaa                    b                    c
               a           bbbbbbbbbb                    c

Since Python 2.6+, you can use a format string in the following way to set the columns to a minimum of 20 characters and align text to right.

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]
for row in table_data:
    print("{: >20} {: >20} {: >20}".format(*row))

Output:

               a                    b                    c
      aaaaaaaaaa                    b                    c
               a           bbbbbbbbbb                    c

回答 2

我来到这里有相同的要求,但@lvc和@Preet的答案似乎与column -t该列中产生的内容具有不同的宽度更加一致:

>>> rows =  [   ['a',           'b',            'c',    'd']
...         ,   ['aaaaaaaaaa',  'b',            'c',    'd']
...         ,   ['a',           'bbbbbbbbbb',   'c',    'd']
...         ]
...

>>> widths = [max(map(len, col)) for col in zip(*rows)]
>>> for row in rows:
...     print "  ".join((val.ljust(width) for val, width in zip(row, widths)))
...
a           b           c  d
aaaaaaaaaa  b           c  d
a           bbbbbbbbbb  c  d

I came here with the same requirements but @lvc and @Preet’s answers seems more inline with what column -t produces in that columns have different widths:

>>> rows =  [   ['a',           'b',            'c',    'd']
...         ,   ['aaaaaaaaaa',  'b',            'c',    'd']
...         ,   ['a',           'bbbbbbbbbb',   'c',    'd']
...         ]
...

>>> widths = [max(map(len, col)) for col in zip(*rows)]
>>> for row in rows:
...     print "  ".join((val.ljust(width) for val, width in zip(row, widths)))
...
a           b           c  d
aaaaaaaaaa  b           c  d
a           bbbbbbbbbb  c  d

回答 3

这对聚会来说有点晚了,是我写的一个无耻的插件,但是您也可以查看Columnar软件包。

它接受一个输入列表和一个标题列表,并输出一个表格式的字符串。此代码段创建了一个docker-esque表:

from columnar import columnar

headers = ['name', 'id', 'host', 'notes']

data = [
    ['busybox', 'c3c37d5d-38d2-409f-8d02-600fd9d51239', 'linuxnode-1-292735', 'Test server.'],
    ['alpine-python', '6bb77855-0fda-45a9-b553-e19e1a795f1e', 'linuxnode-2-249253', 'The one that runs python.'],
    ['redis', 'afb648ba-ac97-4fb2-8953-9a5b5f39663e', 'linuxnode-3-3416918', 'For queues and stuff.'],
    ['app-server', 'b866cd0f-bf80-40c7-84e3-c40891ec68f9', 'linuxnode-4-295918', 'A popular destination.'],
    ['nginx', '76fea0f0-aa53-4911-b7e4-fae28c2e469b', 'linuxnode-5-292735', 'Traffic Cop'],
]

table = columnar(data, headers, no_borders=True)
print(table)

或者,您可以得到一些带有颜色和边框的爱好者。

要了解有关列大小调整算法的更多信息并查看API的其余部分,可以查看上面的链接或查看Columnar GitHub Repo

This is a little late to the party, and a shameless plug for a package I wrote, but you can also check out the Columnar package.

It takes a list of lists of input and a list of headers and outputs a table-formatted string. This snippet creates a docker-esque table:

from columnar import columnar

headers = ['name', 'id', 'host', 'notes']

data = [
    ['busybox', 'c3c37d5d-38d2-409f-8d02-600fd9d51239', 'linuxnode-1-292735', 'Test server.'],
    ['alpine-python', '6bb77855-0fda-45a9-b553-e19e1a795f1e', 'linuxnode-2-249253', 'The one that runs python.'],
    ['redis', 'afb648ba-ac97-4fb2-8953-9a5b5f39663e', 'linuxnode-3-3416918', 'For queues and stuff.'],
    ['app-server', 'b866cd0f-bf80-40c7-84e3-c40891ec68f9', 'linuxnode-4-295918', 'A popular destination.'],
    ['nginx', '76fea0f0-aa53-4911-b7e4-fae28c2e469b', 'linuxnode-5-292735', 'Traffic Cop'],
]

table = columnar(data, headers, no_borders=True)
print(table)

Or you can get a little fancier with colors and borders.

To read more about the column-sizing algorithm and see the rest of the API you can check out the link above or see the Columnar GitHub Repo


回答 4

您必须使用2个通行证来执行此操作:

  1. 获取每列的最大宽度。
  2. 使用格式化使用我们最宽的知识从第一遍列str.ljust()str.rjust()

You have to do this with 2 passes:

  1. get the maximum width of each column.
  2. formatting the columns using our knowledge of max width from the first pass using str.ljust() and str.rjust()

回答 5

像这样转换列是zip的工作:

>>> a = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
>>> list(zip(*a))
[('a', 'aaaaaaaaaa', 'a'), ('b', 'b', 'bbbbbbbbbb'), ('c', 'c', 'c')]

要查找每列所需的长度,可以使用max

>>> trans_a = zip(*a)
>>> [max(len(c) for c in b) for b in trans_a]
[10, 10, 1]

您可以在适当的填充下使用它来构造要传递给的字符串print

>>> col_lenghts = [max(len(c) for c in b) for b in trans_a]
>>> padding = ' ' # You might want more
>>> padding.join(s.ljust(l) for s,l in zip(a[0], col_lenghts))
'a          b          c'

Transposing the columns like that is a job for zip:

>>> a = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
>>> list(zip(*a))
[('a', 'aaaaaaaaaa', 'a'), ('b', 'b', 'bbbbbbbbbb'), ('c', 'c', 'c')]

To find the required length of each column, you can use max:

>>> trans_a = zip(*a)
>>> [max(len(c) for c in b) for b in trans_a]
[10, 10, 1]

Which you can use, with suitable padding, to construct strings to pass to print:

>>> col_lenghts = [max(len(c) for c in b) for b in trans_a]
>>> padding = ' ' # You might want more
>>> padding.join(s.ljust(l) for s,l in zip(a[0], col_lenghts))
'a          b          c'

回答 6

要获得更高级的表

---------------------------------------------------
| First Name | Last Name        | Age | Position  |
---------------------------------------------------
| John       | Smith            | 24  | Software  |
|            |                  |     | Engineer  |
---------------------------------------------------
| Mary       | Brohowski        | 23  | Sales     |
|            |                  |     | Manager   |
---------------------------------------------------
| Aristidis  | Papageorgopoulos | 28  | Senior    |
|            |                  |     | Reseacher |
---------------------------------------------------

您可以使用以下Python配方

'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''
import cStringIO,operator

def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
           separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column. 
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""
    # closure for breaking logical rows to physical, using wrapfunc
    def rowWrapper(row):
        newRows = [wrapfunc(item).split('\n') for item in row]
        return [[substr or '' for substr in item] for item in map(None,*newRows)]
    # break each logical row into one or more physical ones
    logicalRows = [rowWrapper(row) for row in rows]
    # columns of physical rows
    columns = map(None,*reduce(operator.add,logicalRows))
    # get the maximum of each column by the string length of its items
    maxWidths = [max([len(str(item)) for item in column]) for column in columns]
    rowSeparator = headerChar * (len(prefix) + len(postfix) + sum(maxWidths) + \
                                 len(delim)*(len(maxWidths)-1))
    # select the appropriate justify method
    justify = {'center':str.center, 'right':str.rjust, 'left':str.ljust}[justify.lower()]
    output=cStringIO.StringIO()
    if separateRows: print >> output, rowSeparator
    for physicalRows in logicalRows:
        for row in physicalRows:
            print >> output, \
                prefix \
                + delim.join([justify(str(item),width) for (item,width) in zip(row,maxWidths)]) \
                + postfix
        if separateRows or hasHeader: print >> output, rowSeparator; hasHeader=False
    return output.getvalue()

# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return reduce(lambda line, word, width=width: '%s%s%s' %
                  (line,
                   ' \n'[(len(line[line.rfind('\n')+1:])
                         + len(word.split('\n',1)[0]
                              ) >= width)],
                   word),
                  text.split(' ')
                 )

import re
def wrap_onspace_strict(text, width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    wordRegex = re.compile(r'\S{'+str(width)+r',}')
    return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)

import math
def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return '\n'.join([ text[width*i:width*(i+1)] \
                       for i in xrange(int(math.ceil(1.*len(text)/width))) ])

if __name__ == '__main__':
    labels = ('First Name', 'Last Name', 'Age', 'Position')
    data = \
    '''John,Smith,24,Software Engineer
       Mary,Brohowski,23,Sales Manager
       Aristidis,Papageorgopoulos,28,Senior Reseacher'''
    rows = [row.strip().split(',')  for row in data.splitlines()]

    print 'Without wrapping function\n'
    print indent([labels]+rows, hasHeader=True)
    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always,wrap_onspace,wrap_onspace_strict):
        print 'Wrapping function: %s(x,width=%d)\n' % (wrapper.__name__,width)
        print indent([labels]+rows, hasHeader=True, separateRows=True,
                     prefix='| ', postfix=' |',
                     wrapfunc=lambda x: wrapper(x,width))

    # output:
    #
    #Without wrapping function
    #
    #First Name | Last Name        | Age | Position         
    #-------------------------------------------------------
    #John       | Smith            | 24  | Software Engineer
    #Mary       | Brohowski        | 23  | Sales Manager    
    #Aristidis  | Papageorgopoulos | 28  | Senior Reseacher 
    #
    #Wrapping function: wrap_always(x,width=10)
    #
    #----------------------------------------------
    #| First Name | Last Name  | Age | Position   |
    #----------------------------------------------
    #| John       | Smith      | 24  | Software E |
    #|            |            |     | ngineer    |
    #----------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales Mana |
    #|            |            |     | ger        |
    #----------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior Res |
    #|            | poulos     |     | eacher     |
    #----------------------------------------------
    #
    #Wrapping function: wrap_onspace(x,width=10)
    #
    #---------------------------------------------------
    #| First Name | Last Name        | Age | Position  |
    #---------------------------------------------------
    #| John       | Smith            | 24  | Software  |
    #|            |                  |     | Engineer  |
    #---------------------------------------------------
    #| Mary       | Brohowski        | 23  | Sales     |
    #|            |                  |     | Manager   |
    #---------------------------------------------------
    #| Aristidis  | Papageorgopoulos | 28  | Senior    |
    #|            |                  |     | Reseacher |
    #---------------------------------------------------
    #
    #Wrapping function: wrap_onspace_strict(x,width=10)
    #
    #---------------------------------------------
    #| First Name | Last Name  | Age | Position  |
    #---------------------------------------------
    #| John       | Smith      | 24  | Software  |
    #|            |            |     | Engineer  |
    #---------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales     |
    #|            |            |     | Manager   |
    #---------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior    |
    #|            | poulos     |     | Reseacher |
    #---------------------------------------------

Python的配方页面包含了它一些改进。

To get fancier tables like

---------------------------------------------------
| First Name | Last Name        | Age | Position  |
---------------------------------------------------
| John       | Smith            | 24  | Software  |
|            |                  |     | Engineer  |
---------------------------------------------------
| Mary       | Brohowski        | 23  | Sales     |
|            |                  |     | Manager   |
---------------------------------------------------
| Aristidis  | Papageorgopoulos | 28  | Senior    |
|            |                  |     | Reseacher |
---------------------------------------------------

you can use this Python recipe:

'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''
import cStringIO,operator

def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
           separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column. 
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""
    # closure for breaking logical rows to physical, using wrapfunc
    def rowWrapper(row):
        newRows = [wrapfunc(item).split('\n') for item in row]
        return [[substr or '' for substr in item] for item in map(None,*newRows)]
    # break each logical row into one or more physical ones
    logicalRows = [rowWrapper(row) for row in rows]
    # columns of physical rows
    columns = map(None,*reduce(operator.add,logicalRows))
    # get the maximum of each column by the string length of its items
    maxWidths = [max([len(str(item)) for item in column]) for column in columns]
    rowSeparator = headerChar * (len(prefix) + len(postfix) + sum(maxWidths) + \
                                 len(delim)*(len(maxWidths)-1))
    # select the appropriate justify method
    justify = {'center':str.center, 'right':str.rjust, 'left':str.ljust}[justify.lower()]
    output=cStringIO.StringIO()
    if separateRows: print >> output, rowSeparator
    for physicalRows in logicalRows:
        for row in physicalRows:
            print >> output, \
                prefix \
                + delim.join([justify(str(item),width) for (item,width) in zip(row,maxWidths)]) \
                + postfix
        if separateRows or hasHeader: print >> output, rowSeparator; hasHeader=False
    return output.getvalue()

# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return reduce(lambda line, word, width=width: '%s%s%s' %
                  (line,
                   ' \n'[(len(line[line.rfind('\n')+1:])
                         + len(word.split('\n',1)[0]
                              ) >= width)],
                   word),
                  text.split(' ')
                 )

import re
def wrap_onspace_strict(text, width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    wordRegex = re.compile(r'\S{'+str(width)+r',}')
    return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)

import math
def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return '\n'.join([ text[width*i:width*(i+1)] \
                       for i in xrange(int(math.ceil(1.*len(text)/width))) ])

if __name__ == '__main__':
    labels = ('First Name', 'Last Name', 'Age', 'Position')
    data = \
    '''John,Smith,24,Software Engineer
       Mary,Brohowski,23,Sales Manager
       Aristidis,Papageorgopoulos,28,Senior Reseacher'''
    rows = [row.strip().split(',')  for row in data.splitlines()]

    print 'Without wrapping function\n'
    print indent([labels]+rows, hasHeader=True)
    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always,wrap_onspace,wrap_onspace_strict):
        print 'Wrapping function: %s(x,width=%d)\n' % (wrapper.__name__,width)
        print indent([labels]+rows, hasHeader=True, separateRows=True,
                     prefix='| ', postfix=' |',
                     wrapfunc=lambda x: wrapper(x,width))

    # output:
    #
    #Without wrapping function
    #
    #First Name | Last Name        | Age | Position         
    #-------------------------------------------------------
    #John       | Smith            | 24  | Software Engineer
    #Mary       | Brohowski        | 23  | Sales Manager    
    #Aristidis  | Papageorgopoulos | 28  | Senior Reseacher 
    #
    #Wrapping function: wrap_always(x,width=10)
    #
    #----------------------------------------------
    #| First Name | Last Name  | Age | Position   |
    #----------------------------------------------
    #| John       | Smith      | 24  | Software E |
    #|            |            |     | ngineer    |
    #----------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales Mana |
    #|            |            |     | ger        |
    #----------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior Res |
    #|            | poulos     |     | eacher     |
    #----------------------------------------------
    #
    #Wrapping function: wrap_onspace(x,width=10)
    #
    #---------------------------------------------------
    #| First Name | Last Name        | Age | Position  |
    #---------------------------------------------------
    #| John       | Smith            | 24  | Software  |
    #|            |                  |     | Engineer  |
    #---------------------------------------------------
    #| Mary       | Brohowski        | 23  | Sales     |
    #|            |                  |     | Manager   |
    #---------------------------------------------------
    #| Aristidis  | Papageorgopoulos | 28  | Senior    |
    #|            |                  |     | Reseacher |
    #---------------------------------------------------
    #
    #Wrapping function: wrap_onspace_strict(x,width=10)
    #
    #---------------------------------------------
    #| First Name | Last Name  | Age | Position  |
    #---------------------------------------------
    #| John       | Smith      | 24  | Software  |
    #|            |            |     | Engineer  |
    #---------------------------------------------
    #| Mary       | Brohowski  | 23  | Sales     |
    #|            |            |     | Manager   |
    #---------------------------------------------
    #| Aristidis  | Papageorgo | 28  | Senior    |
    #|            | poulos     |     | Reseacher |
    #---------------------------------------------

The Python recipe page contains a few improvements on it.


回答 7

pandas 创建数据框的基于解决方案:

import pandas as pd
l = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
df = pd.DataFrame(l)

print(df)
            0           1  2
0           a           b  c
1  aaaaaaaaaa           b  c
2           a  bbbbbbbbbb  c

要删除索引和标头值以创建所需的输出,可以使用to_string方法:

result = df.to_string(index=False, header=False)

print(result)
          a           b  c
 aaaaaaaaaa           b  c
          a  bbbbbbbbbb  c

pandas based solution with creating dataframe:

import pandas as pd
l = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
df = pd.DataFrame(l)

print(df)
            0           1  2
0           a           b  c
1  aaaaaaaaaa           b  c
2           a  bbbbbbbbbb  c

To remove index and header values to create output what you want you could use to_string method:

result = df.to_string(index=False, header=False)

print(result)
          a           b  c
 aaaaaaaaaa           b  c
          a  bbbbbbbbbb  c

回答 8

Scolp是一个新的库,可让您在自动调整列宽的同时轻松打印流式列数据。

(免责声明:我是作者)

Scolp is a new library that lets you pretty print streaming columnar data easily while auto-adjusting column width.

(Disclaimer: I am the author)


回答 9

这将基于其他答案中使用的最大度量设置独立的,最适合的列宽。

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
padding = 2
col_widths = [max(len(w) for w in [r[cn] for r in data]) + padding for cn in range(len(data[0]))]
format_string = "{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)
for row in data:
    print(format_string.format(*row))

This sets independent, best-fit column widths based on the max-metric used in other answers.

data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
padding = 2
col_widths = [max(len(w) for w in [r[cn] for r in data]) + padding for cn in range(len(data[0]))]
format_string = "{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)
for row in data:
    print(format_string.format(*row))

回答 10

对于懒惰的人

使用Python 3. *Pandas / Geopandas的代码;通用的简单类内方法(对于“普通”脚本,只需删除self):

函数着色:

    def colorize(self,s,color):
        s = color+str(s)+"\033[0m"
        return s

标头:

print('{0:<23} {1:>24} {2:>26} {3:>26} {4:>11} {5:>11}'.format('Road name','Classification','Function','Form of road','Length','Distance') )

然后是来自Pandas / Geopandas数据框的数据:

            for index, row in clipped.iterrows():
                rdName      = self.colorize(row['name1'],"\033[32m")
                rdClass     = self.colorize(row['roadClassification'],"\033[93m")
                rdFunction  = self.colorize(row['roadFunction'],"\033[33m")
                rdForm      = self.colorize(row['formOfWay'],"\033[94m")
                rdLength    = self.colorize(row['length'],"\033[97m")
                rdDistance  = self.colorize(row['distance'],"\033[96m")
                print('{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}'.format(rdName,rdClass,rdFunction,rdForm,rdLength,rdDistance) )

含义{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}

0, 1, 2, 3, 4, 5 ->列,在这种情况下总共有6个

30, 35, 20->列的宽度(请注意,您必须添加长度\033[96m-对于Python来说,这也是一个字符串),只需进行实验即可:)

>, <->对齐:右,左(也=用于填充零)

如果要区分例如最大值,则必须切换到特殊的Pandas样式函数,但是假设这足以在终端窗口上显示数据。

结果:

For lazy people

that are using Python 3.* and Pandas/Geopandas; universal simple in-class approach (for ‘normal’ script just remove self):

Function colorize:

    def colorize(self,s,color):
        s = color+str(s)+"\033[0m"
        return s

Header:

print('{0:<23} {1:>24} {2:>26} {3:>26} {4:>11} {5:>11}'.format('Road name','Classification','Function','Form of road','Length','Distance') )

and then data from Pandas/Geopandas dataframe:

            for index, row in clipped.iterrows():
                rdName      = self.colorize(row['name1'],"\033[32m")
                rdClass     = self.colorize(row['roadClassification'],"\033[93m")
                rdFunction  = self.colorize(row['roadFunction'],"\033[33m")
                rdForm      = self.colorize(row['formOfWay'],"\033[94m")
                rdLength    = self.colorize(row['length'],"\033[97m")
                rdDistance  = self.colorize(row['distance'],"\033[96m")
                print('{0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}'.format(rdName,rdClass,rdFunction,rdForm,rdLength,rdDistance) )

Meaning of {0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}:

0, 1, 2, 3, 4, 5 -> columns, there are 6 in total in this case

30, 35, 20 -> width of column (note that you’ll have to add length of \033[96m – this for Python is a string as well), just experiment :)

>, < -> justify: right, left (there is = for filling with zeros as well)

If you want to distinct e.g. max value, you’ll have to switch to special Pandas style function, but suppose that’s far enough to present data on terminal window.

Result:


回答 11

与先前的答案略有不同(我没有足够的代表对此发表评论)。格式库允许您指定元素的宽度和对齐方式,但不能指定元素的开始位置,即可以说“宽20列”,而不能说“从20列开始”。导致此问题:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print("first row: {: >20} {: >20} {: >20}".format(*table_data[0]))
print("second row: {: >20} {: >20} {: >20}".format(*table_data[1]))
print("third row: {: >20} {: >20} {: >20}".format(*table_data[2]))

输出量

first row:                    a                    b                    c
second row:           aaaaaaaaaa                    b                    c
third row:                    a           bbbbbbbbbb                    c

当然,答案是格式化文字字符串,它与格式有点奇怪地结合在一起:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print(f"{'first row:': <20} {table_data[0][0]: >20} {table_data[0][1]: >20} {table_data[0][2]: >20}")
print("{: <20} {: >20} {: >20} {: >20}".format(*['second row:', *table_data[1]]))
print("{: <20} {: >20} {: >20} {: >20}".format(*['third row:', *table_data[1]]))

输出量

first row:                              a                    b                    c
second row:                    aaaaaaaaaa                    b                    c
third row:                     aaaaaaaaaa                    b                    c

A slight variation on a previous answer (I don’t have enough rep to comment on it). The format library lets you specify the width and alignment of an element but not where it starts, ie, you can say “be 20 columns wide” but not “start in column 20”. Which leads to this issue:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print("first row: {: >20} {: >20} {: >20}".format(*table_data[0]))
print("second row: {: >20} {: >20} {: >20}".format(*table_data[1]))
print("third row: {: >20} {: >20} {: >20}".format(*table_data[2]))

Output

first row:                    a                    b                    c
second row:           aaaaaaaaaa                    b                    c
third row:                    a           bbbbbbbbbb                    c

The answer of course is to format the literal strings as well, which combines slightly weirdly with the format:

table_data = [
    ['a', 'b', 'c'],
    ['aaaaaaaaaa', 'b', 'c'], 
    ['a', 'bbbbbbbbbb', 'c']
]

print(f"{'first row:': <20} {table_data[0][0]: >20} {table_data[0][1]: >20} {table_data[0][2]: >20}")
print("{: <20} {: >20} {: >20} {: >20}".format(*['second row:', *table_data[1]]))
print("{: <20} {: >20} {: >20} {: >20}".format(*['third row:', *table_data[1]]))

Output

first row:                              a                    b                    c
second row:                    aaaaaaaaaa                    b                    c
third row:                     aaaaaaaaaa                    b                    c

回答 12

我发现这个答案超级有用且优雅,最初是从这里开始的

matrix = [["A", "B"], ["C", "D"]]

print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in matrix]))

输出量

A   B
C   D

I found this answer super-helpful and elegant, originally from here:

matrix = [["A", "B"], ["C", "D"]]

print('\n'.join(['\t'.join([str(cell) for cell in row]) for row in matrix]))

Output

A   B
C   D

回答 13

这是肖恩·钦(Shawn Chin)答案的一种变体。宽度是固定的,不是每列都固定。第一行下方和各列之间还有一个边框。(icontract库用于执行合同。)

@icontract.pre(
    lambda table: not table or all(len(row) == len(table[0]) for row in table))
@icontract.post(lambda table, result: result == "" if not table else True)
@icontract.post(lambda result: not result.endswith("\n"))
def format_table(table: List[List[str]]) -> str:
    """
    Format the table as equal-spaced columns.

    :param table: rows of cells
    :return: table as string
    """
    cols = len(table[0])

    col_widths = [max(len(row[i]) for row in table) for i in range(cols)]

    lines = []  # type: List[str]
    for i, row in enumerate(table):
        parts = []  # type: List[str]

        for cell, width in zip(row, col_widths):
            parts.append(cell.ljust(width))

        line = " | ".join(parts)
        lines.append(line)

        if i == 0:
            border = []  # type: List[str]

            for width in col_widths:
                border.append("-" * width)

            lines.append("-+-".join(border))

    result = "\n".join(lines)

    return result

这是一个例子:

>>> table = [['column 0', 'another column 1'], ['00', '01'], ['10', '11']]
>>> result = packagery._format_table(table=table)
>>> print(result)
column 0 | another column 1
---------+-----------------
00       | 01              
10       | 11              

Here is a variation of the Shawn Chin’s answer. The width is fixed per column, not over all columns. There is also a border below the first row and between the columns. (icontract library is used to enforce the contracts.)

@icontract.pre(
    lambda table: not table or all(len(row) == len(table[0]) for row in table))
@icontract.post(lambda table, result: result == "" if not table else True)
@icontract.post(lambda result: not result.endswith("\n"))
def format_table(table: List[List[str]]) -> str:
    """
    Format the table as equal-spaced columns.

    :param table: rows of cells
    :return: table as string
    """
    cols = len(table[0])

    col_widths = [max(len(row[i]) for row in table) for i in range(cols)]

    lines = []  # type: List[str]
    for i, row in enumerate(table):
        parts = []  # type: List[str]

        for cell, width in zip(row, col_widths):
            parts.append(cell.ljust(width))

        line = " | ".join(parts)
        lines.append(line)

        if i == 0:
            border = []  # type: List[str]

            for width in col_widths:
                border.append("-" * width)

            lines.append("-+-".join(border))

    result = "\n".join(lines)

    return result

Here is an example:

>>> table = [['column 0', 'another column 1'], ['00', '01'], ['10', '11']]
>>> result = packagery._format_table(table=table)
>>> print(result)
column 0 | another column 1
---------+-----------------
00       | 01              
10       | 11              

回答 14

更新了@Franck Dernoncourt花式食谱以符合python 3和PEP8

import io
import math
import operator
import re
import functools

from itertools import zip_longest


def indent(
    rows,
    has_header=False,
    header_char="-",
    delim=" | ",
    justify="left",
    separate_rows=False,
    prefix="",
    postfix="",
    wrapfunc=lambda x: x,
):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column.
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""

    # closure for breaking logical rows to physical, using wrapfunc
    def row_wrapper(row):
        new_rows = [wrapfunc(item).split("\n") for item in row]
        return [[substr or "" for substr in item] for item in zip_longest(*new_rows)]

    # break each logical row into one or more physical ones
    logical_rows = [row_wrapper(row) for row in rows]
    # columns of physical rows
    columns = zip_longest(*functools.reduce(operator.add, logical_rows))
    # get the maximum of each column by the string length of its items
    max_widths = [max([len(str(item)) for item in column]) for column in columns]
    row_separator = header_char * (
        len(prefix) + len(postfix) + sum(max_widths) + len(delim) * (len(max_widths) - 1)
    )
    # select the appropriate justify method
    justify = {"center": str.center, "right": str.rjust, "left": str.ljust}[
        justify.lower()
    ]
    output = io.StringIO()
    if separate_rows:
        print(output, row_separator)
    for physicalRows in logical_rows:
        for row in physicalRows:
            print( output, prefix + delim.join(
                [justify(str(item), width) for (item, width) in zip(row, max_widths)]
            ) + postfix)
        if separate_rows or has_header:
            print(output, row_separator)
            has_header = False
    return output.getvalue()


# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return functools.reduce(
        lambda line, word, i_width=width: "%s%s%s"
        % (
            line,
            " \n"[
                (
                    len(line[line.rfind("\n") + 1 :]) + len(word.split("\n", 1)[0])
                    >= i_width
                )
            ],
            word,
        ),
        text.split(" "),
    )


def wrap_onspace_strict(text, i_width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    word_regex = re.compile(r"\S{" + str(i_width) + r",}")
    return wrap_onspace(
        word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
    )


def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return "\n".join(
        [
            text[width * i : width * (i + 1)]
            for i in range(int(math.ceil(1.0 * len(text) / width)))
        ]
    )


if __name__ == "__main__":
    labels = ("First Name", "Last Name", "Age", "Position")
    data = """John,Smith,24,Software Engineer
           Mary,Brohowski,23,Sales Manager
           Aristidis,Papageorgopoulos,28,Senior Reseacher"""
    rows = [row.strip().split(",") for row in data.splitlines()]

    print("Without wrapping function\n")
    print(indent([labels] + rows, has_header=True))

    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always, wrap_onspace, wrap_onspace_strict):
        print("Wrapping function: %s(x,width=%d)\n" % (wrapper.__name__, width))

        print(
            indent(
                [labels] + rows,
                has_header=True,
                separate_rows=True,
                prefix="| ",
                postfix=" |",
                wrapfunc=lambda x: wrapper(x, width),
            )
        )

    # output:
    #
    # Without wrapping function
    #
    # First Name | Last Name        | Age | Position
    # -------------------------------------------------------
    # John       | Smith            | 24  | Software Engineer
    # Mary       | Brohowski        | 23  | Sales Manager
    # Aristidis  | Papageorgopoulos | 28  | Senior Reseacher
    #
    # Wrapping function: wrap_always(x,width=10)
    #
    # ----------------------------------------------
    # | First Name | Last Name  | Age | Position   |
    # ----------------------------------------------
    # | John       | Smith      | 24  | Software E |
    # |            |            |     | ngineer    |
    # ----------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales Mana |
    # |            |            |     | ger        |
    # ----------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior Res |
    # |            | poulos     |     | eacher     |
    # ----------------------------------------------
    #
    # Wrapping function: wrap_onspace(x,width=10)
    #
    # ---------------------------------------------------
    # | First Name | Last Name        | Age | Position  |
    # ---------------------------------------------------
    # | John       | Smith            | 24  | Software  |
    # |            |                  |     | Engineer  |
    # ---------------------------------------------------
    # | Mary       | Brohowski        | 23  | Sales     |
    # |            |                  |     | Manager   |
    # ---------------------------------------------------
    # | Aristidis  | Papageorgopoulos | 28  | Senior    |
    # |            |                  |     | Reseacher |
    # ---------------------------------------------------
    #
    # Wrapping function: wrap_onspace_strict(x,width=10)
    #
    # ---------------------------------------------
    # | First Name | Last Name  | Age | Position  |
    # ---------------------------------------------
    # | John       | Smith      | 24  | Software  |
    # |            |            |     | Engineer  |
    # ---------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales     |
    # |            |            |     | Manager   |
    # ---------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior    |
    # |            | poulos     |     | Reseacher |
    # ---------------------------------------------

updated @Franck Dernoncourt fancy recipe to be python 3 and PEP8 compliant

import io
import math
import operator
import re
import functools

from itertools import zip_longest


def indent(
    rows,
    has_header=False,
    header_char="-",
    delim=" | ",
    justify="left",
    separate_rows=False,
    prefix="",
    postfix="",
    wrapfunc=lambda x: x,
):
    """Indents a table by column.
       - rows: A sequence of sequences of items, one sequence per row.
       - hasHeader: True if the first row consists of the columns' names.
       - headerChar: Character to be used for the row separator line
         (if hasHeader==True or separateRows==True).
       - delim: The column delimiter.
       - justify: Determines how are data justified in their column.
         Valid values are 'left','right' and 'center'.
       - separateRows: True if rows are to be separated by a line
         of 'headerChar's.
       - prefix: A string prepended to each printed row.
       - postfix: A string appended to each printed row.
       - wrapfunc: A function f(text) for wrapping text; each element in
         the table is first wrapped by this function."""

    # closure for breaking logical rows to physical, using wrapfunc
    def row_wrapper(row):
        new_rows = [wrapfunc(item).split("\n") for item in row]
        return [[substr or "" for substr in item] for item in zip_longest(*new_rows)]

    # break each logical row into one or more physical ones
    logical_rows = [row_wrapper(row) for row in rows]
    # columns of physical rows
    columns = zip_longest(*functools.reduce(operator.add, logical_rows))
    # get the maximum of each column by the string length of its items
    max_widths = [max([len(str(item)) for item in column]) for column in columns]
    row_separator = header_char * (
        len(prefix) + len(postfix) + sum(max_widths) + len(delim) * (len(max_widths) - 1)
    )
    # select the appropriate justify method
    justify = {"center": str.center, "right": str.rjust, "left": str.ljust}[
        justify.lower()
    ]
    output = io.StringIO()
    if separate_rows:
        print(output, row_separator)
    for physicalRows in logical_rows:
        for row in physicalRows:
            print( output, prefix + delim.join(
                [justify(str(item), width) for (item, width) in zip(row, max_widths)]
            ) + postfix)
        if separate_rows or has_header:
            print(output, row_separator)
            has_header = False
    return output.getvalue()


# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
    """
    A word-wrap function that preserves existing line breaks
    and most spaces in the text. Expects that existing line
    breaks are posix newlines (\n).
    """
    return functools.reduce(
        lambda line, word, i_width=width: "%s%s%s"
        % (
            line,
            " \n"[
                (
                    len(line[line.rfind("\n") + 1 :]) + len(word.split("\n", 1)[0])
                    >= i_width
                )
            ],
            word,
        ),
        text.split(" "),
    )


def wrap_onspace_strict(text, i_width):
    """Similar to wrap_onspace, but enforces the width constraint:
       words longer than width are split."""
    word_regex = re.compile(r"\S{" + str(i_width) + r",}")
    return wrap_onspace(
        word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
    )


def wrap_always(text, width):
    """A simple word-wrap function that wraps text on exactly width characters.
       It doesn't split the text in words."""
    return "\n".join(
        [
            text[width * i : width * (i + 1)]
            for i in range(int(math.ceil(1.0 * len(text) / width)))
        ]
    )


if __name__ == "__main__":
    labels = ("First Name", "Last Name", "Age", "Position")
    data = """John,Smith,24,Software Engineer
           Mary,Brohowski,23,Sales Manager
           Aristidis,Papageorgopoulos,28,Senior Reseacher"""
    rows = [row.strip().split(",") for row in data.splitlines()]

    print("Without wrapping function\n")
    print(indent([labels] + rows, has_header=True))

    # test indent with different wrapping functions
    width = 10
    for wrapper in (wrap_always, wrap_onspace, wrap_onspace_strict):
        print("Wrapping function: %s(x,width=%d)\n" % (wrapper.__name__, width))

        print(
            indent(
                [labels] + rows,
                has_header=True,
                separate_rows=True,
                prefix="| ",
                postfix=" |",
                wrapfunc=lambda x: wrapper(x, width),
            )
        )

    # output:
    #
    # Without wrapping function
    #
    # First Name | Last Name        | Age | Position
    # -------------------------------------------------------
    # John       | Smith            | 24  | Software Engineer
    # Mary       | Brohowski        | 23  | Sales Manager
    # Aristidis  | Papageorgopoulos | 28  | Senior Reseacher
    #
    # Wrapping function: wrap_always(x,width=10)
    #
    # ----------------------------------------------
    # | First Name | Last Name  | Age | Position   |
    # ----------------------------------------------
    # | John       | Smith      | 24  | Software E |
    # |            |            |     | ngineer    |
    # ----------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales Mana |
    # |            |            |     | ger        |
    # ----------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior Res |
    # |            | poulos     |     | eacher     |
    # ----------------------------------------------
    #
    # Wrapping function: wrap_onspace(x,width=10)
    #
    # ---------------------------------------------------
    # | First Name | Last Name        | Age | Position  |
    # ---------------------------------------------------
    # | John       | Smith            | 24  | Software  |
    # |            |                  |     | Engineer  |
    # ---------------------------------------------------
    # | Mary       | Brohowski        | 23  | Sales     |
    # |            |                  |     | Manager   |
    # ---------------------------------------------------
    # | Aristidis  | Papageorgopoulos | 28  | Senior    |
    # |            |                  |     | Reseacher |
    # ---------------------------------------------------
    #
    # Wrapping function: wrap_onspace_strict(x,width=10)
    #
    # ---------------------------------------------
    # | First Name | Last Name  | Age | Position  |
    # ---------------------------------------------
    # | John       | Smith      | 24  | Software  |
    # |            |            |     | Engineer  |
    # ---------------------------------------------
    # | Mary       | Brohowski  | 23  | Sales     |
    # |            |            |     | Manager   |
    # ---------------------------------------------
    # | Aristidis  | Papageorgo | 28  | Senior    |
    # |            | poulos     |     | Reseacher |
    # ---------------------------------------------

回答 15

我知道这个问题很旧,但是我不了解Antak的答案,也不想使用库,所以我推出了自己的解决方案。

解决方案假定记录是2D数组,记录的长度都相同,并且字段都是字符串。

def stringifyRecords(records):
    column_widths = [0] * len(records[0])
    for record in records:
        for i, field in enumerate(record):
            width = len(field)
            if width > column_widths[i]: column_widths[i] = width

    s = ""
    for record in records:
        for column_width, field in zip(column_widths, record):
            s += field.ljust(column_width+1)
        s += "\n"

    return s

I realize this question is old but I didn’t understand Antak’s answer and didn’t want to use a library so I rolled my own solution.

Solution assumes records is a 2D array, records are all the same length, and that fields are all strings.

def stringifyRecords(records):
    column_widths = [0] * len(records[0])
    for record in records:
        for i, field in enumerate(record):
            width = len(field)
            if width > column_widths[i]: column_widths[i] = width

    s = ""
    for record in records:
        for column_width, field in zip(column_widths, record):
            s += field.ljust(column_width+1)
        s += "\n"

    return s

处理CSV数据时如何忽略第一行数据?

问题:处理CSV数据时如何忽略第一行数据?

我要Python从一列CSV数据中打印最少的数字,但是第一行是列号,我不希望Python考虑到第一行。如何确定Python忽略第一行?

到目前为止,这是代码:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

您还能说明自己在做什么,而不仅仅是给出代码吗?我对Python非常陌生,并希望确保我了解所有内容。

I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don’t want Python to take the top row into account. How can I make sure Python ignores the first line?

This is the code so far:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.


回答 0

您可以使用csv模块Sniffer类的实例来推断CSV文件的格式,并检测是否存在标头行以及next()仅在必要时才跳过第一行的内置函数:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

print(least_value)

由于在您的示例中datatypecolumn都进行了硬编码,因此row像这样处理起来会更快一些:

    data = (float(row[1]) for row in reader)

注意:以上代码适用于Python3.x。对于Python 2.x,使用以下行来打开文件而不是显示的内容:

with open('all16.csv', 'rb') as file:

You could use an instance of the csv module’s Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:

import csv

with open('all16.csv', 'r', newline='') as file:
    has_header = csv.Sniffer().has_header(file.read(1024))
    file.seek(0)  # Rewind.
    reader = csv.reader(file)
    if has_header:
        next(reader)  # Skip header row.
    column = 1
    datatype = float
    data = (datatype(row[column]) for row in reader)
    least_value = min(data)

print(least_value)

Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:

    data = (float(row[1]) for row in reader)

Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:

with open('all16.csv', 'rb') as file:

回答 1

要跳过第一行,只需调用:

next(inf)

Python中的文件是行上的迭代器。

To skip the first line just call:

next(inf)

Files in Python are iterators over lines.


回答 2

在类似的用例中,我不得不在具有实际列名的行之前跳过烦人的行。该解决方案效果很好。首先阅读文件,然后将列表传递给csv.DictReader

with open('all16.csv') as tmp:
    # Skip first line (if any)
    next(tmp, None)

    # {line_num: row}
    data = dict(enumerate(csv.DictReader(tmp)))

In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.

with open('all16.csv') as tmp:
    # Skip first line (if any)
    next(tmp, None)

    # {line_num: row}
    data = dict(enumerate(csv.DictReader(tmp)))

回答 3

python cookbook借来的,
更简洁的模板代码可能如下所示:

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f) 
    headers = next(f_csv) 
    for row in f_csv:
        # Process row ...

Borrowed from python cookbook,
A more concise template code might look like this:

import csv
with open('stocks.csv') as f:
    f_csv = csv.reader(f) 
    headers = next(f_csv) 
    for row in f_csv:
        # Process row ...

回答 4

通常next(incsv),您会使用它使迭代器前进一排,因此跳过标题。另一个(例如,您想跳过30行)将是:

from itertools import islice
for row in islice(incsv, 30, None):
    # process

You would normally use next(incsv) which advances the iterator one row, so you skip the header. The other (say you wanted to skip 30 rows) would be:

from itertools import islice
for row in islice(incsv, 30, None):
    # process

回答 5

使用csv.DictReader而不是csv.Reader。如果省略fieldnames参数,则csvfile第一行中的值将用作字段名称。这样便可以使用row [“ 1”]等访问字段值

use csv.DictReader instead of csv.Reader. If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row[“1”] etc


回答 6

新的“ pandas”软件包可能比“ csv”更相关。下面的代码将读取一个CSV文件,默认情况下将第一行解释为列标题,并在各列中查找最小值。

import pandas as pd

data = pd.read_csv('all16.csv')
data.min()

The new ‘pandas’ package might be more relevant than ‘csv’. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.

import pandas as pd

data = pd.read_csv('all16.csv')
data.min()

回答 7

好吧,我的迷你包装库也可以完成这项工作。

>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])

同时,如果您知道什么是标题列索引之一,例如“ Column 1”,则可以执行以下操作:

>>> min(data.column["Column 1"])

Well, my mini wrapper library would do the job as well.

>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])

Meanwhile, if you know what header column index one is, for example “Column 1”, you can do this instead:

>>> min(data.column["Column 1"])

回答 8

对我来说,最简单的方法就是使用范围。

import csv

with open('files/filename.csv') as I:
    reader = csv.reader(I)
    fulllist = list(reader)

# Starting with data skipping header
for item in range(1, len(fulllist)): 
    # Print each row using "item" as the index value
    print (fulllist[item])  

For me the easiest way to go is to use range.

import csv

with open('files/filename.csv') as I:
    reader = csv.reader(I)
    fulllist = list(reader)

# Starting with data skipping header
for item in range(1, len(fulllist)): 
    # Print each row using "item" as the index value
    print (fulllist[item])  

回答 9

因为这与我正在做的事情有关,所以我在这里分享。

如果我们不确定是否有标题并且您又不想导入嗅探器和其他内容,该怎么办?

如果您的任务是基本任务,例如打印或追加到列表或数组,则可以使用if语句:

# Let's say there's 4 columns
with open('file.csv') as csvfile:
     csvreader = csv.reader(csvfile)
# read first line
     first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
     if len(first_line) == 4:
          array.append(first_line)
# Now we'll just iterate over everything else as usual:
     for row in csvreader:
          array.append(row)

Because this is related to something I was doing, I’ll share here.

What if we’re not sure if there’s a header and you also don’t feel like importing sniffer and other things?

If your task is basic, such as printing or appending to a list or array, you could just use an if statement:

# Let's say there's 4 columns
with open('file.csv') as csvfile:
     csvreader = csv.reader(csvfile)
# read first line
     first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
     if len(first_line) == 4:
          array.append(first_line)
# Now we'll just iterate over everything else as usual:
     for row in csvreader:
          array.append(row)

回答 10

Python 3 CSV模块文档提供了以下示例:

with open('example.csv', newline='') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    # ... process CSV file contents here ...

Sniffer会尝试自动检测有关CSV文件很多东西。您需要显式调用其has_header()方法以确定文件是否具有标题行。如果是这样,则在循环CSV行时跳过第一行。您可以这样做:

if sniffer.has_header():
    for header_row in reader:
        break
for data_row in reader:
    # do something with the row

The documentation for the Python 3 CSV module provides this example:

with open('example.csv', newline='') as csvfile:
    dialect = csv.Sniffer().sniff(csvfile.read(1024))
    csvfile.seek(0)
    reader = csv.reader(csvfile, dialect)
    # ... process CSV file contents here ...

The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:

if sniffer.has_header():
    for header_row in reader:
        break
for data_row in reader:
    # do something with the row

回答 11

我将使用tail摆脱不必要的第一行:

tail -n +2 $INFIL | whatever_script.py 

I would use tail to get rid of the unwanted first line:

tail -n +2 $INFIL | whatever_script.py 

回答 12

只需添加[1:]

下面的例子:

data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**

在iPython中对我有用

just add [1:]

example below:

data = pd.read_csv("/Users/xyz/Desktop/xyxData/xyz.csv", sep=',', header=None)**[1:]**

that works for me in iPython


回答 13

的Python 3.X

处理UTF8 BOM + HEADER

令人沮丧的是,csv模块无法轻松获取标头,UTF-8 BOM(文件中的第一个字符)也存在一个错误。这仅适用于我的csv模块:

import csv

def read_csv(self, csv_path, delimiter):
    with open(csv_path, newline='', encoding='utf-8') as f:
        # https://bugs.python.org/issue7185
        # Remove UTF8 BOM.
        txt = f.read()[1:]

    # Remove header line.
    header = txt.splitlines()[:1]
    lines = txt.splitlines()[1:]

    # Convert to list.
    csv_rows = list(csv.reader(lines, delimiter=delimiter))

    for row in csv_rows:
        value = row[INDEX_HERE]

Python 3.X

Handles UTF8 BOM + HEADER

It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file). This works for me using only the csv module:

import csv

def read_csv(self, csv_path, delimiter):
    with open(csv_path, newline='', encoding='utf-8') as f:
        # https://bugs.python.org/issue7185
        # Remove UTF8 BOM.
        txt = f.read()[1:]

    # Remove header line.
    header = txt.splitlines()[:1]
    lines = txt.splitlines()[1:]

    # Convert to list.
    csv_rows = list(csv.reader(lines, delimiter=delimiter))

    for row in csv_rows:
        value = row[INDEX_HERE]

回答 14

我将csvreader转换为list,然后弹出第一个元素

import csv        

with open(fileName, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        data = list(csvreader)               # Convert to list
        data.pop(0)                          # Removes the first row

        for row in data:
            print(row)

I would convert csvreader to list, then pop the first element

import csv        

with open(fileName, 'r') as csvfile:
        csvreader = csv.reader(csvfile)
        data = list(csvreader)               # Convert to list
        data.pop(0)                          # Removes the first row

        for row in data:
            print(row)

回答 15

Python 2.x

csvreader.next()

将读者可迭代对象的下一行作为列表返回,并根据当前方言进行解析。

csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 3.x

csvreader.__next__()

返回读取器的可迭代对象的下一行作为列表(如果该对象是从reader()返回的)或字典(如果它是DictReader实例),则根据当前的方言进行解析。通常,您应该将此称为next(reader)。

csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 2.x

csvreader.next()

Return the next row of the reader’s iterable object as a list, parsed according to the current dialect.

csv_data = csv.reader(open('sample.csv'))
csv_data.next() # skip first row
for row in csv_data:
    print(row) # should print second row

Python 3.x

csvreader.__next__()

Return the next row of the reader’s iterable object as a list (if the object was returned from reader()) or a dict (if it is a DictReader instance), parsed according to the current dialect. Usually you should call this as next(reader).

csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
    print(row) # should print second row

给定一个文本文件的URL,最简单的读取文本文件内容的方法是什么?

问题:给定一个文本文件的URL,最简单的读取文本文件内容的方法是什么?

在Python中,当获得文本文件的URL时,最简单的方法是从文本文件访问内容并逐行将文件内容打印出本地而不保存文本文件的本地副本?

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?

TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc

回答 0

编辑09/2016:在Python 3及更高版本中,使用urllib.request而不是urllib2

实际上,最简单的方法是:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

正如Will所建议的,您甚至不需要“ readlines”。您甚至可以将其缩短为: *

import urllib2

for line in urllib2.urlopen(target_url):
    print line

但是请记住,在Python中,可读性很重要。

但是,这是最简单的方法,但不是安全的方法,因为在大多数情况下,使用网络编程时,您不知道预期的数据量是否会得到遵守。因此,通常最好读取固定且合理数量的数据,这足以满足您的期望,但可以防止脚本被淹没:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

* Python 3中的第二个示例:

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2

Actually the simplest way is:

import urllib2  # the lib that handles the url stuff

data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
    print line

You don’t even need “readlines”, as Will suggested. You could even shorten it to: *

import urllib2

for line in urllib2.urlopen(target_url):
    print line

But remember in Python, readability matters.

However, this is the simplest way but not the safe way because most of the time with network programming, you don’t know if the amount of data to expect will be respected. So you’d generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:

import urllib2

data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines

for line in data:
    print line

* Second example in Python 3:

import urllib.request  # the lib that handles the url stuff

for line in urllib.request.urlopen(target_url):
    print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is

回答 1

我是Python的新手,在公认的解决方案中关于Python 3的副手评论令人困惑。为了后代,在Python 3中执行此操作的代码是

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

或者

from urllib.request import urlopen
data = urlopen(target_url)

请注意,这import urllib是行不通的。

I’m a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is

import urllib.request
data = urllib.request.urlopen(target_url)

for line in data:
    ...

or alternatively

from urllib.request import urlopen
data = urlopen(target_url)

Note that just import urllib does not work.


回答 2

确实不需要逐行阅读。您可以像这样得到整个东西:

import urllib
txt = urllib.urlopen(target_url).read()

There’s really no need to read line-by-line. You can get the whole thing like this:

import urllib
txt = urllib.urlopen(target_url).read()

回答 3

请求库有一个简单的界面,并与两个Python 2和3的作品。

import requests

response = requests.get(target_url)
data = response.text

The requests library has a simpler interface and works with both Python 2 and 3.

import requests

response = requests.get(target_url)
data = response.text

回答 4

import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line
import urllib2
for line in urllib2.urlopen("http://www.myhost.com/SomeFile.txt"):
    print line

回答 5

import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l
import urllib2

f = urllib2.urlopen(target_url)
for l in f.readlines():
    print l

回答 6

Python 3中的另一种方法是使用urllib3包

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

这可能比urllib更好,因为urllib3拥有

  • 线程安全。
  • 连接池。
  • 客户端SSL / TLS验证。
  • 使用分段编码上传文件。
  • 重试请求和处理HTTP重定向的助手。
  • 支持gzip和deflate编码。
  • HTTP和SOCKS的代理支持。
  • 100%的测试覆盖率。

Another way in Python 3 is to use the urllib3 package.

import urllib3

http = urllib3.PoolManager()
response = http.request('GET', target_url)
data = response.data.decode('utf-8')

This can be a better option than urllib since urllib3 boasts having

  • Thread safety.
  • Connection pooling.
  • Client-side SSL/TLS verification.
  • File uploads with multipart encoding.
  • Helpers for retrying requests and dealing with HTTP redirects.
  • Support for gzip and deflate encoding.
  • Proxy support for HTTP and SOCKS.
  • 100% test coverage.

回答 7

对我来说,上述任何回应都没有直接起作用。相反,我必须执行以下操作(Python 3):

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

For me, none of the above responses worked straight ahead. Instead, I had to do the following (Python 3):

from urllib.request import urlopen

data = urlopen("[your url goes here]").read().decode('utf-8')

# Do what you need to do with the data.

回答 8

只需更新@ ken-kinder建议的Python 2解决方案即可用于Python 3:

import urllib
urllib.request.urlopen(target_url).read()

Just updating here solution suggested by @ken-kinder for Python 2 to work for Python 3:

import urllib
urllib.request.urlopen(target_url).read()

将字符串插入列表中而不会拆分为字符

问题:将字符串插入列表中而不会拆分为字符

我是Python的新手,如果不将其拆分成单个字符,就找不到一种将字符串插入列表的方法:

>>> list=['hello','world']
>>> list
['hello', 'world']
>>> list[:0]='foo'
>>> list
['f', 'o', 'o', 'hello', 'world']

我应该怎么做:

['foo', 'hello', 'world']

搜索了文档和网络,但这不是我的日子。

I’m new to Python and can’t find a way to insert a string into a list without it getting split into individual characters:

>>> list=['hello','world']
>>> list
['hello', 'world']
>>> list[:0]='foo'
>>> list
['f', 'o', 'o', 'hello', 'world']

What should I do to have:

['foo', 'hello', 'world']

Searched the docs and the Web, but it has not been my day.


回答 0

要添加到列表的末尾:

list.append('foo')

要在开头插入:

list.insert(0, 'foo')

To add to the end of the list:

list.append('foo')

To insert at the beginning:

list.insert(0, 'foo')

回答 1

坚持使用您要插入的方法,使用

list[:0] = ['foo']

http://docs.python.org/release/2.6.6/library/stdtypes.html#mutable-sequence-types

Sticking to the method you are using to insert it, use

list[:0] = ['foo']

http://docs.python.org/release/2.6.6/library/stdtypes.html#mutable-sequence-types


回答 2

另一种选择是使用重载+ operator

>>> l = ['hello','world']
>>> l = ['foo'] + l
>>> l
['foo', 'hello', 'world']

Another option is using the overloaded + operator:

>>> l = ['hello','world']
>>> l = ['foo'] + l
>>> l
['foo', 'hello', 'world']

回答 3

最好将方括号放在foo周围,并使用+ =

list+=['foo']

best put brackets around foo, and use +=

list+=['foo']

回答 4

>>> li = ['aaa', 'bbb']
>>> li.insert(0, 'wow!')
>>> li
['wow!', 'aaa', 'bbb']
>>> li = ['aaa', 'bbb']
>>> li.insert(0, 'wow!')
>>> li
['wow!', 'aaa', 'bbb']

回答 5

不要将list用作变量名。这是您掩盖的内在因素。

要插入,请使用列表的插入功能。

l = ['hello','world']
l.insert(0, 'foo')
print l
['foo', 'hello', 'world']

Don’t use list as a variable name. It’s a built in that you are masking.

To insert, use the insert function of lists.

l = ['hello','world']
l.insert(0, 'foo')
print l
['foo', 'hello', 'world']

回答 6

您必须添加另一个列表:

list[:0]=['foo']

You have to add another list:

list[:0]=['foo']

回答 7

ls=['hello','world']
ls.append('python')
['hello', 'world', 'python']

或(使用insert可以在列表中使用索引位置的功能)

ls.insert(0,'python')
print(ls)
['python', 'hello', 'world']
ls=['hello','world']
ls.append('python')
['hello', 'world', 'python']

or (use insert function where you can use index position in list)

ls.insert(0,'python')
print(ls)
['python', 'hello', 'world']

回答 8

我建议添加“ +”运算符,如下所示:

列表=列表+ [‘foo’]

希望能帮助到你!

I suggest to add the ‘+’ operator as follows:

list = list + [‘foo’]

Hope it helps!


有多行导入的推荐格式吗?

问题:有多行导入的推荐格式吗?

我已经阅读了在python中编码多行导入的三种方法

带斜杠:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    LEFT, DISABLED, NORMAL, RIDGE, END

复制语句:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END

带括号:

from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
    LEFT, DISABLED, NORMAL, RIDGE, END)

此语句是否有推荐的格式或更简洁的方法?

I have read there are three ways for coding multi-line imports in python

With slashes:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text, \
    LEFT, DISABLED, NORMAL, RIDGE, END

Duplicating senteces:

from Tkinter import Tk, Frame, Button, Entry, Canvas, Text
from Tkinter import LEFT, DISABLED, NORMAL, RIDGE, END

With parenthesis:

from Tkinter import (Tk, Frame, Button, Entry, Canvas, Text,
    LEFT, DISABLED, NORMAL, RIDGE, END)

Is there a recomended format or a more elegant way for this statements?


回答 0

就个人而言,导入多个组件并按字母顺序对它们进行括号处理。像这样:

from Tkinter import (
    Button,
    Canvas,
    DISABLED,
    END,
    Entry,
    Frame,
    LEFT,
    NORMAL,
    RIDGE,
    Text,
    Tk,
)

这具有额外的优势,即可以轻松查看每个提交或PR中已添加/删除了哪些组件。

总的来说,尽管这是个人喜好,但我建议您选择最适合自己的方式。

Personally I go with parentheses when importing more than one component and sort them alphabetically. Like so:

from Tkinter import (
    Button,
    Canvas,
    DISABLED,
    END,
    Entry,
    Frame,
    LEFT,
    NORMAL,
    RIDGE,
    Text,
    Tk,
)

This has the added advantage of easily seeing what components have been added / removed in each commit or PR.

Overall though it’s a personal preference and I would advise you to go with whatever looks best to you.


回答 1

您的示例似乎源于PEP 328。在那里,正是针对这个问题提出了括号符号,所以我可能会选择这个。

Your examples seem to stem from PEP 328. There, the parenthesis-notation is proposed for exactly this problem, so probably I’d choose this one.


回答 2

我将使用PEP328的括号符号,并在括号之前和之后添加换行符:

from Tkinter import (
    Tk, Frame, Button, Entry, Canvas, Text, 
    LEFT, DISABLED, NORMAL, RIDGE, END
)

这是Django使用的格式:

from django.test.client import Client, RequestFactory
from django.test.testcases import (
    LiveServerTestCase, SimpleTestCase, TestCase, TransactionTestCase,
    skipIfDBFeature, skipUnlessAnyDBFeature, skipUnlessDBFeature,
)
from django.test.utils import (
    ignore_warnings, modify_settings, override_settings,
    override_system_checks, tag,
)

I would go with the parenthesis notation from the PEP328 with newlines added before and after parentheses:

from Tkinter import (
    Tk, Frame, Button, Entry, Canvas, Text, 
    LEFT, DISABLED, NORMAL, RIDGE, END
)

This is the format which Django uses:

from django.test.client import Client, RequestFactory
from django.test.testcases import (
    LiveServerTestCase, SimpleTestCase, TestCase, TransactionTestCase,
    skipIfDBFeature, skipUnlessAnyDBFeature, skipUnlessDBFeature,
)
from django.test.utils import (
    ignore_warnings, modify_settings, override_settings,
    override_system_checks, tag,
)

回答 3

通常对于Tkinter,只使用from Tkinter import *模块就可以了,因为该模块只会导出明显是小部件的名称。

PEP 8没有列出这种情况下的任何约定,因此我想您应该决定什么是最佳选择。一切都与可读性有关,因此请选择任何可以使您清楚地表明要从单个模块导入内容的东西。

由于在您的范围内提供了所有这些名称,因此我个人认为选项2最清晰,因为您可以看到最佳的导入名称。然后,您甚至可以将其拆分成更多的部分,以将彼此属于的那些名称组合在一起。在您的示例中,我可能将和分开放置Tk,因为它们将小部件分组在一起,而将和分开放置则是它们在视图中较小的组件。FrameCanvasButtonText

Usually with Tkinter, it is okay to just use from Tkinter import * as the module will only export names that are clearly widgets.

PEP 8 does not list any conventions for such a case, so I guess it is up to you to decide what is the best option. It is all about readability, so choose whatever makes it clear that you are importing stuff from a single module.

As all those names are made available in your scope, I personally think that options 2 is the most clearest as you can see the imported names the best. You then could even split it up more to maybe group those names together that belong with each other. In your example I might put Tk, Frame and Canvas separately as they group widgets together, while having Button and Text separately as they are smaller components in a view.


评估字符串中的数学表达式

问题:评估字符串中的数学表达式

stringExp = "2^4"
intVal = int(stringExp)      # Expected value: 16

这将返回以下错误:

Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'

我知道eval可以解决此问题,但是难道没有更好,更重要的是更安全的方法来评估存储在字符串中的数学表达式吗?

stringExp = "2^4"
intVal = int(stringExp)      # Expected value: 16

This returns the following error:

Traceback (most recent call last):  
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'

I know that eval can work around this, but isn’t there a better and – more importantly – safer method to evaluate a mathematical expression that is being stored in a string?


回答 0

Pyparsing可用于解析数学表达式。特别是,fourFn.py 显示了如何解析基本算术表达式。在下面,我将fourFn重新包装为一个数值解析器类,以便于重用。

from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
                       ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator

__author__ = 'Paul McGuire'
__version__ = '$Revision: 0.0 $'
__date__ = '$Date: 2009-03-20 $'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''


class NumericStringParser(object):
    '''
    Most of this code comes from the fourFn.py pyparsing example

    '''

    def pushFirst(self, strg, loc, toks):
        self.exprStack.append(toks[0])

    def pushUMinus(self, strg, loc, toks):
        if toks and toks[0] == '-':
            self.exprStack.append('unary -')

    def __init__(self):
        """
        expop   :: '^'
        multop  :: '*' | '/'
        addop   :: '+' | '-'
        integer :: ['+' | '-'] '0'..'9'+
        atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
        factor  :: atom [ expop factor ]*
        term    :: factor [ multop factor ]*
        expr    :: term [ addop term ]*
        """
        point = Literal(".")
        e = CaselessLiteral("E")
        fnumber = Combine(Word("+-" + nums, nums) +
                          Optional(point + Optional(Word(nums))) +
                          Optional(e + Word("+-" + nums, nums)))
        ident = Word(alphas, alphas + nums + "_$")
        plus = Literal("+")
        minus = Literal("-")
        mult = Literal("*")
        div = Literal("/")
        lpar = Literal("(").suppress()
        rpar = Literal(")").suppress()
        addop = plus | minus
        multop = mult | div
        expop = Literal("^")
        pi = CaselessLiteral("PI")
        expr = Forward()
        atom = ((Optional(oneOf("- +")) +
                 (ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
                | Optional(oneOf("- +")) + Group(lpar + expr + rpar)
                ).setParseAction(self.pushUMinus)
        # by defining exponentiation as "atom [ ^ factor ]..." instead of
        # "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
        # that is, 2^3^2 = 2^(3^2), not (2^3)^2.
        factor = Forward()
        factor << atom + \
            ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
        term = factor + \
            ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
        expr << term + \
            ZeroOrMore((addop + term).setParseAction(self.pushFirst))
        # addop_term = ( addop + term ).setParseAction( self.pushFirst )
        # general_term = term + ZeroOrMore( addop_term ) | OneOrMore( addop_term)
        # expr <<  general_term
        self.bnf = expr
        # map operator symbols to corresponding arithmetic operations
        epsilon = 1e-12
        self.opn = {"+": operator.add,
                    "-": operator.sub,
                    "*": operator.mul,
                    "/": operator.truediv,
                    "^": operator.pow}
        self.fn = {"sin": math.sin,
                   "cos": math.cos,
                   "tan": math.tan,
                   "exp": math.exp,
                   "abs": abs,
                   "trunc": lambda a: int(a),
                   "round": round,
                   "sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}

    def evaluateStack(self, s):
        op = s.pop()
        if op == 'unary -':
            return -self.evaluateStack(s)
        if op in "+-*/^":
            op2 = self.evaluateStack(s)
            op1 = self.evaluateStack(s)
            return self.opn[op](op1, op2)
        elif op == "PI":
            return math.pi  # 3.1415926535
        elif op == "E":
            return math.e  # 2.718281828
        elif op in self.fn:
            return self.fn[op](self.evaluateStack(s))
        elif op[0].isalpha():
            return 0
        else:
            return float(op)

    def eval(self, num_string, parseAll=True):
        self.exprStack = []
        results = self.bnf.parseString(num_string, parseAll)
        val = self.evaluateStack(self.exprStack[:])
        return val

你可以这样使用

nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0

result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872

Pyparsing can be used to parse mathematical expressions. In particular, fourFn.py shows how to parse basic arithmetic expressions. Below, I’ve rewrapped fourFn into a numeric parser class for easier reuse.

from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
                       ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator

__author__ = 'Paul McGuire'
__version__ = '$Revision: 0.0 $'
__date__ = '$Date: 2009-03-20 $'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''


class NumericStringParser(object):
    '''
    Most of this code comes from the fourFn.py pyparsing example

    '''

    def pushFirst(self, strg, loc, toks):
        self.exprStack.append(toks[0])

    def pushUMinus(self, strg, loc, toks):
        if toks and toks[0] == '-':
            self.exprStack.append('unary -')

    def __init__(self):
        """
        expop   :: '^'
        multop  :: '*' | '/'
        addop   :: '+' | '-'
        integer :: ['+' | '-'] '0'..'9'+
        atom    :: PI | E | real | fn '(' expr ')' | '(' expr ')'
        factor  :: atom [ expop factor ]*
        term    :: factor [ multop factor ]*
        expr    :: term [ addop term ]*
        """
        point = Literal(".")
        e = CaselessLiteral("E")
        fnumber = Combine(Word("+-" + nums, nums) +
                          Optional(point + Optional(Word(nums))) +
                          Optional(e + Word("+-" + nums, nums)))
        ident = Word(alphas, alphas + nums + "_$")
        plus = Literal("+")
        minus = Literal("-")
        mult = Literal("*")
        div = Literal("/")
        lpar = Literal("(").suppress()
        rpar = Literal(")").suppress()
        addop = plus | minus
        multop = mult | div
        expop = Literal("^")
        pi = CaselessLiteral("PI")
        expr = Forward()
        atom = ((Optional(oneOf("- +")) +
                 (ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
                | Optional(oneOf("- +")) + Group(lpar + expr + rpar)
                ).setParseAction(self.pushUMinus)
        # by defining exponentiation as "atom [ ^ factor ]..." instead of
        # "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
        # that is, 2^3^2 = 2^(3^2), not (2^3)^2.
        factor = Forward()
        factor << atom + \
            ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
        term = factor + \
            ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
        expr << term + \
            ZeroOrMore((addop + term).setParseAction(self.pushFirst))
        # addop_term = ( addop + term ).setParseAction( self.pushFirst )
        # general_term = term + ZeroOrMore( addop_term ) | OneOrMore( addop_term)
        # expr <<  general_term
        self.bnf = expr
        # map operator symbols to corresponding arithmetic operations
        epsilon = 1e-12
        self.opn = {"+": operator.add,
                    "-": operator.sub,
                    "*": operator.mul,
                    "/": operator.truediv,
                    "^": operator.pow}
        self.fn = {"sin": math.sin,
                   "cos": math.cos,
                   "tan": math.tan,
                   "exp": math.exp,
                   "abs": abs,
                   "trunc": lambda a: int(a),
                   "round": round,
                   "sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}

    def evaluateStack(self, s):
        op = s.pop()
        if op == 'unary -':
            return -self.evaluateStack(s)
        if op in "+-*/^":
            op2 = self.evaluateStack(s)
            op1 = self.evaluateStack(s)
            return self.opn[op](op1, op2)
        elif op == "PI":
            return math.pi  # 3.1415926535
        elif op == "E":
            return math.e  # 2.718281828
        elif op in self.fn:
            return self.fn[op](self.evaluateStack(s))
        elif op[0].isalpha():
            return 0
        else:
            return float(op)

    def eval(self, num_string, parseAll=True):
        self.exprStack = []
        results = self.bnf.parseString(num_string, parseAll)
        val = self.evaluateStack(self.exprStack[:])
        return val

You can use it like this

nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0

result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872

回答 1

eval 是邪恶的

eval("__import__('os').remove('important file')") # arbitrary commands
eval("9**9**9**9**9**9**9**9", {'__builtins__': None}) # CPU, memory

注意:即使您使用set __builtins__None使用内省也可能爆发:

eval('(1).__class__.__bases__[0].__subclasses__()', {'__builtins__': None})

使用以下方法评估算术表达式 ast

import ast
import operator as op

# supported operators
operators = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
             ast.Div: op.truediv, ast.Pow: op.pow, ast.BitXor: op.xor,
             ast.USub: op.neg}

def eval_expr(expr):
    """
    >>> eval_expr('2^6')
    4
    >>> eval_expr('2**6')
    64
    >>> eval_expr('1 + 2*3**(4^5) / (6 + -7)')
    -5.0
    """
    return eval_(ast.parse(expr, mode='eval').body)

def eval_(node):
    if isinstance(node, ast.Num): # <number>
        return node.n
    elif isinstance(node, ast.BinOp): # <left> <operator> <right>
        return operators[type(node.op)](eval_(node.left), eval_(node.right))
    elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
        return operators[type(node.op)](eval_(node.operand))
    else:
        raise TypeError(node)

您可以轻松限制每个操作或任何中间结果的允许范围,例如,限制以下项的输入参数a**b

def power(a, b):
    if any(abs(n) > 100 for n in [a, b]):
        raise ValueError((a,b))
    return op.pow(a, b)
operators[ast.Pow] = power

或限制中间结果的大小:

import functools

def limit(max_=None):
    """Return decorator that limits allowed returned values."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            ret = func(*args, **kwargs)
            try:
                mag = abs(ret)
            except TypeError:
                pass # not applicable
            else:
                if mag > max_:
                    raise ValueError(ret)
            return ret
        return wrapper
    return decorator

eval_ = limit(max_=10**100)(eval_)

>>> evil = "__import__('os').remove('important file')"
>>> eval_expr(evil) #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
TypeError:
>>> eval_expr("9**9")
387420489
>>> eval_expr("9**9**9**9**9**9**9**9") #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:

eval is evil

eval("__import__('os').remove('important file')") # arbitrary commands
eval("9**9**9**9**9**9**9**9", {'__builtins__': None}) # CPU, memory

Note: even if you use set __builtins__ to None it still might be possible to break out using introspection:

eval('(1).__class__.__bases__[0].__subclasses__()', {'__builtins__': None})

Evaluate arithmetic expression using ast

import ast
import operator as op

# supported operators
operators = {ast.Add: op.add, ast.Sub: op.sub, ast.Mult: op.mul,
             ast.Div: op.truediv, ast.Pow: op.pow, ast.BitXor: op.xor,
             ast.USub: op.neg}

def eval_expr(expr):
    """
    >>> eval_expr('2^6')
    4
    >>> eval_expr('2**6')
    64
    >>> eval_expr('1 + 2*3**(4^5) / (6 + -7)')
    -5.0
    """
    return eval_(ast.parse(expr, mode='eval').body)

def eval_(node):
    if isinstance(node, ast.Num): # <number>
        return node.n
    elif isinstance(node, ast.BinOp): # <left> <operator> <right>
        return operators[type(node.op)](eval_(node.left), eval_(node.right))
    elif isinstance(node, ast.UnaryOp): # <operator> <operand> e.g., -1
        return operators[type(node.op)](eval_(node.operand))
    else:
        raise TypeError(node)

You can easily limit allowed range for each operation or any intermediate result, e.g., to limit input arguments for a**b:

def power(a, b):
    if any(abs(n) > 100 for n in [a, b]):
        raise ValueError((a,b))
    return op.pow(a, b)
operators[ast.Pow] = power

Or to limit magnitude of intermediate results:

import functools

def limit(max_=None):
    """Return decorator that limits allowed returned values."""
    def decorator(func):
        @functools.wraps(func)
        def wrapper(*args, **kwargs):
            ret = func(*args, **kwargs)
            try:
                mag = abs(ret)
            except TypeError:
                pass # not applicable
            else:
                if mag > max_:
                    raise ValueError(ret)
            return ret
        return wrapper
    return decorator

eval_ = limit(max_=10**100)(eval_)

Example

>>> evil = "__import__('os').remove('important file')"
>>> eval_expr(evil) #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
TypeError:
>>> eval_expr("9**9")
387420489
>>> eval_expr("9**9**9**9**9**9**9**9") #doctest:+IGNORE_EXCEPTION_DETAIL
Traceback (most recent call last):
...
ValueError:

回答 2

eval()*的一些更安全的选择:sympy.sympify().evalf()

*sympify根据文档中的以下警告,SymPy 也不安全。

警告:请注意,此函数使用eval,因此不应该在未过滤的输入上使用。

Some safer alternatives to eval() and sympy.sympify().evalf()*:

*SymPy sympify is also unsafe according to the following warning from the documentation.

Warning: Note that this function uses eval, and thus shouldn’t be used on unsanitized input.


回答 3

好的,所以eval的问题在于,即使您摆脱,它也很容易逃脱其沙箱__builtins__。逃避沙箱的所有方法归结为使用getattrobject.__getattribute__(通过.操作员)通过某个允许的对象(''.__class__.__bases__[0].__subclasses__或类似对象)获得对某个危险对象的引用。 getattr通过设置消除__builtins__Noneobject.__getattribute__之所以困难,是因为它不能简单地删除,这既是因为它object是不可变的,又是因为删除它会破坏一切。但是,__getattribute__只能通过.操作员进行访问,因此清除您的输入内容就足以确保eval无法逃脱其沙箱。
在处理公式中,十进制的唯一有效用法是在十进制之前或之后[0-9],因此我们只删除的所有其他实例.

import re
inp = re.sub(r"\.(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})

请注意,虽然python通常将1 + 1.视为1 + 1.0,但这会删除尾部.并留下1 + 1。您可以将)和添加EOF到允许遵循的事物列表中.,但是为什么要麻烦呢?

Okay, so the problem with eval is that it can escape its sandbox too easily, even if you get rid of __builtins__. All the methods for escaping the sandbox come down to using getattr or object.__getattribute__ (via the . operator) to obtain a reference to some dangerous object via some allowed object (''.__class__.__bases__[0].__subclasses__ or similar). getattr is eliminated by setting __builtins__ to None. object.__getattribute__ is the difficult one, since it cannot simply be removed, both because object is immutable and because removing it would break everything. However, __getattribute__ is only accessible via the . operator, so purging that from your input is sufficient to ensure eval cannot escape its sandbox.
In processing formulas, the only valid use of a decimal is when it is preceded or followed by [0-9], so we just remove all other instances of ..

import re
inp = re.sub(r"\.(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})

Note that while python normally treats 1 + 1. as 1 + 1.0, this will remove the trailing . and leave you with 1 + 1. You could add ),, and EOF to the list of things allowed to follow ., but why bother?


回答 4

您可以使用ast模块并编写一个NodeVisitor,以验证每个节点的类型是否属于白名单。

import ast, math

locals =  {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})

class Visitor(ast.NodeVisitor):
    def visit(self, node):
       if not isinstance(node, self.whitelist):
           raise ValueError(node)
       return super().visit(node)

    whitelist = (ast.Module, ast.Expr, ast.Load, ast.Expression, ast.Add, ast.Sub, ast.UnaryOp, ast.Num, ast.BinOp,
            ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
            ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)

def evaluate(expr, locals = {}):
    if any(elem in expr for elem in '\n#') : raise ValueError(expr)
    try:
        node = ast.parse(expr.strip(), mode='eval')
        Visitor().visit(node)
        return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
    except Exception: raise ValueError(expr)

因为它通过白名单而不是黑名单工作,所以它是安全的。它可以访问的唯一函数和变量是您明确为其提供访问权限的函数和变量。我在字典中填充了与数学相关的函数,因此您可以根据需要轻松访问这些函数,但是必须显式使用它。

如果字符串尝试调用未提供的函数或调用任何方法,则将引发异常,并且该异常将不会执行。

因为它使用Python内置的解析器和评估器,所以它也继承了Python的优先级和升级规则。

>>> evaluate("7 + 9 * (2 << 2)")
79
>>> evaluate("6 // 2 + 0.0")
3.0

上面的代码仅在Python 3上进行了测试。

如果需要,可以在此函数上添加超时装饰器。

You can use the ast module and write a NodeVisitor that verifies that the type of each node is part of a whitelist.

import ast, math

locals =  {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})

class Visitor(ast.NodeVisitor):
    def visit(self, node):
       if not isinstance(node, self.whitelist):
           raise ValueError(node)
       return super().visit(node)

    whitelist = (ast.Module, ast.Expr, ast.Load, ast.Expression, ast.Add, ast.Sub, ast.UnaryOp, ast.Num, ast.BinOp,
            ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
            ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)

def evaluate(expr, locals = {}):
    if any(elem in expr for elem in '\n#') : raise ValueError(expr)
    try:
        node = ast.parse(expr.strip(), mode='eval')
        Visitor().visit(node)
        return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
    except Exception: raise ValueError(expr)

Because it works via a whitelist rather than a blacklist, it is safe. The only functions and variables it can access are those you explicitly give it access to. I populated a dict with math-related functions so you can easily provide access to those if you want, but you have to explicitly use it.

If the string attempts to call functions that haven’t been provided, or invoke any methods, an exception will be raised, and it will not be executed.

Because this uses Python’s built in parser and evaluator, it also inherits Python’s precedence and promotion rules as well.

>>> evaluate("7 + 9 * (2 << 2)")
79
>>> evaluate("6 // 2 + 0.0")
3.0

The above code has only been tested on Python 3.

If desired, you can add a timeout decorator on this function.


回答 5

究其原因evalexec这么危险的是,默认的compile功能会产生字节码的任何有效的Python表达式,而默认evalexec将执行任何有效的Python字节码。迄今为止,所有答案都集中在限制可以生成的字节码(通过清理输入)或使用AST构建自己的域特定语言。

相反,您可以轻松创建一个简单的eval函数,该函数不能执行任何有害的操作,并且可以轻松地对内存或所用时间进行运行时检查。当然,如果这是简单的数学运算,那将是捷径。

c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
    return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]

它的工作方式很简单,在编译过程中可以安全地评估任何常数数学表达式并将其存储为常数。由compile返回的代码对象由组成d,它是的字节码LOAD_CONST,其后是要加载的常量的编号(通常是列表中的最后一个),然后是S,它是用于加载的常量的字节码。RETURN_VALUE。如果此快捷方式不起作用,则表示用户输入不是常量表达式(包含变量或函数调用或类似内容)。

这也为一些更复杂的输入格式打开了大门。例如:

stringExp = "1 + cos(2)"

这实际上需要评估字节码,这仍然非常简单。Python字节码是一种面向堆栈的语言,因此所有事情都是简单的事情TOS=stack.pop(); op(TOS); stack.put(TOS)或类似的事情。关键是只能实现安全的操作码(加载/存储值,数学运算,返回值),而不是不安全的操作码(属性查找)。如果您希望用户能够调用函数(不使用上面的快捷方式的全部原因),可以简单地使您的实现CALL_FUNCTION仅允许“安全”列表中的函数。

from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator

globs = {'sin':sin, 'cos':cos}
safe = globs.values()

stack = LifoQueue()

class BINARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get(),stack.get()))

class UNARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get()))


def CALL_FUNCTION(context, arg):
    argc = arg[0]+arg[1]*256
    args = [stack.get() for i in range(argc)]
    func = stack.get()
    if func not in safe:
        raise TypeError("Function %r now allowed"%func)
    stack.put(func(*args))

def LOAD_CONST(context, arg):
    cons = arg[0]+arg[1]*256
    stack.put(context['code'].co_consts[cons])

def LOAD_NAME(context, arg):
    name_num = arg[0]+arg[1]*256
    name = context['code'].co_names[name_num]
    if name in context['locals']:
        stack.put(context['locals'][name])
    else:
        stack.put(context['globals'][name])

def RETURN_VALUE(context):
    return stack.get()

opfuncs = {
    opmap['BINARY_ADD']: BINARY(operator.add),
    opmap['UNARY_INVERT']: UNARY(operator.invert),
    opmap['CALL_FUNCTION']: CALL_FUNCTION,
    opmap['LOAD_CONST']: LOAD_CONST,
    opmap['LOAD_NAME']: LOAD_NAME
    opmap['RETURN_VALUE']: RETURN_VALUE,
}

def VMeval(c):
    context = dict(locals={}, globals=globs, code=c)
    bci = iter(c.co_code)
    for bytecode in bci:
        func = opfuncs[ord(bytecode)]
        if func.func_code.co_argcount==1:
            ret = func(context)
        else:
            args = ord(bci.next()), ord(bci.next())
            ret = func(context, args)
        if ret:
            return ret

def evaluate(expr):
    return VMeval(compile(expr, 'userinput', 'eval'))

显然,此版本的实际版本会更长一些(有119个操作码,其中24个与数学相关)。添加STORE_FAST和另外两个将允许'x=5;return x+x轻松地进行类似或类似的输入。它甚至可以用来执行用户创建的函数,只要用户创建的函数本身是通过VMeval执行的(不要使它们可调用!!!否则它们可能被用作某个地方的回调)。处理循环需要对goto字节码的支持,这意味着从for迭代器更改为最明显的)。while并维护指向当前指令的指针,但这并不难。为了抵制DOS,主循环应检查自计算开始以来经过了多少时间,并且某些运算符应拒绝超出合理范围的输入(BINARY_POWER

尽管此方法比简单表达式的简单语法解析器要长一些(请参见上文,仅获取编译的常量),但它可以轻松扩展到更复杂的输入,并且不需要处理语法(compile将任意复杂的事物简化为一系列简单的说明)。

The reason eval and exec are so dangerous is that the default compile function will generate bytecode for any valid python expression, and the default eval or exec will execute any valid python bytecode. All the answers to date have focused on restricting the bytecode that can be generated (by sanitizing input) or building your own domain-specific-language using the AST.

Instead, you can easily create a simple eval function that is incapable of doing anything nefarious and can easily have runtime checks on memory or time used. Of course, if it is simple math, than there is a shortcut.

c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
    return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]

The way this works is simple, any constant mathematic expression is safely evaluated during compilation and stored as a constant. The code object returned by compile consists of d, which is the bytecode for LOAD_CONST, followed by the number of the constant to load (usually the last one in the list), followed by S, which is the bytecode for RETURN_VALUE. If this shortcut doesn’t work, it means that the user input isn’t a constant expression (contains a variable or function call or similar).

This also opens the door to some more sophisticated input formats. For example:

stringExp = "1 + cos(2)"

This requires actually evaluating the bytecode, which is still quite simple. Python bytecode is a stack oriented language, so everything is a simple matter of TOS=stack.pop(); op(TOS); stack.put(TOS) or similar. The key is to only implement the opcodes that are safe (loading/storing values, math operations, returning values) and not unsafe ones (attribute lookup). If you want the user to be able to call functions (the whole reason not to use the shortcut above), simple make your implementation of CALL_FUNCTION only allow functions in a ‘safe’ list.

from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator

globs = {'sin':sin, 'cos':cos}
safe = globs.values()

stack = LifoQueue()

class BINARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get(),stack.get()))

class UNARY(object):
    def __init__(self, operator):
        self.op=operator
    def __call__(self, context):
        stack.put(self.op(stack.get()))


def CALL_FUNCTION(context, arg):
    argc = arg[0]+arg[1]*256
    args = [stack.get() for i in range(argc)]
    func = stack.get()
    if func not in safe:
        raise TypeError("Function %r now allowed"%func)
    stack.put(func(*args))

def LOAD_CONST(context, arg):
    cons = arg[0]+arg[1]*256
    stack.put(context['code'].co_consts[cons])

def LOAD_NAME(context, arg):
    name_num = arg[0]+arg[1]*256
    name = context['code'].co_names[name_num]
    if name in context['locals']:
        stack.put(context['locals'][name])
    else:
        stack.put(context['globals'][name])

def RETURN_VALUE(context):
    return stack.get()

opfuncs = {
    opmap['BINARY_ADD']: BINARY(operator.add),
    opmap['UNARY_INVERT']: UNARY(operator.invert),
    opmap['CALL_FUNCTION']: CALL_FUNCTION,
    opmap['LOAD_CONST']: LOAD_CONST,
    opmap['LOAD_NAME']: LOAD_NAME
    opmap['RETURN_VALUE']: RETURN_VALUE,
}

def VMeval(c):
    context = dict(locals={}, globals=globs, code=c)
    bci = iter(c.co_code)
    for bytecode in bci:
        func = opfuncs[ord(bytecode)]
        if func.func_code.co_argcount==1:
            ret = func(context)
        else:
            args = ord(bci.next()), ord(bci.next())
            ret = func(context, args)
        if ret:
            return ret

def evaluate(expr):
    return VMeval(compile(expr, 'userinput', 'eval'))

Obviously, the real version of this would be a bit longer (there are 119 opcodes, 24 of which are math related). Adding STORE_FAST and a couple others would allow for input like 'x=5;return x+x or similar, trivially easily. It can even be used to execute user-created functions, so long as the user created functions are themselves executed via VMeval (don’t make them callable!!! or they could get used as a callback somewhere). Handling loops requires support for the goto bytecodes, which means changing from a for iterator to while and maintaining a pointer to the current instruction, but isn’t too hard. For resistance to DOS, the main loop should check how much time has passed since the start of the calculation, and certain operators should deny input over some reasonable limit (BINARY_POWER being the most obvious).

While this approach is somewhat longer than a simple grammar parser for simple expressions (see above about just grabbing the compiled constant), it extends easily to more complicated input, and doesn’t require dealing with grammar (compile take anything arbitrarily complicated and reduces it to a sequence of simple instructions).


回答 6

我想我会使用eval(),但首先要检查以确保该字符串是有效的数学表达式,而不是恶意的表达式。您可以使用正则表达式进行验证。

eval() 还采用了其他参数,您可以使用这些参数来限制其操作的命名空间,以提高安全性。

I think I would use eval(), but would first check to make sure the string is a valid mathematical expression, as opposed to something malicious. You could use a regex for the validation.

eval() also takes additional arguments which you can use to restrict the namespace it operates in for greater security.


回答 7

这是一个很晚的答复,但我认为对将来的参考很有用。您可以使用SymPy,而不是编写自己的数学解析器(尽管上面的pyparsing示例很棒)。我没有很多经验,但是它包含的数学引擎比任何人都可能为特定应用程序编写的函数强大得多,并且基本表达式评估非常简单:

>>> import sympy
>>> x, y, z = sympy.symbols('x y z')
>>> sympy.sympify("x**3 + sin(y)").evalf(subs={x:1, y:-3})
0.858879991940133

确实很酷!A from sympy import *带来了更多的功能支持,例如触发功能,特殊功能等,但是我在这里避免了显示来自何处的信息。

This is a massively late reply, but I think useful for future reference. Rather than write your own math parser (although the pyparsing example above is great) you could use SymPy. I don’t have a lot of experience with it, but it contains a much more powerful math engine than anyone is likely to write for a specific application and the basic expression evaluation is very easy:

>>> import sympy
>>> x, y, z = sympy.symbols('x y z')
>>> sympy.sympify("x**3 + sin(y)").evalf(subs={x:1, y:-3})
0.858879991940133

Very cool indeed! A from sympy import * brings in a lot more function support, such as trig functions, special functions, etc., but I’ve avoided that here to show what’s coming from where.


回答 8

[我知道这是一个老问题,但是当它们弹出时,有必要指出新的有用的解决方案]

自python3.6起,此功能现已内置于语言中,即“ f-strings”

请参阅:PEP 498-文字字符串插值

例如(注意f前缀):

f'{2**4}'
=> '16'

[I know this is an old question, but it is worth pointing out new useful solutions as they pop up]

Since python3.6, this capability is now built into the language, coined “f-strings”.

See: PEP 498 — Literal String Interpolation

For example (note the f prefix):

f'{2**4}'
=> '16'

回答 9

使用eval在干净的命名空间:

>>> ns = {'__builtins__': None}
>>> eval('2 ** 4', ns)
16

干净的命名空间应防止注入。例如:

>>> eval('__builtins__.__import__("os").system("echo got through")', ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute '__import__'

否则,您将获得:

>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0

您可能要授予对math模块的访问权限:

>>> import math
>>> ns = vars(math).copy()
>>> ns['__builtins__'] = None
>>> eval('cos(pi/3)', ns)
0.50000000000000011

Use eval in a clean namespace:

>>> ns = {'__builtins__': None}
>>> eval('2 ** 4', ns)
16

The clean namespace should prevent injection. For instance:

>>> eval('__builtins__.__import__("os").system("echo got through")', ns)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute '__import__'

Otherwise you would get:

>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0

You might want to give access to the math module:

>>> import math
>>> ns = vars(math).copy()
>>> ns['__builtins__'] = None
>>> eval('cos(pi/3)', ns)
0.50000000000000011

回答 10

这是我不使用eval即可解决问题的方法。适用于Python2和Python3。它不适用于负数。

$ python -m pytest test.py

test.py

from solution import Solutions

class SolutionsTestCase(unittest.TestCase):
    def setUp(self):
        self.solutions = Solutions()

    def test_evaluate(self):
        expressions = [
            '2+3=5',
            '6+4/2*2=10',
            '3+2.45/8=3.30625',
            '3**3*3/3+3=30',
            '2^4=6'
        ]
        results = [x.split('=')[1] for x in expressions]
        for e in range(len(expressions)):
            if '.' in results[e]:
                results[e] = float(results[e])
            else:
                results[e] = int(results[e])
            self.assertEqual(
                results[e],
                self.solutions.evaluate(expressions[e])
            )

solution.py

class Solutions(object):
    def evaluate(self, exp):
        def format(res):
            if '.' in res:
                try:
                    res = float(res)
                except ValueError:
                    pass
            else:
                try:
                    res = int(res)
                except ValueError:
                    pass
            return res
        def splitter(item, op):
            mul = item.split(op)
            if len(mul) == 2:
                for x in ['^', '*', '/', '+', '-']:
                    if x in mul[0]:
                        mul = [mul[0].split(x)[1], mul[1]]
                    if x in mul[1]:
                        mul = [mul[0], mul[1].split(x)[0]]
            elif len(mul) > 2:
                pass
            else:
                pass
            for x in range(len(mul)):
                mul[x] = format(mul[x])
            return mul
        exp = exp.replace(' ', '')
        if '=' in exp:
            res = exp.split('=')[1]
            res = format(res)
            exp = exp.replace('=%s' % res, '')
        while '^' in exp:
            if '^' in exp:
                itm = splitter(exp, '^')
                res = itm[0] ^ itm[1]
                exp = exp.replace('%s^%s' % (str(itm[0]), str(itm[1])), str(res))
        while '**' in exp:
            if '**' in exp:
                itm = splitter(exp, '**')
                res = itm[0] ** itm[1]
                exp = exp.replace('%s**%s' % (str(itm[0]), str(itm[1])), str(res))
        while '/' in exp:
            if '/' in exp:
                itm = splitter(exp, '/')
                res = itm[0] / itm[1]
                exp = exp.replace('%s/%s' % (str(itm[0]), str(itm[1])), str(res))
        while '*' in exp:
            if '*' in exp:
                itm = splitter(exp, '*')
                res = itm[0] * itm[1]
                exp = exp.replace('%s*%s' % (str(itm[0]), str(itm[1])), str(res))
        while '+' in exp:
            if '+' in exp:
                itm = splitter(exp, '+')
                res = itm[0] + itm[1]
                exp = exp.replace('%s+%s' % (str(itm[0]), str(itm[1])), str(res))
        while '-' in exp:
            if '-' in exp:
                itm = splitter(exp, '-')
                res = itm[0] - itm[1]
                exp = exp.replace('%s-%s' % (str(itm[0]), str(itm[1])), str(res))

        return format(exp)

Here’s my solution to the problem without using eval. Works with Python2 and Python3. It doesn’t work with negative numbers.

$ python -m pytest test.py

test.py

from solution import Solutions

class SolutionsTestCase(unittest.TestCase):
    def setUp(self):
        self.solutions = Solutions()

    def test_evaluate(self):
        expressions = [
            '2+3=5',
            '6+4/2*2=10',
            '3+2.45/8=3.30625',
            '3**3*3/3+3=30',
            '2^4=6'
        ]
        results = [x.split('=')[1] for x in expressions]
        for e in range(len(expressions)):
            if '.' in results[e]:
                results[e] = float(results[e])
            else:
                results[e] = int(results[e])
            self.assertEqual(
                results[e],
                self.solutions.evaluate(expressions[e])
            )

solution.py

class Solutions(object):
    def evaluate(self, exp):
        def format(res):
            if '.' in res:
                try:
                    res = float(res)
                except ValueError:
                    pass
            else:
                try:
                    res = int(res)
                except ValueError:
                    pass
            return res
        def splitter(item, op):
            mul = item.split(op)
            if len(mul) == 2:
                for x in ['^', '*', '/', '+', '-']:
                    if x in mul[0]:
                        mul = [mul[0].split(x)[1], mul[1]]
                    if x in mul[1]:
                        mul = [mul[0], mul[1].split(x)[0]]
            elif len(mul) > 2:
                pass
            else:
                pass
            for x in range(len(mul)):
                mul[x] = format(mul[x])
            return mul
        exp = exp.replace(' ', '')
        if '=' in exp:
            res = exp.split('=')[1]
            res = format(res)
            exp = exp.replace('=%s' % res, '')
        while '^' in exp:
            if '^' in exp:
                itm = splitter(exp, '^')
                res = itm[0] ^ itm[1]
                exp = exp.replace('%s^%s' % (str(itm[0]), str(itm[1])), str(res))
        while '**' in exp:
            if '**' in exp:
                itm = splitter(exp, '**')
                res = itm[0] ** itm[1]
                exp = exp.replace('%s**%s' % (str(itm[0]), str(itm[1])), str(res))
        while '/' in exp:
            if '/' in exp:
                itm = splitter(exp, '/')
                res = itm[0] / itm[1]
                exp = exp.replace('%s/%s' % (str(itm[0]), str(itm[1])), str(res))
        while '*' in exp:
            if '*' in exp:
                itm = splitter(exp, '*')
                res = itm[0] * itm[1]
                exp = exp.replace('%s*%s' % (str(itm[0]), str(itm[1])), str(res))
        while '+' in exp:
            if '+' in exp:
                itm = splitter(exp, '+')
                res = itm[0] + itm[1]
                exp = exp.replace('%s+%s' % (str(itm[0]), str(itm[1])), str(res))
        while '-' in exp:
            if '-' in exp:
                itm = splitter(exp, '-')
                res = itm[0] - itm[1]
                exp = exp.replace('%s-%s' % (str(itm[0]), str(itm[1])), str(res))

        return format(exp)

有趣好用的Python教程

退出移动版
微信支付
请使用 微信 扫码支付