Operators like <= in Python are generally not overriden to mean something significantly different than “less than or equal to”. It’s unusual for the standard library does this–it smells like legacy API to me.
Use the equivalent and more clearly-named method, set.issubset. Note that you don’t need to convert the argument to a set; it’ll do that for you if needed.
I would probably use set in the following manner :
set(l).issuperset(set(['a','b']))
or the other way round :
set(['a','b']).issubset(set(l))
I find it a bit more readable, but it may be over-kill. Sets are particularly useful to compute union/intersection/differences between collections, but it may not be the best option in this situation …
I like these two because they seem the most logical, the latter being shorter and probably fastest (shown here using set literal syntax which has been backported to Python 2.7):
all(x in {'a', 'b', 'c'} for x in ['a', 'b'])
# or
{'a', 'b'}.issubset({'a', 'b', 'c'})
回答 3
如果您的列表包含这样的重复项,该怎么办:
v1 =['s','h','e','e','p']
v2 =['s','s','h']
集不包含重复项。因此,以下行返回True。
set(v2).issubset(v1)
要计算重复项,可以使用以下代码:
v1 = sorted(v1)
v2 = sorted(v2)def is_subseq(v2, v1):"""Check whether v2 is a subsequence of v1."""
it = iter(v1)return all(c in it for c in v2)
Sets do not contain duplicates. So, the following line returns True.
set(v2).issubset(v1)
To count for duplicates, you can use the code:
v1 = sorted(v1)
v2 = sorted(v2)
def is_subseq(v2, v1):
"""Check whether v2 is a subsequence of v1."""
it = iter(v1)
return all(c in it for c in v2)
So, the following line returns False.
is_subseq(v2, v1)
回答 4
这就是我在网上搜索的内容,但不幸的是,我在python解释器上进行实验时发现不是在线的。
>>> case ="caseCamel">>> label ="Case Camel">>> list =["apple","banana"]>>>>>>(case or label)in list
False>>> list =["apple","caseCamel"]>>>(case or label)in list
True>>>(case and label)in list
False>>> list =["case","caseCamel","Case Camel"]>>>(case and label)in list
True>>>
如果您有一个完整的变量列表 sublist variable
>>>>>> list =["case","caseCamel","Case Camel"]>>> label ="Case Camel">>> case ="caseCamel">>>>>> sublist =["unique banana","very unique banana"]>>>>>># example for if any (at least one) item contained in superset (or statement)...>>> next((Truefor item in sublist if next((Truefor x in list if x == item),False)),False)False>>>>>> sublist[0]= label
>>>>>> next((Truefor item in sublist if next((Truefor x in list if x == item),False)),False)True>>>>>># example for whether a subset (all items) contained in superset (and statement)...>>># a bit of demorgan's law...>>> next((Falsefor item in sublist if item notin list),True)False>>>>>> sublist[1]= case
>>>>>> next((Falsefor item in sublist if item notin list),True)True>>>>>> next((Truefor item in sublist if next((Truefor x in list if x == item),False)),False)True>>>>>>
This was what I was searching online but unfortunately found not online but while experimenting on python interpreter.
>>> case = "caseCamel"
>>> label = "Case Camel"
>>> list = ["apple", "banana"]
>>>
>>> (case or label) in list
False
>>> list = ["apple", "caseCamel"]
>>> (case or label) in list
True
>>> (case and label) in list
False
>>> list = ["case", "caseCamel", "Case Camel"]
>>> (case and label) in list
True
>>>
and if you have a looong list of variables held in a sublist variable
>>>
>>> list = ["case", "caseCamel", "Case Camel"]
>>> label = "Case Camel"
>>> case = "caseCamel"
>>>
>>> sublist = ["unique banana", "very unique banana"]
>>>
>>> # example for if any (at least one) item contained in superset (or statement)
...
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
False
>>>
>>> sublist[0] = label
>>>
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
True
>>>
>>> # example for whether a subset (all items) contained in superset (and statement)
...
>>> # a bit of demorgan's law
...
>>> next((False for item in sublist if item not in list), True)
False
>>>
>>> sublist[1] = case
>>>
>>> next((False for item in sublist if item not in list), True)
True
>>>
>>> next((True for item in sublist if next((True for x in list if x == item), False)), False)
True
>>>
>>>
>>> a ={'key':'value'}>>> b ={'key':'value','extra_key':'extra_value'}>>> all(item in a.items()for item in b.items())True>>> all(item in b.items()for item in a.items())False
Not OP’s case, but – for anyone who wants to assert intersection in dicts and ended up here due to poor googling (e.g. me) – you need to work with dict.items:
>>> a = {'key': 'value'}
>>> b = {'key': 'value', 'extra_key': 'extra_value'}
>>> all(item in a.items() for item in b.items())
True
>>> all(item in b.items() for item in a.items())
False
That’s because dict.items returns tuples of key/value pairs, and much like any object in Python, they’re interchangeably comparable
Finally I migrated my development env from runserver to gunicorn/nginx.
It’d be convenient to replicate the autoreload feature of runserver to gunicorn, so the server automatically restarts when source changes. Otherwise I have to restart the server manually with kill -HUP.
One option would be to use the –max-requests to limit each spawned process to serving only one request by adding --max-requests 1 to the startup options. Every newly spawned process should see your code changes and in a development environment the extra startup time per request should be negligible.
Bryan Helmig came up with this and I modified it to use run_gunicorn instead of launching gunicorn directly, to make it possible to just cut and paste these 3 commands into a shell in your django project root folder (with your virtualenv activated):
I use git push to deploy to production and set up git hooks to run a script. The advantage of this approach is you can also do your migration and package installation at the same time. https://mikeeverhart.net/2013/01/using-git-to-deploy-code/
mkdir -p /home/git/project_name.git
cd /home/git/project_name.git
git init --bare
Then create a script /home/git/project_name.git/hooks/post-receive.
From my local / development server, I set up git remote that allows me to push to the production server
git remote add production ssh://user_name@production-server/home/git/project_name.git
# initial push
git push production +master:refs/heads/master
# subsequent push
git push production master
As a bonus, you will get to see all the prompts as the script is running. So you will see if there is any issue with the migration/package installation/supervisor restart.
from itertools import tee
# python2 only:#from itertools import izip as zipdef differences(seq):
iterable, copied = tee(seq)
next(copied)for x, y in zip(iterable, copied):yield y - x
from itertools import islice
def differences(seq):
nexts = islice(seq,1,None)for x, y in zip(seq, nexts):yield y - x
您也可以避免使用itertools模块:
def differences(seq):
iterable = iter(seq)
prev = next(iterable)for element in iterable:yield element - prev
prev = element
如果您不需要存储所有结果并支持无限的可迭代对象,那么所有这些解决方案都可以在恒定的空间中工作。
以下是解决方案的一些微观基准:
In[12]: L = range(10**6)In[13]:from collections import deque
In[15]:%timeit deque(differences_tee(L), maxlen=0)10 loops, best of 3:122 ms per loop
In[16]:%timeit deque(differences_islice(L), maxlen=0)10 loops, best of 3:127 ms per loop
In[17]:%timeit deque(differences_no_it(L), maxlen=0)10 loops, best of 3:89.9 ms per loop
以及其他建议的解决方案:
In[18]:%timeit [x[1]- x[0]for x in zip(L[1:], L)]10 loops, best of 3:163 ms per loop
In[19]:%timeit [L[i+1]-L[i]for i in range(len(L)-1)]1 loops, best of 3:395 ms per loop
In[20]:import numpy as np
In[21]:%timeit np.diff(L)1 loops, best of 3:479 ms per loop
In[35]:%%timeit
...: res =[]...:for i in range(len(L)-1):...: res.append(L[i+1]- L[i])...:1 loops, best of 3:234 ms per loop
You can use itertools.tee and zip to efficiently build the result:
from itertools import tee
# python2 only:
#from itertools import izip as zip
def differences(seq):
iterable, copied = tee(seq)
next(copied)
for x, y in zip(iterable, copied):
yield y - x
from itertools import islice
def differences(seq):
nexts = islice(seq, 1, None)
for x, y in zip(seq, nexts):
yield y - x
You can also avoid using the itertools module:
def differences(seq):
iterable = iter(seq)
prev = next(iterable)
for element in iterable:
yield element - prev
prev = element
All these solution work in constant space if you don’t need to store all the results and support infinite iterables.
Here are some micro-benchmarks of the solutions:
In [12]: L = range(10**6)
In [13]: from collections import deque
In [15]: %timeit deque(differences_tee(L), maxlen=0)
10 loops, best of 3: 122 ms per loop
In [16]: %timeit deque(differences_islice(L), maxlen=0)
10 loops, best of 3: 127 ms per loop
In [17]: %timeit deque(differences_no_it(L), maxlen=0)
10 loops, best of 3: 89.9 ms per loop
And the other proposed solutions:
In [18]: %timeit [x[1] - x[0] for x in zip(L[1:], L)]
10 loops, best of 3: 163 ms per loop
In [19]: %timeit [L[i+1]-L[i] for i in range(len(L)-1)]
1 loops, best of 3: 395 ms per loop
In [20]: import numpy as np
In [21]: %timeit np.diff(L)
1 loops, best of 3: 479 ms per loop
In [35]: %%timeit
...: res = []
...: for i in range(len(L) - 1):
...: res.append(L[i+1] - L[i])
...:
1 loops, best of 3: 234 ms per loop
Note that:
zip(L[1:], L) is equivalent to zip(L[1:], L[:-1]) since zip already terminates on the shortest input, however it avoids a whole copy of L.
Accessing the single elements by index is very slow because every index access is a method call in python
numpy.diff is slow because it has to first convert the list to a ndarray. Obviously if you start with an ndarray it will be much faster:
In [22]: arr = np.array(L)
In [23]: %timeit np.diff(arr)
100 loops, best of 3: 3.02 ms per loop
回答 4
使用:=Python 3.8+中可用的walrus运算符:
>>> t =[1,3,6]>>> prev = t[0];[-prev +(prev := x)for x in t[1:]][2,3]
But if you want v to have the same length as t then
v = np.diff([t[0]] + t) # for python 3.x
or
v = np.diff(t + [t[-1]])
FYI: this will only work for lists.
for numpy arrays
v = np.diff(np.append(t[0], t))
回答 6
功能方法:
>>>import operator
>>> a =[1,3,5,7,11,13,17,21]>>> map(operator.sub, a[1:], a[:-1])[2,2,2,4,2,4,4]
使用生成器:
>>>import operator, itertools
>>> g1,g2 = itertools.tee((x*x for x in xrange(5)),2)>>> list(itertools.imap(operator.sub, itertools.islice(g1,1,None), g2))[1,3,5,7]
使用索引:
>>>[a[i+1]-a[i]for i in xrange(len(a)-1)][2,2,2,4,2,4,4]
Sometimes with numerical integration you will want to difference a list with periodic boundary conditions (so the first element calculates the difference to the last. In this case the numpy.roll function is helpful:
v-np.roll(v,1)
Solutions with zero prepended
Another numpy solution (just for completeness) is to use
numpy.ediff1d(v)
This works as numpy.diff, but only on a vector (it flattens the input array). It offers the ability to prepend or append numbers to the resulting vector. This is useful when handling accumulated fields that is often the case fluxes in meteorological variables (e.g. rain, latent heat etc), as you want a resulting list of the same length as the input variable, with the first entry untouched.
Then you would write
np.ediff1d(v,to_begin=v[0])
Of course, you can also do this with the np.diff command, in this case though you need to prepend zero to the series with the prepend keyword:
np.diff(v,prepend=0.0)
All the above solutions return a vector that is the same length as the input.
回答 9
我的方式
>>>v =[1,2,3,4,5]>>>[v[i]- v[i-1]for i, value in enumerate(v[1:],1)][1,1,1,1]
$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"| column -t
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
Using plain tabs wont do the trick here because I don’t know the longest data in each row.
This is the same behavior as ‘column -t’ in Linux..
$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c"
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
$ echo -e "a b c\naaaaaaaaaa b c\na bbbbbbbbbb c" | column -t
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
I have looked around for various python libraries to do this but can’t find anything useful.
回答 0
data =[['a','b','c'],['aaaaaaaaaa','b','c'],['a','bbbbbbbbbb','c']]
col_width = max(len(word)for row in data for word in row)+2# paddingfor row in data:print"".join(word.ljust(col_width)for word in row)
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
col_width = max(len(word) for row in data for word in row) + 2 # padding
for row in data:
print "".join(word.ljust(col_width) for word in row)
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
What this does is calculate the longest data entry to determine the column width, then use .ljust() to add the necessary padding when printing out each column.
>>> widths =[max(map(len, col))for col in zip(*rows)]>>>for row in rows:...print" ".join((val.ljust(width)for val, width in zip(row, widths)))...
a b c d
aaaaaaaaaa b c d
a bbbbbbbbbb c d
I came here with the same requirements but @lvc and @Preet’s answers seems more inline with what column -t produces in that columns have different widths:
>>> widths = [max(map(len, col)) for col in zip(*rows)]
>>> for row in rows:
... print " ".join((val.ljust(width) for val, width in zip(row, widths)))
...
a b c d
aaaaaaaaaa b c d
a bbbbbbbbbb c d
from columnar import columnar
headers =['name','id','host','notes']
data =[['busybox','c3c37d5d-38d2-409f-8d02-600fd9d51239','linuxnode-1-292735','Test server.'],['alpine-python','6bb77855-0fda-45a9-b553-e19e1a795f1e','linuxnode-2-249253','The one that runs python.'],['redis','afb648ba-ac97-4fb2-8953-9a5b5f39663e','linuxnode-3-3416918','For queues and stuff.'],['app-server','b866cd0f-bf80-40c7-84e3-c40891ec68f9','linuxnode-4-295918','A popular destination.'],['nginx','76fea0f0-aa53-4911-b7e4-fae28c2e469b','linuxnode-5-292735','Traffic Cop'],]
table = columnar(data, headers, no_borders=True)print(table)
formatting the columns using our knowledge of max width from the first pass using str.ljust() and str.rjust()
回答 5
像这样转换列是zip的工作:
>>> a =[['a','b','c'],['aaaaaaaaaa','b','c'],['a','bbbbbbbbbb','c']]>>> list(zip(*a))[('a','aaaaaaaaaa','a'),('b','b','bbbbbbbbbb'),('c','c','c')]
要查找每列所需的长度,可以使用max:
>>> trans_a = zip(*a)>>>[max(len(c)for c in b)for b in trans_a][10,10,1]
您可以在适当的填充下使用它来构造要传递给的字符串print:
>>> col_lenghts =[max(len(c)for c in b)for b in trans_a]>>> padding =' '# You might want more>>> padding.join(s.ljust(l)for s,l in zip(a[0], col_lenghts))'a b c'
To find the required length of each column, you can use max:
>>> trans_a = zip(*a)
>>> [max(len(c) for c in b) for b in trans_a]
[10, 10, 1]
Which you can use, with suitable padding, to construct strings to pass to print:
>>> col_lenghts = [max(len(c) for c in b) for b in trans_a]
>>> padding = ' ' # You might want more
>>> padding.join(s.ljust(l) for s,l in zip(a[0], col_lenghts))
'a b c'
'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''import cStringIO,operator
def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):"""Indents a table by column.
- rows: A sequence of sequences of items, one sequence per row.
- hasHeader: True if the first row consists of the columns' names.
- headerChar: Character to be used for the row separator line
(if hasHeader==True or separateRows==True).
- delim: The column delimiter.
- justify: Determines how are data justified in their column.
Valid values are 'left','right' and 'center'.
- separateRows: True if rows are to be separated by a line
of 'headerChar's.
- prefix: A string prepended to each printed row.
- postfix: A string appended to each printed row.
- wrapfunc: A function f(text) for wrapping text; each element in
the table is first wrapped by this function."""# closure for breaking logical rows to physical, using wrapfuncdef rowWrapper(row):
newRows =[wrapfunc(item).split('\n')for item in row]return[[substr or''for substr in item]for item in map(None,*newRows)]# break each logical row into one or more physical ones
logicalRows =[rowWrapper(row)for row in rows]# columns of physical rows
columns = map(None,*reduce(operator.add,logicalRows))# get the maximum of each column by the string length of its items
maxWidths =[max([len(str(item))for item in column])for column in columns]
rowSeparator = headerChar *(len(prefix)+ len(postfix)+ sum(maxWidths)+ \
len(delim)*(len(maxWidths)-1))# select the appropriate justify method
justify ={'center':str.center,'right':str.rjust,'left':str.ljust}[justify.lower()]
output=cStringIO.StringIO()if separateRows:print>> output, rowSeparator
for physicalRows in logicalRows:for row in physicalRows:print>> output, \
prefix \
+ delim.join([justify(str(item),width)for(item,width)in zip(row,maxWidths)]) \
+ postfix
if separateRows or hasHeader:print>> output, rowSeparator; hasHeader=Falsereturn output.getvalue()# written by Mike Brown# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061def wrap_onspace(text, width):"""
A word-wrap function that preserves existing line breaks
and most spaces in the text. Expects that existing line
breaks are posix newlines (\n).
"""return reduce(lambda line, word, width=width:'%s%s%s'%(line,' \n'[(len(line[line.rfind('\n')+1:])+ len(word.split('\n',1)[0])>= width)],
word),
text.split(' '))import re
def wrap_onspace_strict(text, width):"""Similar to wrap_onspace, but enforces the width constraint:
words longer than width are split."""
wordRegex = re.compile(r'\S{'+str(width)+r',}')return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)import math
def wrap_always(text, width):"""A simple word-wrap function that wraps text on exactly width characters.
It doesn't split the text in words."""return'\n'.join([ text[width*i:width*(i+1)] \
for i in xrange(int(math.ceil(1.*len(text)/width)))])if __name__ =='__main__':
labels =('First Name','Last Name','Age','Position')
data = \
'''John,Smith,24,Software Engineer
Mary,Brohowski,23,Sales Manager
Aristidis,Papageorgopoulos,28,Senior Reseacher'''
rows =[row.strip().split(',')for row in data.splitlines()]print'Without wrapping function\n'print indent([labels]+rows, hasHeader=True)# test indent with different wrapping functions
width =10for wrapper in(wrap_always,wrap_onspace,wrap_onspace_strict):print'Wrapping function: %s(x,width=%d)\n'%(wrapper.__name__,width)print indent([labels]+rows, hasHeader=True, separateRows=True,
prefix='| ', postfix=' |',
wrapfunc=lambda x: wrapper(x,width))# output:##Without wrapping function##First Name | Last Name | Age | Position #-------------------------------------------------------#John | Smith | 24 | Software Engineer#Mary | Brohowski | 23 | Sales Manager #Aristidis | Papageorgopoulos | 28 | Senior Reseacher ##Wrapping function: wrap_always(x,width=10)##----------------------------------------------#| First Name | Last Name | Age | Position |#----------------------------------------------#| John | Smith | 24 | Software E |#| | | | ngineer |#----------------------------------------------#| Mary | Brohowski | 23 | Sales Mana |#| | | | ger |#----------------------------------------------#| Aristidis | Papageorgo | 28 | Senior Res |#| | poulos | | eacher |#----------------------------------------------##Wrapping function: wrap_onspace(x,width=10)##---------------------------------------------------#| First Name | Last Name | Age | Position |#---------------------------------------------------#| John | Smith | 24 | Software |#| | | | Engineer |#---------------------------------------------------#| Mary | Brohowski | 23 | Sales |#| | | | Manager |#---------------------------------------------------#| Aristidis | Papageorgopoulos | 28 | Senior |#| | | | Reseacher |#---------------------------------------------------##Wrapping function: wrap_onspace_strict(x,width=10)##---------------------------------------------#| First Name | Last Name | Age | Position |#---------------------------------------------#| John | Smith | 24 | Software |#| | | | Engineer |#---------------------------------------------#| Mary | Brohowski | 23 | Sales |#| | | | Manager |#---------------------------------------------#| Aristidis | Papageorgo | 28 | Senior |#| | poulos | | Reseacher |#---------------------------------------------
'''
From http://code.activestate.com/recipes/267662-table-indentation/
PSF License
'''
import cStringIO,operator
def indent(rows, hasHeader=False, headerChar='-', delim=' | ', justify='left',
separateRows=False, prefix='', postfix='', wrapfunc=lambda x:x):
"""Indents a table by column.
- rows: A sequence of sequences of items, one sequence per row.
- hasHeader: True if the first row consists of the columns' names.
- headerChar: Character to be used for the row separator line
(if hasHeader==True or separateRows==True).
- delim: The column delimiter.
- justify: Determines how are data justified in their column.
Valid values are 'left','right' and 'center'.
- separateRows: True if rows are to be separated by a line
of 'headerChar's.
- prefix: A string prepended to each printed row.
- postfix: A string appended to each printed row.
- wrapfunc: A function f(text) for wrapping text; each element in
the table is first wrapped by this function."""
# closure for breaking logical rows to physical, using wrapfunc
def rowWrapper(row):
newRows = [wrapfunc(item).split('\n') for item in row]
return [[substr or '' for substr in item] for item in map(None,*newRows)]
# break each logical row into one or more physical ones
logicalRows = [rowWrapper(row) for row in rows]
# columns of physical rows
columns = map(None,*reduce(operator.add,logicalRows))
# get the maximum of each column by the string length of its items
maxWidths = [max([len(str(item)) for item in column]) for column in columns]
rowSeparator = headerChar * (len(prefix) + len(postfix) + sum(maxWidths) + \
len(delim)*(len(maxWidths)-1))
# select the appropriate justify method
justify = {'center':str.center, 'right':str.rjust, 'left':str.ljust}[justify.lower()]
output=cStringIO.StringIO()
if separateRows: print >> output, rowSeparator
for physicalRows in logicalRows:
for row in physicalRows:
print >> output, \
prefix \
+ delim.join([justify(str(item),width) for (item,width) in zip(row,maxWidths)]) \
+ postfix
if separateRows or hasHeader: print >> output, rowSeparator; hasHeader=False
return output.getvalue()
# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
"""
A word-wrap function that preserves existing line breaks
and most spaces in the text. Expects that existing line
breaks are posix newlines (\n).
"""
return reduce(lambda line, word, width=width: '%s%s%s' %
(line,
' \n'[(len(line[line.rfind('\n')+1:])
+ len(word.split('\n',1)[0]
) >= width)],
word),
text.split(' ')
)
import re
def wrap_onspace_strict(text, width):
"""Similar to wrap_onspace, but enforces the width constraint:
words longer than width are split."""
wordRegex = re.compile(r'\S{'+str(width)+r',}')
return wrap_onspace(wordRegex.sub(lambda m: wrap_always(m.group(),width),text),width)
import math
def wrap_always(text, width):
"""A simple word-wrap function that wraps text on exactly width characters.
It doesn't split the text in words."""
return '\n'.join([ text[width*i:width*(i+1)] \
for i in xrange(int(math.ceil(1.*len(text)/width))) ])
if __name__ == '__main__':
labels = ('First Name', 'Last Name', 'Age', 'Position')
data = \
'''John,Smith,24,Software Engineer
Mary,Brohowski,23,Sales Manager
Aristidis,Papageorgopoulos,28,Senior Reseacher'''
rows = [row.strip().split(',') for row in data.splitlines()]
print 'Without wrapping function\n'
print indent([labels]+rows, hasHeader=True)
# test indent with different wrapping functions
width = 10
for wrapper in (wrap_always,wrap_onspace,wrap_onspace_strict):
print 'Wrapping function: %s(x,width=%d)\n' % (wrapper.__name__,width)
print indent([labels]+rows, hasHeader=True, separateRows=True,
prefix='| ', postfix=' |',
wrapfunc=lambda x: wrapper(x,width))
# output:
#
#Without wrapping function
#
#First Name | Last Name | Age | Position
#-------------------------------------------------------
#John | Smith | 24 | Software Engineer
#Mary | Brohowski | 23 | Sales Manager
#Aristidis | Papageorgopoulos | 28 | Senior Reseacher
#
#Wrapping function: wrap_always(x,width=10)
#
#----------------------------------------------
#| First Name | Last Name | Age | Position |
#----------------------------------------------
#| John | Smith | 24 | Software E |
#| | | | ngineer |
#----------------------------------------------
#| Mary | Brohowski | 23 | Sales Mana |
#| | | | ger |
#----------------------------------------------
#| Aristidis | Papageorgo | 28 | Senior Res |
#| | poulos | | eacher |
#----------------------------------------------
#
#Wrapping function: wrap_onspace(x,width=10)
#
#---------------------------------------------------
#| First Name | Last Name | Age | Position |
#---------------------------------------------------
#| John | Smith | 24 | Software |
#| | | | Engineer |
#---------------------------------------------------
#| Mary | Brohowski | 23 | Sales |
#| | | | Manager |
#---------------------------------------------------
#| Aristidis | Papageorgopoulos | 28 | Senior |
#| | | | Reseacher |
#---------------------------------------------------
#
#Wrapping function: wrap_onspace_strict(x,width=10)
#
#---------------------------------------------
#| First Name | Last Name | Age | Position |
#---------------------------------------------
#| John | Smith | 24 | Software |
#| | | | Engineer |
#---------------------------------------------
#| Mary | Brohowski | 23 | Sales |
#| | | | Manager |
#---------------------------------------------
#| Aristidis | Papageorgo | 28 | Senior |
#| | poulos | | Reseacher |
#---------------------------------------------
import pandas as pd
l =[['a','b','c'],['aaaaaaaaaa','b','c'],['a','bbbbbbbbbb','c']]
df = pd.DataFrame(l)print(df)0120 a b c
1 aaaaaaaaaa b c
2 a bbbbbbbbbb c
import pandas as pd
l = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
df = pd.DataFrame(l)
print(df)
0 1 2
0 a b c
1 aaaaaaaaaa b c
2 a bbbbbbbbbb c
To remove index and header values to create output what you want you could use to_string method:
result = df.to_string(index=False, header=False)
print(result)
a b c
aaaaaaaaaa b c
a bbbbbbbbbb c
Scolp is a new library that lets you pretty print streaming columnar data easily while auto-adjusting column width.
(Disclaimer: I am the author)
回答 9
这将基于其他答案中使用的最大度量设置独立的,最适合的列宽。
data =[['a','b','c'],['aaaaaaaaaa','b','c'],['a','bbbbbbbbbb','c']]
padding =2
col_widths =[max(len(w)for w in[r[cn]for r in data])+ padding for cn in range(len(data[0]))]
format_string ="{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)for row in data:print(format_string.format(*row))
This sets independent, best-fit column widths based on the max-metric used in other answers.
data = [['a', 'b', 'c'], ['aaaaaaaaaa', 'b', 'c'], ['a', 'bbbbbbbbbb', 'c']]
padding = 2
col_widths = [max(len(w) for w in [r[cn] for r in data]) + padding for cn in range(len(data[0]))]
format_string = "{{:{}}}{{:{}}}{{:{}}}".format(*col_widths)
for row in data:
print(format_string.format(*row))
Meaning of {0:<30} {1:>35} {2:>35} {3:>35} {4:>20} {5:>20}:
0, 1, 2, 3, 4, 5 -> columns, there are 6 in total in this case
30, 35, 20 -> width of column (note that you’ll have to add length of \033[96m – this for Python is a string as well), just experiment :)
>, < -> justify: right, left (there is = for filling with zeros as well)
If you want to distinct e.g. max value, you’ll have to switch to special Pandas style function, but suppose that’s far enough to present data on terminal window.
A slight variation on a previous answer (I don’t have enough rep to comment on it). The format library lets you specify the width and alignment of an element but not where it starts, ie, you can say “be 20 columns wide” but not “start in column 20”. Which leads to this issue:
@icontract.pre(lambda table:not table or all(len(row)== len(table[0])for row in table))@icontract.post(lambda table, result: result ==""ifnot table elseTrue)@icontract.post(lambda result:not result.endswith("\n"))def format_table(table:List[List[str]])-> str:"""
Format the table as equal-spaced columns.
:param table: rows of cells
:return: table as string
"""
cols = len(table[0])
col_widths =[max(len(row[i])for row in table)for i in range(cols)]
lines =[]# type: List[str]for i, row in enumerate(table):
parts =[]# type: List[str]for cell, width in zip(row, col_widths):
parts.append(cell.ljust(width))
line =" | ".join(parts)
lines.append(line)if i ==0:
border =[]# type: List[str]for width in col_widths:
border.append("-"* width)
lines.append("-+-".join(border))
result ="\n".join(lines)return result
这是一个例子:
>>> table =[['column 0','another column 1'],['00','01'],['10','11']]>>> result = packagery._format_table(table=table)>>>print(result)
column 0| another column 1---------+-----------------00|0110|11
Here is a variation of the Shawn Chin’s answer. The width is fixed per column, not over all columns. There is also a border below the first row and between the columns. (icontract library is used to enforce the contracts.)
@icontract.pre(
lambda table: not table or all(len(row) == len(table[0]) for row in table))
@icontract.post(lambda table, result: result == "" if not table else True)
@icontract.post(lambda result: not result.endswith("\n"))
def format_table(table: List[List[str]]) -> str:
"""
Format the table as equal-spaced columns.
:param table: rows of cells
:return: table as string
"""
cols = len(table[0])
col_widths = [max(len(row[i]) for row in table) for i in range(cols)]
lines = [] # type: List[str]
for i, row in enumerate(table):
parts = [] # type: List[str]
for cell, width in zip(row, col_widths):
parts.append(cell.ljust(width))
line = " | ".join(parts)
lines.append(line)
if i == 0:
border = [] # type: List[str]
for width in col_widths:
border.append("-" * width)
lines.append("-+-".join(border))
result = "\n".join(lines)
return result
import io
import math
import operator
import re
import functools
from itertools import zip_longest
def indent(
rows,
has_header=False,
header_char="-",
delim=" | ",
justify="left",
separate_rows=False,
prefix="",
postfix="",
wrapfunc=lambda x: x,):"""Indents a table by column.
- rows: A sequence of sequences of items, one sequence per row.
- hasHeader: True if the first row consists of the columns' names.
- headerChar: Character to be used for the row separator line
(if hasHeader==True or separateRows==True).
- delim: The column delimiter.
- justify: Determines how are data justified in their column.
Valid values are 'left','right' and 'center'.
- separateRows: True if rows are to be separated by a line
of 'headerChar's.
- prefix: A string prepended to each printed row.
- postfix: A string appended to each printed row.
- wrapfunc: A function f(text) for wrapping text; each element in
the table is first wrapped by this function."""# closure for breaking logical rows to physical, using wrapfuncdef row_wrapper(row):
new_rows =[wrapfunc(item).split("\n")for item in row]return[[substr or""for substr in item]for item in zip_longest(*new_rows)]# break each logical row into one or more physical ones
logical_rows =[row_wrapper(row)for row in rows]# columns of physical rows
columns = zip_longest(*functools.reduce(operator.add, logical_rows))# get the maximum of each column by the string length of its items
max_widths =[max([len(str(item))for item in column])for column in columns]
row_separator = header_char *(
len(prefix)+ len(postfix)+ sum(max_widths)+ len(delim)*(len(max_widths)-1))# select the appropriate justify method
justify ={"center": str.center,"right": str.rjust,"left": str.ljust}[
justify.lower()]
output = io.StringIO()if separate_rows:print(output, row_separator)for physicalRows in logical_rows:for row in physicalRows:print( output, prefix + delim.join([justify(str(item), width)for(item, width)in zip(row, max_widths)])+ postfix)if separate_rows or has_header:print(output, row_separator)
has_header =Falsereturn output.getvalue()# written by Mike Brown# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061def wrap_onspace(text, width):"""
A word-wrap function that preserves existing line breaks
and most spaces in the text. Expects that existing line
breaks are posix newlines (\n).
"""return functools.reduce(lambda line, word, i_width=width:"%s%s%s"%(
line," \n"[(
len(line[line.rfind("\n")+1:])+ len(word.split("\n",1)[0])>= i_width
)],
word,),
text.split(" "),)def wrap_onspace_strict(text, i_width):"""Similar to wrap_onspace, but enforces the width constraint:
words longer than width are split."""
word_regex = re.compile(r"\S{"+ str(i_width)+ r",}")return wrap_onspace(
word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
)def wrap_always(text, width):"""A simple word-wrap function that wraps text on exactly width characters.
It doesn't split the text in words."""return"\n".join([
text[width * i : width *(i +1)]for i in range(int(math.ceil(1.0* len(text)/ width)))])if __name__ =="__main__":
labels =("First Name","Last Name","Age","Position")
data ="""John,Smith,24,Software Engineer
Mary,Brohowski,23,Sales Manager
Aristidis,Papageorgopoulos,28,Senior Reseacher"""
rows =[row.strip().split(",")for row in data.splitlines()]print("Without wrapping function\n")print(indent([labels]+ rows, has_header=True))# test indent with different wrapping functions
width =10for wrapper in(wrap_always, wrap_onspace, wrap_onspace_strict):print("Wrapping function: %s(x,width=%d)\n"%(wrapper.__name__, width))print(
indent([labels]+ rows,
has_header=True,
separate_rows=True,
prefix="| ",
postfix=" |",
wrapfunc=lambda x: wrapper(x, width),))# output:## Without wrapping function## First Name | Last Name | Age | Position# -------------------------------------------------------# John | Smith | 24 | Software Engineer# Mary | Brohowski | 23 | Sales Manager# Aristidis | Papageorgopoulos | 28 | Senior Reseacher## Wrapping function: wrap_always(x,width=10)## ----------------------------------------------# | First Name | Last Name | Age | Position |# ----------------------------------------------# | John | Smith | 24 | Software E |# | | | | ngineer |# ----------------------------------------------# | Mary | Brohowski | 23 | Sales Mana |# | | | | ger |# ----------------------------------------------# | Aristidis | Papageorgo | 28 | Senior Res |# | | poulos | | eacher |# ----------------------------------------------## Wrapping function: wrap_onspace(x,width=10)## ---------------------------------------------------# | First Name | Last Name | Age | Position |# ---------------------------------------------------# | John | Smith | 24 | Software |# | | | | Engineer |# ---------------------------------------------------# | Mary | Brohowski | 23 | Sales |# | | | | Manager |# ---------------------------------------------------# | Aristidis | Papageorgopoulos | 28 | Senior |# | | | | Reseacher |# ---------------------------------------------------## Wrapping function: wrap_onspace_strict(x,width=10)## ---------------------------------------------# | First Name | Last Name | Age | Position |# ---------------------------------------------# | John | Smith | 24 | Software |# | | | | Engineer |# ---------------------------------------------# | Mary | Brohowski | 23 | Sales |# | | | | Manager |# ---------------------------------------------# | Aristidis | Papageorgo | 28 | Senior |# | | poulos | | Reseacher |# ---------------------------------------------
updated @Franck Dernoncourt fancy recipe to be python 3 and PEP8 compliant
import io
import math
import operator
import re
import functools
from itertools import zip_longest
def indent(
rows,
has_header=False,
header_char="-",
delim=" | ",
justify="left",
separate_rows=False,
prefix="",
postfix="",
wrapfunc=lambda x: x,
):
"""Indents a table by column.
- rows: A sequence of sequences of items, one sequence per row.
- hasHeader: True if the first row consists of the columns' names.
- headerChar: Character to be used for the row separator line
(if hasHeader==True or separateRows==True).
- delim: The column delimiter.
- justify: Determines how are data justified in their column.
Valid values are 'left','right' and 'center'.
- separateRows: True if rows are to be separated by a line
of 'headerChar's.
- prefix: A string prepended to each printed row.
- postfix: A string appended to each printed row.
- wrapfunc: A function f(text) for wrapping text; each element in
the table is first wrapped by this function."""
# closure for breaking logical rows to physical, using wrapfunc
def row_wrapper(row):
new_rows = [wrapfunc(item).split("\n") for item in row]
return [[substr or "" for substr in item] for item in zip_longest(*new_rows)]
# break each logical row into one or more physical ones
logical_rows = [row_wrapper(row) for row in rows]
# columns of physical rows
columns = zip_longest(*functools.reduce(operator.add, logical_rows))
# get the maximum of each column by the string length of its items
max_widths = [max([len(str(item)) for item in column]) for column in columns]
row_separator = header_char * (
len(prefix) + len(postfix) + sum(max_widths) + len(delim) * (len(max_widths) - 1)
)
# select the appropriate justify method
justify = {"center": str.center, "right": str.rjust, "left": str.ljust}[
justify.lower()
]
output = io.StringIO()
if separate_rows:
print(output, row_separator)
for physicalRows in logical_rows:
for row in physicalRows:
print( output, prefix + delim.join(
[justify(str(item), width) for (item, width) in zip(row, max_widths)]
) + postfix)
if separate_rows or has_header:
print(output, row_separator)
has_header = False
return output.getvalue()
# written by Mike Brown
# http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/148061
def wrap_onspace(text, width):
"""
A word-wrap function that preserves existing line breaks
and most spaces in the text. Expects that existing line
breaks are posix newlines (\n).
"""
return functools.reduce(
lambda line, word, i_width=width: "%s%s%s"
% (
line,
" \n"[
(
len(line[line.rfind("\n") + 1 :]) + len(word.split("\n", 1)[0])
>= i_width
)
],
word,
),
text.split(" "),
)
def wrap_onspace_strict(text, i_width):
"""Similar to wrap_onspace, but enforces the width constraint:
words longer than width are split."""
word_regex = re.compile(r"\S{" + str(i_width) + r",}")
return wrap_onspace(
word_regex.sub(lambda m: wrap_always(m.group(), i_width), text), i_width
)
def wrap_always(text, width):
"""A simple word-wrap function that wraps text on exactly width characters.
It doesn't split the text in words."""
return "\n".join(
[
text[width * i : width * (i + 1)]
for i in range(int(math.ceil(1.0 * len(text) / width)))
]
)
if __name__ == "__main__":
labels = ("First Name", "Last Name", "Age", "Position")
data = """John,Smith,24,Software Engineer
Mary,Brohowski,23,Sales Manager
Aristidis,Papageorgopoulos,28,Senior Reseacher"""
rows = [row.strip().split(",") for row in data.splitlines()]
print("Without wrapping function\n")
print(indent([labels] + rows, has_header=True))
# test indent with different wrapping functions
width = 10
for wrapper in (wrap_always, wrap_onspace, wrap_onspace_strict):
print("Wrapping function: %s(x,width=%d)\n" % (wrapper.__name__, width))
print(
indent(
[labels] + rows,
has_header=True,
separate_rows=True,
prefix="| ",
postfix=" |",
wrapfunc=lambda x: wrapper(x, width),
)
)
# output:
#
# Without wrapping function
#
# First Name | Last Name | Age | Position
# -------------------------------------------------------
# John | Smith | 24 | Software Engineer
# Mary | Brohowski | 23 | Sales Manager
# Aristidis | Papageorgopoulos | 28 | Senior Reseacher
#
# Wrapping function: wrap_always(x,width=10)
#
# ----------------------------------------------
# | First Name | Last Name | Age | Position |
# ----------------------------------------------
# | John | Smith | 24 | Software E |
# | | | | ngineer |
# ----------------------------------------------
# | Mary | Brohowski | 23 | Sales Mana |
# | | | | ger |
# ----------------------------------------------
# | Aristidis | Papageorgo | 28 | Senior Res |
# | | poulos | | eacher |
# ----------------------------------------------
#
# Wrapping function: wrap_onspace(x,width=10)
#
# ---------------------------------------------------
# | First Name | Last Name | Age | Position |
# ---------------------------------------------------
# | John | Smith | 24 | Software |
# | | | | Engineer |
# ---------------------------------------------------
# | Mary | Brohowski | 23 | Sales |
# | | | | Manager |
# ---------------------------------------------------
# | Aristidis | Papageorgopoulos | 28 | Senior |
# | | | | Reseacher |
# ---------------------------------------------------
#
# Wrapping function: wrap_onspace_strict(x,width=10)
#
# ---------------------------------------------
# | First Name | Last Name | Age | Position |
# ---------------------------------------------
# | John | Smith | 24 | Software |
# | | | | Engineer |
# ---------------------------------------------
# | Mary | Brohowski | 23 | Sales |
# | | | | Manager |
# ---------------------------------------------
# | Aristidis | Papageorgo | 28 | Senior |
# | | poulos | | Reseacher |
# ---------------------------------------------
回答 15
我知道这个问题很旧,但是我不了解Antak的答案,也不想使用库,所以我推出了自己的解决方案。
解决方案假定记录是2D数组,记录的长度都相同,并且字段都是字符串。
def stringifyRecords(records):
column_widths =[0]* len(records[0])for record in records:for i, field in enumerate(record):
width = len(field)if width > column_widths[i]: column_widths[i]= width
s =""for record in records:for column_width, field in zip(column_widths, record):
s += field.ljust(column_width+1)
s +="\n"return s
I realize this question is old but I didn’t understand Antak’s answer and didn’t want to use a library so I rolled my own solution.
Solution assumes records is a 2D array, records are all the same length, and that fields are all strings.
def stringifyRecords(records):
column_widths = [0] * len(records[0])
for record in records:
for i, field in enumerate(record):
width = len(field)
if width > column_widths[i]: column_widths[i] = width
s = ""
for record in records:
for column_width, field in zip(column_widths, record):
s += field.ljust(column_width+1)
s += "\n"
return s
I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don’t want Python to take the top row into account. How can I make sure Python ignores the first line?
This is the code so far:
import csv
with open('all16.csv', 'rb') as inf:
incsv = csv.reader(inf)
column = 1
datatype = float
data = (datatype(column) for row in incsv)
least_value = min(data)
print least_value
Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.
You could use an instance of the csv module’s Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:
import csv
with open('all16.csv', 'r', newline='') as file:
has_header = csv.Sniffer().has_header(file.read(1024))
file.seek(0) # Rewind.
reader = csv.reader(file)
if has_header:
next(reader) # Skip header row.
column = 1
datatype = float
data = (datatype(row[column]) for row in reader)
least_value = min(data)
print(least_value)
Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:
data = (float(row[1]) for row in reader)
Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:
In a similar use case I had to skip annoying lines before the line with my actual column names. This solution worked nicely. Read the file first, then pass the list to csv.DictReader.
with open('all16.csv') as tmp:
# Skip first line (if any)
next(tmp, None)
# {line_num: row}
data = dict(enumerate(csv.DictReader(tmp)))
use csv.DictReader instead of csv.Reader.
If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as field names. you would then be able to access field values using row[“1”] etc
The new ‘pandas’ package might be more relevant than ‘csv’. The code below will read a CSV file, by default interpreting the first line as the column header and find the minimum across columns.
import pandas as pd
data = pd.read_csv('all16.csv')
data.min()
>>> import pyexcel as pe
>>> data = pe.load('all16.csv', name_columns_by_row=0)
>>> min(data.column[1])
Meanwhile, if you know what header column index one is, for example “Column 1”, you can do this instead:
>>> min(data.column["Column 1"])
回答 8
对我来说,最简单的方法就是使用范围。
import csv
with open('files/filename.csv')as I:
reader = csv.reader(I)
fulllist = list(reader)# Starting with data skipping headerfor item in range(1, len(fulllist)):# Print each row using "item" as the index valueprint(fulllist[item])
import csv
with open('files/filename.csv') as I:
reader = csv.reader(I)
fulllist = list(reader)
# Starting with data skipping header
for item in range(1, len(fulllist)):
# Print each row using "item" as the index value
print (fulllist[item])
回答 9
因为这与我正在做的事情有关,所以我在这里分享。
如果我们不确定是否有标题并且您又不想导入嗅探器和其他内容,该怎么办?
如果您的任务是基本任务,例如打印或追加到列表或数组,则可以使用if语句:
# Let's say there's 4 columnswith open('file.csv')as csvfile:
csvreader = csv.reader(csvfile)# read first line
first_line = next(csvreader)# My headers were just text. You can use any suitable conditional hereif len(first_line)==4:
array.append(first_line)# Now we'll just iterate over everything else as usual:for row in csvreader:
array.append(row)
Because this is related to something I was doing, I’ll share here.
What if we’re not sure if there’s a header and you also don’t feel like importing sniffer and other things?
If your task is basic, such as printing or appending to a list or array, you could just use an if statement:
# Let's say there's 4 columns
with open('file.csv') as csvfile:
csvreader = csv.reader(csvfile)
# read first line
first_line = next(csvreader)
# My headers were just text. You can use any suitable conditional here
if len(first_line) == 4:
array.append(first_line)
# Now we'll just iterate over everything else as usual:
for row in csvreader:
array.append(row)
with open('example.csv', newline='') as csvfile:
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
# ... process CSV file contents here ...
The Sniffer will try to auto-detect many things about the CSV file. You need to explicitly call its has_header() method to determine whether the file has a header line. If it does, then skip the first row when iterating the CSV rows. You can do it like this:
if sniffer.has_header():
for header_row in reader:
break
for data_row in reader:
# do something with the row
It was quite frustrating that the csv module could not easily get the header, there is also a bug with the UTF-8 BOM (first char in file).
This works for me using only the csv module:
import csv
def read_csv(self, csv_path, delimiter):
with open(csv_path, newline='', encoding='utf-8') as f:
# https://bugs.python.org/issue7185
# Remove UTF8 BOM.
txt = f.read()[1:]
# Remove header line.
header = txt.splitlines()[:1]
lines = txt.splitlines()[1:]
# Convert to list.
csv_rows = list(csv.reader(lines, delimiter=delimiter))
for row in csv_rows:
value = row[INDEX_HERE]
回答 14
我将csvreader转换为list,然后弹出第一个元素
import csv
with open(fileName,'r')as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader)# Convert to list
data.pop(0)# Removes the first rowfor row in data:print(row)
I would convert csvreader to list, then pop the first element
import csv
with open(fileName, 'r') as csvfile:
csvreader = csv.reader(csvfile)
data = list(csvreader) # Convert to list
data.pop(0) # Removes the first row
for row in data:
print(row)
Return the next row of the reader’s iterable object as a list (if the
object was returned from reader()) or a dict (if it is a DictReader
instance), parsed according to the current dialect. Usually you should
call this as next(reader).
csv_data = csv.reader(open('sample.csv'))
csv_data.__next__() # skip first row
for row in csv_data:
print(row) # should print second row
In Python, when given the URL for a text file, what is the simplest way to access the contents off the text file and print the contents of the file out locally line-by-line without saving a local copy of the text file?
TargetURL=http://www.myhost.com/SomeFile.txt
#read the file
#print first line
#print second line
#etc
import urllib2 # the lib that handles the url stuff
data = urllib2.urlopen(target_url)# it's a file like object and works just like a filefor line in data:# files are iterableprint line
正如Will所建议的,您甚至不需要“ readlines”。您甚至可以将其缩短为: *
import urllib2
for line in urllib2.urlopen(target_url):print line
import urllib2
data = urllib2.urlopen("http://www.google.com").read(20000)# read only 20 000 chars
data = data.split("\n")# then split it into linesfor line in data:print line
* Python 3中的第二个示例:
import urllib.request # the lib that handles the url stufffor line in urllib.request.urlopen(target_url):print(line.decode('utf-8'))#utf-8 or iso8859-1 or whatever the page encoding scheme is
Edit 09/2016: In Python 3 and up use urllib.request instead of urllib2
Actually the simplest way is:
import urllib2 # the lib that handles the url stuff
data = urllib2.urlopen(target_url) # it's a file like object and works just like a file
for line in data: # files are iterable
print line
You don’t even need “readlines”, as Will suggested. You could even shorten it to: *
import urllib2
for line in urllib2.urlopen(target_url):
print line
But remember in Python, readability matters.
However, this is the simplest way but not the safe way because most of the time with network programming, you don’t know if the amount of data to expect will be respected. So you’d generally better read a fixed and reasonable amount of data, something you know to be enough for the data you expect but will prevent your script from been flooded:
import urllib2
data = urllib2.urlopen("http://www.google.com").read(20000) # read only 20 000 chars
data = data.split("\n") # then split it into lines
for line in data:
print line
* Second example in Python 3:
import urllib.request # the lib that handles the url stuff
for line in urllib.request.urlopen(target_url):
print(line.decode('utf-8')) #utf-8 or iso8859-1 or whatever the page encoding scheme is
I’m a newbie to Python and the offhand comment about Python 3 in the accepted solution was confusing. For posterity, the code to do this in Python 3 is
import urllib.request
data = urllib.request.urlopen(target_url)
for line in data:
...
or alternatively
from urllib.request import urlopen
data = urlopen(target_url)
Usually with Tkinter, it is okay to just use from Tkinter import * as the module will only export names that are clearly widgets.
PEP 8 does not list any conventions for such a case, so I guess it is up to you to decide what is the best option. It is all about readability, so choose whatever makes it clear that you are importing stuff from a single module.
As all those names are made available in your scope, I personally think that options 2 is the most clearest as you can see the imported names the best. You then could even split it up more to maybe group those names together that belong with each other. In your example I might put Tk, Frame and Canvas separately as they group widgets together, while having Button and Text separately as they are smaller components in a view.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int()
with base 10: '2^4'
I know that eval can work around this, but isn’t there a better and – more importantly – safer method to evaluate a mathematical expression that is being stored in a string?
Pyparsing can be used to parse mathematical expressions. In particular, fourFn.py
shows how to parse basic arithmetic expressions. Below, I’ve rewrapped fourFn into a numeric parser class for easier reuse.
from __future__ import division
from pyparsing import (Literal, CaselessLiteral, Word, Combine, Group, Optional,
ZeroOrMore, Forward, nums, alphas, oneOf)
import math
import operator
__author__ = 'Paul McGuire'
__version__ = '$Revision: 0.0 $'
__date__ = '$Date: 2009-03-20 $'
__source__ = '''http://pyparsing.wikispaces.com/file/view/fourFn.py
http://pyparsing.wikispaces.com/message/view/home/15549426
'''
__note__ = '''
All I've done is rewrap Paul McGuire's fourFn.py as a class, so I can use it
more easily in other places.
'''
class NumericStringParser(object):
'''
Most of this code comes from the fourFn.py pyparsing example
'''
def pushFirst(self, strg, loc, toks):
self.exprStack.append(toks[0])
def pushUMinus(self, strg, loc, toks):
if toks and toks[0] == '-':
self.exprStack.append('unary -')
def __init__(self):
"""
expop :: '^'
multop :: '*' | '/'
addop :: '+' | '-'
integer :: ['+' | '-'] '0'..'9'+
atom :: PI | E | real | fn '(' expr ')' | '(' expr ')'
factor :: atom [ expop factor ]*
term :: factor [ multop factor ]*
expr :: term [ addop term ]*
"""
point = Literal(".")
e = CaselessLiteral("E")
fnumber = Combine(Word("+-" + nums, nums) +
Optional(point + Optional(Word(nums))) +
Optional(e + Word("+-" + nums, nums)))
ident = Word(alphas, alphas + nums + "_$")
plus = Literal("+")
minus = Literal("-")
mult = Literal("*")
div = Literal("/")
lpar = Literal("(").suppress()
rpar = Literal(")").suppress()
addop = plus | minus
multop = mult | div
expop = Literal("^")
pi = CaselessLiteral("PI")
expr = Forward()
atom = ((Optional(oneOf("- +")) +
(ident + lpar + expr + rpar | pi | e | fnumber).setParseAction(self.pushFirst))
| Optional(oneOf("- +")) + Group(lpar + expr + rpar)
).setParseAction(self.pushUMinus)
# by defining exponentiation as "atom [ ^ factor ]..." instead of
# "atom [ ^ atom ]...", we get right-to-left exponents, instead of left-to-right
# that is, 2^3^2 = 2^(3^2), not (2^3)^2.
factor = Forward()
factor << atom + \
ZeroOrMore((expop + factor).setParseAction(self.pushFirst))
term = factor + \
ZeroOrMore((multop + factor).setParseAction(self.pushFirst))
expr << term + \
ZeroOrMore((addop + term).setParseAction(self.pushFirst))
# addop_term = ( addop + term ).setParseAction( self.pushFirst )
# general_term = term + ZeroOrMore( addop_term ) | OneOrMore( addop_term)
# expr << general_term
self.bnf = expr
# map operator symbols to corresponding arithmetic operations
epsilon = 1e-12
self.opn = {"+": operator.add,
"-": operator.sub,
"*": operator.mul,
"/": operator.truediv,
"^": operator.pow}
self.fn = {"sin": math.sin,
"cos": math.cos,
"tan": math.tan,
"exp": math.exp,
"abs": abs,
"trunc": lambda a: int(a),
"round": round,
"sgn": lambda a: abs(a) > epsilon and cmp(a, 0) or 0}
def evaluateStack(self, s):
op = s.pop()
if op == 'unary -':
return -self.evaluateStack(s)
if op in "+-*/^":
op2 = self.evaluateStack(s)
op1 = self.evaluateStack(s)
return self.opn[op](op1, op2)
elif op == "PI":
return math.pi # 3.1415926535
elif op == "E":
return math.e # 2.718281828
elif op in self.fn:
return self.fn[op](self.evaluateStack(s))
elif op[0].isalpha():
return 0
else:
return float(op)
def eval(self, num_string, parseAll=True):
self.exprStack = []
results = self.bnf.parseString(num_string, parseAll)
val = self.evaluateStack(self.exprStack[:])
return val
You can use it like this
nsp = NumericStringParser()
result = nsp.eval('2^4')
print(result)
# 16.0
result = nsp.eval('exp(2^4)')
print(result)
# 8886110.520507872
Okay, so the problem with eval is that it can escape its sandbox too easily, even if you get rid of __builtins__. All the methods for escaping the sandbox come down to using getattr or object.__getattribute__ (via the . operator) to obtain a reference to some dangerous object via some allowed object (''.__class__.__bases__[0].__subclasses__ or similar). getattr is eliminated by setting __builtins__ to None. object.__getattribute__ is the difficult one, since it cannot simply be removed, both because object is immutable and because removing it would break everything. However, __getattribute__ is only accessible via the . operator, so purging that from your input is sufficient to ensure eval cannot escape its sandbox.
In processing formulas, the only valid use of a decimal is when it is preceded or followed by [0-9], so we just remove all other instances of ..
import re
inp = re.sub(r"\.(?![0-9])","", inp)
val = eval(inp, {'__builtins__':None})
Note that while python normally treats 1 + 1. as 1 + 1.0, this will remove the trailing . and leave you with 1 + 1. You could add ),, and EOF to the list of things allowed to follow ., but why bother?
You can use the ast module and write a NodeVisitor that verifies that the type of each node is part of a whitelist.
import ast, math
locals = {key: value for (key,value) in vars(math).items() if key[0] != '_'}
locals.update({"abs": abs, "complex": complex, "min": min, "max": max, "pow": pow, "round": round})
class Visitor(ast.NodeVisitor):
def visit(self, node):
if not isinstance(node, self.whitelist):
raise ValueError(node)
return super().visit(node)
whitelist = (ast.Module, ast.Expr, ast.Load, ast.Expression, ast.Add, ast.Sub, ast.UnaryOp, ast.Num, ast.BinOp,
ast.Mult, ast.Div, ast.Pow, ast.BitOr, ast.BitAnd, ast.BitXor, ast.USub, ast.UAdd, ast.FloorDiv, ast.Mod,
ast.LShift, ast.RShift, ast.Invert, ast.Call, ast.Name)
def evaluate(expr, locals = {}):
if any(elem in expr for elem in '\n#') : raise ValueError(expr)
try:
node = ast.parse(expr.strip(), mode='eval')
Visitor().visit(node)
return eval(compile(node, "<string>", "eval"), {'__builtins__': None}, locals)
except Exception: raise ValueError(expr)
Because it works via a whitelist rather than a blacklist, it is safe. The only functions and variables it can access are those you explicitly give it access to. I populated a dict with math-related functions so you can easily provide access to those if you want, but you have to explicitly use it.
If the string attempts to call functions that haven’t been provided, or invoke any methods, an exception will be raised, and it will not be executed.
Because this uses Python’s built in parser and evaluator, it also inherits Python’s precedence and promotion rules as well.
The reason eval and exec are so dangerous is that the default compile function will generate bytecode for any valid python expression, and the default eval or exec will execute any valid python bytecode. All the answers to date have focused on restricting the bytecode that can be generated (by sanitizing input) or building your own domain-specific-language using the AST.
Instead, you can easily create a simple eval function that is incapable of doing anything nefarious and can easily have runtime checks on memory or time used. Of course, if it is simple math, than there is a shortcut.
c = compile(stringExp, 'userinput', 'eval')
if c.co_code[0]==b'd' and c.co_code[3]==b'S':
return c.co_consts[ord(c.co_code[1])+ord(c.co_code[2])*256]
The way this works is simple, any constant mathematic expression is safely evaluated during compilation and stored as a constant. The code object returned by compile consists of d, which is the bytecode for LOAD_CONST, followed by the number of the constant to load (usually the last one in the list), followed by S, which is the bytecode for RETURN_VALUE. If this shortcut doesn’t work, it means that the user input isn’t a constant expression (contains a variable or function call or similar).
This also opens the door to some more sophisticated input formats. For example:
stringExp = "1 + cos(2)"
This requires actually evaluating the bytecode, which is still quite simple. Python bytecode is a stack oriented language, so everything is a simple matter of TOS=stack.pop(); op(TOS); stack.put(TOS) or similar. The key is to only implement the opcodes that are safe (loading/storing values, math operations, returning values) and not unsafe ones (attribute lookup). If you want the user to be able to call functions (the whole reason not to use the shortcut above), simple make your implementation of CALL_FUNCTION only allow functions in a ‘safe’ list.
from dis import opmap
from Queue import LifoQueue
from math import sin,cos
import operator
globs = {'sin':sin, 'cos':cos}
safe = globs.values()
stack = LifoQueue()
class BINARY(object):
def __init__(self, operator):
self.op=operator
def __call__(self, context):
stack.put(self.op(stack.get(),stack.get()))
class UNARY(object):
def __init__(self, operator):
self.op=operator
def __call__(self, context):
stack.put(self.op(stack.get()))
def CALL_FUNCTION(context, arg):
argc = arg[0]+arg[1]*256
args = [stack.get() for i in range(argc)]
func = stack.get()
if func not in safe:
raise TypeError("Function %r now allowed"%func)
stack.put(func(*args))
def LOAD_CONST(context, arg):
cons = arg[0]+arg[1]*256
stack.put(context['code'].co_consts[cons])
def LOAD_NAME(context, arg):
name_num = arg[0]+arg[1]*256
name = context['code'].co_names[name_num]
if name in context['locals']:
stack.put(context['locals'][name])
else:
stack.put(context['globals'][name])
def RETURN_VALUE(context):
return stack.get()
opfuncs = {
opmap['BINARY_ADD']: BINARY(operator.add),
opmap['UNARY_INVERT']: UNARY(operator.invert),
opmap['CALL_FUNCTION']: CALL_FUNCTION,
opmap['LOAD_CONST']: LOAD_CONST,
opmap['LOAD_NAME']: LOAD_NAME
opmap['RETURN_VALUE']: RETURN_VALUE,
}
def VMeval(c):
context = dict(locals={}, globals=globs, code=c)
bci = iter(c.co_code)
for bytecode in bci:
func = opfuncs[ord(bytecode)]
if func.func_code.co_argcount==1:
ret = func(context)
else:
args = ord(bci.next()), ord(bci.next())
ret = func(context, args)
if ret:
return ret
def evaluate(expr):
return VMeval(compile(expr, 'userinput', 'eval'))
Obviously, the real version of this would be a bit longer (there are 119 opcodes, 24 of which are math related). Adding STORE_FAST and a couple others would allow for input like 'x=5;return x+x or similar, trivially easily. It can even be used to execute user-created functions, so long as the user created functions are themselves executed via VMeval (don’t make them callable!!! or they could get used as a callback somewhere). Handling loops requires support for the goto bytecodes, which means changing from a for iterator to while and maintaining a pointer to the current instruction, but isn’t too hard. For resistance to DOS, the main loop should check how much time has passed since the start of the calculation, and certain operators should deny input over some reasonable limit (BINARY_POWER being the most obvious).
While this approach is somewhat longer than a simple grammar parser for simple expressions (see above about just grabbing the compiled constant), it extends easily to more complicated input, and doesn’t require dealing with grammar (compile take anything arbitrarily complicated and reduces it to a sequence of simple instructions).
I think I would use eval(), but would first check to make sure the string is a valid mathematical expression, as opposed to something malicious. You could use a regex for the validation.
eval() also takes additional arguments which you can use to restrict the namespace it operates in for greater security.
This is a massively late reply, but I think useful for future reference. Rather than write your own math parser (although the pyparsing example above is great) you could use SymPy. I don’t have a lot of experience with it, but it contains a much more powerful math engine than anyone is likely to write for a specific application and the basic expression evaluation is very easy:
>>> import sympy
>>> x, y, z = sympy.symbols('x y z')
>>> sympy.sympify("x**3 + sin(y)").evalf(subs={x:1, y:-3})
0.858879991940133
Very cool indeed! A from sympy import * brings in a lot more function support, such as trig functions, special functions, etc., but I’ve avoided that here to show what’s coming from where.
>>> eval('__builtins__.__import__("os").system("echo got through")', ns)Traceback(most recent call last):File"<stdin>", line 1,in<module>File"<string>", line 1,in<module>AttributeError:'NoneType' object has no attribute '__import__'
否则,您将获得:
>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0
您可能要授予对math模块的访问权限:
>>>import math
>>> ns = vars(math).copy()>>> ns['__builtins__']=None>>> eval('cos(pi/3)', ns)0.50000000000000011
The clean namespace should prevent injection. For instance:
>>> eval('__builtins__.__import__("os").system("echo got through")', ns)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "<string>", line 1, in <module>
AttributeError: 'NoneType' object has no attribute '__import__'
Otherwise you would get:
>>> eval('__builtins__.__import__("os").system("echo got through")')
got through
0
from solution importSolutionsclassSolutionsTestCase(unittest.TestCase):def setUp(self):
self.solutions =Solutions()def test_evaluate(self):
expressions =['2+3=5','6+4/2*2=10','3+2.45/8=3.30625','3**3*3/3+3=30','2^4=6']
results =[x.split('=')[1]for x in expressions]for e in range(len(expressions)):if'.'in results[e]:
results[e]= float(results[e])else:
results[e]= int(results[e])
self.assertEqual(
results[e],
self.solutions.evaluate(expressions[e]))
solution.py
classSolutions(object):def evaluate(self, exp):def format(res):if'.'in res:try:
res = float(res)exceptValueError:passelse:try:
res = int(res)exceptValueError:passreturn res
def splitter(item, op):
mul = item.split(op)if len(mul)==2:for x in['^','*','/','+','-']:if x in mul[0]:
mul =[mul[0].split(x)[1], mul[1]]if x in mul[1]:
mul =[mul[0], mul[1].split(x)[0]]elif len(mul)>2:passelse:passfor x in range(len(mul)):
mul[x]= format(mul[x])return mul
exp = exp.replace(' ','')if'='in exp:
res = exp.split('=')[1]
res = format(res)
exp = exp.replace('=%s'% res,'')while'^'in exp:if'^'in exp:
itm = splitter(exp,'^')
res = itm[0]^ itm[1]
exp = exp.replace('%s^%s'%(str(itm[0]), str(itm[1])), str(res))while'**'in exp:if'**'in exp:
itm = splitter(exp,'**')
res = itm[0]** itm[1]
exp = exp.replace('%s**%s'%(str(itm[0]), str(itm[1])), str(res))while'/'in exp:if'/'in exp:
itm = splitter(exp,'/')
res = itm[0]/ itm[1]
exp = exp.replace('%s/%s'%(str(itm[0]), str(itm[1])), str(res))while'*'in exp:if'*'in exp:
itm = splitter(exp,'*')
res = itm[0]* itm[1]
exp = exp.replace('%s*%s'%(str(itm[0]), str(itm[1])), str(res))while'+'in exp:if'+'in exp:
itm = splitter(exp,'+')
res = itm[0]+ itm[1]
exp = exp.replace('%s+%s'%(str(itm[0]), str(itm[1])), str(res))while'-'in exp:if'-'in exp:
itm = splitter(exp,'-')
res = itm[0]- itm[1]
exp = exp.replace('%s-%s'%(str(itm[0]), str(itm[1])), str(res))return format(exp)
Here’s my solution to the problem without using eval. Works with Python2 and Python3. It doesn’t work with negative numbers.
$ python -m pytest test.py
test.py
from solution import Solutions
class SolutionsTestCase(unittest.TestCase):
def setUp(self):
self.solutions = Solutions()
def test_evaluate(self):
expressions = [
'2+3=5',
'6+4/2*2=10',
'3+2.45/8=3.30625',
'3**3*3/3+3=30',
'2^4=6'
]
results = [x.split('=')[1] for x in expressions]
for e in range(len(expressions)):
if '.' in results[e]:
results[e] = float(results[e])
else:
results[e] = int(results[e])
self.assertEqual(
results[e],
self.solutions.evaluate(expressions[e])
)
solution.py
class Solutions(object):
def evaluate(self, exp):
def format(res):
if '.' in res:
try:
res = float(res)
except ValueError:
pass
else:
try:
res = int(res)
except ValueError:
pass
return res
def splitter(item, op):
mul = item.split(op)
if len(mul) == 2:
for x in ['^', '*', '/', '+', '-']:
if x in mul[0]:
mul = [mul[0].split(x)[1], mul[1]]
if x in mul[1]:
mul = [mul[0], mul[1].split(x)[0]]
elif len(mul) > 2:
pass
else:
pass
for x in range(len(mul)):
mul[x] = format(mul[x])
return mul
exp = exp.replace(' ', '')
if '=' in exp:
res = exp.split('=')[1]
res = format(res)
exp = exp.replace('=%s' % res, '')
while '^' in exp:
if '^' in exp:
itm = splitter(exp, '^')
res = itm[0] ^ itm[1]
exp = exp.replace('%s^%s' % (str(itm[0]), str(itm[1])), str(res))
while '**' in exp:
if '**' in exp:
itm = splitter(exp, '**')
res = itm[0] ** itm[1]
exp = exp.replace('%s**%s' % (str(itm[0]), str(itm[1])), str(res))
while '/' in exp:
if '/' in exp:
itm = splitter(exp, '/')
res = itm[0] / itm[1]
exp = exp.replace('%s/%s' % (str(itm[0]), str(itm[1])), str(res))
while '*' in exp:
if '*' in exp:
itm = splitter(exp, '*')
res = itm[0] * itm[1]
exp = exp.replace('%s*%s' % (str(itm[0]), str(itm[1])), str(res))
while '+' in exp:
if '+' in exp:
itm = splitter(exp, '+')
res = itm[0] + itm[1]
exp = exp.replace('%s+%s' % (str(itm[0]), str(itm[1])), str(res))
while '-' in exp:
if '-' in exp:
itm = splitter(exp, '-')
res = itm[0] - itm[1]
exp = exp.replace('%s-%s' % (str(itm[0]), str(itm[1])), str(res))
return format(exp)