import csv
with open('coors.csv', mode='r')as infile:
reader = csv.reader(infile)with open('coors_new.csv', mode='w')as outfile:
writer = csv.writer(outfile)for rows in reader:
k = rows[0]
v = rows[1]
mydict ={k:v for k, v in rows}print(mydict)
当我运行上面的代码时,我得到一个ValueError: too many values to unpack (expected 2)。如何从csv文件创建一个字典?谢谢。
I am trying to create a dictionary from a csv file. The first column of the csv file contains unique keys and the second column contains values. Each row of the csv file represents a unique key, value pair within the dictionary. I tried to use the csv.DictReader and csv.DictWriter classes, but I could only figure out how to generate a new dictionary for each row. I want one dictionary. Here is the code I am trying to use:
import csv
with open('coors.csv', mode='r') as infile:
reader = csv.reader(infile)
with open('coors_new.csv', mode='w') as outfile:
writer = csv.writer(outfile)
for rows in reader:
k = rows[0]
v = rows[1]
mydict = {k:v for k, v in rows}
print(mydict)
When I run the above code I get a ValueError: too many values to unpack (expected 2). How do I create one dictionary from a csv file? Thanks.
回答 0
我相信您正在寻找的语法如下:
import csv
with open('coors.csv', mode='r')as infile:
reader = csv.reader(infile)with open('coors_new.csv', mode='w')as outfile:
writer = csv.writer(outfile)
mydict ={rows[0]:rows[1]for rows in reader}
或者,对于python <= 2.7.1,您需要:
mydict = dict((rows[0],rows[1])for rows in reader)
I believe the syntax you were looking for is as follows:
import csv
with open('coors.csv', mode='r') as infile:
reader = csv.reader(infile)
with open('coors_new.csv', mode='w') as outfile:
writer = csv.writer(outfile)
mydict = {rows[0]:rows[1] for rows in reader}
Alternately, for python <= 2.7.1, you want:
mydict = dict((rows[0],rows[1]) for rows in reader)
I’d suggest adding if rows in case there is an empty line at the end of the file
import csv
with open('coors.csv', mode='r') as infile:
reader = csv.reader(infile)
with open('coors_new.csv', mode='w') as outfile:
writer = csv.writer(outfile)
mydict = dict(row[:2] for row in reader if row)
回答 7
一线解决方案
import pandas as pd
dict ={row[0]: row[1]for _, row in pd.read_csv("file.csv").iterrows()}
If you are OK with using the numpy package, then you can do something like the following:
import numpy as np
lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
my_dict[lines[i][0]] = lines[i][1]
with open(csv_file)as f:
csv_list =[[val.strip()for val in r.split(",")]for r in f.readlines()](_,*header),*data = csv_list
csv_dict ={}for row in data:
key,*values = row
csv_dict[key]={key: value for key, value in zip(header, values)}
You can convert it to a Python dictionary using only built-ins
with open(csv_file) as f:
csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]
(_, *header), *data = csv_list
csv_dict = {}
for row in data:
key, *values = row
csv_dict[key] = {key: value for key, value in zip(header, values)}
Note: Python dictionaries have unique keys, so if your csv file has duplicate ids you should append each row to a list.
for row in data:
key, *values = row
if key not in csv_dict:
csv_dict[key] = []
csv_dict[key].append({key: value for key, value in zip(header, values)})
回答 10
您可以使用它,这非常酷:
import dataconverters.commas as commas
filename ='test.csv'with open(filename)as f:
records, metadata = commas.parse(f)for row in records:print'this is row in dictionary:'+rowenter code here
import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
records, metadata = commas.parse(f)
for row in records:
print 'this is row in dictionary:'+rowenter code here
input_file = csv.DictReader(open(path_to_csv_file))
csv_dict ={elem:[]for elem in input_file.fieldnames}for row in input_file:for key in csv_dict.keys():
csv_dict[key].append(row[key])
Many solutions have been posted and I’d like to contribute with mine, which works for a different number of columns in the CSV file.
It creates a dictionary with one key per column, and the value for each key is a list with the elements in such column.
input_file = csv.DictReader(open(path_to_csv_file))
csv_dict = {elem: [] for elem in input_file.fieldnames}
for row in input_file:
for key in csv_dict.keys():
csv_dict[key].append(row[key])
with pandas, it is much easier, for example.
assuming you have the following data as CSV and let’s call it test.txt / test.csv (you know CSV is a sort of text file )
a,b,c,d
1,2,3,4
5,6,7,8
now using pandas
import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()
for each row, it would be
df.to_dict(orient='records')
and that’s it.
回答 13
尝试使用defaultdict和DictReader。
import csv
from collections import defaultdict
my_dict = defaultdict(list)with open('filename.csv','r')as csv_file:
csv_reader = csv.DictReader(csv_file)for line in csv_reader:for key, value in line.items():
my_dict[key].append(value)
import csv
from collections import defaultdict
my_dict = defaultdict(list)
with open('filename.csv', 'r') as csv_file:
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
for key, value in line.items():
my_dict[key].append(value)
l =[['40','20','10','30'],['20','20','20','20','20','30','20'],['30','20','30','50','10','30','20','20','20'],['100','100'],['100','100','100','100','100'],['100','100','100','100']]
现在,我要做的是将列表中的每个元素转换为float。我的解决方案是这样的:
newList =[]for x in l:for y in x:
newList.append(float(y))
Here is how you would do this with a nested list comprehension:
[[float(y) for y in x] for x in l]
This would give you a list of lists, similar to what you started with except with floats instead of strings. If you want one flat list then you would use [float(y) for x in l for y in x].
回答 1
以下是将嵌套的for循环转换为嵌套列表理解的方法:
以下是嵌套列表推导的工作方式:
l a b c d e f
↓↓↓↓↓↓↓In[1]: l =[[[[[[1]]]]]]In[2]:for a in l:...:for b in a:...:for c in b:...:for d in c:...:for e in d:...:for f in e:...:print(float(f))...:1.0In[3]:[float(f)for a in l
...:for b in a
...:for c in b
...:for d in c
...:for e in d
...:for f in e]Out[3]:[1.0]
Here is how to convert nested for loop to nested list comprehension:
Here is how nested list comprehension works:
l a b c d e f
↓ ↓ ↓ ↓ ↓ ↓ ↓
In [1]: l = [ [ [ [ [ [ 1 ] ] ] ] ] ]
In [2]: for a in l:
...: for b in a:
...: for c in b:
...: for d in c:
...: for e in d:
...: for f in e:
...: print(float(f))
...:
1.0
In [3]: [float(f)
for a in l
...: for b in a
...: for c in b
...: for d in c
...: for e in d
...: for f in e]
Out[3]: [1.0]
For your case, it will be something like this.
In [4]: new_list = [float(y) for x in l for y in x]
回答 2
>>> l =[['40','20','10','30'],['20','20','20','20','20','30','20'],['30','20','30','50','10','30','20','20','20'],['100','100'],['100','100','100','100','100'],['100','100','100','100']]>>> new_list =[float(x)for xs in l for x in xs]>>> new_list
[40.0,20.0,10.0,30.0,20.0,20.0,20.0,20.0,20.0,30.0,20.0,30.0,20.0,30.0,50.0,10.0,30.0,20.0,20.0,20.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0,100.0]
Not sure what your desired output is, but if you’re using list comprehension, the order follows the order of nested loops, which you have backwards. So I got the what I think you want with:
[float(y) for x in l for y in x]
The principle is: use the same order you’d use in writing it out as nested for loops.
回答 4
由于我来这里不晚,但我想分享列表理解的实际工作原理,尤其是嵌套列表理解:
New_list=[[float(y)for x in l]
实际上与:
New_list=[]for x in l:New_list.append(x)
现在嵌套列表理解:
[[float(y)for y in x]for x in l]
与;
new_list=[]for x in l:
sub_list=[]for y in x:
sub_list.append(float(y))
new_list.append(sub_list)print(new_list)
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10]">>>100000 loops, best of 3:15.2 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10]">>>10000 loops, best of 3:19.6 usec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100]">>>100000 loops, best of 3:15.2 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100]">>>10000 loops, best of 3:19.6 usec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*1000]">>>1000 loops, best of 3:1.43 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*1000]">>>100 loops, best of 3:1.91 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10000]">>>100 loops, best of 3:13.6 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10000]">>>10 loops, best of 3:19.1 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100000]">>>10 loops, best of 3:164 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100000]">>>10 loops, best of 3:216 msec per loop
在下一组测试中,我希望将每个列表的元素数量增加到100个。
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10]">>>10000 loops, best of 3:110 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10]">>>10000 loops, best of 3:151 usec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100]">>>1000 loops, best of 3:1.11 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100]">>>1000 loops, best of 3:1.5 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*1000]">>>100 loops, best of 3:11.2 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*1000]">>>100 loops, best of 3:16.7 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10000]">>>10 loops, best of 3:134 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10000]">>>10 loops, best of 3:171 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100000]">>>10 loops, best of 3:1.32 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100000]">>>10 loops, best of 3:1.7 sec per loop
让我们采取一个勇敢的步骤并将列表中的元素数修改为1000
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10]">>>1000 loops, best of 3:800 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10]">>>1000 loops, best of 3:1.16 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100]">>>100 loops, best of 3:8.26 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100]">>>100 loops, best of 3:11.7 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*1000]">>>10 loops, best of 3:83.8 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*1000]">>>10 loops, best of 3:118 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10000]">>>10 loops, best of 3:868 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10000]">>>10 loops, best of 3:1.23 sec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100000]">>>10 loops, best of 3:9.2 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100000]">>>10 loops, best of 3:12.7 sec per loop
I had a similar problem to solve so I came across this question. I did a performance comparison of Andrew Clark’s and narayan’s answer which I would like to share.
Lets do a performance benchmark to see if it is actually true. I used python version 3.5.0 to perform all these tests. In first set of tests I would like to keep elements per list to be 10 and vary number of lists from 10-100,000
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10]"
>>> 100000 loops, best of 3: 15.2 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10]"
>>> 10000 loops, best of 3: 19.6 usec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100]"
>>> 100000 loops, best of 3: 15.2 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100]"
>>> 10000 loops, best of 3: 19.6 usec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*1000]"
>>> 1000 loops, best of 3: 1.43 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*1000]"
>>> 100 loops, best of 3: 1.91 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10000]"
>>> 100 loops, best of 3: 13.6 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10000]"
>>> 10 loops, best of 3: 19.1 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100000]"
>>> 10 loops, best of 3: 164 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100000]"
>>> 10 loops, best of 3: 216 msec per loop
In the next set of tests I would like to raise number of elements per lists to 100.
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10]"
>>> 10000 loops, best of 3: 110 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10]"
>>> 10000 loops, best of 3: 151 usec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100]"
>>> 1000 loops, best of 3: 1.11 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100]"
>>> 1000 loops, best of 3: 1.5 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*1000]"
>>> 100 loops, best of 3: 11.2 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*1000]"
>>> 100 loops, best of 3: 16.7 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10000]"
>>> 10 loops, best of 3: 134 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10000]"
>>> 10 loops, best of 3: 171 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100000]"
>>> 10 loops, best of 3: 1.32 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100000]"
>>> 10 loops, best of 3: 1.7 sec per loop
Lets take a brave step and modify the number of elements in lists to be 1000
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10]"
>>> 1000 loops, best of 3: 800 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10]"
>>> 1000 loops, best of 3: 1.16 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100]"
>>> 100 loops, best of 3: 8.26 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100]"
>>> 100 loops, best of 3: 11.7 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*1000]"
>>> 10 loops, best of 3: 83.8 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*1000]"
>>> 10 loops, best of 3: 118 msec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10000]"
>>> 10 loops, best of 3: 868 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10000]"
>>> 10 loops, best of 3: 1.23 sec per loop
>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100000]"
>>> 10 loops, best of 3: 9.2 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100000]"
>>> 10 loops, best of 3: 12.7 sec per loop
From these test we can conclude that map has a performance benefit over list comprehension in this case. This is also applicable if you are trying to cast to either int or str. For small number of lists with less elements per list, the difference is negligible. For larger lists with more elements per list one might like to use map instead of list comprehension, but it totally depends on application needs.
However I personally find list comprehension to be more readable and idiomatic than map. It is a de-facto standard in python. Usually people are more proficient and comfortable(specially beginner) in using list comprehension than map.
l =[['40','20','10','30'],['20','20','20','20','20','30','20'],['30','20','30','50','10','30','20','20','20'],['100','100'],['100','100','100','100','100'],['100','100','100','100']]
map(lambda x:map(lambda y:float(y),x),l)
This Problem can be solved without using for loop.Single line code will be sufficient for this. Using Nested Map with lambda function will also works here.
Having an iterator object, is there something faster, better or more correct than a list comprehension to get a list of the objects returned by the iterator?
It’s more about python list comprehension syntax. I’ve got a list comprehension that produces list of odd numbers of a given range:
[x for x in range(1, 10) if x % 2]
This makes a filter – I’ve got a source list, where I remove even numbers (if x % 2). I’d like to use something like if-then-else here. Following code fails:
>>> [x for x in range(1, 10) if x % 2 else x * 100]
File "<stdin>", line 1
[x for x in range(1, 10) if x % 2 else x * 100]
^
SyntaxError: invalid syntax
x if y else z is the syntax for the expression you’re returning for each element. Thus you need:
[ x if x%2 else x*100 for x in range(1, 10) ]
The confusion arises from the fact you’re using a filter in the first example, but not in the second. In the second example you’re only mapping each value to another, using a ternary-operator expression.
With a filter, you need:
[ EXP for x in seq if COND ]
Without a filter you need:
[ EXP for x in seq ]
and in your second example, the expression is a “complex” one, which happens to involve an if-else.
mylist =['abc','abcdef','abcd']for each in mylist:if condition1:
do_something()elif ___________________:#else if each is the longest string contained in mylist:
do_something_else()
I have a list of variable length and am trying to find a way to test if the list item currently being evaluated is the longest string contained in the list. And I am using Python 2.6.1
For example:
mylist = ['abc','abcdef','abcd']
for each in mylist:
if condition1:
do_something()
elif ___________________: #else if each is the longest string contained in mylist:
do_something_else()
Surely there’s a simple list comprehension that’s short and elegant that I’m overlooking?
What should happen if there are more than 1 longest string (think ’12’, and ’01’)?
Try that to get the longest element
max_length,longest_element = max([(len(x),x) for x in ('a','b','aa')])
And then regular foreach
for st in mylist:
if len(st)==max_length:...
回答 2
def longestWord(some_list):
count =0#You set the count to 0for i in some_list:# Go through the whole listif len(i)> count:#Checking for the longest word(string)
count = len(i)
word = i
return("the longest string is "+ word)
def longestWord(some_list):
count = 0 #You set the count to 0
for i in some_list: # Go through the whole list
if len(i) > count: #Checking for the longest word(string)
count = len(i)
word = i
return ("the longest string is " + word)
Looks like you could use the max function if you map it correctly for strings and use that as the comparison. I would recommend just finding the max once though of course, not for each element in the list.
回答 4
len(each) == max(len(x) for x in myList) 要不就 each == max(myList, key=len)
I hope this helps someone else since a,b,x,y don’t have much meaning to me! Suppose you have a text full of sentences and you want an array of words.
# Without list comprehension
list_of_words = []
for sentence in text:
for word in sentence:
list_of_words.append(word)
return list_of_words
I like to think of list comprehension as stretching code horizontally.
Try breaking it up into:
# List Comprehension
[word for sentence in text for word in sentence]
Example:
>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> [word for sentence in text for word in sentence]
['Hi', 'Steve!', "What's", 'up?']
This also works for generators
>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> gen = (word for sentence in text for word in sentence)
>>> for word in gen: print(word)
Hi
Steve!
What's
up?
Take for example: [str(x) for i in range(3) for x in foo(i)]
Let’s decompose it:
def foo(i):
return i, i + 0.5
[str(x)
for i in range(3)
for x in foo(i)
]
# is same as
for i in range(3):
for x in foo(i):
yield str(x)
回答 4
ThomasH已经添加了一个很好的答案,但是我想说明会发生什么:
>>> a =[[1,2],[3,4]]>>>[x for x in b for b in a]Traceback(most recent call last):File"<stdin>", line 1,in<module>NameError: name 'b'isnot defined
>>>[x for b in a for x in b][1,2,3,4]>>>[x for x in b for b in a][3,3,4,4]
ThomasH has already added a good answer, but I want to show what happens:
>>> a = [[1, 2], [3, 4]]
>>> [x for x in b for b in a]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined
>>> [x for b in a for x in b]
[1, 2, 3, 4]
>>> [x for x in b for b in a]
[3, 3, 4, 4]
I guess Python parses the list comprehension from left to right. This means, the first for loop that occurs will be executed first.
The second “problem” of this is that b gets “leaked” out of the list comprehension. After the first successful list comprehension b == [3, 4].
回答 5
如果要保留多维数组,则应嵌套数组括号。请参阅下面的示例,其中每个元素都添加了一个。
>>> a =[[1,2],[3,4]]>>>[[col +1for col in row]for row in a][[2,3],[4,5]]>>>[col +1for row in a for col in row][2,3,4,5]
c=[111,222,333]
b=[11,22,33]
a=[1,2,3]print([(i, j, k)# <RETURNED_VALUE> for i in a for j in b for k in c # in order: loop1, loop2, loop3if i <2and j <20and k <200# <OPTIONAL_IF>])[(1,11,111)]
因为上面只是一个:
for i in a:# outer loop1 GOES SECONDfor j in b:# inner loop2 GOES THIRDfor k in c:# inner loop3 GOES FOURTHif i <2and j <20and k <200:print((i, j, k))# returned value GOES FIRST
对于迭代一个嵌套列表/结构,技术是相同的:a从问题出发:
a =[[1,2],[3,4]][i2 for i1 in a for i2 in i1]
which return[1,2,3,4]
互相嵌套的水平
a =[[[1,2],[3,4]],[[5,6],[7,8,9]],[[10]]][i3 for i1 in a for i2 in i1 for i3 in i2]
which return[1,2,3,4,5,6,7,8,9,10]
And now you can think about Return + Outer-loop
as the only Right Order
Knowing above, the order in list comprehensive even for 3 loops seem easy:
c=[111, 222, 333]
b=[11, 22, 33]
a=[1, 2, 3]
print(
[
(i, j, k) # <RETURNED_VALUE>
for i in a for j in b for k in c # in order: loop1, loop2, loop3
if i < 2 and j < 20 and k < 200 # <OPTIONAL_IF>
]
)
[(1, 11, 111)]
because the above is just a:
for i in a: # outer loop1 GOES SECOND
for j in b: # inner loop2 GOES THIRD
for k in c: # inner loop3 GOES FOURTH
if i < 2 and j < 20 and k < 200:
print((i, j, k)) # returned value GOES FIRST
for iterating one nested list/structure, technic is the same:
for a from the question:
a = [[1,2],[3,4]]
[i2 for i1 in a for i2 in i1]
which return [1, 2, 3, 4]
for one another nested level
a = [[[1, 2], [3, 4]], [[5, 6], [7, 8, 9]], [[10]]]
[i3 for i1 in a for i2 in i1 for i3 in i2]
which return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
and so on
回答 7
我觉得这更容易理解
[row[i]for row in a for i in range(len(a))]
result:[1,2,3,4]
Additionally, you could use just the same variable for the member of the input list which is currently accessed and for the element inside this member. However, this might even make it more (list) incomprehensible.
input = [[1, 2], [3, 4]]
[x for x in input for x in x]
First for x in input is evaluated, leading to one member list of the input, then, Python walks through the second part for x in x during which the x-value is overwritten by the current element it is accessing, then the first x defines what we want to return.
There are dictionary comprehensions in Python 2.7+, but they don’t work quite the way you’re trying. Like a list comprehension, they create a new dictionary; you can’t use them to add keys to an existing dictionary. Also, you have to specify the keys and values, although of course you can specify a dummy value if you like.
>>> d = {n: n**2 for n in range(5)}
>>> print d
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}
If you want to set them all to True:
>>> d = {n: True for n in range(5)}
>>> print d
{0: True, 1: True, 2: True, 3: True, 4: True}
What you seem to be asking for is a way to set multiple keys at once on an existing dictionary. There’s no direct shortcut for that. You can either loop like you already showed, or you could use a dictionary comprehension to create a new dict with the new values, and then do oldDict.update(newDict) to merge the new values into the old dict.
I really like the @mgilson comment, since if you have a two iterables, one that corresponds to the keys and the other the values, you can also do the following.
The main purpose of a list comprehension is to create a new list based on another one without changing or destroying the original list.
Instead of writing
l = []
for n in range(1, 11):
l.append(n)
or
l = [n for n in range(1, 11)]
you should write only
l = range(1, 11)
In the two top code blocks you’re creating a new list, iterating through it and just returning each element. It’s just an expensive way of creating a list copy.
To get a new dictionary with all keys set to the same value based on another dict, do this:
old_dict = {'a': 1, 'c': 3, 'b': 2}
new_dict = { key:'your value here' for key in old_dict.keys()}
You’re receiving a SyntaxError because when you write
d = {}
d[i for i in range(1, 11)] = True
you’re basically saying: “Set my key ‘i for i in range(1, 11)’ to True” and “i for i in range(1, 11)” is not a valid key, it’s just a syntax error. If dicts supported lists as keys, you would do something like
d[[i for i in range(1, 11)]] = True
and not
d[i for i in range(1, 11)] = True
but lists are not hashable, so you can’t use them as dict keys.
Raymond Hettinger (one of the Python core developers) had this to say about tuples in a recent tweet:
#python tip: Generally, lists are for looping; tuples for structs. Lists are homogeneous; tuples heterogeneous. Lists for variable length.
This (to me) supports the idea that if the items in a sequence are related enough to be generated by a, well, generator, then it should be a list. Although a tuple is iterable and seems like simply a immutable list, it’s really the Python equivalent of a C struct:
struct {
int a;
char b;
float c;
} foo;
struct foo x = { 3, 'g', 5.9 };
Comprehension works by looping or iterating over items and assigning them into a container, a Tuple is unable to receive assignments.
Once a Tuple is created, it can not be appended to, extended, or assigned to. The only way to modify a Tuple is if one of its objects can itself be assigned to (is a non-tuple container). Because the Tuple is only holding a reference to that kind of object.
Also – a tuple has its own constructor tuple() which you can give any iterator. Which means that to create a tuple, you could do:
I believe it’s simply for the sake of clarity, we do not want to clutter the language with too many different symbols. Also a tuple comprehension is never necessary, a list can just be used instead with negligible speed differences, unlike a dict comprehension as opposed to a list comprehension.
回答 9
我们可以从列表理解中生成元组。下一个将两个数字顺序加到一个元组中,并给出一个从0-9的列表。
>>>print k
[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99]>>> r=[tuple(k[i:i+2])for i in xrange(10)ifnot i%2]>>>print r
[(0,1),(2,3),(4,5),(6,7),(8,9)]
I want to create a series of lists, all of varying lengths. Each list will contain the same element e, repeated n times (where n = length of the list).
How do I create the lists, without using a list comprehension [e for number in xrange(n)] for each list?
You should note that if e is for example an empty list you get a list with n references to the same list, not n independent empty lists.
Performance testing
At first glance it seems that repeat is the fastest way to create a list with n identical elements:
>>> timeit.timeit('itertools.repeat(0, 10)', 'import itertools', number = 1000000)
0.37095273281943264
>>> timeit.timeit('[0] * 10', 'import itertools', number = 1000000)
0.5577236771712819
But wait – it’s not a fair test…
>>> itertools.repeat(0, 10)
repeat(0, 10) # Not a list!!!
The function itertools.repeat doesn’t actually create the list, it just creates an object that can be used to create a list if you wish! Let’s try that again, but converting to a list:
>>> timeit.timeit('list(itertools.repeat(0, 10))', 'import itertools', number = 1000000)
1.7508119747063233
So if you want a list, use [e] * n. If you want to generate the elements lazily, use repeat.
回答 1
>>>[5]*4[5,5,5,5]
当重复的项目是列表时,请当心。该列表将不会被克隆:所有元素都将引用同一列表!
>>> x=[5]>>> y=[x]*4>>> y
[[5],[5],[5],[5]]>>> y[0][0]=6>>> y
[[6],[6],[6],[6]]
>>> l =[0]*4>>> l[0]+=1>>> l
[1,0,0,0]>>> l =[frozenset()]*4>>> l[0]|= set('abc')>>> l
[frozenset(['a','c','b']), frozenset([]), frozenset([]), frozenset([])]
但同样,可变对象对此没有好处,因为就地操作会更改对象,而不是引用:
l =[set()]*4>>> l[0]|= set('abc')>>> l
[set(['a','c','b']), set(['a','c','b']), set(['a','c','b']), set(['a','c','b'])]
Create List of Single Item Repeated n Times in Python
Immutable items
For immutable items, like None, bools, ints, floats, strings, tuples, or frozensets, you can do it like this:
[e] * 4
Note that this is best only used with immutable items (strings, tuples, frozensets, ) in the list, because they all point to the same item in the same place in memory. I use this frequently when I have to build a table with a schema of all strings, so that I don’t have to give a highly redundant one to one mapping.
schema = ['string'] * len(columns)
Mutable items
I’ve used Python for a long time now, and I have never seen a use-case where I would do the above with a mutable instance. Instead, to get, say, a mutable empty list, set, or dict, you should do something like this:
list_of_lists = [[] for _ in columns]
The underscore is simply a throwaway variable name in this context.
If you only have the number, that would be:
list_of_lists = [[] for _ in range(4)]
The _ is not really special, but your coding environment style checker will probably complain if you don’t intend to use the variable and use any other name.
Caveats for using the immutable method with mutable items:
Beware doing this with mutable objects, when you change one of them, they all change because they’re all the same object:
foo = [[]] * 4
foo[0].append('x')
foo now returns:
[['x'], ['x'], ['x'], ['x']]
But with immutable objects, you can make it work because you change the reference, not the object:
>>> l = [0] * 4
>>> l[0] += 1
>>> l
[1, 0, 0, 0]
>>> l = [frozenset()] * 4
>>> l[0] |= set('abc')
>>> l
[frozenset(['a', 'c', 'b']), frozenset([]), frozenset([]), frozenset([])]
But again, mutable objects are no good for this, because in-place operations change the object, not the reference:
Of course itertools gives you a iterator instead of a list. [e] * n gives you a list, but, depending on what you will do with those sequences, the itertools variant can be much more efficient.
As others have pointed out, using the * operator for a mutable object duplicates references, so if you change one you change them all. If you want to create independent instances of a mutable object, your xrange syntax is the most Pythonic way to do this. If you are bothered by having a named variable that is never used, you can use the anonymous underscore variable.
def gen():return(something for something in get_some_stuff())print gen()[:2]# generators don't support indexing or slicingprint[5,6]+ gen()# generators can't be added to lists
John’s answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it’s also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won’t work:
def gen():
return (something for something in get_some_stuff())
print gen()[:2] # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists
Basically, use a generator expression if all you’re doing is iterating once. If you want to store and use the generated results, then you’re probably better off with a list comprehension.
Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.
Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.
Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.
The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will “filter” the source material on-the-fly as you consume the bits.
Imagine you have a 2TB log file called “hugefile.txt”, and you want the content and length for all the lines that start with the word “ENTRY”.
So you try starting out by writing a list comprehension:
logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]
This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That’s a lot of RAM, and probably not practical for your purposes.
So instead we can use a generator to apply a “filter” to our content. No data is actually read until we start iterating over the result.
logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))
Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:
long_entries = ((line,length) for (line,length) in entry_lines if length > 80)
Still nothing has been read, but we’ve specified now two generators that will act on our data as we wish.
Lets write out our filtered lines to another file:
outfile = open("filtered.txt","a")
for entry,length in long_entries:
outfile.write(entry)
Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.
So instead of “pushing” data to your output function in the form of a fully-populated list, you’re giving the output function a way to “pull” data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we’ve read gets immediately discarded, so we can’t go back to a previous line. On the other hand, we don’t have to worry about keeping data around once we’re done with it.
The benefit of a generator expression is that it uses less memory since it doesn’t build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.
For example:
sum(x*2 for x in xrange(256))
dict( (k, some_func(k)) for k in some_list_of_keys )
The advantage there is that the list isn’t completely generated, and thus little memory is used (and should also be faster)
You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.
When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:
>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing
If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.
import mincemeat
def mapfn(k,v):
for w in v:
yield 'sum',w
#yield 'count',1
def reducefn(k,v):
r1=sum(v)
r2=len(v)
print r2
m=r1/r2
std=0
for i in range(r2):
std+=pow(abs(v[i]-m),2)
res=pow((std/r2),0.5)
return r1,r2,res
Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop’s map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).
Hadoop is a great example for using all the advantages of Generators.