标签归档:list-comprehension

从CSV文件创建字典?

问题:从CSV文件创建字典?

我正在尝试从csv文件创建字典。csv文件的第一列包含唯一键,第二列包含值。csv文件的每一行代表字典中的唯一键,值对。我尝试使用csv.DictReadercsv.DictWriter类,但是只能弄清楚如何为每一行生成一个新的字典。我要一部字典。这是我尝试使用的代码:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
    writer = csv.writer(outfile)
    for rows in reader:
        k = rows[0]
        v = rows[1]
        mydict = {k:v for k, v in rows}
    print(mydict)

当我运行上面的代码时,我得到一个ValueError: too many values to unpack (expected 2)。如何从csv文件创建一个字典?谢谢。

I am trying to create a dictionary from a csv file. The first column of the csv file contains unique keys and the second column contains values. Each row of the csv file represents a unique key, value pair within the dictionary. I tried to use the csv.DictReader and csv.DictWriter classes, but I could only figure out how to generate a new dictionary for each row. I want one dictionary. Here is the code I am trying to use:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
    writer = csv.writer(outfile)
    for rows in reader:
        k = rows[0]
        v = rows[1]
        mydict = {k:v for k, v in rows}
    print(mydict)

When I run the above code I get a ValueError: too many values to unpack (expected 2). How do I create one dictionary from a csv file? Thanks.


回答 0

我相信您正在寻找的语法如下:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = {rows[0]:rows[1] for rows in reader}

或者,对于python <= 2.7.1,您需要:

mydict = dict((rows[0],rows[1]) for rows in reader)

I believe the syntax you were looking for is as follows:

import csv

with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = {rows[0]:rows[1] for rows in reader}

Alternately, for python <= 2.7.1, you want:

mydict = dict((rows[0],rows[1]) for rows in reader)

回答 1

通过依次调用open和打开文件csv.DictReader

input_file = csv.DictReader(open("coors.csv"))

您可以通过遍历input_file遍历csv文件dict阅读器对象的行。

for row in input_file:
    print(row)

或仅访问第一行

dictobj = csv.DictReader(open('coors.csv')).next() 

更新 在python 3+版本中,此代码将有所变化:

reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader) 

Open the file by calling open and then csv.DictReader.

input_file = csv.DictReader(open("coors.csv"))

You may iterate over the rows of the csv file dict reader object by iterating over input_file.

for row in input_file:
    print(row)

OR To access first line only

dictobj = csv.DictReader(open('coors.csv')).next() 

UPDATE In python 3+ versions, this code would change a little:

reader = csv.DictReader(open('coors.csv'))
dictobj = next(reader) 

回答 2

import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
   k, v = row
   d[k] = v
import csv
reader = csv.reader(open('filename.csv', 'r'))
d = {}
for row in reader:
   k, v = row
   d[k] = v

回答 3

这不是很好,但是使用熊猫的一线解决方案。

import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()

如果要为索引指定dtype(如果由于bug而使用index_col参数,则无法在read_csv中指定该类型):

import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()

This isn’t elegant but a one line solution using pandas.

import pandas as pd
pd.read_csv('coors.csv', header=None, index_col=0, squeeze=True).to_dict()

If you want to specify dtype for your index (it can’t be specified in read_csv if you use the index_col argument because of a bug):

import pandas as pd
pd.read_csv('coors.csv', header=None, dtype={0: str}).set_index(0).squeeze().to_dict()

回答 4

您只需要将csv.reader转换为dict:

~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3

~ >> cat > d.py
import csv
with open('1.csv') as f:
    d = dict(filter(None, csv.reader(f)))

print(d)

~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}

You have to just convert csv.reader to dict:

~ >> cat > 1.csv
key1, value1
key2, value2
key2, value22
key3, value3

~ >> cat > d.py
import csv
with open('1.csv') as f:
    d = dict(filter(None, csv.reader(f)))

print(d)

~ >> python d.py
{'key3': ' value3', 'key2': ' value22', 'key1': ' value1'}

回答 5

您也可以为此使用numpy。

from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }

You can also use numpy for this.

from numpy import loadtxt
key_value = loadtxt("filename.csv", delimiter=",")
mydict = { k:v for k,v in key_value }

回答 6

我建议添加if rows,以防文件末尾有空行

import csv
with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = dict(row[:2] for row in reader if row)

I’d suggest adding if rows in case there is an empty line at the end of the file

import csv
with open('coors.csv', mode='r') as infile:
    reader = csv.reader(infile)
    with open('coors_new.csv', mode='w') as outfile:
        writer = csv.writer(outfile)
        mydict = dict(row[:2] for row in reader if row)

回答 7

一线解决方案

import pandas as pd

dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}

One-liner solution

import pandas as pd

dict = {row[0] : row[1] for _, row in pd.read_csv("file.csv").iterrows()}

回答 8

如果可以使用numpy包,则可以执行以下操作:

import numpy as np

lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
   my_dict[lines[i][0]] = lines[i][1]

If you are OK with using the numpy package, then you can do something like the following:

import numpy as np

lines = np.genfromtxt("coors.csv", delimiter=",", dtype=None)
my_dict = dict()
for i in range(len(lines)):
   my_dict[lines[i][0]] = lines[i][1]

回答 9

对于简单的csv文件,例如以下内容

id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3

您可以仅使用内置功能将其转换为Python字典

with open(csv_file) as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
csv_dict = {}
for row in data:
    key, *values = row   
    csv_dict[key] = {key: value for key, value in zip(header, values)}

这应该产生以下字典

{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
 'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
 'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
 'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}

注意:Python字典具有唯一键,因此,如果csv文件重复ids,则应将每行追加到列表中。

for row in data:
    key, *values = row

    if key not in csv_dict:
            csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})

For simple csv files, such as the following

id,col1,col2,col3
row1,r1c1,r1c2,r1c3
row2,r2c1,r2c2,r2c3
row3,r3c1,r3c2,r3c3
row4,r4c1,r4c2,r4c3

You can convert it to a Python dictionary using only built-ins

with open(csv_file) as f:
    csv_list = [[val.strip() for val in r.split(",")] for r in f.readlines()]

(_, *header), *data = csv_list
csv_dict = {}
for row in data:
    key, *values = row   
    csv_dict[key] = {key: value for key, value in zip(header, values)}

This should yield the following dictionary

{'row1': {'col1': 'r1c1', 'col2': 'r1c2', 'col3': 'r1c3'},
 'row2': {'col1': 'r2c1', 'col2': 'r2c2', 'col3': 'r2c3'},
 'row3': {'col1': 'r3c1', 'col2': 'r3c2', 'col3': 'r3c3'},
 'row4': {'col1': 'r4c1', 'col2': 'r4c2', 'col3': 'r4c3'}}

Note: Python dictionaries have unique keys, so if your csv file has duplicate ids you should append each row to a list.

for row in data:
    key, *values = row

    if key not in csv_dict:
            csv_dict[key] = []

    csv_dict[key].append({key: value for key, value in zip(header, values)})

回答 10

您可以使用它,这非常酷:

import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
      records, metadata = commas.parse(f)
      for row in records:
            print 'this is row in dictionary:'+rowenter code here

You can use this, it is pretty cool:

import dataconverters.commas as commas
filename = 'test.csv'
with open(filename) as f:
      records, metadata = commas.parse(f)
      for row in records:
            print 'this is row in dictionary:'+rowenter code here

回答 11

已经发布了许多解决方案,我想为我的做出贡献,该解决方案适用于CSV文件中不同数量的列。它创建每列一个键的字典,每个键的值是一个列表,其中包含该列中的元素。

    input_file = csv.DictReader(open(path_to_csv_file))
    csv_dict = {elem: [] for elem in input_file.fieldnames}
    for row in input_file:
        for key in csv_dict.keys():
            csv_dict[key].append(row[key])

Many solutions have been posted and I’d like to contribute with mine, which works for a different number of columns in the CSV file. It creates a dictionary with one key per column, and the value for each key is a list with the elements in such column.

    input_file = csv.DictReader(open(path_to_csv_file))
    csv_dict = {elem: [] for elem in input_file.fieldnames}
    for row in input_file:
        for key in csv_dict.keys():
            csv_dict[key].append(row[key])

回答 12

例如,使用熊猫要容易得多。假设您拥有以下数据作为CSV并将其命名为test.txt/ test.csv(您知道CSV是一种文本文件)

a,b,c,d
1,2,3,4
5,6,7,8

现在正在使用熊猫

import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()

对于每一行,

df.to_dict(orient='records')

就是这样。

with pandas, it is much easier, for example. assuming you have the following data as CSV and let’s call it test.txt / test.csv (you know CSV is a sort of text file )

a,b,c,d
1,2,3,4
5,6,7,8

now using pandas

import pandas as pd
df = pd.read_csv("./text.txt")
df_to_doct = df.to_dict()

for each row, it would be

df.to_dict(orient='records')

and that’s it.


回答 13

尝试使用defaultdictDictReader

import csv
from collections import defaultdict
my_dict = defaultdict(list)

with open('filename.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        for key, value in line.items():
            my_dict[key].append(value)

它返回:

{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}

Try to use a defaultdict and DictReader.

import csv
from collections import defaultdict
my_dict = defaultdict(list)

with open('filename.csv', 'r') as csv_file:
    csv_reader = csv.DictReader(csv_file)
    for line in csv_reader:
        for key, value in line.items():
            my_dict[key].append(value)

It returns:

{'key1':[value_1, value_2, value_3], 'key2': [value_a, value_b, value_c], 'Key3':[value_x, Value_y, Value_z]}

嵌套列表上的列表理解?

问题:嵌套列表上的列表理解?

我有这个嵌套列表:

l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']]

现在,我要做的是将列表中的每个元素转换为float。我的解决方案是这样的:

newList = []
for x in l:
  for y in x:
    newList.append(float(y))

但这可以使用嵌套列表理解来完成吗?

我所做的是:

[float(y) for y in x for x in l]

但是结果是一堆100的总数为2400。

任何解决方案,解释将不胜感激。谢谢!

I have this nested list:

l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']]

Now, what I want to do is convert each element in a list to float. My solution is this:

newList = []
for x in l:
  for y in x:
    newList.append(float(y))

But can this be done using nested list comprehension, right?

what I’ve done is:

[float(y) for y in x for x in l]

But then the result is bunch of 100’s with the sum of 2400.

any solution, an explanation would be much appreciated. Thanks!


回答 0

这是使用嵌套列表推导的方法:

[[float(y) for y in x] for x in l]

这将为您提供一个列表列表,类似于您开始时使用的列表,但使用浮点数而不是字符串。如果您想要一个固定列表,则可以使用[float(y) for x in l for y in x]

Here is how you would do this with a nested list comprehension:

[[float(y) for y in x] for x in l]

This would give you a list of lists, similar to what you started with except with floats instead of strings. If you want one flat list then you would use [float(y) for x in l for y in x].


回答 1

以下是将嵌套的for循环转换为嵌套列表理解的方法:

以下是嵌套列表推导的工作方式:

            l a b c d e f
                  
In [1]: l = [ [ [ [ [ [ 1 ] ] ] ] ] ]
In [2]: for a in l:
   ...:     for b in a:
   ...:         for c in b:
   ...:             for d in c:
   ...:                 for e in d:
   ...:                     for f in e:
   ...:                         print(float(f))
   ...:                         
1.0

In [3]: [float(f)
         for a in l
   ...:     for b in a
   ...:         for c in b
   ...:             for d in c
   ...:                 for e in d
   ...:                     for f in e]
Out[3]: [1.0]

对于您的情况,将是这样的。

In [4]: new_list = [float(y) for x in l for y in x]

Here is how to convert nested for loop to nested list comprehension:

Here is how nested list comprehension works:

            l a b c d e f
            ↓ ↓ ↓ ↓ ↓ ↓ ↓
In [1]: l = [ [ [ [ [ [ 1 ] ] ] ] ] ]
In [2]: for a in l:
   ...:     for b in a:
   ...:         for c in b:
   ...:             for d in c:
   ...:                 for e in d:
   ...:                     for f in e:
   ...:                         print(float(f))
   ...:                         
1.0

In [3]: [float(f)
         for a in l
   ...:     for b in a
   ...:         for c in b
   ...:             for d in c
   ...:                 for e in d
   ...:                     for f in e]
Out[3]: [1.0]

For your case, it will be something like this.

In [4]: new_list = [float(y) for x in l for y in x]

回答 2

>>> l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']]
>>> new_list = [float(x) for xs in l for x in xs]
>>> new_list
[40.0, 20.0, 10.0, 30.0, 20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0, 30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]
>>> l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']]
>>> new_list = [float(x) for xs in l for x in xs]
>>> new_list
[40.0, 20.0, 10.0, 30.0, 20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0, 30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0, 100.0]

回答 3

不确定所需的输出是什么,但是如果您使用列表推导,则顺序遵循嵌套循环的顺序,而嵌套循环的顺序是向后的。所以我得到了我想要的东西:

[float(y) for x in l for y in x]

原理是:使用与嵌套循环相同的顺序来写出来。

Not sure what your desired output is, but if you’re using list comprehension, the order follows the order of nested loops, which you have backwards. So I got the what I think you want with:

[float(y) for x in l for y in x]

The principle is: use the same order you’d use in writing it out as nested for loops.


回答 4

由于我来这里不晚,但我想分享列表理解的实际工作原理,尤其是嵌套列表理解:

New_list= [[float(y) for x in l]

实际上与:

New_list=[]
for x in l:
    New_list.append(x)

现在嵌套列表理解:

[[float(y) for y in x] for x in l]

与;

new_list=[]
for x in l:
    sub_list=[]
    for y in x:
        sub_list.append(float(y))

    new_list.append(sub_list)

print(new_list)

输出:

[[40.0, 20.0, 10.0, 30.0], [20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0], [30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0], [100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0]]

Since i am little late here but i wanted to share how actually list comprehension works especially nested list comprehension :

New_list= [[float(y) for x in l]

is actually same as :

New_list=[]
for x in l:
    New_list.append(x)

And now nested list comprehension :

[[float(y) for y in x] for x in l]

is same as ;

new_list=[]
for x in l:
    sub_list=[]
    for y in x:
        sub_list.append(float(y))

    new_list.append(sub_list)

print(new_list)

output:

[[40.0, 20.0, 10.0, 30.0], [20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0], [30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0], [100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0]]

回答 5

如果您不喜欢嵌套列表推导,也可以使用map函数,

>>> from pprint import pprint

>>> l = l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']] 

>>> pprint(l)
[['40', '20', '10', '30'],
['20', '20', '20', '20', '20', '30', '20'],
['30', '20', '30', '50', '10', '30', '20', '20', '20'],
['100', '100'],
['100', '100', '100', '100', '100'],
['100', '100', '100', '100']]

>>> float_l = [map(float, nested_list) for nested_list in l]

>>> pprint(float_l)
[[40.0, 20.0, 10.0, 30.0],
[20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0],
[30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0],
[100.0, 100.0],
[100.0, 100.0, 100.0, 100.0, 100.0],
[100.0, 100.0, 100.0, 100.0]]

If you don’t like nested list comprehensions, you can make use of the map function as well,

>>> from pprint import pprint

>>> l = l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']] 

>>> pprint(l)
[['40', '20', '10', '30'],
['20', '20', '20', '20', '20', '30', '20'],
['30', '20', '30', '50', '10', '30', '20', '20', '20'],
['100', '100'],
['100', '100', '100', '100', '100'],
['100', '100', '100', '100']]

>>> float_l = [map(float, nested_list) for nested_list in l]

>>> pprint(float_l)
[[40.0, 20.0, 10.0, 30.0],
[20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0],
[30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0],
[100.0, 100.0],
[100.0, 100.0, 100.0, 100.0, 100.0],
[100.0, 100.0, 100.0, 100.0]]

回答 6

我有一个类似的问题要解决,所以遇到了这个问题。我对安德鲁·克拉克(Andrew Clark)和纳拉扬(narayan)的答案进行了性能比较,我想分享一下。

两个答案之间的主要区别是它们如何遍历内部列表。其中一个使用内置地图,而另一个使用列表推导。如果不需要使用lambdas,则Map函数与其等效的列表理解相比在性能上会有一点优势。所以在这个问题的背景下map应比列表理解稍好。

让我们做一个性能基准,看看它是否真的是真的。我使用python 3.5.0版执行所有这些测试。在第一组测试中,我希望每个列表的元素数量保持为10,列表数量从10-100,000不等

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10]"
>>> 100000 loops, best of 3: 15.2 usec per loop   
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10]"
>>> 10000 loops, best of 3: 19.6 usec per loop 

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100]"
>>> 100000 loops, best of 3: 15.2 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100]"
>>> 10000 loops, best of 3: 19.6 usec per loop 

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*1000]"
>>> 1000 loops, best of 3: 1.43 msec per loop   
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*1000]"
>>> 100 loops, best of 3: 1.91 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10000]"
>>> 100 loops, best of 3: 13.6 msec per loop   
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10000]"
>>> 10 loops, best of 3: 19.1 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100000]"
>>> 10 loops, best of 3: 164 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100000]"
>>> 10 loops, best of 3: 216 msec per loop

在下一组测试中,我希望将每个列表的元素数量增加到100个

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10]"
>>> 10000 loops, best of 3: 110 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10]"
>>> 10000 loops, best of 3: 151 usec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100]"
>>> 1000 loops, best of 3: 1.11 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100]"
>>> 1000 loops, best of 3: 1.5 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*1000]"
>>> 100 loops, best of 3: 11.2 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*1000]"
>>> 100 loops, best of 3: 16.7 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10000]"
>>> 10 loops, best of 3: 134 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10000]"
>>> 10 loops, best of 3: 171 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100000]"
>>> 10 loops, best of 3: 1.32 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100000]"
>>> 10 loops, best of 3: 1.7 sec per loop

让我们采取一个勇敢的步骤并将列表中的元素数修改为1000

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10]"
>>> 1000 loops, best of 3: 800 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10]"
>>> 1000 loops, best of 3: 1.16 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100]"
>>> 100 loops, best of 3: 8.26 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100]"
>>> 100 loops, best of 3: 11.7 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*1000]"
>>> 10 loops, best of 3: 83.8 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*1000]"
>>> 10 loops, best of 3: 118 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10000]"
>>> 10 loops, best of 3: 868 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10000]"
>>> 10 loops, best of 3: 1.23 sec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100000]"
>>> 10 loops, best of 3: 9.2 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100000]"
>>> 10 loops, best of 3: 12.7 sec per loop

从这些测试中,我们可以得出结论,map在这种情况下,与列表理解相比,它具有性能优势。如果您要强制转换为int或,这也适用str。对于少量列表且每个列表元素较少的列表,差异可以忽略不计。对于每个列表具有更多元素的较大列表,可能要使用map而不是列表理解,但这完全取决于应用程序的需求。

但是我个人认为列表理解比map。这是python中的事实上的标准。通常,人们比使用列表理解更熟练和更舒适(特别是初学者)map

I had a similar problem to solve so I came across this question. I did a performance comparison of Andrew Clark’s and narayan’s answer which I would like to share.

The primary difference between two answers is how they iterate over inner lists. One of them uses builtin map, while other is using list comprehension. Map function has slight performance advantage to its equivalent list comprehension if it doesn’t require the use lambdas. So in context of this question map should perform slightly better than list comprehension.

Lets do a performance benchmark to see if it is actually true. I used python version 3.5.0 to perform all these tests. In first set of tests I would like to keep elements per list to be 10 and vary number of lists from 10-100,000

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10]"
>>> 100000 loops, best of 3: 15.2 usec per loop   
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10]"
>>> 10000 loops, best of 3: 19.6 usec per loop 

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100]"
>>> 100000 loops, best of 3: 15.2 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100]"
>>> 10000 loops, best of 3: 19.6 usec per loop 

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*1000]"
>>> 1000 loops, best of 3: 1.43 msec per loop   
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*1000]"
>>> 100 loops, best of 3: 1.91 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*10000]"
>>> 100 loops, best of 3: 13.6 msec per loop   
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*10000]"
>>> 10 loops, best of 3: 19.1 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,10))]*100000]"
>>> 10 loops, best of 3: 164 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,10))]*100000]"
>>> 10 loops, best of 3: 216 msec per loop

In the next set of tests I would like to raise number of elements per lists to 100.

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10]"
>>> 10000 loops, best of 3: 110 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10]"
>>> 10000 loops, best of 3: 151 usec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100]"
>>> 1000 loops, best of 3: 1.11 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100]"
>>> 1000 loops, best of 3: 1.5 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*1000]"
>>> 100 loops, best of 3: 11.2 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*1000]"
>>> 100 loops, best of 3: 16.7 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*10000]"
>>> 10 loops, best of 3: 134 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*10000]"
>>> 10 loops, best of 3: 171 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,100))]*100000]"
>>> 10 loops, best of 3: 1.32 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,100))]*100000]"
>>> 10 loops, best of 3: 1.7 sec per loop

Lets take a brave step and modify the number of elements in lists to be 1000

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10]"
>>> 1000 loops, best of 3: 800 usec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10]"
>>> 1000 loops, best of 3: 1.16 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100]"
>>> 100 loops, best of 3: 8.26 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100]"
>>> 100 loops, best of 3: 11.7 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*1000]"
>>> 10 loops, best of 3: 83.8 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*1000]"
>>> 10 loops, best of 3: 118 msec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*10000]"
>>> 10 loops, best of 3: 868 msec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*10000]"
>>> 10 loops, best of 3: 1.23 sec per loop

>>> python -m timeit "[list(map(float,k)) for k in [list(range(0,1000))]*100000]"
>>> 10 loops, best of 3: 9.2 sec per loop
>>> python -m timeit "[[float(y) for y in x] for x in [list(range(0,1000))]*100000]"
>>> 10 loops, best of 3: 12.7 sec per loop

From these test we can conclude that map has a performance benefit over list comprehension in this case. This is also applicable if you are trying to cast to either int or str. For small number of lists with less elements per list, the difference is negligible. For larger lists with more elements per list one might like to use map instead of list comprehension, but it totally depends on application needs.

However I personally find list comprehension to be more readable and idiomatic than map. It is a de-facto standard in python. Usually people are more proficient and comfortable(specially beginner) in using list comprehension than map.


回答 7

是的,您可以使用以下代码进行操作:

l = [[float(y) for y in x] for x in l]

Yes, you can do it with such a code:

l = [[float(y) for y in x] for x in l]

回答 8

无需使用for循环即可解决此问题,只需单行代码即可。在lambda函数中使用嵌套地图也可以在这里使用。

l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']]

map(lambda x:map(lambda y:float(y),x),l)

输出列表如下:

[[40.0, 20.0, 10.0, 30.0], [20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0], [30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0], [100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0]]

This Problem can be solved without using for loop.Single line code will be sufficient for this. Using Nested Map with lambda function will also works here.

l = [['40', '20', '10', '30'], ['20', '20', '20', '20', '20', '30', '20'], ['30', '20', '30', '50', '10', '30', '20', '20', '20'], ['100', '100'], ['100', '100', '100', '100', '100'], ['100', '100', '100', '100']]

map(lambda x:map(lambda y:float(y),x),l)

And Output List would be as follows:

[[40.0, 20.0, 10.0, 30.0], [20.0, 20.0, 20.0, 20.0, 20.0, 30.0, 20.0], [30.0, 20.0, 30.0, 50.0, 10.0, 30.0, 20.0, 20.0, 20.0], [100.0, 100.0], [100.0, 100.0, 100.0, 100.0, 100.0], [100.0, 100.0, 100.0, 100.0]]

回答 9

我认为做到这一点的最佳方法是使用python的itertools软件包。

>>>import itertools
>>>l1 = [1,2,3]
>>>l2 = [10,20,30]
>>>[l*2 for l in itertools.chain(*[l1,l2])]
[2, 4, 6, 20, 40, 60]

The best way to do this in my opinion is to use python’s itertools package.

>>>import itertools
>>>l1 = [1,2,3]
>>>l2 = [10,20,30]
>>>[l*2 for l in itertools.chain(*[l1,l2])]
[2, 4, 6, 20, 40, 60]

回答 10

是的,您可以执行以下操作。

[[float(y) for y in x] for x in l]

Yes you can do the following.

[[float(y) for y in x] for x in l]

回答 11

    deck = [] 
    for rank in ranks:
        for suit in suits:
            deck.append(('%s%s')%(rank, suit))

这可以通过列表理解来实现:

[deck.append((rank,suit)) for suit in suits for rank in ranks ]
    deck = [] 
    for rank in ranks:
        for suit in suits:
            deck.append(('%s%s')%(rank, suit))

This can be achieved using list comprehension:

[deck.append((rank,suit)) for suit in suits for rank in ranks ]

将迭代器转换为列表的最快方法

问题:将迭代器转换为列表的最快方法

有一个iterator对象,是否有比列表理解更快,更好或更正确的方法来获取迭代器返回的对象的列表?

user_list = [user for user in user_iterator]

Having an iterator object, is there something faster, better or more correct than a list comprehension to get a list of the objects returned by the iterator?

user_list = [user for user in user_iterator]

回答 0

list(your_iterator)
list(your_iterator)

回答 1

python 3.5开始, 您可以使用*可迭代的拆包运算符:

user_list = [*your_iterator]

pythonic的方法是:

user_list  = list(your_iterator)

since python 3.5 you can use * iterable unpacking operator:

user_list = [*your_iterator]

but the pythonic way to do it is:

user_list  = list(your_iterator)

单行列表理解:if-else变体

问题:单行列表理解:if-else变体

更多有关python列表理解语法的信息。我有一个列表推导,它产生给定范围的奇数列表:

[x for x in range(1, 10) if x % 2]

这将构成一个过滤器-我有一个源列表,其中删除了偶数(if x % 2)。我想在这里使用if-then-else之类的东西。以下代码失败:

>>> [x for x in range(1, 10) if x % 2 else x * 100]
  File "<stdin>", line 1
    [x for x in range(1, 10) if x % 2 else x * 100]
                                         ^
SyntaxError: invalid syntax

有一个类似if-else的python表达式:

1 if 0 is 0 else 3

如何在列表理解中使用它?

It’s more about python list comprehension syntax. I’ve got a list comprehension that produces list of odd numbers of a given range:

[x for x in range(1, 10) if x % 2]

This makes a filter – I’ve got a source list, where I remove even numbers (if x % 2). I’d like to use something like if-then-else here. Following code fails:

>>> [x for x in range(1, 10) if x % 2 else x * 100]
  File "<stdin>", line 1
    [x for x in range(1, 10) if x % 2 else x * 100]
                                         ^
SyntaxError: invalid syntax

There is a python expression like if-else:

1 if 0 is 0 else 3

How to use it inside a list comprehension?


回答 0

x if y else z是您要为每个元素返回的表达式的语法。因此,您需要:

[ x if x%2 else x*100 for x in range(1, 10) ]

混淆是由于您在第一个示例中使用了过滤器而在第二个示例中没有使用过滤器。在第二个示例中,您仅映射使用三元运算符表达式每个值到另一个。

使用过滤器,您需要:

[ EXP for x in seq if COND ]

没有过滤器,您需要:

[ EXP for x in seq ]

在第二个示例中,该表达式是一个“复杂”表达式,其中恰好包含一个if-else

x if y else z is the syntax for the expression you’re returning for each element. Thus you need:

[ x if x%2 else x*100 for x in range(1, 10) ]

The confusion arises from the fact you’re using a filter in the first example, but not in the second. In the second example you’re only mapping each value to another, using a ternary-operator expression.

With a filter, you need:

[ EXP for x in seq if COND ]

Without a filter you need:

[ EXP for x in seq ]

and in your second example, the expression is a “complex” one, which happens to involve an if-else.


回答 1

[x if x % 2 else x * 100 for x in range(1, 10) ]
[x if x % 2 else x * 100 for x in range(1, 10) ]

回答 2

您也可以使用列表理解来做到这一点:

A=[[x*100, x][x % 2 != 0] for x in range(1,11)]
print A

You can do that with list comprehension too:

A=[[x*100, x][x % 2 != 0] for x in range(1,11)]
print A

回答 3

只是另一种解决方案,希望有人会喜欢它:

使用:[假,真] [表达式]

>>> map(lambda x: [x*100, x][x % 2 != 0], range(1,10))
[1, 200, 3, 400, 5, 600, 7, 800, 9]
>>>

Just another solution, hope some one may like it :

Using: [False, True][Expression]

>>> map(lambda x: [x*100, x][x % 2 != 0], range(1,10))
[1, 200, 3, 400, 5, 600, 7, 800, 9]
>>>

回答 4

我能够做到这一点

>>> [x if x % 2 != 0 else x * 100 for x in range(1,10)]
    [1, 200, 3, 400, 5, 600, 7, 800, 9]
>>>

I was able to do this

>>> [x if x % 2 != 0 else x * 100 for x in range(1,10)]
    [1, 200, 3, 400, 5, 600, 7, 800, 9]
>>>

Python选择列表中最长字符串的最有效方法?

问题:Python选择列表中最长字符串的最有效方法?

我有一个可变长度的列表,正在尝试寻找一种方法来测试当前正在评估的列表项是否是列表中包含的最长字符串。我正在使用Python 2.6.1

例如:

mylist = ['abc','abcdef','abcd']

for each in mylist:
    if condition1:
        do_something()
    elif ___________________: #else if each is the longest string contained in mylist:
        do_something_else()

当然,有一个简单的列表理解功能很简短,但我却忽略了它?

I have a list of variable length and am trying to find a way to test if the list item currently being evaluated is the longest string contained in the list. And I am using Python 2.6.1

For example:

mylist = ['abc','abcdef','abcd']

for each in mylist:
    if condition1:
        do_something()
    elif ___________________: #else if each is the longest string contained in mylist:
        do_something_else()

Surely there’s a simple list comprehension that’s short and elegant that I’m overlooking?


回答 0

Python文档本身,您可以使用max

>>> mylist = ['123','123456','1234']
>>> print max(mylist, key=len)
123456

From the Python documentation itself, you can use max:

>>> mylist = ['123','123456','1234']
>>> print max(mylist, key=len)
123456

回答 1

如果最长的字符串超过1个(应该考虑’12’和’01’),该怎么办?

尝试获得最长的元素

max_length,longest_element = max([(len(x),x) for x in ('a','b','aa')])

然后定期进行foreach

for st in mylist:
    if len(st)==max_length:...

What should happen if there are more than 1 longest string (think ’12’, and ’01’)?

Try that to get the longest element

max_length,longest_element = max([(len(x),x) for x in ('a','b','aa')])

And then regular foreach

for st in mylist:
    if len(st)==max_length:...

回答 2

def longestWord(some_list): 
    count = 0    #You set the count to 0
    for i in some_list: # Go through the whole list
        if len(i) > count: #Checking for the longest word(string)
            count = len(i)
            word = i
    return ("the longest string is " + word)

或更容易:

max(some_list , key = len)
def longestWord(some_list): 
    count = 0    #You set the count to 0
    for i in some_list: # Go through the whole list
        if len(i) > count: #Checking for the longest word(string)
            count = len(i)
            word = i
    return ("the longest string is " + word)

or much easier:

max(some_list , key = len)

回答 3

要获取列表中最小或最大的项目,请使用内置的min和max函数:

lo = min(L)
hi = max(L)

与sort一样,您可以传入“ key”参数,该参数用于在比较列表项之前映射它们:

lo = min(L, key=int)
hi = max(L, key=int)

http://effbot.org/zone/python-list.htm

如果您正确地将其映射为字符串并将其用作比较,则看起来可以使用max函数。我建议当然只查找一次最大值,而不是列表中的每个元素。

To get the smallest or largest item in a list, use the built-in min and max functions:

lo = min(L)
hi = max(L)

As with sort, you can pass in a “key” argument that is used to map the list items before they are compared:

lo = min(L, key=int)
hi = max(L, key=int)

http://effbot.org/zone/python-list.htm

Looks like you could use the max function if you map it correctly for strings and use that as the comparison. I would recommend just finding the max once though of course, not for each element in the list.


回答 4

len(each) == max(len(x) for x in myList) 要不就 each == max(myList, key=len)

len(each) == max(len(x) for x in myList) or just each == max(myList, key=len)


回答 5

def LongestEntry(lstName):
  totalEntries = len(lstName)
  currentEntry = 0
  longestLength = 0
  while currentEntry < totalEntries:
    thisEntry = len(str(lstName[currentEntry]))
    if int(thisEntry) > int(longestLength):
      longestLength = thisEntry
      longestEntry = currentEntry
    currentEntry += 1
  return longestLength
def LongestEntry(lstName):
  totalEntries = len(lstName)
  currentEntry = 0
  longestLength = 0
  while currentEntry < totalEntries:
    thisEntry = len(str(lstName[currentEntry]))
    if int(thisEntry) > int(longestLength):
      longestLength = thisEntry
      longestEntry = currentEntry
    currentEntry += 1
  return longestLength

列表理解中的双重迭代

问题:列表理解中的双重迭代

在Python中,列表推导中可以有多个迭代器,例如

[(x,y) for x in a for y in b]

对于一些合适的序列a和b。我知道Python的列表推导的嵌套循环语义。

我的问题是:理解中的一个迭代器可以引用另一个吗?换句话说:我可以有这样的东西:

[x for x in a for a in b]

外循环的当前值在哪里是内循环的迭代器?

例如,如果我有一个嵌套列表:

a=[[1,2],[3,4]]

列表理解表达式将如何获得此结果:

[1,2,3,4]

?? (请只列出理解答案,因为这是我要查找的内容)。

In Python you can have multiple iterators in a list comprehension, like

[(x,y) for x in a for y in b]

for some suitable sequences a and b. I’m aware of the nested loop semantics of Python’s list comprehensions.

My question is: Can one iterator in the comprehension refer to the other? In other words: Could I have something like this:

[x for x in a for a in b]

where the current value of the outer loop is the iterator of the inner?

As an example, if I have a nested list:

a=[[1,2],[3,4]]

what would the list comprehension expression be to achieve this result:

[1,2,3,4]

?? (Please only list comprehension answers, since this is what I want to find out).


回答 0

要根据自己的建议回答您的问题:

>>> [x for b in a for x in b] # Works fine

当您要求提供列表理解答案时,我还要指出出色的itertools.chain():

>>> from itertools import chain
>>> list(chain.from_iterable(a))
>>> list(chain(*a)) # If you're using python < 2.6

To answer your question with your own suggestion:

>>> [x for b in a for x in b] # Works fine

While you asked for list comprehension answers, let me also point out the excellent itertools.chain():

>>> from itertools import chain
>>> list(chain.from_iterable(a))
>>> list(chain(*a)) # If you're using python < 2.6

回答 1

我希望这对别人a,b,x,y有帮助,因为对我没有太大的意义!假设您有一个充满句子的文本,并且想要一个单词数组。

# Without list comprehension
list_of_words = []
for sentence in text:
    for word in sentence:
       list_of_words.append(word)
return list_of_words

我喜欢将列表理解理解为水平扩展代码。

尝试将其分解为:

# List Comprehension 
[word for sentence in text for word in sentence]

例:

>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> [word for sentence in text for word in sentence]
['Hi', 'Steve!', "What's", 'up?']

这也适用于生成器

>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> gen = (word for sentence in text for word in sentence)
>>> for word in gen: print(word)
Hi
Steve!
What's
up?

I hope this helps someone else since a,b,x,y don’t have much meaning to me! Suppose you have a text full of sentences and you want an array of words.

# Without list comprehension
list_of_words = []
for sentence in text:
    for word in sentence:
       list_of_words.append(word)
return list_of_words

I like to think of list comprehension as stretching code horizontally.

Try breaking it up into:

# List Comprehension 
[word for sentence in text for word in sentence]

Example:

>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> [word for sentence in text for word in sentence]
['Hi', 'Steve!', "What's", 'up?']

This also works for generators

>>> text = (("Hi", "Steve!"), ("What's", "up?"))
>>> gen = (word for sentence in text for word in sentence)
>>> for word in gen: print(word)
Hi
Steve!
What's
up?

回答 2

e,我想我找到了答案:我对那个循环是内部的,哪个循环是外部的,没有足够的小心。列表理解应为:

[x for b in a for x in b]

为了获得期望的结果,是的,一个当前值可以作为下一个循环的迭代器。

Gee, I guess I found the anwser: I was not taking care enough about which loop is inner and which is outer. The list comprehension should be like:

[x for b in a for x in b]

to get the desired result, and yes, one current value can be the iterator for the next loop.


回答 3

迭代器的顺序似乎违反直觉。

举个例子: [str(x) for i in range(3) for x in foo(i)]

让我们分解一下:

def foo(i):
    return i, i + 0.5

[str(x)
    for i in range(3)
        for x in foo(i)
]

# is same as
for i in range(3):
    for x in foo(i):
        yield str(x)

Order of iterators may seem counter-intuitive.

Take for example: [str(x) for i in range(3) for x in foo(i)]

Let’s decompose it:

def foo(i):
    return i, i + 0.5

[str(x)
    for i in range(3)
        for x in foo(i)
]

# is same as
for i in range(3):
    for x in foo(i):
        yield str(x)

回答 4

ThomasH已经添加了一个很好的答案,但是我想说明会发生什么:

>>> a = [[1, 2], [3, 4]]
>>> [x for x in b for b in a]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined

>>> [x for b in a for x in b]
[1, 2, 3, 4]
>>> [x for x in b for b in a]
[3, 3, 4, 4]

我猜想Python从左到右解析列表理解。这意味着,将首先for执行发生的第一个循环。

第二个“问题”是b从列表理解中“泄漏”出去。在第一次成功理解清单之后b == [3, 4]

ThomasH has already added a good answer, but I want to show what happens:

>>> a = [[1, 2], [3, 4]]
>>> [x for x in b for b in a]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'b' is not defined

>>> [x for b in a for x in b]
[1, 2, 3, 4]
>>> [x for x in b for b in a]
[3, 3, 4, 4]

I guess Python parses the list comprehension from left to right. This means, the first for loop that occurs will be executed first.

The second “problem” of this is that b gets “leaked” out of the list comprehension. After the first successful list comprehension b == [3, 4].


回答 5

如果要保留多维数组,则应嵌套数组括号。请参阅下面的示例,其中每个元素都添加了一个。

>>> a = [[1, 2], [3, 4]]

>>> [[col +1 for col in row] for row in a]
[[2, 3], [4, 5]]

>>> [col +1 for row in a for col in row]
[2, 3, 4, 5]

If you want to keep the multi dimensional array, one should nest the array brackets. see example below where one is added to every element.

>>> a = [[1, 2], [3, 4]]

>>> [[col +1 for col in row] for row in a]
[[2, 3], [4, 5]]

>>> [col +1 for row in a for col in row]
[2, 3, 4, 5]

回答 6

这种记忆技术对我有很大帮助:

[ <RETURNED_VALUE> <OUTER_LOOP1> <INNER_LOOP2> <INNER_LOOP3> ... <OPTIONAL_IF> ]

现在,你可以想想[R E打开+ Ø uter环作为唯一的[R飞行Ø刻申

综上所述,即使对于3个循环,列表中的顺序也很容易:


c=[111, 222, 333]
b=[11, 22, 33]
a=[1, 2, 3]

print(
  [
    (i, j, k)                            # <RETURNED_VALUE> 
    for i in a for j in b for k in c     # in order: loop1, loop2, loop3
    if i < 2 and j < 20 and k < 200      # <OPTIONAL_IF>
  ]
)
[(1, 11, 111)]

因为上面只是一个:

for i in a:                         # outer loop1 GOES SECOND
  for j in b:                       # inner loop2 GOES THIRD
    for k in c:                     # inner loop3 GOES FOURTH
      if i < 2 and j < 20 and k < 200:
        print((i, j, k))            # returned value GOES FIRST

对于迭代一个嵌套列表/结构,技术是相同的:a从问题出发:

a = [[1,2],[3,4]]
[i2    for i1 in a      for i2 in i1]
which return [1, 2, 3, 4]

互相嵌套的水平

a = [[[1, 2], [3, 4]], [[5, 6], [7, 8, 9]], [[10]]]
[i3    for i1 in a      for i2 in i1     for i3 in i2]
which return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

等等

This memory technic helps me a lot:

[ <RETURNED_VALUE> <OUTER_LOOP1> <INNER_LOOP2> <INNER_LOOP3> ... <OPTIONAL_IF> ]

And now you can think about Return + Outer-loop as the only Right Order

Knowing above, the order in list comprehensive even for 3 loops seem easy:


c=[111, 222, 333]
b=[11, 22, 33]
a=[1, 2, 3]

print(
  [
    (i, j, k)                            # <RETURNED_VALUE> 
    for i in a for j in b for k in c     # in order: loop1, loop2, loop3
    if i < 2 and j < 20 and k < 200      # <OPTIONAL_IF>
  ]
)
[(1, 11, 111)]

because the above is just a:

for i in a:                         # outer loop1 GOES SECOND
  for j in b:                       # inner loop2 GOES THIRD
    for k in c:                     # inner loop3 GOES FOURTH
      if i < 2 and j < 20 and k < 200:
        print((i, j, k))            # returned value GOES FIRST

for iterating one nested list/structure, technic is the same: for a from the question:

a = [[1,2],[3,4]]
[i2    for i1 in a      for i2 in i1]
which return [1, 2, 3, 4]

for one another nested level

a = [[[1, 2], [3, 4]], [[5, 6], [7, 8, 9]], [[10]]]
[i3    for i1 in a      for i2 in i1     for i3 in i2]
which return [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

and so on


回答 7

我觉得这更容易理解

[row[i] for row in a for i in range(len(a))]

result: [1, 2, 3, 4]

I feel this is easier to understand

[row[i] for row in a for i in range(len(a))]

result: [1, 2, 3, 4]

回答 8

另外,您可以对当前访问的输入列表的成员该成员中的元素使用相同的变量。但是,这甚至可能使它(清单)更加难以理解。

input = [[1, 2], [3, 4]]
[x for x in input for x in x]

首先for x in input求值,得到输入的一个成员列表,然后,Python遍历第二部分,for x in x在此过程中,x值被其访问的当前元素覆盖,然后第一部分x定义了我们要返回的内容。

Additionally, you could use just the same variable for the member of the input list which is currently accessed and for the element inside this member. However, this might even make it more (list) incomprehensible.

input = [[1, 2], [3, 4]]
[x for x in input for x in x]

First for x in input is evaluated, leading to one member list of the input, then, Python walks through the second part for x in x during which the x-value is overwritten by the current element it is accessing, then the first x defines what we want to return.


回答 9

这个flatten_nlevel函数递归调用嵌套的list1以隐蔽到一个级别。试试看

def flatten_nlevel(list1, flat_list):
    for sublist in list1:
        if isinstance(sublist, type(list)):        
            flatten_nlevel(sublist, flat_list)
        else:
            flat_list.append(sublist)

list1 = [1,[1,[2,3,[4,6]],4],5]

items = []
flatten_nlevel(list1,items)
print(items)

输出:

[1, 1, 2, 3, 4, 6, 4, 5]

This flatten_nlevel function calls recursively the nested list1 to covert to one level. Try this out

def flatten_nlevel(list1, flat_list):
    for sublist in list1:
        if isinstance(sublist, type(list)):        
            flatten_nlevel(sublist, flat_list)
        else:
            flat_list.append(sublist)

list1 = [1,[1,[2,3,[4,6]],4],5]

items = []
flatten_nlevel(list1,items)
print(items)

output:

[1, 1, 2, 3, 4, 6, 4, 5]

Python字典理解

问题:Python字典理解

是否可以在Python(用于键)中创建字典理解?

在没有列表理解的情况下,您可以使用以下内容:

l = []
for n in range(1, 11):
    l.append(n)

我们可以将其简化为列表理解:l = [n for n in range(1, 11)]

但是,说我想将字典的键设置为相同的值。我可以:

d = {}
for n in range(1, 11):
     d[n] = True # same value for each

我已经试过了:

d = {}
d[i for i in range(1, 11)] = True

不过,我得到一个SyntaxErrorfor

另外(我不需要这部分,只是想知道),您能否将字典的键设置为一堆不同的值,例如:

d = {}
for n in range(1, 11):
    d[n] = n

字典理解有可能吗?

d = {}
d[i for i in range(1, 11)] = [x for x in range(1, 11)]

这也引发了SyntaxErrorfor

Is it possible to create a dictionary comprehension in Python (for the keys)?

Without list comprehensions, you can use something like this:

l = []
for n in range(1, 11):
    l.append(n)

We can shorten this to a list comprehension: l = [n for n in range(1, 11)].

However, say I want to set a dictionary’s keys to the same value. I can do:

d = {}
for n in range(1, 11):
     d[n] = True # same value for each

I’ve tried this:

d = {}
d[i for i in range(1, 11)] = True

However, I get a SyntaxError on the for.

In addition (I don’t need this part, but just wondering), can you set a dictionary’s keys to a bunch of different values, like this:

d = {}
for n in range(1, 11):
    d[n] = n

Is this possible with a dictionary comprehension?

d = {}
d[i for i in range(1, 11)] = [x for x in range(1, 11)]

This also raises a SyntaxError on the for.


回答 0

Python 2.7+中字典理解功能,但是它们不能完全按照您尝试的方式工作。就像列表理解一样,他们创建了一个字典。您不能使用它们将键添加到现有字典中。此外,您还必须指定键和值,尽管您当然可以根据需要指定一个哑数值。

>>> d = {n: n**2 for n in range(5)}
>>> print d
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

如果要将它们全部设置为True:

>>> d = {n: True for n in range(5)}
>>> print d
{0: True, 1: True, 2: True, 3: True, 4: True}

您似乎想要的是一种在现有字典上一次设置多个键的方法。没有直接的捷径。您可以像已经显示的那样循环,也可以使用字典理解来创建具有新值的新字典,然后oldDict.update(newDict)将新值合并到旧字典中。

There are dictionary comprehensions in Python 2.7+, but they don’t work quite the way you’re trying. Like a list comprehension, they create a new dictionary; you can’t use them to add keys to an existing dictionary. Also, you have to specify the keys and values, although of course you can specify a dummy value if you like.

>>> d = {n: n**2 for n in range(5)}
>>> print d
{0: 0, 1: 1, 2: 4, 3: 9, 4: 16}

If you want to set them all to True:

>>> d = {n: True for n in range(5)}
>>> print d
{0: True, 1: True, 2: True, 3: True, 4: True}

What you seem to be asking for is a way to set multiple keys at once on an existing dictionary. There’s no direct shortcut for that. You can either loop like you already showed, or you could use a dictionary comprehension to create a new dict with the new values, and then do oldDict.update(newDict) to merge the new values into the old dict.


回答 1

您可以使用dict.fromkeys类方法…

>>> dict.fromkeys(range(5), True)
{0: True, 1: True, 2: True, 3: True, 4: True}

这是创建所有键都映射到相同值的字典的最快方法。

但是千万不能用可变对象使用

d = dict.fromkeys(range(5), [])
# {0: [], 1: [], 2: [], 3: [], 4: []}
d[1].append(2)
# {0: [2], 1: [2], 2: [2], 3: [2], 4: [2]} !!!

如果您实际上不需要初始化所有键,那么a defaultdict也可能很有用:

from collections import defaultdict
d = defaultdict(True)

要回答第二部分,您需要的是dict理解:

{k: k for k in range(10)}

您可能不应该这样做,但也可以创建一个子类,如果您重写dict它,该子类的工作原理类似于a :defaultdict__missing__

>>> class KeyDict(dict):
...    def __missing__(self, key):
...       #self[key] = key  # Maybe add this also?
...       return key
... 
>>> d = KeyDict()
>>> d[1]
1
>>> d[2]
2
>>> d[3]
3
>>> print(d)
{}

You can use the dict.fromkeys class method …

>>> dict.fromkeys(range(5), True)
{0: True, 1: True, 2: True, 3: True, 4: True}

This is the fastest way to create a dictionary where all the keys map to the same value.

But do not use this with mutable objects:

d = dict.fromkeys(range(5), [])
# {0: [], 1: [], 2: [], 3: [], 4: []}
d[1].append(2)
# {0: [2], 1: [2], 2: [2], 3: [2], 4: [2]} !!!

If you don’t actually need to initialize all the keys, a defaultdict might be useful as well:

from collections import defaultdict
d = defaultdict(True)

To answer the second part, a dict-comprehension is just what you need:

{k: k for k in range(10)}

You probably shouldn’t do this but you could also create a subclass of dict which works somewhat like a defaultdict if you override __missing__:

>>> class KeyDict(dict):
...    def __missing__(self, key):
...       #self[key] = key  # Maybe add this also?
...       return key
... 
>>> d = KeyDict()
>>> d[1]
1
>>> d[2]
2
>>> d[3]
3
>>> print(d)
{}

回答 2

>>> {i:i for i in range(1, 11)}
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10}
>>> {i:i for i in range(1, 11)}
{1: 1, 2: 2, 3: 3, 4: 4, 5: 5, 6: 6, 7: 7, 8: 8, 9: 9, 10: 10}

回答 3

我真的很喜欢@mgilson注释,因为如果您有两个可迭代项,一个与键相对应,另一个与值相对应,则还可以执行以下操作。

keys = ['a', 'b', 'c']
values = [1, 2, 3]
d = dict(zip(keys, values))

给予

d = {‘a’:1,’b’:2,’c’:3}

I really like the @mgilson comment, since if you have a two iterables, one that corresponds to the keys and the other the values, you can also do the following.

keys = ['a', 'b', 'c']
values = [1, 2, 3]
d = dict(zip(keys, values))

giving

d = {‘a’: 1, ‘b’: 2, ‘c’: 3}


回答 4

在元组列表上使用dict(),此解决方案将允许您在每个列表中具有任意值,只要它们的长度相同

i_s = range(1, 11)
x_s = range(1, 11)
# x_s = range(11, 1, -1) # Also works
d = dict([(i_s[index], x_s[index], ) for index in range(len(i_s))])

Use dict() on a list of tuples, this solution will allow you to have arbitrary values in each list, so long as they are the same length

i_s = range(1, 11)
x_s = range(1, 11)
# x_s = range(11, 1, -1) # Also works
d = dict([(i_s[index], x_s[index], ) for index in range(len(i_s))])

回答 5

考虑以下示例,该示例使用字典理解来计算列表中单词的出现

my_list = ['hello', 'hi', 'hello', 'today', 'morning', 'again', 'hello']
my_dict = {k:my_list.count(k) for k in my_list}
print(my_dict)

结果是

{'again': 1, 'hi': 1, 'hello': 3, 'today': 1, 'morning': 1}

Consider this example of counting the occurrence of words in a list using dictionary comprehension

my_list = ['hello', 'hi', 'hello', 'today', 'morning', 'again', 'hello']
my_dict = {k:my_list.count(k) for k in my_list}
print(my_dict)

And the result is

{'again': 1, 'hi': 1, 'hello': 3, 'today': 1, 'morning': 1}

回答 6

列表理解的主要目的是在不更改或破坏原始列表的基础上创建基于另一个列表的新列表。

而不是写作

    l = []
    for n in range(1, 11):
        l.append(n)

要么

    l = [n for n in range(1, 11)]

你应该只写

    l = range(1, 11)

在两个最重要的代码块中,您将创建一个新列表,对其进行迭代并仅返回每个元素。这只是创建列表副本的一种昂贵方法。

要获得一个新的字典,并且根据另一个字典将所有键设置为相同的值,请执行以下操作:

    old_dict = {'a': 1, 'c': 3, 'b': 2}
    new_dict = { key:'your value here' for key in old_dict.keys()}

您收到SyntaxError的原因是,在编写时

    d = {}
    d[i for i in range(1, 11)] = True

您基本上是在说:“将我的密钥’i for range(1,11)’中的i设置为True”和“ i for i in range(1,11)中的”)不是有效的密钥,这只是语法错误。如果将dicts支持的列表作为键,您将执行以下操作

    d[[i for i in range(1, 11)]] = True

并不是

    d[i for i in range(1, 11)] = True

但是列表不可散列,因此您不能将它们用作字典键。

The main purpose of a list comprehension is to create a new list based on another one without changing or destroying the original list.

Instead of writing

    l = []
    for n in range(1, 11):
        l.append(n)

or

    l = [n for n in range(1, 11)]

you should write only

    l = range(1, 11)

In the two top code blocks you’re creating a new list, iterating through it and just returning each element. It’s just an expensive way of creating a list copy.

To get a new dictionary with all keys set to the same value based on another dict, do this:

    old_dict = {'a': 1, 'c': 3, 'b': 2}
    new_dict = { key:'your value here' for key in old_dict.keys()}

You’re receiving a SyntaxError because when you write

    d = {}
    d[i for i in range(1, 11)] = True

you’re basically saying: “Set my key ‘i for i in range(1, 11)’ to True” and “i for i in range(1, 11)” is not a valid key, it’s just a syntax error. If dicts supported lists as keys, you would do something like

    d[[i for i in range(1, 11)]] = True

and not

    d[i for i in range(1, 11)] = True

but lists are not hashable, so you can’t use them as dict keys.


回答 7

您不能像这样散列表。试试这个,它使用元组

d[tuple([i for i in range(1,11)])] = True

you can’t hash a list like that. try this instead, it uses tuples

d[tuple([i for i in range(1,11)])] = True

为什么Python中没有元组理解?

问题:为什么Python中没有元组理解?

众所周知,列表理解

[i for i in [1, 2, 3, 4]]

并且有字典理解,例如

{i:j for i, j in {1: 'a', 2: 'b'}.items()}

(i for i in (1, 2, 3))

最终将成为生成器,而不是tuple理解力。这是为什么?

我的猜测是a tuple是不可变的,但这似乎并不是答案。

As we all know, there’s list comprehension, like

[i for i in [1, 2, 3, 4]]

and there is dictionary comprehension, like

{i:j for i, j in {1: 'a', 2: 'b'}.items()}

but

(i for i in (1, 2, 3))

will end up in a generator, not a tuple comprehension. Why is that?

My guess is that a tuple is immutable, but this does not seem to be the answer.


回答 0

您可以使用生成器表达式:

tuple(i for i in (1, 2, 3))

但是对于…生成器表达式,已经使用了括号。

You can use a generator expression:

tuple(i for i in (1, 2, 3))

but parentheses were already taken for … generator expressions.


回答 1

Raymond Hettinger(Python核心开发人员之一)在最近的一条推文中曾这样说过元组:

#python提示:通常,列表用于循环;结构的元组。列表是同质的;元组异构。列出可变长度。

(对我来说)支持这样的想法:如果序列中的项目相关性足以由生成器生成,那么它应该是一个列表。尽管元组是可迭代的,并且看起来只是一个不可变的列表,但它实际上与C结构的Python等效:

struct {
    int a;
    char b;
    float c;
} foo;

struct foo x = { 3, 'g', 5.9 };

成为Python

x = (3, 'g', 5.9)

Raymond Hettinger (one of the Python core developers) had this to say about tuples in a recent tweet:

#python tip: Generally, lists are for looping; tuples for structs. Lists are homogeneous; tuples heterogeneous. Lists for variable length.

This (to me) supports the idea that if the items in a sequence are related enough to be generated by a, well, generator, then it should be a list. Although a tuple is iterable and seems like simply a immutable list, it’s really the Python equivalent of a C struct:

struct {
    int a;
    char b;
    float c;
} foo;

struct foo x = { 3, 'g', 5.9 };

becomes in Python

x = (3, 'g', 5.9)

回答 2

从Python 3.5开始,您还可以使用splat *解包语法来解压缩生成器表达式:

*(x for x in range(10)),

Since Python 3.5, you can also use splat * unpacking syntax to unpack a generator expresion:

*(x for x in range(10)),

回答 3

正如另一位发帖人macm所述,从生成器创建元组的最快方法是tuple([generator])


性能比较

  • 清单理解:

    $ python3 -m timeit "a = [i for i in range(1000)]"
    10000 loops, best of 3: 27.4 usec per loop
    
  • 来自列表理解的元组:

    $ python3 -m timeit "a = tuple([i for i in range(1000)])"
    10000 loops, best of 3: 30.2 usec per loop
    
  • 生成器中的元组:

    $ python3 -m timeit "a = tuple(i for i in range(1000))"
    10000 loops, best of 3: 50.4 usec per loop
    
  • 打开包装的元组:

    $ python3 -m timeit "a = *(i for i in range(1000)),"
    10000 loops, best of 3: 52.7 usec per loop
    

我的python版本

$ python3 --version
Python 3.6.3

因此,除非性能不是问题,否则应始终从列表理解中创建一个元组。

As another poster macm mentioned, the fastest way to create a tuple from a generator is tuple([generator]).


Performance Comparison

  • List comprehension:

    $ python3 -m timeit "a = [i for i in range(1000)]"
    10000 loops, best of 3: 27.4 usec per loop
    
  • Tuple from list comprehension:

    $ python3 -m timeit "a = tuple([i for i in range(1000)])"
    10000 loops, best of 3: 30.2 usec per loop
    
  • Tuple from generator:

    $ python3 -m timeit "a = tuple(i for i in range(1000))"
    10000 loops, best of 3: 50.4 usec per loop
    
  • Tuple from unpacking:

    $ python3 -m timeit "a = *(i for i in range(1000)),"
    10000 loops, best of 3: 52.7 usec per loop
    

My version of python:

$ python3 --version
Python 3.6.3

So you should always create a tuple from a list comprehension unless performance is not an issue.


回答 4

理解通过循环或迭代项并将它们分配到容器中来工作,元组无法接收分配。

创建元组后,将无法对其进行追加,扩展或分配。修改元组的唯一方法是是否可以将其对象之一本身分配给它(是非元组容器)。因为元组仅持有对此类对象的引用。

另外-元组有自己的构造函数tuple(),您可以提供任何迭代器。这意味着要创建一个元组,您可以执行以下操作:

tuple(i for i in (1,2,3))

Comprehension works by looping or iterating over items and assigning them into a container, a Tuple is unable to receive assignments.

Once a Tuple is created, it can not be appended to, extended, or assigned to. The only way to modify a Tuple is if one of its objects can itself be assigned to (is a non-tuple container). Because the Tuple is only holding a reference to that kind of object.

Also – a tuple has its own constructor tuple() which you can give any iterator. Which means that to create a tuple, you could do:

tuple(i for i in (1,2,3))

回答 5

我最好的猜测是,他们用完了括号,并认为这对警告添加“丑陋”语法没有足够的帮助…

My best guess is that they ran out of brackets and didn’t think it would be useful enough to warrent adding an “ugly” syntax …


回答 6

元组不能像列表一样有效地附加。

因此,元组理解将需要在内部使用列表,然后转换为元组。

那将与您现在所做的相同:tuple([comprehension])

Tuples cannot efficiently be appended like a list.

So a tuple comprehension would need to use a list internally and then convert to a tuple.

That would be the same as what you do now : tuple( [ comprehension ] )


回答 7

括号不会创建元组。又名一个=(两个)不是元组。唯一的解决方法是一个=(两个)或一个=元组(两个)。因此,解决方案是:

tuple(i for i in myothertupleorlistordict) 

Parentheses do not create a tuple. aka one = (two) is not a tuple. The only way around is either one = (two,) or one = tuple(two). So a solution is:

tuple(i for i in myothertupleorlistordict) 

回答 8

我相信这只是为了清楚起见,我们不想用太多不同的符号来使语言混乱。同样tuple也不需要理解,可以使用列表,而速度差异可以忽略不计,而不是像dict理解而不是list理解。

I believe it’s simply for the sake of clarity, we do not want to clutter the language with too many different symbols. Also a tuple comprehension is never necessary, a list can just be used instead with negligible speed differences, unlike a dict comprehension as opposed to a list comprehension.


回答 9

我们可以从列表理解中生成元组。下一个将两个数字顺序加到一个元组中,并给出一个从0-9的列表。

>>> print k
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> r= [tuple(k[i:i+2]) for i in xrange(10) if not i%2]
>>> print r
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

We can generate tuples from a list comprehension. The following one adds two numbers sequentially into a tuple and gives a list from numbers 0-9.

>>> print k
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99]
>>> r= [tuple(k[i:i+2]) for i in xrange(10) if not i%2]
>>> print r
[(0, 1), (2, 3), (4, 5), (6, 7), (8, 9)]

创建重复N次的单个项目的列表

问题:创建重复N次的单个项目的列表

我想创建一系列长度不一的列表。每个列表将包含相同的元素e,重复n次数(其中n=列表的长度)。

如何创建列表,而不[e for number in xrange(n)]对每个列表使用列表理解?

I want to create a series of lists, all of varying lengths. Each list will contain the same element e, repeated n times (where n = length of the list).

How do I create the lists, without using a list comprehension [e for number in xrange(n)] for each list?


回答 0

您还可以编写:

[e] * n

您应该注意,例如,如果e是一个空列表,您将得到一个具有n个指向同一列表的引用的列表,而不是n个独立的空列表。

性能测试

乍看之下,似乎是重复是创建一个具有n个相同的元素列表的最快方法:

>>> timeit.timeit('itertools.repeat(0, 10)', 'import itertools', number = 1000000)
0.37095273281943264
>>> timeit.timeit('[0] * 10', 'import itertools', number = 1000000)
0.5577236771712819

但是等等-这不是一个公平的测试…

>>> itertools.repeat(0, 10)
repeat(0, 10)  # Not a list!!!

该函数itertools.repeat实际上并没有创建列表,它只是创建一个对象,您可以根据需要使用该对象来创建列表!让我们再试一次,但转换为列表:

>>> timeit.timeit('list(itertools.repeat(0, 10))', 'import itertools', number = 1000000)
1.7508119747063233

因此,如果您想要列表,请使用[e] * n。如果要延迟生成元素,请使用repeat

You can also write:

[e] * n

You should note that if e is for example an empty list you get a list with n references to the same list, not n independent empty lists.

Performance testing

At first glance it seems that repeat is the fastest way to create a list with n identical elements:

>>> timeit.timeit('itertools.repeat(0, 10)', 'import itertools', number = 1000000)
0.37095273281943264
>>> timeit.timeit('[0] * 10', 'import itertools', number = 1000000)
0.5577236771712819

But wait – it’s not a fair test…

>>> itertools.repeat(0, 10)
repeat(0, 10)  # Not a list!!!

The function itertools.repeat doesn’t actually create the list, it just creates an object that can be used to create a list if you wish! Let’s try that again, but converting to a list:

>>> timeit.timeit('list(itertools.repeat(0, 10))', 'import itertools', number = 1000000)
1.7508119747063233

So if you want a list, use [e] * n. If you want to generate the elements lazily, use repeat.


回答 1

>>> [5] * 4
[5, 5, 5, 5]

当重复的项目是列表时,请当心。该列表将不会被克隆:所有元素都将引用同一列表!

>>> x=[5]
>>> y=[x] * 4
>>> y
[[5], [5], [5], [5]]
>>> y[0][0] = 6
>>> y
[[6], [6], [6], [6]]
>>> [5] * 4
[5, 5, 5, 5]

Be careful when the item being repeated is a list. The list will not be cloned: all the elements will refer to the same list!

>>> x=[5]
>>> y=[x] * 4
>>> y
[[5], [5], [5], [5]]
>>> y[0][0] = 6
>>> y
[[6], [6], [6], [6]]

回答 2

在Python中创建重复n次的单项列表

不变物品

对于不可变的项目,例如“无”,布尔值,整数,浮点数,字符串,元组或Frozensets,可以这样进行:

[e] * 4

请注意,这最好仅与列表中的不可变项(字符串,元组,frozensets)一起使用,因为它们都指向内存中同一位置的同一项。当我必须构建一个包含所有字符串的架构的表时,我会经常使用它,这样就不必提供高度冗余的一对一映射。

schema = ['string'] * len(columns)

可变项

我已经使用Python很长时间了,而且从未见过用可变实例执行上述操作的用例。相反,要获取可变的空列表,集合或字典,您应该执行以下操作:

list_of_lists = [[] for _ in columns]

在这种情况下,下划线只是一个简单的变量名。

如果只有号码,那将是:

list_of_lists = [[] for _ in range(4)]

_不是真的很特别,但你的编码环境风格检查可能会抱怨,如果你不打算使用的变量和使用的其他任何名称。


对可变项使用不可变方法的注意事项:

当心使用可变对象,当您更改其中一个对象时,它们都会更改,因为它们都是同一对象:

foo = [[]] * 4
foo[0].append('x')

foo现在返回:

[['x'], ['x'], ['x'], ['x']]

但是对于不可变的对象,您可以使其起作用,因为您可以更改引用,而不是对象:

>>> l = [0] * 4
>>> l[0] += 1
>>> l
[1, 0, 0, 0]

>>> l = [frozenset()] * 4
>>> l[0] |= set('abc')
>>> l
[frozenset(['a', 'c', 'b']), frozenset([]), frozenset([]), frozenset([])]

但同样,可变对象对此没有好处,因为就地操作会更改对象,而不是引用:

l = [set()] * 4
>>> l[0] |= set('abc')    
>>> l
[set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b'])]

Create List of Single Item Repeated n Times in Python

Immutable items

For immutable items, like None, bools, ints, floats, strings, tuples, or frozensets, you can do it like this:

[e] * 4

Note that this is best only used with immutable items (strings, tuples, frozensets, ) in the list, because they all point to the same item in the same place in memory. I use this frequently when I have to build a table with a schema of all strings, so that I don’t have to give a highly redundant one to one mapping.

schema = ['string'] * len(columns)

Mutable items

I’ve used Python for a long time now, and I have never seen a use-case where I would do the above with a mutable instance. Instead, to get, say, a mutable empty list, set, or dict, you should do something like this:

list_of_lists = [[] for _ in columns]

The underscore is simply a throwaway variable name in this context.

If you only have the number, that would be:

list_of_lists = [[] for _ in range(4)]

The _ is not really special, but your coding environment style checker will probably complain if you don’t intend to use the variable and use any other name.


Caveats for using the immutable method with mutable items:

Beware doing this with mutable objects, when you change one of them, they all change because they’re all the same object:

foo = [[]] * 4
foo[0].append('x')

foo now returns:

[['x'], ['x'], ['x'], ['x']]

But with immutable objects, you can make it work because you change the reference, not the object:

>>> l = [0] * 4
>>> l[0] += 1
>>> l
[1, 0, 0, 0]

>>> l = [frozenset()] * 4
>>> l[0] |= set('abc')
>>> l
[frozenset(['a', 'c', 'b']), frozenset([]), frozenset([]), frozenset([])]

But again, mutable objects are no good for this, because in-place operations change the object, not the reference:

l = [set()] * 4
>>> l[0] |= set('abc')    
>>> l
[set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b'])]

回答 3

Itertools具有此功能:

import itertools
it = itertools.repeat(e,n)

当然可以itertools为您提供迭代器,而不是列表。[e] * n为您提供了一个列表,但是根据您对这些序列的处理方式,itertools变体可能会更加有效。

Itertools has a function just for that:

import itertools
it = itertools.repeat(e,n)

Of course itertools gives you a iterator instead of a list. [e] * n gives you a list, but, depending on what you will do with those sequences, the itertools variant can be much more efficient.


回答 4

正如其他人指出的那样,对可变对象使用*运算符会重复引用,因此,如果更改一个,则会全部更改。如果要创建可变对象的独立实例,则xrange语法是执行此操作的最Python方式。如果您有一个从未使用过的命名变量而感到困扰,则可以使用匿名下划线变量。

[e for _ in xrange(n)]

As others have pointed out, using the * operator for a mutable object duplicates references, so if you change one you change them all. If you want to create independent instances of a mutable object, your xrange syntax is the most Pythonic way to do this. If you are bothered by having a named variable that is never used, you can use the anonymous underscore variable.

[e for _ in xrange(n)]

回答 5

[e] * n

应该管用

[e] * n

should work


生成器表达式与列表理解

问题:生成器表达式与列表理解

什么时候应该使用生成器表达式,什么时候应该在Python中使用列表推导?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

When should you use generator expressions and when should you use list comprehensions in Python?

# Generator expression
(x*2 for x in range(256))

# List comprehension
[x*2 for x in range(256)]

回答 0

John的答案很好(当您要迭代多次时,列表理解会更好)。但是,还应注意,如果要使用任何列表方法,都应使用列表。例如,以下代码将不起作用:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

基本上,如果您要做的只是迭代一次,则使用生成器表达式。如果要存储和使用生成的结果,则最好使用列表理解功能。

由于性能是选择彼此的最常见原因,所以我的建议是不要担心它,而只选择一个即可。如果您发现程序运行速度太慢,则只有这样,您才应回去担心调整代码。

John’s answer is good (that list comprehensions are better when you want to iterate over something multiple times). However, it’s also worth noting that you should use a list if you want to use any of the list methods. For example, the following code won’t work:

def gen():
    return (something for something in get_some_stuff())

print gen()[:2]     # generators don't support indexing or slicing
print [5,6] + gen() # generators can't be added to lists

Basically, use a generator expression if all you’re doing is iterating once. If you want to store and use the generated results, then you’re probably better off with a list comprehension.

Since performance is the most common reason to choose one over the other, my advice is to not worry about it and just pick one; if you find that your program is running too slowly, then and only then should you go back and worry about tuning your code.


回答 1

遍历生成器表达式列表理解将执行相同的操作。但是,列表理解将首先在内存中创建整个列表,而生成器表达式将在运行中创建项目,因此您可以将其用于非常大的(也可以是无限的!)序列。

Iterating over the generator expression or the list comprehension will do the same thing. However, the list comprehension will create the entire list in memory first while the generator expression will create the items on the fly, so you are able to use it for very large (and also infinite!) sequences.


回答 2

当结果需要多次迭代或速度至关重要时,请使用列表推导。使用范围较大或无限的生成器表达式。

有关更多信息,请参见生成器表达式和列表推导。

Use list comprehensions when the result needs to be iterated over multiple times, or where speed is paramount. Use generator expressions where the range is large or infinite.

See Generator expressions and list comprehensions for more info.


回答 3

重要的是列表理解会创建一个新列表。生成器创建一个可迭代的对象,当您使用这些位时,它将动态“过滤”源材料。

假设您有一个名为“ hugefile.txt”的2TB日志文件,并且想要以单词“ ENTRY”开头的所有行的内容和长度。

因此,您尝试通过编写列表理解来开始:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

这样会抓取整个文件,处理每一行,并将匹配的行存储在数组中。因此,此阵列最多可以包含2TB的内容。那会占用很多RAM,对于您的目的可能不切实际。

因此,我们可以使用生成器将“过滤器”应用于我们的内容。直到我们开始遍历结果之前,才实际读取任何数据。

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

甚至没有从我们的文件中读取任何一行。实际上,假设我们想进一步过滤结果:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

仍未读取任何内容,但是我们现在指定了两个生成器,它们将根据需要对数据起作用。

让我们将过滤后的行写到另一个文件中:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

现在我们读取输入文件。随着for循环继续请求其他行,long_entries生成器要求生成器提供行entry_lines,仅返回长度大于80个字符的行。然后,entry_lines生成器从logfile迭代迭代器读取文件。

因此,不是以完全填充列表的形式将数据“推送”到输出函数,而是为输出函数提供了一种仅在需要时才“拉”数据的方法。在我们的情况下,这要高效得多,但不够灵活。生成器是一种方式,一次通过。我们读取的日志文件中的数据会立即被丢弃,因此我们无法返回上一行。另一方面,完成数据后,我们不必担心保留数据。

The important point is that the list comprehension creates a new list. The generator creates a an iterable object that will “filter” the source material on-the-fly as you consume the bits.

Imagine you have a 2TB log file called “hugefile.txt”, and you want the content and length for all the lines that start with the word “ENTRY”.

So you try starting out by writing a list comprehension:

logfile = open("hugefile.txt","r")
entry_lines = [(line,len(line)) for line in logfile if line.startswith("ENTRY")]

This slurps up the whole file, processes each line, and stores the matching lines in your array. This array could therefore contain up to 2TB of content. That’s a lot of RAM, and probably not practical for your purposes.

So instead we can use a generator to apply a “filter” to our content. No data is actually read until we start iterating over the result.

logfile = open("hugefile.txt","r")
entry_lines = ((line,len(line)) for line in logfile if line.startswith("ENTRY"))

Not even a single line has been read from our file yet. In fact, say we want to filter our result even further:

long_entries = ((line,length) for (line,length) in entry_lines if length > 80)

Still nothing has been read, but we’ve specified now two generators that will act on our data as we wish.

Lets write out our filtered lines to another file:

outfile = open("filtered.txt","a")
for entry,length in long_entries:
    outfile.write(entry)

Now we read the input file. As our for loop continues to request additional lines, the long_entries generator demands lines from the entry_lines generator, returning only those whose length is greater than 80 characters. And in turn, the entry_lines generator requests lines (filtered as indicated) from the logfile iterator, which in turn reads the file.

So instead of “pushing” data to your output function in the form of a fully-populated list, you’re giving the output function a way to “pull” data only when its needed. This is in our case much more efficient, but not quite as flexible. Generators are one way, one pass; the data from the log file we’ve read gets immediately discarded, so we can’t go back to a previous line. On the other hand, we don’t have to worry about keeping data around once we’re done with it.


回答 4

生成器表达式的好处是它使用较少的内存,因为它不会立即构建整个列表。当列表是中间变量时,最好使用生成器表达式,例如对结果求和或根据结果创建字典。

例如:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

这样做的好处是列表不会完全生成,因此使用的内存很少(而且应该更快)

但是,当所需的最终产品是列表时,应该使用列表推导。您将不会使用生成器表达式保存任何内存,因为您需要生成的列表。您还可以获得能够使用任何列表功能(如已排序或反转)的好处。

例如:

reversed( [x*2 for x in xrange(256)] )

The benefit of a generator expression is that it uses less memory since it doesn’t build the whole list at once. Generator expressions are best used when the list is an intermediary, such as summing the results, or creating a dict out of the results.

For example:

sum(x*2 for x in xrange(256))

dict( (k, some_func(k)) for k in some_list_of_keys )

The advantage there is that the list isn’t completely generated, and thus little memory is used (and should also be faster)

You should, though, use list comprehensions when the desired final product is a list. You are not going to save any memeory using generator expressions, since you want the generated list. You also get the benefit of being able to use any of the list functions like sorted or reversed.

For example:

reversed( [x*2 for x in xrange(256)] )

回答 5

从可变对象(如列表)创建生成器时,请注意,生成器将在使用生成器时(而不是在创建生成器时)根据列表的状态进行评估:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

如果您的列表有可能被修改(或列表中的可变对象),但是您需要在生成器创建时的状态,则需要使用列表推导。

When creating a generator from a mutable object (like a list) be aware that the generator will get evaluated on the state of the list at time of using the generator, not at time of the creation of the generator:

>>> mylist = ["a", "b", "c"]
>>> gen = (elem + "1" for elem in mylist)
>>> mylist.clear()
>>> for x in gen: print (x)
# nothing

If there is any chance of your list getting modified (or a mutable object inside that list) but you need the state at creation of the generator you need to use a list comprehension instead.


回答 6

有时,您可以从itertools中使用tee函数,它为同一生成器返回多个迭代器,这些迭代器可以独立使用。

Sometimes you can get away with the tee function from itertools, it returns multiple iterators for the same generator that can be used independently.


回答 7

我正在使用Hadoop Mincemeat模块。我认为这是一个值得注意的好例子:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

在这里,生成器从文本文件(最大为15GB)中获取数字,并使用Hadoop的map-reduce对这些数字进行简单的数学运算。如果我没有使用yield函数,而是使用列表理解,那么计算总和和平均值将花费更长的时间(更不用说空间复杂性了)。

Hadoop是利用Generators的所有优点的一个很好的例子。

I’m using the Hadoop Mincemeat module. I think this is a great example to take a note of:

import mincemeat

def mapfn(k,v):
    for w in v:
        yield 'sum',w
        #yield 'count',1


def reducefn(k,v): 
    r1=sum(v)
    r2=len(v)
    print r2
    m=r1/r2
    std=0
    for i in range(r2):
       std+=pow(abs(v[i]-m),2)  
    res=pow((std/r2),0.5)
    return r1,r2,res

Here the generator gets numbers out of a text file (as big as 15GB) and applies simple math on those numbers using Hadoop’s map-reduce. If I had not used the yield function, but instead a list comprehension, it would have taken a much longer time calculating the sums and average (not to mention the space complexity).

Hadoop is a great example for using all the advantages of Generators.