标签归档:list

为什么说Python大数据处理一定要用Numpy Array?

Numpy 是Python科学计算的一个核心模块。它提供了非常高效的数组对象,以及用于处理这些数组对象的工具。一个Numpy数组由许多值组成,所有值的类型是相同的。

Python的核心库提供了 List 列表。列表是最常见的Python数据类型之一,它可以调整大小并且包含不同类型的元素,非常方便。

那么List和Numpy Array到底有什么区别?为什么我们需要在大数据处理的时候使用Numpy Array?答案是性能。

Numpy数据结构在以下方面表现更好:

1.内存大小—Numpy数据结构占用的内存更小。

2.性能—Numpy底层是用C语言实现的,比列表更快。

3.运算方法—内置优化了代数运算等方法。

下面分别讲解在大数据处理时,Numpy数组相对于List的优势。

1.Numpy Array内存占用更小

适当地使用Numpy数组替代List,你能让你的内存占用降低20倍。

对于Python原生的List列表,由于每次新增对象,都需要8个字节来引用新对象,新的对象本身占28个字节(以整数为例)。所以,列表 list 的大小可以用以下公式计算:

64 + 8 * len(lst) + len(lst) * 28 字节

而使用Numpy,就能减少非常多的空间占用。比如长度为n的Numpy整形Array,它需要:

96 + len(a) * 8 字节

可见,数组越大,你节省的内存空间越多。假设你的数组有10亿个元素,那么这个内存占用大小的差距会是GB级别的。

2.Numpy Array速度更快、内置计算方法

运行下面这个脚本,同样是生成某个维度的两个数组并相加,你就能看到原生List和Numpy Array的性能差距。

import time
import numpy as np

size_of_vec = 1000

def pure_python_version():
    t1 = time.time()
    X = range(size_of_vec)
    Y = range(size_of_vec)
    Z = [X[i] + Y[i] for i in range(len(X)) ]
    return time.time() - t1

def numpy_version():
    t1 = time.time()
    X = np.arange(size_of_vec)
    Y = np.arange(size_of_vec)
    Z = X + Y
    return time.time() - t1


t1 = pure_python_version()
t2 = numpy_version()
print(t1, t2)
print("Numpy is in this example " + str(t1/t2) + " faster!")

结果如下:

0.00048732757568359375 0.0002491474151611328
Numpy is in this example 1.955980861244019 faster!

可以看到,Numpy比原生数组快1.95倍。

如果你细心的话,还能发现,Numpy array可以直接执行加法操作。而原生的数组是做不到这点的,这就是Numpy 运算方法的优势。

我们再做几次重复试验,以证明这个性能优势是持久性的。

import numpy as np
from timeit import Timer

size_of_vec = 1000
X_list = range(size_of_vec)
Y_list = range(size_of_vec)
X = np.arange(size_of_vec)
Y = np.arange(size_of_vec)

def pure_python_version():
    Z = [X_list[i] + Y_list[i] for i in range(len(X_list)) ]

def numpy_version():
    Z = X + Y

timer_obj1 = Timer("pure_python_version()", 
                   "from __main__ import pure_python_version")
timer_obj2 = Timer("numpy_version()", 
                   "from __main__ import numpy_version")

print(timer_obj1.timeit(10))
print(timer_obj2.timeit(10))  # Runs Faster!

print(timer_obj1.repeat(repeat=3, number=10))
print(timer_obj2.repeat(repeat=3, number=10)) # repeat to prove it!

结果如下:

0.0029753120616078377
0.00014940369874238968
[0.002683573868125677, 0.002754641231149435, 0.002803879790008068]
[6.536301225423813e-05, 2.9387418180704117e-05, 2.9171351343393326e-05]

可以看到,第二个输出的时间总是小得多,这就证明了这个性能优势是具有持久性的。

所以,如果你在做一些大数据研究,比如金融数据、股票数据的研究,使用Numpy能够节省你不少内存空间,并拥有更强大的性能。

参考文献:https://webcourses.ucf.edu/courses/1249560/pages/python-lists-vs-numpy-arrays-what-is-the-difference

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!

给作者打赏,选择打赏金额
¥1¥5¥10¥20¥50¥100¥200 自定义

​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

Box 为你的字典添加点符号访问特性

正常情况下,我们想访问字典中的某个值,都是通过中括号访问,比如:

test_dict = {"test": {"imdb stars": 6.7, "length": 104}}

print(test_dict["test"]["imdb stars"])
# 104

而通过Box模块,我们可以扩展字典功能,使用点符号访问元素:

from box import Box

movie_box = Box({ "Robin Hood: Men in Tights": { "imdb stars": 6.7, "length": 104 } })

movie_box.Robin_Hood_Men_in_Tights.imdb_stars

# 6.7

另外,可以看到默认情况下转换后,字典键值中的空格被转化为了下划线。

下面具体介绍 Box 模块的使用方法。

1.准备

开始之前,你要确保Python和pip已经成功安装在电脑上,如果没有,请访问这篇文章:超详细Python安装指南 进行安装。

(可选1) 如果你用Python的目的是数据分析,可以直接安装Anaconda:Python数据分析与挖掘好帮手—Anaconda,它内置了Python和pip.

(可选2) 此外,推荐大家用VSCode编辑器来编写小型Python项目:Python 编程的最好搭档—VSCode 详细指南

Windows环境下打开Cmd(开始—运行—CMD),苹果系统环境下请打开Terminal(command+空格输入Terminal),输入命令安装依赖:

pip install --upgrade python-box[all]

2.基本使用

我们可以像文章开头那样传入一个字典给 Box,生成一个Box对象;也可以直接使用参数赋值的方式生成一个Box对象:

from box import Box

my_box = Box(funny_movie='Hudson Hawk', best_movie='Kung Fu Panda')
my_box.funny_movie
# 'Hudson Hawk'

请记住,任何情况下,你往Box对象里添加字典或是数组,这些字典或数组都会被转变为Box对象:

my_box = Box({"team": {"red": {"leader": "Sarge", "members": []}}})
print(my_box.team.red.leader)
# Sarge

my_box.team.blue = {"leader": "Church", "members": []} 
print(repr(my_box.team.blue))
# <Box: {'leader': 'Church', 'members': []}>

访问列表中的 Box 对象也非常轻松:

my_box.team.red.members = [
    {"name": "Grif", "rank": "Minor Junior Private Negative First Class"},
    {"name": "Dick Simmons", "rank": "Captain"}
]

print(my_box.team.red.members[0].name)
# Grif

局限性

请注意,字典中有些默认方法,如:clear, copy, fromkeys, get, items, keys, pop, popitem, setdefault, to_dict, update, merge_update, values,当你的键值和这些方法名称冲突时,你无法使用点符号访问它们。

不过冲突时,你依然可以使用传统的字典取值访问它们,例如:

my_box['keys']

合并

要合并两个Box对象,你只需要通过 merge_update 方法:

from box import Box

box_1 = Box(val={'important_key': 1}) 
box_2 = Box(val={'less_important_key': 2})

box_1.merge_update(box_2)

print(box_1)
# {'val': {'important_key': 1, 'less_important_key': 2}}

当然,你也可以用传统的 update 方法:

from box import Box

box_1 = Box(val={'important_key': 1}) 
box_2 = Box(val={'less_important_key': 2})

box_1.update(box_2)

print(box_1)
# {'val': {'less_important_key': 2}}

转换为原始列表/字典

如果你需要把一个 Box 对象的字典转化为原始字典,.to_dict() 方法就可以帮你实现:

from box import Box

box_1 = Box(val={'important_key': 1}) 

print(box_1)
# {'val': {'less_important_key': 2}}
print(type(box_1))
# <class 'box.box.Box'>
print(type(box_1.to_dict()))
# <class 'dict'>

如果你需要把一个 Box 对象的列表转化为原始列表,你可以使用 .to_list() 方法:

from box import BoxList

my_boxlist = BoxList({'item': x} for x in range(10))
#  <BoxList: [<Box: {'item': 0}>, <Box: {'item': 1}>, ...

my_boxlist[5].item
# 5

print(type(my_boxlist.to_list()))
# <class 'list'>

3.导入导出功能

Box对象有一个很方便的功能,就是能够轻松地将Box对象导出为Json/yaml/csv/msgpack文件:

from box import BoxList

my_boxlist = BoxList({'item': x} for x in range(10))
#  <BoxList: [<Box: {'item': 0}>, <Box: {'item': 1}>, ...

my_boxlist.to_json(filename="test.json")
# 在当前文件夹下生成一个 test.json 文件

此外,还能接受 Json/yaml/csv/msgpack 文件导入:

new_box = Box.from_json(filename="films.json")

各种类型的文件对应的方法如下:

转换器方法描述
to_dict递归地将所有 Box(和 BoxList)对象转换回字典(和列表)
to_json将 Box 对象另存为 JSON 字符串或使用filename参数写入文件
to_yaml将 Box 对象另存为 YAML 字符串或使用filename参数写入文件
to_msgpack将 Box 对象另存为 msgpack 字节或使用filename参数写入文件
to_toml*将 Box 对象另存为 TOML 字符串或使用filename参数写入文件
to_csv**将 BoxList 对象另存为 CSV 字符串或使用filename参数写入文件
from_jsonClassmethod,从一个 JSON 文件或字符串创建一个 Box 对象(所有 Box 参数都可以传递)
from_yaml类方法,从 YAML 文件或字符串创建一个 Box 对象(所有 Box 参数都可以传递)
from_msgpackClassmethod,从msgpack文件或字节创建一个Box对象(所有Box参数都可以传递)
from_toml*Classmethod,从TOML文件或字符串创建一个Box对象(所有Box参数都可以传递)
from_csv**Classmethod,从一个CSV文件或字符串创建一个BoxList对象(可以传递所有BoxList参数)

* 不适用于 BoxList,仅适用于 Box ** 不适用于 Box,仅适用于 BoxList。

还有更多的特性,大家可以参考 Box 模块官方WIki:

https://github.com/cdgriffith/Box/wiki

我们的文章到此就结束啦,如果你喜欢今天的 Python 教程,请持续关注Python实用宝典。

有任何问题,可以在公众号后台回复:加群,回答相应验证信息,进入互助群询问。

原创不易,希望你能在下面点个赞和在看支持我继续创作,谢谢!

给作者打赏,选择打赏金额
¥1¥5¥10¥20¥50¥100¥200 自定义

​Python实用宝典 ( pythondict.com )
不只是一个宝典
欢迎关注公众号:Python实用宝典

如何使用Python将文本文件读取到列表或数组中

问题:如何使用Python将文本文件读取到列表或数组中

我正在尝试将文本文件的行读入python中的列表或数组中。创建后,我只需要能够单独访问列表或数组中的任何项目。

文本文件的格式如下:

0,0,200,0,53,1,0,255,...,0.

...以上,有实际的文本文件中有数百或数千多个项目。

我正在使用以下代码尝试将文件读入列表:

text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()

我得到的输出是:

['0,0,200,0,53,1,0,255,...,0.']
1

显然,它将整个文件读入一个项目列表,而不是单个项目列表。我究竟做错了什么?

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.

The text file is formatted as follows:

0,0,200,0,53,1,0,255,...,0.

Where the ... is above, there actual text file has hundreds or thousands more items.

I’m using the following code to try to read the file into a list:

text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()

The output I get is:

['0,0,200,0,53,1,0,255,...,0.']
1

Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?


回答 0

您将必须使用以下方法将字符串拆分为值列表 split()

所以,

lines = text_file.read().split(',')

You will have to split your string into a list of values using split()

So,

lines = text_file.read().split(',')

EDIT: I didn’t realise there would be so much traction to this. Here’s a more idiomatic approach.

import csv
with open('filename.csv', 'r') as fd:
    reader = csv.reader(fd)
    for row in reader:
        # do something

回答 1

您也可以使用numpy loadtxt

from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)

You can also use numpy loadtxt like

from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)

回答 2

所以您想创建一个列表列表…我们需要从一个空列表开始

list_of_lists = []

接下来,我们逐行读取文件内容

with open('data') as f:
    for line in f:
        inner_list = [elt.strip() for elt in line.split(',')]
        # in alternative, if you need to use the file content as numbers
        # inner_list = [int(elt.strip()) for elt in line.split(',')]
        list_of_lists.append(inner_list)

一个常见的用例是列式数据,但我们的存储单位是文件的行,我们已逐一读取它,因此您可能需要转置 列表列表。这可以通过以下成语来完成

by_cols = zip(*list_of_lists)

另一个常见的用法是为每列命名

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
    by_names[col_name] = by_cols[i]

这样您就可以对同类数据项进行操作

 mean_apple_prices = [money/fruits for money, fruits in
                     zip(by_names['apples revenue'], by_names['apples_sold'])]

我编写的大多数内容都可以使用csv标准库中的模块来加速。另一个第三方模块是pandas,它使您可以自动化典型数据分析的大多数方面(但具有许多依赖性)。


更新虽然在Python 2中zip(*list_of_lists)返回了一个不同的列表(换位后的列表),但在Python 3中情况发生了变化,并zip(*list_of_lists)返回了一个不能下标的zip对象

如果您需要索引访问,则可以使用

by_cols = list(zip(*list_of_lists))

为您提供了两个Python版本中的列表列表。

另一方面,如果您不需要索引访问,而您想要的只是构建一个按列名称索引的字典,那么zip对象就可以了。

file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column

So you want to create a list of lists… We need to start with an empty list

list_of_lists = []

next, we read the file content, line by line

with open('data') as f:
    for line in f:
        inner_list = [elt.strip() for elt in line.split(',')]
        # in alternative, if you need to use the file content as numbers
        # inner_list = [int(elt.strip()) for elt in line.split(',')]
        list_of_lists.append(inner_list)

A common use case is that of columnar data, but our units of storage are the rows of the file, that we have read one by one, so you may want to transpose your list of lists. This can be done with the following idiom

by_cols = zip(*list_of_lists)

Another common use is to give a name to each column

col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
    by_names[col_name] = by_cols[i]

so that you can operate on homogeneous data items

 mean_apple_prices = [money/fruits for money, fruits in
                     zip(by_names['apples revenue'], by_names['apples_sold'])]

Most of what I’ve written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).


Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.

If you need indexed access you can use

by_cols = list(zip(*list_of_lists))

that gives you a list of lists in both versions of Python.

On the other hand, if you don’t need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine…

file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column

回答 3

这个问题问如何将文件中的逗号分隔值内容读取到可迭代列表中:

0,0,200,0,53,1,0,255,...,0.

最简单的方法是使用以下csv模块:

import csv
with open('filename.dat', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')

现在,您可以spamreader像这样轻松地进行迭代:

for row in spamreader:
    print(', '.join(row))

有关更多示例,请参见文档

This question is asking how to read the comma-separated value contents from a file into an iterable list:

0,0,200,0,53,1,0,255,...,0.

The easiest way to do this is with the csv module as follows:

import csv
with open('filename.dat', newline='') as csvfile:
    spamreader = csv.reader(csvfile, delimiter=',')

Now, you can easily iterate over spamreader like this:

for row in spamreader:
    print(', '.join(row))

See documentation for more examples.


如何在Python中将多个值附加到列表

问题:如何在Python中将多个值附加到列表

我试图弄清楚如何在Python中将多个值附加到列表中。我知道有一些方法来做到这一点,如手动输入值,或在PUR追加操作for循环,或appendextend功能。

但是,我想知道是否还有更整洁的方法?也许某个软件包或功能?

I am trying to figure out how to append multiple values to a list in Python. I know there are few methods to do so, such as manually input the values, or put the append operation in a for loop, or the append and extend functions.

However, I wonder if there is a more neat way to do so? Maybe a certain package or function?


回答 0

您可以使用sequence方法list.extend将列表从任意迭代类型中扩展为多个值,无论是另一个列表还是提供值序列的任何其他事物。

>>> lst = [1, 2]
>>> lst.append(3)
>>> lst.append(4)
>>> lst
[1, 2, 3, 4]

>>> lst.extend([5, 6, 7])
>>> lst.extend((8, 9, 10))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

>>> lst.extend(range(11, 14))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

因此,您可以list.append()用来附加单个值,也list.extend()可以附加多个值。

You can use the sequence method list.extend to extend the list by multiple values from any kind of iterable, being it another list or any other thing that provides a sequence of values.

>>> lst = [1, 2]
>>> lst.append(3)
>>> lst.append(4)
>>> lst
[1, 2, 3, 4]

>>> lst.extend([5, 6, 7])
>>> lst.extend((8, 9, 10))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

>>> lst.extend(range(11, 14))
>>> lst
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

So you can use list.append() to append a single value, and list.extend() to append multiple values.


回答 1

除了append函数以外,如果用“多个值”表示另一个列表,则可以像这样简单地将它们连接起来。

>>> a = [1,2,3]
>>> b = [4,5,6]
>>> a + b
[1, 2, 3, 4, 5, 6]

Other than the append function, if by “multiple values” you mean another list, you can simply concatenate them like so.

>>> a = [1,2,3]
>>> b = [4,5,6]
>>> a + b
[1, 2, 3, 4, 5, 6]

回答 2

如果你看一下在官方的文档,你会看到下方appendextend。这就是您要寻找的。

itertools.chain如果您对高效的迭代感兴趣,而不是最终获得一个完全填充的数据结构,那也是很有用的。

If you take a look at the official docs, you’ll see right below append, extend. That’s what your looking for.

There’s also itertools.chain if you are more interested in efficient iteration than ending up with a fully populated data structure.


用括号括起来的列表和括号在Python中有什么区别?

问题:用括号括起来的列表和括号在Python中有什么区别?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

它们都有效吗?是出于某些原因而首选?

>>> x=[1,2]
>>> x[1]
2
>>> x=(1,2)
>>> x[1]
2

Are they both valid? Is one preferred for some reason?


回答 0

方括号是列表,括号是元组

列表是可变的,这意味着您可以更改其内容:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

而元组不是:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

另一个主要区别是元组是可哈希的,这意味着您可以将其用作字典的键。例如:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

请注意,正如许多人指出的那样,您可以将元组加在一起。例如:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

但是,这并不意味着元组是可变的。在上面的示例中,通过将两个元组作为参数相加来构造新的元组。原始元组未修改。为了证明这一点,请考虑以下因素:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

而如果您要使用列表构造相同的示例,则y也会进行更新:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

Square brackets are lists while parentheses are tuples.

A list is mutable, meaning you can change its contents:

>>> x = [1,2]
>>> x.append(3)
>>> x
[1, 2, 3]

while tuples are not:

>>> x = (1,2)
>>> x
(1, 2)
>>> x.append(3)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'tuple' object has no attribute 'append'

The other main difference is that a tuple is hashable, meaning that you can use it as a key to a dictionary, among other things. For example:

>>> x = (1,2)
>>> y = [1,2]
>>> z = {}
>>> z[x] = 3
>>> z
{(1, 2): 3}
>>> z[y] = 4
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Note that, as many people have pointed out, you can add tuples together. For example:

>>> x = (1,2)
>>> x += (3,)
>>> x
(1, 2, 3)

However, this does not mean tuples are mutable. In the example above, a new tuple is constructed by adding together the two tuples as arguments. The original tuple is not modified. To demonstrate this, consider the following:

>>> x = (1,2)
>>> y = x
>>> x += (3,)
>>> x
(1, 2, 3)
>>> y
(1, 2)

Whereas, if you were to construct this same example with a list, y would also be updated:

>>> x = [1, 2]
>>> y = x
>>> x += [3]
>>> x
[1, 2, 3]
>>> y
[1, 2, 3]

回答 1

一个有趣的区别:

lst=[1]
print lst          // prints [1]
print type(lst)    // prints <type 'list'>

notATuple=(1)
print notATuple        // prints 1
print type(notATuple)  // prints <type 'int'>
                                         ^^ instead of tuple(expected)

即使只包含一个值,逗号也必须包含在元组中。例如(1,)代替(1)

One interesting difference :

lst=[1]
print lst          // prints [1]
print type(lst)    // prints <type 'list'>

notATuple=(1)
print notATuple        // prints 1
print type(notATuple)  // prints <type 'int'>
                                         ^^ instead of tuple(expected)

A comma must be included in a tuple even if it contains only a single value. e.g. (1,) instead of (1).


回答 2

它们不是列表,而是列表和元组。您可以在Python教程中阅读有关元组的信息。尽管您可以对列表进行变异,但是使用元组是不可能的。

In [1]: x = (1, 2)

In [2]: x[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/user/<ipython console> in <module>()

TypeError: 'tuple' object does not support item assignment

They are not lists, they are a list and a tuple. You can read about tuples in the Python tutorial. While you can mutate lists, this is not possible with tuples.

In [1]: x = (1, 2)

In [2]: x[0] = 3
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)

/home/user/<ipython console> in <module>()

TypeError: 'tuple' object does not support item assignment

回答 3

方括号和括号的另一种不同之处是方括号可以描述列表的理解,例如 [x for x in y]

相应的括号语法指定一个元组生成器(x for x in y)

您可以使用以下方法获取元组理解: tuple(x for x in y)

请参阅:为什么Python中没有元组理解?

Another way brackets and parentheses differ is that square brackets can describe a list comprehension, e.g. [x for x in y]

Whereas the corresponding parenthetic syntax specifies a tuple generator: (x for x in y)

You can get a tuple comprehension using: tuple(x for x in y)

See: Why is there no tuple comprehension in Python?


回答 4

第一个是列表,第二个是元组。列表是可变的,元组不是。

查看本教程的“ 数据结构”部分和文档的“ 序列类型”部分。

The first is a list, the second is a tuple. Lists are mutable, tuples are not.

Take a look at the Data Structures section of the tutorial, and the Sequence Types section of the documentation.


回答 5

逗号分隔由包含的项目 ()tupleS,那些由封闭[]list秒。

Comma-separated items enclosed by ( and ) are tuples, those enclosed by [ and ] are lists.


如何执行两个列表的按元素相乘?

问题:如何执行两个列表的按元素相乘?

我想执行元素明智的乘法,将两个列表按值在Python中相乘,就像我们在Matlab中可以做到的那样。

这就是我在Matlab中要做的。

a = [1,2,3,4]
b = [2,3,4,5]
a .* b = [2, 6, 12, 20]

对于from 和from的每个组合x * y,列表理解将给出16个列表条目。不确定如何映射。xayb

如果有人对此感兴趣,我有一个数据集,并想乘以Numpy.linspace(1.0, 0.5, num=len(dataset)) =)

I want to perform an element wise multiplication, to multiply two lists together by value in Python, like we can do it in Matlab.

This is how I would do it in Matlab.

a = [1,2,3,4]
b = [2,3,4,5]
a .* b = [2, 6, 12, 20]

A list comprehension would give 16 list entries, for every combination x * y of x from a and y from b. Unsure of how to map this.

If anyone is interested why, I have a dataset, and want to multiply it by Numpy.linspace(1.0, 0.5, num=len(dataset)) =).


回答 0

使用列表理解与zip():混合。

[a*b for a,b in zip(lista,listb)]

Use a list comprehension mixed with zip():.

[a*b for a,b in zip(lista,listb)]

回答 1

由于您已经在使用numpy,因此将数据存储在numpy数组而不是列表中很有意义。完成此操作后,您将免费获得类似智能元素的产品:

In [1]: import numpy as np

In [2]: a = np.array([1,2,3,4])

In [3]: b = np.array([2,3,4,5])

In [4]: a * b
Out[4]: array([ 2,  6, 12, 20])

Since you’re already using numpy, it makes sense to store your data in a numpy array rather than a list. Once you do this, you get things like element-wise products for free:

In [1]: import numpy as np

In [2]: a = np.array([1,2,3,4])

In [3]: b = np.array([2,3,4,5])

In [4]: a * b
Out[4]: array([ 2,  6, 12, 20])

回答 2

使用np.multiply(a,b):

import numpy as np
a = [1,2,3,4]
b = [2,3,4,5]
np.multiply(a,b)

Use np.multiply(a,b):

import numpy as np
a = [1,2,3,4]
b = [2,3,4,5]
np.multiply(a,b)

回答 3

您可以尝试将每个元素乘以一个循环。这样做的捷径是

ab = [a[i]*b[i] for i in range(len(a))]

You can try multiplying each element in a loop. The short hand for doing that is

ab = [a[i]*b[i] for i in range(len(a))]

回答 4

还有一个答案:

-1…需要导入
+1…非常易读

import operator
a = [1,2,3,4]
b = [10,11,12,13]

list(map(operator.mul, a, b))

输出[10、22、36、52]

Yet another answer:

-1 … requires import
+1 … is very readable

import operator
a = [1,2,3,4]
b = [10,11,12,13]

list(map(operator.mul, a, b))

outputs [10, 22, 36, 52]


回答 5

相当直观的方法:

a = [1,2,3,4]
b = [2,3,4,5]
ab = []                        #Create empty list
for i in range(0, len(a)):
     ab.append(a[i]*b[i])      #Adds each element to the list

Fairly intuitive way of doing this:

a = [1,2,3,4]
b = [2,3,4,5]
ab = []                        #Create empty list
for i in range(0, len(a)):
     ab.append(a[i]*b[i])      #Adds each element to the list

回答 6

您可以使用 lambda

foo=[1,2,3,4]
bar=[1,2,5,55]
l=map(lambda x,y:x*y,foo,bar)

you can multiplication using lambda

foo=[1,2,3,4]
bar=[1,2,5,55]
l=map(lambda x,y:x*y,foo,bar)

回答 7

对于大型列表,我们可以反复进行:

product_iter_object = itertools.imap(operator.mul, [1,2,3,4], [2,3,4,5])

product_iter_object.next() 给出输出列表中的每个元素。

输出将是两个输入列表中较短者的长度。

For large lists, we can do it the iter-way:

product_iter_object = itertools.imap(operator.mul, [1,2,3,4], [2,3,4,5])

product_iter_object.next() gives each of the element in the output list.

The output would be the length of the shorter of the two input lists.


回答 8

创建一个数组;将每个列表乘以数组;将数组转换为列表

import numpy as np

a = [1,2,3,4]
b = [2,3,4,5]

c = (np.ones(len(a))*a*b).tolist()

[2.0, 6.0, 12.0, 20.0]

create an array of ones; multiply each list times the array; convert array to a list

import numpy as np

a = [1,2,3,4]
b = [2,3,4,5]

c = (np.ones(len(a))*a*b).tolist()

[2.0, 6.0, 12.0, 20.0]

回答 9

gahooa的答案对于标题中所述的问题是正确的,但是如果列表已经是numpy格式大于十,它将更快(3个数量级)并且可读性更高,如NPE。我得到这些时间:

0.0049ms -> N = 4, a = [i for i in range(N)], c = [a*b for a,b in zip(a, b)]
0.0075ms -> N = 4, a = [i for i in range(N)], c = a * b
0.0167ms -> N = 4, a = np.arange(N), c = [a*b for a,b in zip(a, b)]
0.0013ms -> N = 4, a = np.arange(N), c = a * b
0.0171ms -> N = 40, a = [i for i in range(N)], c = [a*b for a,b in zip(a, b)]
0.0095ms -> N = 40, a = [i for i in range(N)], c = a * b
0.1077ms -> N = 40, a = np.arange(N), c = [a*b for a,b in zip(a, b)]
0.0013ms -> N = 40, a = np.arange(N), c = a * b
0.1485ms -> N = 400, a = [i for i in range(N)], c = [a*b for a,b in zip(a, b)]
0.0397ms -> N = 400, a = [i for i in range(N)], c = a * b
1.0348ms -> N = 400, a = np.arange(N), c = [a*b for a,b in zip(a, b)]
0.0020ms -> N = 400, a = np.arange(N), c = a * b

即从以下测试程序。

import timeit

init = ['''
import numpy as np
N = {}
a = {}
b = np.linspace(0.0, 0.5, len(a))
'''.format(i, j) for i in [4, 40, 400] 
                  for j in ['[i for i in range(N)]', 'np.arange(N)']]

func = ['''c = [a*b for a,b in zip(a, b)]''',
'''c = a * b''']

for i in init:
  for f in func:
    lines = i.split('\n')
    print('{:6.4f}ms -> {}, {}, {}'.format(
           timeit.timeit(f, setup=i, number=1000), lines[2], lines[3], f))

gahooa’s answer is correct for the question as phrased in the heading, but if the lists are already numpy format or larger than ten it will be MUCH faster (3 orders of magnitude) as well as more readable, to do simple numpy multiplication as suggested by NPE. I get these timings:

0.0049ms -> N = 4, a = [i for i in range(N)], c = [a*b for a,b in zip(a, b)]
0.0075ms -> N = 4, a = [i for i in range(N)], c = a * b
0.0167ms -> N = 4, a = np.arange(N), c = [a*b for a,b in zip(a, b)]
0.0013ms -> N = 4, a = np.arange(N), c = a * b
0.0171ms -> N = 40, a = [i for i in range(N)], c = [a*b for a,b in zip(a, b)]
0.0095ms -> N = 40, a = [i for i in range(N)], c = a * b
0.1077ms -> N = 40, a = np.arange(N), c = [a*b for a,b in zip(a, b)]
0.0013ms -> N = 40, a = np.arange(N), c = a * b
0.1485ms -> N = 400, a = [i for i in range(N)], c = [a*b for a,b in zip(a, b)]
0.0397ms -> N = 400, a = [i for i in range(N)], c = a * b
1.0348ms -> N = 400, a = np.arange(N), c = [a*b for a,b in zip(a, b)]
0.0020ms -> N = 400, a = np.arange(N), c = a * b

i.e. from the following test program.

import timeit

init = ['''
import numpy as np
N = {}
a = {}
b = np.linspace(0.0, 0.5, len(a))
'''.format(i, j) for i in [4, 40, 400] 
                  for j in ['[i for i in range(N)]', 'np.arange(N)']]

func = ['''c = [a*b for a,b in zip(a, b)]''',
'''c = a * b''']

for i in init:
  for f in func:
    lines = i.split('\n')
    print('{:6.4f}ms -> {}, {}, {}'.format(
           timeit.timeit(f, setup=i, number=1000), lines[2], lines[3], f))

回答 10

可以使用枚举。

a = [1, 2, 3, 4]
b = [2, 3, 4, 5]

ab = [val * b[i] for i, val in enumerate(a)]

Can use enumerate.

a = [1, 2, 3, 4]
b = [2, 3, 4, 5]

ab = [val * b[i] for i, val in enumerate(a)]

回答 11

map功能在这里可能非常有用。使用map我们可以将任何函数应用于可迭代对象的每个元素。

Python 3.x

>>> def my_mul(x,y):
...     return x*y
...
>>> a = [1,2,3,4]
>>> b = [2,3,4,5]
>>>
>>> list(map(my_mul,a,b))
[2, 6, 12, 20]
>>>

当然:

map(f, iterable)

相当于

[f(x) for x in iterable]

因此,我们可以通过以下方式获得解决方案:

>>> [my_mul(x,y) for x, y in zip(a,b)]
[2, 6, 12, 20]
>>>

在Python 2.x中map()意味着:将函数应用于可迭代的每个元素并构造一个新列表。在Python 3.x中,map构造迭代器而不是列表。

代替my_mul我们可以使用 mul运算符

Python 2.7

>>>from operator import mul # import mul operator
>>>a = [1,2,3,4]
>>>b = [2,3,4,5]
>>>map(mul,a,b)
[2, 6, 12, 20]
>>>

Python 3.5+

>>> from operator import mul
>>> a = [1,2,3,4]
>>> b = [2,3,4,5]
>>> [*map(mul,a,b)]
[2, 6, 12, 20]
>>>

请注意,由于map()构造了迭代器,因此我们使用*可迭代的拆包运算符来获取列表。解压缩方法比list构造函数要快一些:

>>> list(map(mul,a,b))
[2, 6, 12, 20]
>>>

The map function can be very useful here. Using map we can apply any function to each element of an iterable.

Python 3.x

>>> def my_mul(x,y):
...     return x*y
...
>>> a = [1,2,3,4]
>>> b = [2,3,4,5]
>>>
>>> list(map(my_mul,a,b))
[2, 6, 12, 20]
>>>

Of course:

map(f, iterable)

is equivalent to

[f(x) for x in iterable]

So we can get our solution via:

>>> [my_mul(x,y) for x, y in zip(a,b)]
[2, 6, 12, 20]
>>>

In Python 2.x map() means: apply a function to each element of an iterable and construct a new list. In Python 3.x, map construct iterators instead of lists.

Instead of my_mul we could use mul operator

Python 2.7

>>>from operator import mul # import mul operator
>>>a = [1,2,3,4]
>>>b = [2,3,4,5]
>>>map(mul,a,b)
[2, 6, 12, 20]
>>>

Python 3.5+

>>> from operator import mul
>>> a = [1,2,3,4]
>>> b = [2,3,4,5]
>>> [*map(mul,a,b)]
[2, 6, 12, 20]
>>>

Please note that since map() constructs an iterator we use * iterable unpacking operator to get a list. The unpacking approach is a bit faster then the list constructor:

>>> list(map(mul,a,b))
[2, 6, 12, 20]
>>>

回答 12

要维护列表类型,并在一行中完成(当然,在将numpy导入为np之后):

list(np.array([1,2,3,4]) * np.array([2,3,4,5]))

要么

list(np.array(a) * np.array(b))

To maintain the list type, and do it in one line (after importing numpy as np, of course):

list(np.array([1,2,3,4]) * np.array([2,3,4,5]))

or

list(np.array(a) * np.array(b))

回答 13

您可以将其用于相同长度的列表

def lstsum(a, b):
    c=0
    pos = 0
for element in a:
   c+= element*b[pos]
   pos+=1
return c

you can use this for lists of the same length

def lstsum(a, b):
    c=0
    pos = 0
for element in a:
   c+= element*b[pos]
   pos+=1
return c

是什么导致[* a]总体化?

问题:是什么导致[* a]总体化?

显然list(a)不是总归,[x for x in a]在某些时候[*a]总归,始终都是总归吗?

这是从0到12的大小n,以及三种方法的结果大小(以字节为单位):

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

这样计算,可以使用Python 3 在repl.it中重现。8

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

因此:这是如何工作的?如何整体化[*a]?实际上,它使用什么机制从给定的输入创建结果列表?它是否使用了迭代器a并使用了类似的东西list.append?源代码在哪里?

产生图像的数据和代码协作。)

放大到较小的n:

缩小到较大的n:

Apparently list(a) doesn’t overallocate, [x for x in a] overallocates at some points, and [*a] overallocates all the time?

Here are sizes n from 0 to 12 and the resulting sizes in bytes for the three methods:

0 56 56 56
1 64 88 88
2 72 88 96
3 80 88 104
4 88 88 112
5 96 120 120
6 104 120 128
7 112 120 136
8 120 120 152
9 128 184 184
10 136 184 192
11 144 184 200
12 152 184 208

Computed like this, reproducable at repl.it, using Python 3.8:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]))

So: How does this work? How does [*a] overallocate? Actually, what mechanism does it use to create the result list from the given input? Does it use an iterator over a and use something like list.append? Where is the source code?

(Colab with data and code that produced the images.)

Zooming in to smaller n:

Zooming out to larger n:


回答 0

[*a] 在内部执行C等效于

  1. 新建一个空的 list
  2. 呼叫 newlist.extend(a)
  3. 返回list

因此,如果您将测试扩展到:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

在线尝试!

你会看到的结果getsizeof([*a])l = []; l.extend(a); getsizeof(l)是相同的。

通常这是正确的做法;在extending时,您通常希望以后再添加,对于一般的解压缩,类似的情况是,假设要在一个接一个的基础上添加多个内容。[*a]这不是正常情况;Python假定list[*a, b, c, *d])中添加了多个项目或可迭代项,因此在常见情况下,过度分配可以节省工作。

相比之下,list由单个预先确定的可迭代大小(带有list())构成的结构在使用过程中可能不会增长或收缩,并且积算过早,除非另外证明。Python最近修复了一个错误,该错误使构造函数甚至可以对已知大小的输入进行整体分配

至于list理解,它们实际上等效于重复的appends,因此当您一次添加一个元素时,您会看到正常的过度分配增长模式的最终结果。

需要明确的是,这都不是语言保证。这就是CPython实现它的方式。Python语言规范通常与特定的增长模式无关list(除了保证从末期开始摊销O(1) appends和pops)。如评论中所述,具体实现在3.9中再次更改;尽管这不会影响[*a],但可能会影响其他情况,这些情况曾经是“构建tuple单个项目的临时内容,然后extend使用tuple”,现在变成的多个应用程序LIST_APPEND,当发生超额分配以及要计算的数字是多少时,该应用程序可能会发生变化。

[*a] is internally doing the C equivalent of:

  1. Make a new, empty list
  2. Call newlist.extend(a)
  3. Returns list.

So if you expand your test to:

from sys import getsizeof

for n in range(13):
    a = [None] * n
    l = []
    l.extend(a)
    print(n, getsizeof(list(a)),
             getsizeof([x for x in a]),
             getsizeof([*a]),
             getsizeof(l))

Try it online!

you’ll see the results for getsizeof([*a]) and l = []; l.extend(a); getsizeof(l) are the same.

This is usually the right thing to do; when extending you’re usually expecting to add more later, and similarly for generalized unpacking, it’s assumed that multiple things will be added one after the other. [*a] is not the normal case; Python assumes there are multiple items or iterables being added to the list ([*a, b, c, *d]), so overallocation saves work in the common case.

By contrast, a list constructed from a single, presized iterable (with list()) may not grow or shrink during use, and overallocating is premature until proven otherwise; Python recently fixed a bug that made the constructor overallocate even for inputs with known size.

As for list comprehensions, they’re effectively equivalent to repeated appends, so you’re seeing the final result of the normal overallocation growth pattern when adding an element at a time.

To be clear, none of this is a language guarantee. It’s just how CPython implements it. The Python language spec is generally unconcerned with specific growth patterns in list (aside from guaranteeing amortized O(1) appends and pops from the end). As noted in the comments, the specific implementation changes again in 3.9; while it won’t affect [*a], it could affect other cases where what used to be “build a temporary tuple of individual items and then extend with the tuple” now becomes multiple applications of LIST_APPEND, which can change when the overallocation occurs and what numbers go into the calculation.


回答 1

的全貌是什么情况,建立在其他的答案和评论(尤其是ShadowRanger的答案,这也解释了为什么它这样做)。

拆卸显示已BUILD_LIST_UNPACK被使用:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

这是在中ceval.c处理,它会构建一个空列表并将其扩展(带有a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend 用途 list_extend

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

其中调用list_resize的总和为

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

overallocates如下:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

让我们检查一下。用上面的公式计算预期的点数,并通过将预期的字节数乘以8(如我在此处使用64位Python)并添加一个空列表的字节数(即列表对象的固定开销)来计算预期的字节数:

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

输出:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

匹配,除了n = 0list_extend实际上是快捷方式,因此实际上也匹配:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {

Full picture of what happens, building on the other answers and comments (especially ShadowRanger’s answer, which also explains why it’s done like that).

Disassembling shows that BUILD_LIST_UNPACK gets used:

>>> import dis
>>> dis.dis('[*a]')
  1           0 LOAD_NAME                0 (a)
              2 BUILD_LIST_UNPACK        1
              4 RETURN_VALUE

That’s handled in ceval.c, which builds an empty list and extends it (with a):

        case TARGET(BUILD_LIST_UNPACK): {
            ...
            PyObject *sum = PyList_New(0);
              ...
                none_val = _PyList_Extend((PyListObject *)sum, PEEK(i));

_PyList_Extend uses list_extend:

_PyList_Extend(PyListObject *self, PyObject *iterable)
{
    return list_extend(self, iterable);
}

Which calls list_resize with the sum of the sizes:

list_extend(PyListObject *self, PyObject *iterable)
    ...
        n = PySequence_Fast_GET_SIZE(iterable);
        ...
        m = Py_SIZE(self);
        ...
        if (list_resize(self, m + n) < 0) {

And that overallocates as follows:

list_resize(PyListObject *self, Py_ssize_t newsize)
{
  ...
    new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Let’s check that. Compute the expected number of spots with the formula above, and compute the expected byte size by multiplying it with 8 (as I’m using 64-bit Python here) and adding an empty list’s byte size (i.e., a list object’s constant overhead):

from sys import getsizeof
for n in range(13):
    a = [None] * n
    expected_spots = n + (n >> 3) + (3 if n < 9 else 6)
    expected_bytesize = getsizeof([]) + expected_spots * 8
    real_bytesize = getsizeof([*a])
    print(n,
          expected_bytesize,
          real_bytesize,
          real_bytesize == expected_bytesize)

Output:

0 80 56 False
1 88 88 True
2 96 96 True
3 104 104 True
4 112 112 True
5 120 120 True
6 128 128 True
7 136 136 True
8 152 152 True
9 184 184 True
10 192 192 True
11 200 200 True
12 208 208 True

Matches except for n = 0, which list_extend actually shortcuts, so actually that matches, too:

        if (n == 0) {
            ...
            Py_RETURN_NONE;
        }
        ...
        if (list_resize(self, m + n) < 0) {

回答 2

这些将是CPython解释器的实现细节,因此在其他解释器之间可能不一致。

也就是说,您可以list(a)在这里看到理解和行为的位置:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

专为理解:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

在这些行的下面,有list_preallocate_exact调用时使用的行list(a)

These are going to be implementation details of the CPython interpreter, and so may not be consistent across other interpreters.

That said, you can see where the comprehension and list(a) behaviors come in here:

https://github.com/python/cpython/blob/master/Objects/listobject.c#L36

Specifically for the comprehension:

 * The growth pattern is:  0, 4, 8, 16, 25, 35, 46, 58, 72, 88, ...
...

new_allocated = (size_t)newsize + (newsize >> 3) + (newsize < 9 ? 3 : 6);

Just below those lines, there is list_preallocate_exact which is used when calling list(a).


为什么列表中允许尾随逗号?

问题:为什么列表中允许尾随逗号?

我很好奇为什么在Python中列表中的尾部逗号是有效的语法,并且似乎Python只是忽略了它:

>>> ['a','b',]
['a', 'b']

自从('a')('a',)是两个元组时,它是一个有意义的元组,但是在列表中呢?

I am curious why in Python a trailing comma in a list is valid syntax, and it seems that Python simply ignores it:

>>> ['a','b',]
['a', 'b']

It makes sense when its a tuple since ('a') and ('a',) are two different things, but in lists?


回答 0

主要优点是,它使多行列表更易于编辑,并减少了差异。

变更:

s = ['manny',
     'mo',
     'jack',
]

至:

s = ['manny',
     'mo',
     'jack',
     'roger',
]

仅涉及差异的单行更改:

  s = ['manny',
       'mo',
       'jack',
+      'roger',
  ]

当省略尾部逗号时,这击败了更令人困惑的多行差异:

  s = ['manny',
       'mo',
-      'jack'
+      'jack',
+      'roger'
  ]

后者的差异使得很难看到仅添加了一行,而另一行没有更改内容。

它还降低了这样做的风险:

s = ['manny',
     'mo',
     'jack'
     'roger'  # Added this line, but forgot to add a comma on the previous line
]

并触发隐式字符串文字串联,产生s = ['manny', 'mo', 'jackroger']而不是预期的结果。

The main advantages are that it makes multi-line lists easier to edit and that it reduces clutter in diffs.

Changing:

s = ['manny',
     'mo',
     'jack',
]

to:

s = ['manny',
     'mo',
     'jack',
     'roger',
]

involves only a one-line change in the diff:

  s = ['manny',
       'mo',
       'jack',
+      'roger',
  ]

This beats the more confusing multi-line diff when the trailing comma was omitted:

  s = ['manny',
       'mo',
-      'jack'
+      'jack',
+      'roger'
  ]

The latter diff makes it harder to see that only one line was added and that the other line didn’t change content.

It also reduces the risk of doing this:

s = ['manny',
     'mo',
     'jack'
     'roger'  # Added this line, but forgot to add a comma on the previous line
]

and triggering implicit string literal concatenation, producing s = ['manny', 'mo', 'jackroger'] instead of the intended result.


回答 1

这是一种常见的语法约定,允许在数组中尾随逗号,C和Java之类的语言都允许,并且Python似乎已对其列表数据结构采用了该约定。在生成用于填充列表的代码时,它特别有用:只需生成一系列元素和逗号,而无需将最后一个元素和逗号视为特殊情况,并且不应该在末尾加逗号。

It’s a common syntactical convention to allow trailing commas in an array, languages like C and Java allow it, and Python seems to have adopted this convention for its list data structure. It’s particularly useful when generating code for populating a list: just generate a sequence of elements and commas, no need to consider the last one as a special case that shouldn’t have a comma at the end.


回答 2

它有助于消除某种错误。有时在多行上写列表会更清晰。但是在以后的维护中,您可能需要重新排列项目。

l1 = [
        1,
        2,
        3,
        4,
        5
]

# Now you want to rearrange

l1 = [
        1,
        2,
        3,
        5
        4,
]

# Now you have an error

但是,如果允许使用尾随逗号,则可以轻松地重新排列行而不会引起错误。

It helps to eliminate a certain kind of bug. It’s sometimes clearer to write lists on multiple lines. But in, later maintenace you may want to rearrange the items.

l1 = [
        1,
        2,
        3,
        4,
        5
]

# Now you want to rearrange

l1 = [
        1,
        2,
        3,
        5
        4,
]

# Now you have an error

But if you allow trailing commas, and use them, you can easily rearrange the lines without introducing an error.


回答 3

元组的不同之处在于,('a')它使用隐式连续和()s作为优先运算符进行扩展,而('a',)引用长度为1的元组。

你原来的例子是 tuple('a')

A tuple is different because ('a') is expanded using implicit continuation and ()s as a precendence operator, whereas ('a',) refers to a length 1 tuple.

Your original example would have been tuple('a')


回答 4

主要原因是使diff变得不那么复杂。例如,您有一个列表:

list = [
    'a',
    'b',
    'c'
]

并且您想要向其中添加另一个元素。然后,您将最终执行此操作:

list = [
    'a',
    'b',
    'c',
    'd'
]

因此,diff将显示出两行已更改,首先在’c’处添加’,’,在最后一行添加’d’。

因此,python允许在列表的最后一个元素中尾部加上’,’,以防止可能引起混淆的额外差异。

The main reason is to make diff less complicated. For example you have a list :

list = [
    'a',
    'b',
    'c'
]

and you want to add another element to it. Then you will be end up doing this:

list = [
    'a',
    'b',
    'c',
    'd'
]

thus, diff will show that two lines have been changed, first adding ‘,’ in line with ‘c’ and adding ‘d’ at last line.

So, python allows trailing ‘,’ in last element of list, to prevent extra diff which can cause confusion.


将两个LISTS的值之和添加到新的LIST中

问题:将两个LISTS的值之和添加到新的LIST中

我有以下两个列表:

first = [1,2,3,4,5]
second = [6,7,8,9,10]

现在,我想将这两个列表中的项目添加到新列表中。

输出应该是

third = [7,9,11,13,15]

I have the following two lists:

first = [1,2,3,4,5]
second = [6,7,8,9,10]

Now I want to add the items from both of these lists into a new list.

output should be

third = [7,9,11,13,15]

回答 0

zip功能在此处有用,可与列表推导一起使用。

[x + y for x, y in zip(first, second)]

如果您有一个列表列表(而不是两个列表):

lists_of_lists = [[1, 2, 3], [4, 5, 6]]
[sum(x) for x in zip(*lists_of_lists)]
# -> [5, 7, 9]

The zip function is useful here, used with a list comprehension.

[x + y for x, y in zip(first, second)]

If you have a list of lists (instead of just two lists):

lists_of_lists = [[1, 2, 3], [4, 5, 6]]
[sum(x) for x in zip(*lists_of_lists)]
# -> [5, 7, 9]

回答 1

来自文档

import operator
list(map(operator.add, first,second))

From docs

import operator
list(map(operator.add, first,second))

回答 2

假设两个列表a,并b具有相同的长度,你不需要压缩,numpy的或其他任何东西。

Python 2.x和3.x:

[a[i]+b[i] for i in range(len(a))]

Assuming both lists a and b have same length, you do not need zip, numpy or anything else.

Python 2.x and 3.x:

[a[i]+b[i] for i in range(len(a))]

回答 3

numpy中的默认行为是逐组件添加

import numpy as np
np.add(first, second)

哪个输出

array([7,9,11,13,15])

Default behavior in numpy is add componentwise

import numpy as np
np.add(first, second)

which outputs

array([7,9,11,13,15])

回答 4

这可以扩展到任意数量的列表:

[sum(sublist) for sublist in itertools.izip(*myListOfLists)]

在你的情况下,myListOfLists[first, second]

This extends itself to any number of lists:

[sum(sublist) for sublist in itertools.izip(*myListOfLists)]

In your case, myListOfLists would be [first, second]


回答 5

尝试以下代码:

first = [1, 2, 3, 4]
second = [2, 3, 4, 5]
third = map(sum, zip(first, second))

Try the following code:

first = [1, 2, 3, 4]
second = [2, 3, 4, 5]
third = map(sum, zip(first, second))

回答 6

执行此操作的简单方法和快速方法是:

three = [sum(i) for i in zip(first,second)] # [7,9,11,13,15]

另外,您可以使用numpy sum:

from numpy import sum
three = sum([first,second], axis=0) # array([7,9,11,13,15])

The easy way and fast way to do this is:

three = [sum(i) for i in zip(first,second)] # [7,9,11,13,15]

Alternatively, you can use numpy sum:

from numpy import sum
three = sum([first,second], axis=0) # array([7,9,11,13,15])

回答 7

first = [1, 2, 3, 4, 5]
second = [6, 7, 8, 9, 10]
three = map(lambda x,y: x+y,first,second)
print three



Output 
[7, 9, 11, 13, 15]
first = [1, 2, 3, 4, 5]
second = [6, 7, 8, 9, 10]
three = map(lambda x,y: x+y,first,second)
print three



Output 
[7, 9, 11, 13, 15]

回答 8

一线解决方案

list(map(lambda x,y: x+y, a,b))

one-liner solution

list(map(lambda x,y: x+y, a,b))

回答 9

Thiru在3月17日9:25回答了我的问题。

这更加简单快捷,这是他的解决方案:

执行此操作的简单方法和快速方法是:

 three = [sum(i) for i in zip(first,second)] # [7,9,11,13,15]

另外,您可以使用numpy sum:

 from numpy import sum
 three = sum([first,second], axis=0) # array([7,9,11,13,15])

您需要numpy!

numpy数组可以做一些像矢量的操作

import numpy as np
a = [1,2,3,4,5]
b = [6,7,8,9,10]
c = list(np.array(a) + np.array(b))
print c
# [7, 9, 11, 13, 15]

My answer is repeated with Thiru’s that answered it in Mar 17 at 9:25.

It was simpler and quicker, here are his solutions:

The easy way and fast way to do this is:

 three = [sum(i) for i in zip(first,second)] # [7,9,11,13,15]

Alternatively, you can use numpy sum:

 from numpy import sum
 three = sum([first,second], axis=0) # array([7,9,11,13,15])

You need numpy!

numpy array could do some operation like vectors
import numpy as np
a = [1,2,3,4,5]
b = [6,7,8,9,10]
c = list(np.array(a) + np.array(b))
print c
# [7, 9, 11, 13, 15]

回答 10

如果未知数量的相同长度的列表,则可以使用以下功能。

在这里,* args接受可变数量的列表参数(但每个参数仅求和相同数量的元素)。再次使用*来解压缩每个列表中的元素。

def sum_lists(*args):
    return list(map(sum, zip(*args)))

a = [1,2,3]
b = [1,2,3]  

sum_lists(a,b)

输出:

[2, 4, 6]

或有3个清单

sum_lists([5,5,5,5,5], [10,10,10,10,10], [4,4,4,4,4])

输出:

[19, 19, 19, 19, 19]

If you have an unknown number of lists of the same length, you can use the below function.

Here the *args accepts a variable number of list arguments (but only sums the same number of elements in each). The * is used again to unpack the elements in each of the lists.

def sum_lists(*args):
    return list(map(sum, zip(*args)))

a = [1,2,3]
b = [1,2,3]  

sum_lists(a,b)

Output:

[2, 4, 6]

Or with 3 lists

sum_lists([5,5,5,5,5], [10,10,10,10,10], [4,4,4,4,4])

Output:

[19, 19, 19, 19, 19]

回答 11

您可以使用zip(),将两个数组“交织”在一起,然后使用map(),它将对一个可迭代对象中的每个元素应用一个函数:

>>> a = [1,2,3,4,5]
>>> b = [6,7,8,9,10]
>>> zip(a, b)
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]
>>> map(lambda x: x[0] + x[1], zip(a, b))
[7, 9, 11, 13, 15]

You can use zip(), which will “interleave” the two arrays together, and then map(), which will apply a function to each element in an iterable:

>>> a = [1,2,3,4,5]
>>> b = [6,7,8,9,10]
>>> zip(a, b)
[(1, 6), (2, 7), (3, 8), (4, 9), (5, 10)]
>>> map(lambda x: x[0] + x[1], zip(a, b))
[7, 9, 11, 13, 15]

回答 12

这是另一种方法。我们利用python的内部__add__函数:

class SumList(object):
    def __init__(self, this_list):
        self.mylist = this_list

    def __add__(self, other):
        new_list = []
        zipped_list = zip(self.mylist, other.mylist)
        for item in zipped_list:
            new_list.append(item[0] + item[1])
        return SumList(new_list)

    def __repr__(self):
        return str(self.mylist)

list1 = SumList([1,2,3,4,5])
list2 = SumList([10,20,30,40,50])
sum_list1_list2 = list1 + list2
print(sum_list1_list2)

输出量

[11, 22, 33, 44, 55]

Here is another way to do it. We make use of the internal __add__ function of python:

class SumList(object):
    def __init__(self, this_list):
        self.mylist = this_list

    def __add__(self, other):
        new_list = []
        zipped_list = zip(self.mylist, other.mylist)
        for item in zipped_list:
            new_list.append(item[0] + item[1])
        return SumList(new_list)

    def __repr__(self):
        return str(self.mylist)

list1 = SumList([1,2,3,4,5])
list2 = SumList([10,20,30,40,50])
sum_list1_list2 = list1 + list2
print(sum_list1_list2)

Output

[11, 22, 33, 44, 55]

回答 13

如果您还想添加列表中的其余值,则可以使用它(在Python3.5中有效)

def addVectors(v1, v2):
    sum = [x + y for x, y in zip(v1, v2)]
    if not len(v1) >= len(v2):
        sum += v2[len(v1):]
    else:
        sum += v1[len(v2):]

    return sum


#for testing 
if __name__=='__main__':
    a = [1, 2]
    b = [1, 2, 3, 4]
    print(a)
    print(b)
    print(addVectors(a,b))

If you want to add also the rest of the values in the lists you can use this (this is working in Python3.5)

def addVectors(v1, v2):
    sum = [x + y for x, y in zip(v1, v2)]
    if not len(v1) >= len(v2):
        sum += v2[len(v1):]
    else:
        sum += v1[len(v2):]

    return sum


#for testing 
if __name__=='__main__':
    a = [1, 2]
    b = [1, 2, 3, 4]
    print(a)
    print(b)
    print(addVectors(a,b))

回答 14

    first = [1,2,3,4,5]
    second = [6,7,8,9,10]
    #one way
    third = [x + y for x, y in zip(first, second)]
    print("third" , third) 
    #otherway
    fourth = []
    for i,j in zip(first,second):
        global fourth
        fourth.append(i + j)
    print("fourth" , fourth )
#third [7, 9, 11, 13, 15]
#fourth [7, 9, 11, 13, 15]
    first = [1,2,3,4,5]
    second = [6,7,8,9,10]
    #one way
    third = [x + y for x, y in zip(first, second)]
    print("third" , third) 
    #otherway
    fourth = []
    for i,j in zip(first,second):
        global fourth
        fourth.append(i + j)
    print("fourth" , fourth )
#third [7, 9, 11, 13, 15]
#fourth [7, 9, 11, 13, 15]

回答 15

这是另一种方法。它对我来说很好。

N=int(input())
num1 = list(map(int, input().split()))
num2 = list(map(int, input().split()))
sum=[]

for i in range(0,N):
  sum.append(num1[i]+num2[i])

for element in sum:
  print(element, end=" ")

print("")

Here is another way to do it.It is working fine for me .

N=int(input())
num1 = list(map(int, input().split()))
num2 = list(map(int, input().split()))
sum=[]

for i in range(0,N):
  sum.append(num1[i]+num2[i])

for element in sum:
  print(element, end=" ")

print("")

回答 16

j = min(len(l1), len(l2))
l3 = [l1[i]+l2[i] for i in range(j)]
j = min(len(l1), len(l2))
l3 = [l1[i]+l2[i] for i in range(j)]

回答 17

也许是最简单的方法:

first = [1,2,3,4,5]
second = [6,7,8,9,10]
three=[]

for i in range(0,5):
    three.append(first[i]+second[i])

print(three)

Perhaps the simplest approach:

first = [1,2,3,4,5]
second = [6,7,8,9,10]
three=[]

for i in range(0,5):
    three.append(first[i]+second[i])

print(three)

回答 18

如果您将列表视为numpy数组,则需要轻松对其求和:

import numpy as np

third = np.array(first) + np.array(second)

print third

[7, 9, 11, 13, 15]

If you consider your lists as numpy array, then you need to easily sum them:

import numpy as np

third = np.array(first) + np.array(second)

print third

[7, 9, 11, 13, 15]

回答 19

如果您有不同长度的列表怎么办,那么您可以尝试这样的操作(使用zip_longest

from itertools import zip_longest  # izip_longest for python2.x

l1 = [1, 2, 3]
l2 = [4, 5, 6, 7]

>>> list(map(sum, zip_longest(l1, l2, fillvalue=0)))
[5, 7, 9, 7]

What if you have list with different length, then you can try something like this (using zip_longest)

from itertools import zip_longest  # izip_longest for python2.x

l1 = [1, 2, 3]
l2 = [4, 5, 6, 7]

>>> list(map(sum, zip_longest(l1, l2, fillvalue=0)))
[5, 7, 9, 7]

回答 20

您可以使用此方法,但仅当两个列表的大小相同时,该方法才有效:

first = [1, 2, 3, 4, 5]
second = [6, 7, 8, 9, 10]
third = []

a = len(first)
b = int(0)
while True:
    x = first[b]
    y = second[b]
    ans = x + y
    third.append(ans)
    b = b + 1
    if b == a:
        break

print third

You can use this method but it will work only if both the list are of the same size:

first = [1, 2, 3, 4, 5]
second = [6, 7, 8, 9, 10]
third = []

a = len(first)
b = int(0)
while True:
    x = first[b]
    y = second[b]
    ans = x + y
    third.append(ans)
    b = b + 1
    if b == a:
        break

print third

Python在一个列表中查找不在另一个列表中的元素[重复]

问题:Python在一个列表中查找不在另一个列表中的元素[重复]

我需要比较两个列表,以便创建在一个列表中找到但不在另一个列表中找到的特定元素的新列表。例如:

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 

我想遍历list_1,并将list_2中所有在list_1中找不到的元素附加到main_list。

结果应为:

main_list=["f", "m"]

如何使用python做到这一点?

I need to compare two lists in order to create a new list of specific elements found in one list but not in the other. For example:

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 

I want to loop through list_1 and append to main_list all the elements from list_2 that are not found in list_1.

The result should be:

main_list=["f", "m"]

How can I do it with python?


回答 0

TL; DR:
解决方案(1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

解决方案(2) 您需要一个排序列表

def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans
main_list = setdiff_sorted(list_2,list_1)




说明:
(1)可以使用与NumPy的setdiff1darray1array2assume_unique= False)。

assume_unique询问用户数组是否已经唯一。
如果为False,则首先确定唯一元素。
如果为True,则函数将假定元素已经是唯一的,并且函数将跳过确定唯一元素的操作。

这产生了独特的值array1不是array2assume_uniqueFalse默认。

如果您担心 唯一元素(基于Chinny84响应),则只需使用(其中assume_unique=False=>默认值):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2) 对于想要对答案进行排序的人,我做了一个自定义函数:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans

要获得答案,请运行:

main_list = setdiff_sorted(list_2,list_1)

旁注:
(a)解决方案2(自定义函数setdiff_sorted)返回一个列表(与解决方案1中的数组相比)。

(b)如果不确定这些元素是否唯一,则只需setdiff1d在解决方案A和B中都使用NumPy的默认设置。并发症的例子是什么?见注释(c)。

(c)如果两个列表中的任何一个都不唯一,情况将有所不同。
list_2的不是唯一的:list2 = ["a", "f", "c", "m", "m"]。保持list1原样:list_1 = ["a", "b", "c", "d", "e"]
设置assume_uniqueyields 的默认值["f", "m"](在两种解决方案中)。但是,如果您设置了assume_unique=True,两种解决方案都可以["f", "m", "m"]。为什么?这是因为用户认为元素是唯一的)。因此,最好保持assume_unique为其默认值。请注意,两个答案均已排序。

TL;DR:
SOLUTION (1)

import numpy as np
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`

SOLUTION (2) You want a sorted list

def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans
main_list = setdiff_sorted(list_2,list_1)




EXPLANATIONS:
(1) You can use NumPy’s setdiff1d (array1,array2,assume_unique=False).

assume_unique asks the user IF the arrays ARE ALREADY UNIQUE.
If False, then the unique elements are determined first.
If True, the function will assume that the elements are already unique AND function will skip determining the unique elements.

This yields the unique values in array1 that are not in array2. assume_unique is False by default.

If you are concerned with the unique elements (based on the response of Chinny84), then simply use (where assume_unique=False => the default value):

import numpy as np
list_1 = ["a", "b", "c", "d", "e"]
list_2 = ["a", "f", "c", "m"] 
main_list = np.setdiff1d(list_2,list_1)
# yields the elements in `list_2` that are NOT in `list_1`


(2) For those who want answers to be sorted, I’ve made a custom function:

import numpy as np
def setdiff_sorted(array1,array2,assume_unique=False):
    ans = np.setdiff1d(array1,array2,assume_unique).tolist()
    if assume_unique:
        return sorted(ans)
    return ans

To get the answer, run:

main_list = setdiff_sorted(list_2,list_1)

SIDE NOTES:
(a) Solution 2 (custom function setdiff_sorted) returns a list (compared to an array in solution 1).

(b) If you aren’t sure if the elements are unique, just use the default setting of NumPy’s setdiff1d in both solutions A and B. What can be an example of a complication? See note (c).

(c) Things will be different if either of the two lists is not unique.
Say list_2 is not unique: list2 = ["a", "f", "c", "m", "m"]. Keep list1 as is: list_1 = ["a", "b", "c", "d", "e"]
Setting the default value of assume_unique yields ["f", "m"] (in both solutions). HOWEVER, if you set assume_unique=True, both solutions give ["f", "m", "m"]. Why? This is because the user ASSUMED that the elements are unique). Hence, IT IS BETTER TO KEEP assume_unique to its default value. Note that both answers are sorted.


回答 1

您可以使用集:

main_list = list(set(list_2) - set(list_1))

输出:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

根据@JonClements的评论,这是一个更简洁的版本:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']

You can use sets:

main_list = list(set(list_2) - set(list_1))

Output:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> set(list_2) - set(list_1)
set(['m', 'f'])
>>> list(set(list_2) - set(list_1))
['m', 'f']

Per @JonClements’ comment, here is a tidier version:

>>> list_1=["a", "b", "c", "d", "e"]
>>> list_2=["a", "f", "c", "m"]
>>> list(set(list_2).difference(list_1))
['m', 'f']

回答 2

不知道为什么当您拥有本机方法时,上述说明为何如此复杂:

main_list = list(set(list_2)-set(list_1))

Not sure why the above explanations are so complicated when you have native methods available:

main_list = list(set(list_2)-set(list_1))

回答 3

使用这样的列表理解

main_list = [item for item in list_2 if item not in list_1]

输出:

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

编辑:

就像下面的注释中提到的那样,如果列表很大,则以上并不是理想的解决方案。在这种情况下,更好的选择是转换list_1set第一个:

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]

Use a list comprehension like this:

main_list = [item for item in list_2 if item not in list_1]

Output:

>>> list_1 = ["a", "b", "c", "d", "e"]
>>> list_2 = ["a", "f", "c", "m"] 
>>> 
>>> main_list = [item for item in list_2 if item not in list_1]
>>> main_list
['f', 'm']

Edit:

Like mentioned in the comments below, with large lists, the above is not the ideal solution. When that’s the case, a better option would be converting list_1 to a set first:

set_1 = set(list_1)  # this reduces the lookup time from O(n) to O(1)
main_list = [item for item in list_2 if item not in set_1]

回答 4

如果您想要一种单线解决方案(忽略导入),该解决方案仅需要O(max(n, m))长度n和长度输入工作,m而不需要O(n * m)工作,则可以使用以下itertools模块

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

这利用了功能函数在构造上采用回调函数的优势,从而允许它创建一次回调并在每个元素中重用它,而无需将其存储在某个位置(因为filterfalse在内部存储);列表推导和生成器表达式可以做到这一点,但这很丑陋。†

在一行中得到与以下结果相同的结果:

main_list = [x for x in list_2 if x not in list_1]

速度:

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

当然,如果比较是按位置进行的,则:

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

应该生成:

main_list = [2, 3, 4]

(因为in list_2中的值与in 中的相同索引相匹配list_1),您绝对应该使用Patrick的答案,该答案不涉及临时lists或sets(即使sets大致相同O(1),它们每张支票的“常数”因数也比简单的等式支票高) )并且涉及O(min(n, m))工作,比其他任何答案都要少,并且如果您的问题对位置敏感,则是唯一正确的答案当匹配元素以不匹配的偏移量出现时解决方案。

†:使用列表理解来做与单行代码相同的方法是滥用嵌套循环以在“最外层”循环中创建和缓存值,例如:

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

这也给Python 3带来了次要的性能优势(因为现在set_1它在理解代码中处于本地范围内,而不是从每次检查的嵌套范围中查找;在Python 2上则没有关系,因为Python 2并未使用闭包列出理解;它们的作用范围与所使用的作用域相同)。

If you want a one-liner solution (ignoring imports) that only requires O(max(n, m)) work for inputs of length n and m, not O(n * m) work, you can do so with the itertools module:

from itertools import filterfalse

main_list = list(filterfalse(set(list_1).__contains__, list_2))

This takes advantage of the functional functions taking a callback function on construction, allowing it to create the callback once and reuse it for every element without needing to store it somewhere (because filterfalse stores it internally); list comprehensions and generator expressions can do this, but it’s ugly.†

That gets the same results in a single line as:

main_list = [x for x in list_2 if x not in list_1]

with the speed of:

set_1 = set(list_1)
main_list = [x for x in list_2 if x not in set_1]

Of course, if the comparisons are intended to be positional, so:

list_1 = [1, 2, 3]
list_2 = [2, 3, 4]

should produce:

main_list = [2, 3, 4]

(because no value in list_2 has a match at the same index in list_1), you should definitely go with Patrick’s answer, which involves no temporary lists or sets (even with sets being roughly O(1), they have a higher “constant” factor per check than simple equality checks) and involves O(min(n, m)) work, less than any other answer, and if your problem is position sensitive, is the only correct solution when matching elements appear at mismatched offsets.

†: The way to do the same thing with a list comprehension as a one-liner would be to abuse nested looping to create and cache value(s) in the “outermost” loop, e.g.:

main_list = [x for set_1 in (set(list_1),) for x in list_2 if x not in set_1]

which also gives a minor performance benefit on Python 3 (because now set_1 is locally scoped in the comprehension code, rather than looked up from nested scope for each check; on Python 2 that doesn’t matter, because Python 2 doesn’t use closures for list comprehensions; they operate in the same scope they’re used in).


回答 5

main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

输出:

['f', 'm']
main_list=[]
list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"]

for i in list_2:
    if i not in list_1:
        main_list.append(i)

print(main_list)

output:

['f', 'm']

回答 6

我会将zip这些列表放在一起,以逐个元素比较它们。

main_list = [b for a, b in zip(list1, list2) if a!= b]

I would zip the lists together to compare them element by element.

main_list = [b for a, b in zip(list1, list2) if a!= b]

回答 7

我使用了两种方法,发现一种方法比其他方法有用。这是我的答案:

我的输入数据:

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

方法1:np.setdiff1d我喜欢这种方法,因为它保留了位置

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

方法2:尽管给出的答案与方法1相同,但扰乱了顺序

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

方法1完全np.setdiff1d符合我的要求。此答案仅供参考。

I used two methods and I found one method useful over other. Here is my answer:

My input data:

crkmod_mpp = ['M13','M18','M19','M24']
testmod_mpp = ['M13','M14','M15','M16','M17','M18','M19','M20','M21','M22','M23','M24']

Method1: np.setdiff1d I like this approach over other because it preserves the position

test= list(np.setdiff1d(testmod_mpp,crkmod_mpp))
print(test)
['M15', 'M16', 'M22', 'M23', 'M20', 'M14', 'M17', 'M21']

Method2: Though it gives same answer as in Method1 but disturbs the order

test = list(set(testmod_mpp).difference(set(crkmod_mpp)))
print(test)
['POA23', 'POA15', 'POA17', 'POA16', 'POA22', 'POA18', 'POA24', 'POA21']

Method1 np.setdiff1d meets my requirements perfectly. This answer for information.


回答 8

如果应该考虑发生的次数,则可能需要使用类似以下内容的方法collections.Counter

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['f', 'm']

如所承诺的,这也可以将不同的出现次数称为“差异”:

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['a', 'f', 'm']

If the number of occurences should be taken into account you probably need to use something like collections.Counter:

list_1=["a", "b", "c", "d", "e"]
list_2=["a", "f", "c", "m"] 
from collections import Counter
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['f', 'm']

As promised this can also handle differing number of occurences as “difference”:

list_1=["a", "b", "c", "d", "e", 'a']
cnt1 = Counter(list_1)
cnt2 = Counter(list_2)
final = [key for key, counts in cnt2.items() if cnt1.get(key, 0) != counts]

>>> final
['a', 'f', 'm']

回答 9

从ser1中删除ser2中存在的项目。

输入项

ser1 = pd.Series([1、2、3、4、5])ser2 = pd.Series([4、5、6、7、8])

ser1 [〜ser1.isin(ser2)]

From ser1 remove items present in ser2.

Input

ser1 = pd.Series([1, 2, 3, 4, 5]) ser2 = pd.Series([4, 5, 6, 7, 8])

Solution

ser1[~ser1.isin(ser2)]