标签归档:list

改组对象列表

问题:改组对象列表

我有一个对象列表,我想对其进行洗牌。我以为可以使用该random.shuffle方法,但是当列表中包含对象时,这似乎失败了。是否有一种用于改组对象的方法或解决此问题的另一种方法?

import random

class A:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1, a2]

print(random.shuffle(b))

这将失败。

I have a list of objects and I want to shuffle them. I thought I could use the random.shuffle method, but this seems to fail when the list is of objects. Is there a method for shuffling objects or another way around this?

import random

class A:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1, a2]

print(random.shuffle(b))

This will fail.


回答 0

random.shuffle应该管用。这是一个示例,其中对象是列表:

from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)

# print(x)  gives  [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
# of course your results will vary

请注意,随机播放在适当的地方起作用,并返回None。

random.shuffle should work. Here’s an example, where the objects are lists:

from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)

# print(x)  gives  [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
# of course your results will vary

Note that shuffle works in place, and returns None.


回答 1

当您了解到就地改组就是问题所在。我也经常遇到问题,而且似乎也常常忘记如何复制列表。使用sample(a, len(a))是解决方案,使用len(a)作为样本量。有关Python文档,请参见https://docs.python.org/3.6/library/random.html#random.sample

这是使用的简单版本random.sample(),它将经过改组的结果作为新列表返回。

import random

a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False

# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))

try:
    random.sample(a, len(a) + 1)
except ValueError as e:
    print "Nope!", e

# print: no duplicates: True
# print: Nope! sample larger than population

As you learned the in-place shuffling was the problem. I also have problem frequently, and often seem to forget how to copy a list, too. Using sample(a, len(a)) is the solution, using len(a) as the sample size. See https://docs.python.org/3.6/library/random.html#random.sample for the Python documentation.

Here’s a simple version using random.sample() that returns the shuffled result as a new list.

import random

a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False

# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))

try:
    random.sample(a, len(a) + 1)
except ValueError as e:
    print "Nope!", e

# print: no duplicates: True
# print: Nope! sample larger than population

回答 2

我也花了一些时间来做到这一点。但是洗牌的文档非常清楚:

列表随机排列x ; 不返回。

所以你不应该print(random.shuffle(b))。相反random.shuffle(b),然后print(b)

It took me some time to get that too. But the documentation for shuffle is very clear:

shuffle list x in place; return None.

So you shouldn’t print(random.shuffle(b)). Instead do random.shuffle(b) and then print(b).


回答 3

#!/usr/bin/python3

import random

s=list(range(5))
random.shuffle(s) # << shuffle before print or assignment
print(s)

# print: [2, 4, 1, 3, 0]
#!/usr/bin/python3

import random

s=list(range(5))
random.shuffle(s) # << shuffle before print or assignment
print(s)

# print: [2, 4, 1, 3, 0]

回答 4

如果您碰巧已经使用numpy(在科学和金融应用中非常流行),则可以节省导入时间。

import numpy as np    
np.random.shuffle(b)
print(b)

http://docs.scipy.org/doc/numpy/reference/generation/numpy.random.shuffle.html

If you happen to be using numpy already (very popular for scientific and financial applications) you can save yourself an import.

import numpy as np    
np.random.shuffle(b)
print(b)

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.shuffle.html


回答 5

>>> import random
>>> a = ['hi','world','cat','dog']
>>> random.shuffle(a,random.random)
>>> a
['hi', 'cat', 'dog', 'world']

这对我来说可以。确保设置随机方法。

>>> import random
>>> a = ['hi','world','cat','dog']
>>> random.shuffle(a,random.random)
>>> a
['hi', 'cat', 'dog', 'world']

It works fine for me. Make sure to set the random method.


回答 6

如果您有多个列表,则可能要先定义排列(随机排列列表/重新排列列表中项目的方式),然后将其应用于所有列表:

import random

perm = list(range(len(list_one)))
random.shuffle(perm)
list_one = [list_one[index] for index in perm]
list_two = [list_two[index] for index in perm]

脾气暴躁

如果您的列表是numpy数组,则更为简单:

import numpy as np

perm = np.random.permutation(len(list_one))
list_one = list_one[perm]
list_two = list_two[perm]

处理器

我创建了mpu具有以下consistent_shuffle功能的小型实用程序包:

import mpu

# Necessary if you want consistent results
import random
random.seed(8)

# Define example lists
list_one = [1,2,3]
list_two = ['a', 'b', 'c']

# Call the function
list_one, list_two = mpu.consistent_shuffle(list_one, list_two)

请注意,它mpu.consistent_shuffle接受任意数量的参数。因此,您也可以使用它洗牌三个或更多列表。

If you have multiple lists, you might want to define the permutation (the way you shuffle the list / rearrange the items in the list) first and then apply it to all lists:

import random

perm = list(range(len(list_one)))
random.shuffle(perm)
list_one = [list_one[index] for index in perm]
list_two = [list_two[index] for index in perm]

Numpy / Scipy

If your lists are numpy arrays, it is simpler:

import numpy as np

perm = np.random.permutation(len(list_one))
list_one = list_one[perm]
list_two = list_two[perm]

mpu

I’ve created the small utility package mpu which has the consistent_shuffle function:

import mpu

# Necessary if you want consistent results
import random
random.seed(8)

# Define example lists
list_one = [1,2,3]
list_two = ['a', 'b', 'c']

# Call the function
list_one, list_two = mpu.consistent_shuffle(list_one, list_two)

Note that mpu.consistent_shuffle takes an arbitrary number of arguments. So you can also shuffle three or more lists with it.


回答 7

from random import random
my_list = range(10)
shuffled_list = sorted(my_list, key=lambda x: random())

对于要交换订购功能的某些应用程序,此替代方法可能很有用。

from random import random
my_list = range(10)
shuffled_list = sorted(my_list, key=lambda x: random())

This alternative may be useful for some applications where you want to swap the ordering function.


回答 8

在某些情况下,使用numpy数组时,请random.shuffle在数组中使用创建的重复数据。

另一种方法是使用numpy.random.shuffle。如果您已经在使用numpy,那么这是优于generic的首选方法random.shuffle

numpy.random.shuffle

>>> import numpy as np
>>> import random

使用random.shuffle

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

使用numpy.random.shuffle

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> np.random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

In some cases when using numpy arrays, using random.shuffle created duplicate data in the array.

An alternative is to use numpy.random.shuffle. If you’re working with numpy already, this is the preferred method over the generic random.shuffle.

numpy.random.shuffle

Example

>>> import numpy as np
>>> import random

Using random.shuffle:

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

Using numpy.random.shuffle:

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> np.random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

回答 9

对于单行代码,请使用random.sample(list_to_be_shuffled, length_of_the_list)示例:

import random
random.sample(list(range(10)), 10)

输出:[2、9、7、8、3、0、4、1、6、5]

For one-liners, userandom.sample(list_to_be_shuffled, length_of_the_list) with an example:

import random
random.sample(list(range(10)), 10)

outputs: [2, 9, 7, 8, 3, 0, 4, 1, 6, 5]


回答 10

当使用’foo’调用时,’print func(foo)’将输出’func’的返回值。但是,’shuffle’的返回类型为None,因为该列表将被修改,因此不打印任何内容。解决方法:

# shuffle the list in place 
random.shuffle(b)

# print it
print(b)

如果您更喜欢函数式编程风格,则可能需要创建以下包装函数:

def myshuffle(ls):
    random.shuffle(ls)
    return ls

‘print func(foo)’ will print the return value of ‘func’ when called with ‘foo’. ‘shuffle’ however has None as its return type, as the list will be modified in place, hence it prints nothing. Workaround:

# shuffle the list in place 
random.shuffle(b)

# print it
print(b)

If you’re more into functional programming style you might want to make the following wrapper function:

def myshuffle(ls):
    random.shuffle(ls)
    return ls

回答 11

可以定义一个函数shuffledsort与vs 相同sorted

def shuffled(x):
    import random
    y = x[:]
    random.shuffle(y)
    return y

x = shuffled([1, 2, 3, 4])
print x

One can define a function called shuffled (in the same sense of sort vs sorted)

def shuffled(x):
    import random
    y = x[:]
    random.shuffle(y)
    return y

x = shuffled([1, 2, 3, 4])
print x

回答 12

import random

class a:
    foo = "bar"

a1 = a()
a2 = a()
a3 = a()
a4 = a()
b = [a1,a2,a3,a4]

random.shuffle(b)
print(b)

shuffle 到位,因此不要打印结果None,而是列表。

import random

class a:
    foo = "bar"

a1 = a()
a2 = a()
a3 = a()
a4 = a()
b = [a1,a2,a3,a4]

random.shuffle(b)
print(b)

shuffle is in place, so do not print result, which is None, but the list.


回答 13

您可以这样做:

>>> A = ['r','a','n','d','o','m']
>>> B = [1,2,3,4,5,6]
>>> import random
>>> random.sample(A+B, len(A+B))
[3, 'r', 4, 'n', 6, 5, 'm', 2, 1, 'a', 'o', 'd']

如果要返回到两个列表,则可以将此长列表分成两部分。

You can go for this:

>>> A = ['r','a','n','d','o','m']
>>> B = [1,2,3,4,5,6]
>>> import random
>>> random.sample(A+B, len(A+B))
[3, 'r', 4, 'n', 6, 5, 'm', 2, 1, 'a', 'o', 'd']

if you want to go back to two lists, you then split this long list into two.


回答 14

您可以构建一个将列表作为参数并返回列表的随机版本的函数:

from random import *

def listshuffler(inputlist):
    for i in range(len(inputlist)):
        swap = randint(0,len(inputlist)-1)
        temp = inputlist[swap]
        inputlist[swap] = inputlist[i]
        inputlist[i] = temp
    return inputlist

you could build a function that takes a list as a parameter and returns a shuffled version of the list:

from random import *

def listshuffler(inputlist):
    for i in range(len(inputlist)):
        swap = randint(0,len(inputlist)-1)
        temp = inputlist[swap]
        inputlist[swap] = inputlist[i]
        inputlist[i] = temp
    return inputlist

回答 15

""" to shuffle random, set random= True """

def shuffle(x,random=False):
     shuffled = []
     ma = x
     if random == True:
         rando = [ma[i] for i in np.random.randint(0,len(ma),len(ma))]
         return rando
     if random == False:
          for i in range(len(ma)):
          ave = len(ma)//3
          if i < ave:
             shuffled.append(ma[i+ave])
          else:
             shuffled.append(ma[i-ave])    
     return shuffled
""" to shuffle random, set random= True """

def shuffle(x,random=False):
     shuffled = []
     ma = x
     if random == True:
         rando = [ma[i] for i in np.random.randint(0,len(ma),len(ma))]
         return rando
     if random == False:
          for i in range(len(ma)):
          ave = len(ma)//3
          if i < ave:
             shuffled.append(ma[i+ave])
          else:
             shuffled.append(ma[i-ave])    
     return shuffled

回答 16

您可以使用随机播放或采样。两者均来自随机模块。

import random
def shuffle(arr1):
    n=len(arr1)
    b=random.sample(arr1,n)
    return b

要么

import random
def shuffle(arr1):
    random.shuffle(arr1)
    return arr1

you can either use shuffle or sample . both of which come from random module.

import random
def shuffle(arr1):
    n=len(arr1)
    b=random.sample(arr1,n)
    return b

OR

import random
def shuffle(arr1):
    random.shuffle(arr1)
    return arr1

回答 17

确保您没有命名源文件random.py,并且工作目录中没有名为random.pyc ..的文件,这可能会导致程序尝试导入本地random.py文件而不是pythons random模块。

Make sure you are not naming your source file random.py, and that there is not a file in your working directory called random.pyc.. either could cause your program to try and import your local random.py file instead of pythons random module.


回答 18

def shuffle(_list):
    if not _list == []:
        import random
        list2 = []
        while _list != []:
            card = random.choice(_list)
            _list.remove(card)
            list2.append(card)
        while list2 != []:
            card1 = list2[0]
            list2.remove(card1)
            _list.append(card1)
        return _list
def shuffle(_list):
    if not _list == []:
        import random
        list2 = []
        while _list != []:
            card = random.choice(_list)
            _list.remove(card)
            list2.append(card)
        while list2 != []:
            card1 = list2[0]
            list2.remove(card1)
            _list.append(card1)
        return _list

回答 19

import random
class a:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1.foo,a2.foo]
random.shuffle(b)
import random
class a:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1.foo,a2.foo]
random.shuffle(b)

回答 20

改组过程是“有替换的”,因此每个项目的出现可能会改变!至少当列表中的项目也同时列出时。

例如,

ml = [[0], [1]] * 10

后,

random.shuffle(ml)

[0]的数目可以是9或8,但不完全是10。

The shuffling process is “with replacement”, so the occurrence of each item may change! At least when when items in your list is also list.

E.g.,

ml = [[0], [1]] * 10

After,

random.shuffle(ml)

The number of [0] may be 9 or 8, but not exactly 10.


回答 21

计划:无需依赖库就可以完成改组工作。示例:从元素0的开头开始浏览列表;找到一个新的随机位置,例如6,将0的值放在6中,将6的值放在0中。移到元素1并重复此过程,以此类推。

import random
iteration = random.randint(2, 100)
temp_var = 0
while iteration > 0:

    for i in range(1, len(my_list)): # have to use range with len()
        for j in range(1, len(my_list) - i):
            # Using temp_var as my place holder so I don't lose values
            temp_var = my_list[i]
            my_list[i] = my_list[j]
            my_list[j] = temp_var

        iteration -= 1

Plan: Write out the shuffle without relying on a library to do the heavy lifting. Example: Go through the list from the beginning starting with element 0; find a new random position for it, say 6, put 0’s value in 6 and 6’s value in 0. Move on to element 1 and repeat this process, and so on through the rest of the list

import random
iteration = random.randint(2, 100)
temp_var = 0
while iteration > 0:

    for i in range(1, len(my_list)): # have to use range with len()
        for j in range(1, len(my_list) - i):
            # Using temp_var as my place holder so I don't lose values
            temp_var = my_list[i]
            my_list[i] = my_list[j]
            my_list[j] = temp_var

        iteration -= 1

回答 22

它工作正常。我在这里尝试使用功能作为列表对象:

    from random import shuffle

    def foo1():
        print "foo1",

    def foo2():
        print "foo2",

    def foo3():
        print "foo3",

    A=[foo1,foo2,foo3]

    for x in A:
        x()

    print "\r"

    shuffle(A)
    for y in A:
        y()

它打印出来:foo1 foo2 foo3 foo2 foo3 foo1(最后一行中的foos具有随机​​顺序)

It works fine. I am trying it here with functions as list objects:

    from random import shuffle

    def foo1():
        print "foo1",

    def foo2():
        print "foo2",

    def foo3():
        print "foo3",

    A=[foo1,foo2,foo3]

    for x in A:
        x()

    print "\r"

    shuffle(A)
    for y in A:
        y()

It prints out: foo1 foo2 foo3 foo2 foo3 foo1 (the foos in the last row have a random order)


如何在保留订单的同时从列表中删除重复项?

问题:如何在保留订单的同时从列表中删除重复项?

是否有内置的程序在保留顺序的同时从Python列表中删除重复项?我知道我可以使用集合来删除重复项,但这会破坏原始顺序。我也知道我可以这样滚动自己:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(感谢您放松代码示例。)

但是如果可能的话,我想利用一个内置的或更Pythonic的习惯用法。

相关问题:在Python中,从列表中删除重复项以使所有元素在保持顺序唯一的同时最快的算法是什么?

Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(Thanks to unwind for that code sample.)

But I’d like to avail myself of a built-in or a more Pythonic idiom if possible.

Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?


回答 0

在这里,您有一些选择:http : //www.peterbe.com/plog/uniqifiers-benchmark

最快的:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

为什么要分配seen.addseen_add而不是仅打电话给seen.add?Python是一种动态语言,与解决seen.add局部变量相比,解决每次迭代的成本更高。seen.add可能在两次迭代之间发生了变化,并且运行时不够智能,无法排除这种情况。为了安全起见,它必须每次检查对象。

如果您打算在同一数据集上大量使用此功能,则最好使用有序集:http : //code.activestate.com/recipes/528878/

O(1)每个操作的插入,删除和成员检查。

(小的附加说明:seen.add()始终返回None,因此or以上内容仅是尝试进行集合更新的一种方法,而不是逻辑测试的组成部分。)

Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark

Fastest one:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn’t smart enough to rule that out. To play it safe, it has to check the object each time.

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) insertion, deletion and member-check per operation.

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)


回答 1

编辑2016

正如Raymond所指出的那样,在OrderedDictC语言中实现的python 3.5+中,列表理解方法将比OrderedDict(除非您实际上需要列表的末尾-甚至在输入非常短的情况下)慢一些。因此,针对3.5+的最佳解决方案是OrderedDict

重要编辑2015

@abarnert所述,more_itertools库(pip install more_itertools)包含一个unique_everseen为解决此问题而构建的函数,列表理解中没有任何不可读的not seen.add突变。这也是最快的解决方案:

>>> from  more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]

只需导入一个简单的库,就不会有黑客入侵。这来自itertools配方的实现,unique_everseen如下所示:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

在Python中2.7+这种用法公认的通用用法(它可以工作,但没有针对速度进行优化,我现在将使用unique_everseencollections.OrderedDict

运行时间:O(N)

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

看起来比:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

并且没有利用丑陋的hack

not seen.add(x)

这取决于set.add始终返回的就地方法的事实,None因此not None求值为True

但是请注意,尽管破解解决方案具有相同的运行时复杂度O(N),但其原始速度更快。

Edit 2016

As Raymond pointed out, in python 3.5+ where OrderedDict is implemented in C, the list comprehension approach will be slower than OrderedDict (unless you actually need the list at the end – and even then, only if the input is very short). So the best solution for 3.5+ is OrderedDict.

Important Edit 2015

As @abarnert notes, the more_itertools library (pip install more_itertools) contains a unique_everseen function that is built to solve this problem without any unreadable (not seen.add) mutations in list comprehensions. This is also the fastest solution too:

>>> from  more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]

Just one simple library import and no hacks. This comes from an implementation of the itertools recipe unique_everseen which looks like:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

In Python 2.7+ the accepted common idiom (which works but isn’t optimized for speed, I would now use unique_everseen) for this uses collections.OrderedDict:

Runtime: O(N)

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

This looks much nicer than:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

and doesn’t utilize the ugly hack:

not seen.add(x)

which relies on the fact that set.add is an in-place method that always returns None so not None evaluates to True.

Note however that the hack solution is faster in raw speed though it has the same runtime complexity O(N).


回答 2

在Python 2.7中,从迭代器中删除重复项并同时保持其原始顺序的新方法是:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

在Python 3.5中,OrderedDict具有C实现。我的时间表明,这是Python 3.5各种方法中最快也是最短的。

在Python 3.6中,常规字典变得有序且紧凑。(此功能适用于CPython和PyPy,但在其他实现中可能不存在)。这为我们提供了一种在保留订单的同时进行重复数据删除的最快方法:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

在Python 3.7中,保证常规dict在所有实现中都排序。 因此,最短,最快的解决方案是:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

对@max的响应:一旦移至3.6或3.7并使用常规dict而不是OrderedDict,就无法真正以其他任何方式击败性能。该词典非常密集,几乎可以无开销地转换为列表。目标列表的大小预先设置为len(d),这样可以保存列表推导中发生的所有调整大小。同样,由于内部键列表很密集,因此指针的复制几乎快如列表副本。

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

Response to @max: Once you move to 3.6 or 3.7 and use the regular dict instead of OrderedDict, you can’t really beat the performance in any other way. The dictionary is dense and readily converts to a list with almost no overhead. The target list is pre-sized to len(d) which saves all the resizes that occur in a list comprehension. Also, since the internal key list is dense, copying the pointers is about almost fast as a list copy.


回答 3

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

独特→ ['1', '2', '3', '6', '4', '5']

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

unique → ['1', '2', '3', '6', '4', '5']


回答 4

不要踢死马(这个问题很老了,已经有很多好的答案了),但是这里有一种使用熊猫的解决方案,在很多情况下都非常快,而且使用起来很简单。

import pandas as pd

my_list = [0, 1, 2, 3, 4, 1, 2, 3, 5]

>>> pd.Series(my_list).drop_duplicates().tolist()
# Output:
# [0, 1, 2, 3, 4, 5]

Not to kick a dead horse (this question is very old and already has lots of good answers), but here is a solution using pandas that is quite fast in many circumstances and is dead simple to use.

import pandas as pd

my_list = [0, 1, 2, 3, 4, 1, 2, 3, 5]

>>> pd.Series(my_list).drop_duplicates().tolist()
# Output:
# [0, 1, 2, 3, 4, 5]

回答 5

from itertools import groupby
[ key for key,_ in groupby(sortedList)]

该列表甚至不必排序,充分的条件是将相等的值分组在一起。

编辑:我认为“保留顺序”意味着该列表实际上是有序的。如果不是这种情况,那么MizardX的解决方案就是正确的解决方案。

社区编辑:但是,这是“将重复的连续元素压缩为单个元素”的最优雅的方法。

from itertools import groupby
[ key for key,_ in groupby(sortedList)]

The list doesn’t even have to be sorted, the sufficient condition is that equal values are grouped together.

Edit: I assumed that “preserving order” implies that the list is actually ordered. If this is not the case, then the solution from MizardX is the right one.

Community edit: This is however the most elegant way to “compress duplicate consecutive elements into a single element”.


回答 6

我想如果您想维持订单,

您可以尝试以下方法:

list1 = ['b','c','d','b','c','a','a']    
list2 = list(set(list1))    
list2.sort(key=list1.index)    
print list2

或者类似地,您可以执行以下操作:

list1 = ['b','c','d','b','c','a','a']  
list2 = sorted(set(list1),key=list1.index)  
print list2 

您也可以这样做:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
for i in list1:    
    if not i in list2:  
        list2.append(i)`    
print list2

也可以这样写:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
[list2.append(i) for i in list1 if not i in list2]    
print list2 

I think if you wanna maintain the order,

you can try this:

list1 = ['b','c','d','b','c','a','a']    
list2 = list(set(list1))    
list2.sort(key=list1.index)    
print list2

OR similarly you can do this:

list1 = ['b','c','d','b','c','a','a']  
list2 = sorted(set(list1),key=list1.index)  
print list2 

You can also do this:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
for i in list1:    
    if not i in list2:  
        list2.append(i)`    
print list2

It can also be written as this:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
[list2.append(i) for i in list1 if not i in list2]    
print list2 

回答 7

Python 3.7及更高版本中,保证字典记住其键插入顺序。这个问题的答案总结了当前的状况。

OrderedDict因此,该解决方案变得过时了,没有任何导入语句,我们可以简单地发出:

>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> list(dict.fromkeys(lst))
[1, 2, 3, 4]

In Python 3.7 and above, dictionaries are guaranteed to remember their key insertion order. The answer to this question summarizes the current state of affairs.

The OrderedDict solution thus becomes obsolete and without any import statements we can simply issue:

>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> list(dict.fromkeys(lst))
[1, 2, 3, 4]

回答 8

对于另一个非常老的问题的另一个非常晚的答案:

itertools食谱有做到这一点,利用函数seen集技术,但是:

  • 处理标准key功能。
  • 不使用不雅观的骇客。
  • 通过预绑定seen.add而不是查找N次来优化循环。(f7也可以这样做,但某些版本则不能。)
  • 通过使用来优化循环ifilterfalse,因此您只需要遍历Python中的唯一元素,而不是所有元素。(ifilterfalse当然,您仍然可以在内部遍历所有这些对象,但这是在C中,而且速度更快。)

实际上比它快f7吗?这取决于您的数据,因此您必须对其进行测试并查看。如果您最后想要一个列表,请f7使用listcomp,在这里没有办法做到这一点。(您可以直接append代替yielding,也可以将生成器输入到list函数中,但没有一个可以比listcomp内的LIST_APPEND快。)无论如何,通常挤出几微秒的时间不会像之所以重要,是因为它具有易于理解,可重用,已编写的功能,在您要装饰时不需要DSU。

与所有食谱一样,它也可以在中找到more-iterools

如果只希望没有key条件,可以将其简化为:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

For another very late answer to another very old question:

The itertools recipes have a function that does this, using the seen set technique, but:

  • Handles a standard key function.
  • Uses no unseemly hacks.
  • Optimizes the loop by pre-binding seen.add instead of looking it up N times. (f7 also does this, but some versions don’t.)
  • Optimizes the loop by using ifilterfalse, so you only have to loop over the unique elements in Python, instead of all of them. (You still iterate over all of them inside ifilterfalse, of course, but that’s in C, and much faster.)

Is it actually faster than f7? It depends on your data, so you’ll have to test it and see. If you want a list in the end, f7 uses a listcomp, and there’s no way to do that here. (You can directly append instead of yielding, or you can feed the generator into the list function, but neither one can be as fast as the LIST_APPEND inside a listcomp.) At any rate, usually, squeezing out a few microseconds is not going to be as important as having an easily-understandable, reusable, already-written function that doesn’t require DSU when you want to decorate.

As with all of the recipes, it’s also available in more-iterools.

If you just want the no-key case, you can simplify it as:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

回答 9

刚添加另一个(非常高性能的)实现的这样的功能从外部模块1iteration_utilities.unique_everseen

>>> from iteration_utilities import unique_everseen
>>> lst = [1,1,1,2,3,2,2,2,1,3,4]

>>> list(unique_everseen(lst))
[1, 2, 3, 4]

时机

我做了一些计时(Python的3.6),这些表明,它比我测试的所有其他办法,包括更快OrderedDict.fromkeysf7并且more_itertools.unique_everseen

%matplotlib notebook

from iteration_utilities import unique_everseen
from collections import OrderedDict
from more_itertools import unique_everseen as mi_unique_everseen

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

def iteration_utilities_unique_everseen(seq):
    return list(unique_everseen(seq))

def more_itertools_unique_everseen(seq):
    return list(mi_unique_everseen(seq))

def odict(seq):
    return list(OrderedDict.fromkeys(seq))

from simple_benchmark import benchmark

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: list(range(2**i)) for i in range(1, 20)},
              'list size (no duplicates)')
b.plot()

在此处输入图片说明

并确保我也进行了更多重复的测试,以检查是否有所不同:

import random

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(1, 20)},
              'list size (lots of duplicates)')
b.plot()

在此处输入图片说明

还有一个仅包含一个值:

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [1]*(2**i) for i in range(1, 20)},
              'list size (only duplicates)')
b.plot()

在此处输入图片说明

在所有这些情况下,该iteration_utilities.unique_everseen功能都是最快的(在我的计算机上)。


iteration_utilities.unique_everseen函数还可以处理输入中不可散列的值(但是使用O(n*n)性能而不是O(n)可散列值时的性能)。

>>> lst = [{1}, {1}, {2}, {1}, {3}]

>>> list(unique_everseen(lst))
[{1}, {2}, {3}]

1免责声明:我是该软件包的作者。

Just to add another (very performant) implementation of such a functionality from an external module1: iteration_utilities.unique_everseen:

>>> from iteration_utilities import unique_everseen
>>> lst = [1,1,1,2,3,2,2,2,1,3,4]

>>> list(unique_everseen(lst))
[1, 2, 3, 4]

Timings

I did some timings (Python 3.6) and these show that it’s faster than all other alternatives I tested, including OrderedDict.fromkeys, f7 and more_itertools.unique_everseen:

%matplotlib notebook

from iteration_utilities import unique_everseen
from collections import OrderedDict
from more_itertools import unique_everseen as mi_unique_everseen

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

def iteration_utilities_unique_everseen(seq):
    return list(unique_everseen(seq))

def more_itertools_unique_everseen(seq):
    return list(mi_unique_everseen(seq))

def odict(seq):
    return list(OrderedDict.fromkeys(seq))

from simple_benchmark import benchmark

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: list(range(2**i)) for i in range(1, 20)},
              'list size (no duplicates)')
b.plot()

enter image description here

And just to make sure I also did a test with more duplicates just to check if it makes a difference:

import random

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(1, 20)},
              'list size (lots of duplicates)')
b.plot()

enter image description here

And one containing only one value:

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [1]*(2**i) for i in range(1, 20)},
              'list size (only duplicates)')
b.plot()

enter image description here

In all of these cases the iteration_utilities.unique_everseen function is the fastest (on my computer).


This iteration_utilities.unique_everseen function can also handle unhashable values in the input (however with an O(n*n) performance instead of the O(n) performance when the values are hashable).

>>> lst = [{1}, {1}, {2}, {1}, {3}]

>>> list(unique_everseen(lst))
[{1}, {2}, {3}]

1 Disclaimer: I’m the author of that package.


回答 10

对于没有可散列的类型(例如列表列表),基于MizardX:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

For no hashable types (e.g. list of lists), based on MizardX’s:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

回答 11

借用在定义Haskell nub函数的列表时使用的递归思想,这将是一种递归方法:

def unique(lst):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))

例如:

In [118]: unique([1,5,1,1,4,3,4])
Out[118]: [1, 5, 4, 3]

我尝试使用它来增加数据量,并看到了次线性时间复杂度(不确定,但建议对于正常数据应该很好)。

In [122]: %timeit unique(np.random.randint(5, size=(1)))
10000 loops, best of 3: 25.3 us per loop

In [123]: %timeit unique(np.random.randint(5, size=(10)))
10000 loops, best of 3: 42.9 us per loop

In [124]: %timeit unique(np.random.randint(5, size=(100)))
10000 loops, best of 3: 132 us per loop

In [125]: %timeit unique(np.random.randint(5, size=(1000)))
1000 loops, best of 3: 1.05 ms per loop

In [126]: %timeit unique(np.random.randint(5, size=(10000)))
100 loops, best of 3: 11 ms per loop

我也认为很有趣的是,其他操作可以很容易地将其推广到唯一性。像这样:

import operator
def unique(lst, cmp_op=operator.ne):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)

例如,您可以传入一个函数,该函数使用舍入的概念将其舍入为相同的整数,就好像它是“相等”是出于唯一性目的一样,如下所示:

def test_round(x,y):
    return round(x) != round(y)

然后unique(some_list,test_round)将提供列表的唯一元素,其中唯一性不再意味着传统的相等性(这是通过使用任何基于集合或基于dict-key的方法来解决的),而是意味着对于每个可能舍入到的整数K,只有第一个舍入到K的元素,例如:

In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
Out[6]: [1.2, 5, 1.9, 4.2, 3]

Borrowing the recursive idea used in definining Haskell’s nub function for lists, this would be a recursive approach:

def unique(lst):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))

e.g.:

In [118]: unique([1,5,1,1,4,3,4])
Out[118]: [1, 5, 4, 3]

I tried it for growing data sizes and saw sub-linear time-complexity (not definitive, but suggests this should be fine for normal data).

In [122]: %timeit unique(np.random.randint(5, size=(1)))
10000 loops, best of 3: 25.3 us per loop

In [123]: %timeit unique(np.random.randint(5, size=(10)))
10000 loops, best of 3: 42.9 us per loop

In [124]: %timeit unique(np.random.randint(5, size=(100)))
10000 loops, best of 3: 132 us per loop

In [125]: %timeit unique(np.random.randint(5, size=(1000)))
1000 loops, best of 3: 1.05 ms per loop

In [126]: %timeit unique(np.random.randint(5, size=(10000)))
100 loops, best of 3: 11 ms per loop

I also think it’s interesting that this could be readily generalized to uniqueness by other operations. Like this:

import operator
def unique(lst, cmp_op=operator.ne):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)

For example, you could pass in a function that uses the notion of rounding to the same integer as if it was “equality” for uniqueness purposes, like this:

def test_round(x,y):
    return round(x) != round(y)

then unique(some_list, test_round) would provide the unique elements of the list where uniqueness no longer meant traditional equality (which is implied by using any sort of set-based or dict-key-based approach to this problem) but instead meant to take only the first element that rounds to K for each possible integer K that the elements might round to, e.g.:

In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
Out[6]: [1.2, 5, 1.9, 4.2, 3]

回答 12

5倍速减少变体但更复杂

>>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]

说明:

default = (list(), set())
# use list to keep order
# use set to make lookup faster

def reducer(result, item):
    if item not in result[1]:
        result[0].append(item)
        result[1].add(item)
    return result

>>> reduce(reducer, l, default)[0]
[5, 6, 1, 2, 3, 4]

5 x faster reduce variant but more sophisticated

>>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]

Explanation:

default = (list(), set())
# use list to keep order
# use set to make lookup faster

def reducer(result, item):
    if item not in result[1]:
        result[0].append(item)
        result[1].add(item)
    return result

>>> reduce(reducer, l, default)[0]
[5, 6, 1, 2, 3, 4]

回答 13

您可以引用列表理解,因为它是由符号“ _ [1]”构建的。
例如,以下函数通过引用元素列表理解来唯一化元素列表,而不更改其顺序。

def unique(my_list): 
    return [x for x in my_list if x not in locals()['_[1]']]

演示:

l1 = [1, 2, 3, 4, 1, 2, 3, 4, 5]
l2 = [x for x in l1 if x not in locals()['_[1]']]
print l2

输出:

[1, 2, 3, 4, 5]

You can reference a list comprehension as it is being built by the symbol ‘_[1]’.
For example, the following function unique-ifies a list of elements without changing their order by referencing its list comprehension.

def unique(my_list): 
    return [x for x in my_list if x not in locals()['_[1]']]

Demo:

l1 = [1, 2, 3, 4, 1, 2, 3, 4, 5]
l2 = [x for x in l1 if x not in locals()['_[1]']]
print l2

Output:

[1, 2, 3, 4, 5]

回答 14

MizardX的答案很好地总结了多种方法。

这是我在大声思考时想到的:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

MizardX’s answer gives a good collection of multiple approaches.

This is what I came up with while thinking aloud:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

回答 15

这是一种简单的方法:

list1 = ["hello", " ", "w", "o", "r", "l", "d"]
sorted(set(list1 ), key=lambda x:list1.index(x))

给出输出:

["hello", " ", "w", "o", "r", "l", "d"]

here is a simple way to do it:

list1 = ["hello", " ", "w", "o", "r", "l", "d"]
sorted(set(list1 ), key=lambda x:list1.index(x))

that gives the output:

["hello", " ", "w", "o", "r", "l", "d"]

回答 16

您可以进行某种丑陋的列表理解技巧。

[l[i] for i in range(len(l)) if l.index(l[i]) == i]

You could do a sort of ugly list comprehension hack.

[l[i] for i in range(len(l)) if l.index(l[i]) == i]

回答 17

相对有效_sorted_numpy数组方法:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])

输出:

array([ 1,  3,  8, 12])

Relatively effective approach with _sorted_ a numpy arrays:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])

Outputs:

array([ 1,  3,  8, 12])

回答 18

l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))

生成器表达式使用集合的O(1)查找来确定是否在新列表中包括元素。

l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))

A generator expression that uses the O(1) look up of a set to determine whether or not to include an element in the new list.


回答 19

一个简单的递归解决方案:

def uniquefy_list(a):
    return uniquefy_list(a[1:]) if a[0] in a[1:] else [a[0]]+uniquefy_list(a[1:]) if len(a)>1 else [a[0]]

A simple recursive solution:

def uniquefy_list(a):
    return uniquefy_list(a[1:]) if a[0] in a[1:] else [a[0]]+uniquefy_list(a[1:]) if len(a)>1 else [a[0]]

回答 20

消除序列中的重复值,但保留其余项目的顺序。使用通用发生器功能。

# for hashable sequence
def remove_duplicates(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

a = [1, 5, 2, 1, 9, 1, 5, 10]
list(remove_duplicates(a))
# [1, 5, 2, 9, 10]



# for unhashable sequence
def remove_duplicates(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

a = [ {'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
list(remove_duplicates(a, key=lambda d: (d['x'],d['y'])))
# [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

Eliminating the duplicate values in a sequence, but preserve the order of the remaining items. Use of general purpose generator function.

# for hashable sequence
def remove_duplicates(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

a = [1, 5, 2, 1, 9, 1, 5, 10]
list(remove_duplicates(a))
# [1, 5, 2, 9, 10]



# for unhashable sequence
def remove_duplicates(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

a = [ {'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
list(remove_duplicates(a, key=lambda d: (d['x'],d['y'])))
# [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

回答 21

如果您需要一支班轮,那么这可能会有所帮助:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))

…应该可以,但是如果我错了,请纠正我

If you need one liner then maybe this would help:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))

… should work but correct me if i’m wrong


回答 22

如果您经常使用pandas,并且美观优先于性能,那么请考虑内置功能pandas.Series.drop_duplicates

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

定时:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop

If you routinely use pandas, and aesthetics is preferred over performance, then consider the built-in function pandas.Series.drop_duplicates:

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

Timing:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop

回答 23

这将保留订单并在O(n)时间运行。基本上,这个想法是在发现重复的地方创建一个孔,并将其下沉到底部。利用读写指针。每当发现重复项时,只有读指针前进,写指针停留在重复项上以覆盖它。

def deduplicate(l):
    count = {}
    (read,write) = (0,0)
    while read < len(l):
        if l[read] in count:
            read += 1
            continue
        count[l[read]] = True
        l[write] = l[read]
        read += 1
        write += 1
    return l[0:write]

this will preserve order and run in O(n) time. basically the idea is to create a hole wherever there is a duplicate found and sink it down to the bottom. makes use of a read and write pointer. whenever a duplicate is found only the read pointer advances and write pointer stays on the duplicate entry to overwrite it.

def deduplicate(l):
    count = {}
    (read,write) = (0,0)
    while read < len(l):
        if l[read] in count:
            read += 1
            continue
        count[l[read]] = True
        l[write] = l[read]
        read += 1
        write += 1
    return l[0:write]

回答 24

不使用导入的模块或集的解决方案:

text = "ask not what your country can do for you ask what you can do for your country"
sentence = text.split(" ")
noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
print(noduplicates)

给出输出:

['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']

A solution without using imported modules or sets:

text = "ask not what your country can do for you ask what you can do for your country"
sentence = text.split(" ")
noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
print(noduplicates)

Gives output:

['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']

回答 25

就地方法

此方法是二次方的,因为我们对列表的每个元素都有一个线性查找列表(由于存在以下原因,我们必须增加重新排列列表的成本) del)。

就是说,如果我们从列表的末尾开始,朝着原点前进,删除子列表左侧的每个术语,就有可能进行适当的操作

代码中的这个想法很简单

for i in range(len(l)-1,0,-1): 
    if l[i] in l[:i]: del l[i] 

实施的简单测试

In [91]: from random import randint, seed                                                                                            
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
In [93]: for i in range(len(l)-1,0,-1): 
    ...:     print(l) 
    ...:     print(i, l[i], l[:i], end='') 
    ...:     if l[i] in l[:i]: 
    ...:          print( ': remove', l[i]) 
    ...:          del l[i] 
    ...:     else: 
    ...:          print() 
    ...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]

In [94]:                                                                                                                             

An in-place method

This method is quadratic, because we have a linear lookup into the list for every element of the list (to that we have to add the cost of rearranging the list because of the del s).

That said, it is possible to operate in place if we start from the end of the list and proceed toward the origin removing each term that is present in the sub-list at its left

This idea in code is simply

for i in range(len(l)-1,0,-1): 
    if l[i] in l[:i]: del l[i] 

A simple test of the implementation

In [91]: from random import randint, seed                                                                                            
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
In [93]: for i in range(len(l)-1,0,-1): 
    ...:     print(l) 
    ...:     print(i, l[i], l[:i], end='') 
    ...:     if l[i] in l[:i]: 
    ...:          print( ': remove', l[i]) 
    ...:          del l[i] 
    ...:     else: 
    ...:          print() 
    ...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]

In [94]:                                                                                                                             

回答 26

zmk的方法使用列表理解,它非常快,但自然保持顺序。对于区分大小写的字符串,可以轻松对其进行修改。这也保留了原始情况。

def DelDupes(aseq) :
    seen = set()
    return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]

紧密相关的功能是:

def HasDupes(aseq) :
    s = set()
    return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)

def GetDupes(aseq) :
    s = set()
    return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))

zmk’s approach uses list comprehension which is very fast, yet keeps the order naturally. For applying to case sensitive strings it can be easily modified. This also preserves the original case.

def DelDupes(aseq) :
    seen = set()
    return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]

Closely associated functions are:

def HasDupes(aseq) :
    s = set()
    return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)

def GetDupes(aseq) :
    s = set()
    return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))

回答 27

熊猫用户应退房pandas.unique

>>> import pandas as pd
>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> pd.unique(lst)
array([1, 2, 3, 4])

该函数返回一个NumPy数组。如果需要,可以使用tolist方法将其转换为列表。

pandas users should check out pandas.unique.

>>> import pandas as pd
>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> pd.unique(lst)
array([1, 2, 3, 4])

The function returns a NumPy array. If needed, you can convert it to a list with the tolist method.


回答 28

一种班轮清单理解:

values_non_duplicated = [value for index, value in enumerate(values) if value not in values[ : index]]

只需添加一个条件以检查该不在先前位置

One liner list comprehension:

values_non_duplicated = [value for index, value in enumerate(values) if value not in values[ : index]]

Simply add a conditional to check that value is not on a previous position


为什么[]比list()快?

问题:为什么[]比list()快?

我最近比较了[]和的处理速度,并list()惊讶地发现它的[]运行速度是的三倍以上list()。我跑了相同的测试与{}dict(),结果几乎相同:[]{}两个花了大约0.128sec /百万次,而list()dict()把每个粗0.428sec /万次。

为什么是这样?不要[]{}(可能()'',太)立即传回了一些空的股票面值的副本,而其明确命名同行(list()dict()tuple()str())完全去创建一个对象,他们是否真的有元素?

我不知道这两种方法有何不同,但我很想找出答案。我在文档中或SO上都找不到答案,而寻找空括号却比我预期的要麻烦得多。

通过分别调用timeit.timeit("[]")timeit.timeit("list()"),和timeit.timeit("{}")timeit.timeit("dict()")来比较列表和字典,以获得计时结果。我正在运行Python 2.7.9。

我最近发现“ 为什么True慢于if? ”比较了if Trueto 的性能,if 1并且似乎触及了类似的文字对全局场景;也许也值得考虑。

I recently compared the processing speeds of [] and list() and was surprised to discover that [] runs more than three times faster than list(). I ran the same test with {} and dict() and the results were practically identical: [] and {} both took around 0.128sec / million cycles, while list() and dict() took roughly 0.428sec / million cycles each.

Why is this? Do [] and {} (and probably () and '', too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list(), dict(), tuple(), str()) fully go about creating an object, whether or not they actually have elements?

I have no idea how these two methods differ but I’d love to find out. I couldn’t find an answer in the docs or on SO, and searching for empty brackets turned out to be more problematic than I’d expected.

I got my timing results by calling timeit.timeit("[]") and timeit.timeit("list()"), and timeit.timeit("{}") and timeit.timeit("dict()"), to compare lists and dictionaries, respectively. I’m running Python 2.7.9.

I recently discovered “Why is if True slower than if 1?” that compares the performance of if True to if 1 and seems to touch on a similar literal-versus-global scenario; perhaps it’s worth considering as well.


回答 0

因为[]{}文字语法。Python可以创建字节码仅用于创建列表或字典对象:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list()dict()是单独的对象。它们的名称需要解析,必须包含堆栈以推入参数,必须存储框架以供以后检索,并且必须进行调用。这都需要更多时间。

对于空的情况,这意味着您至少要有一个LOAD_NAME(必须在全局命名空间以及__builtin__模块中进行搜索),后跟一个CALL_FUNCTION必须保留当前帧的:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

您可以使用以下命令分别计时名称查找timeit

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

时间差异可能是字典哈希冲突。从调用这些对象的时间中减去这些时间,然后将结果与使用文字的时间进行比较:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

因此,1.00 - 0.31 - 0.30 == 0.39每1000万次调用必须调用该对象花费了额外的几秒钟。

您可以通过将全局名称别名为本地名称来避免全局查找成本(使用timeit设置,绑定到名称的所有内容都是本地名称):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

但您永远无法克服这些CALL_FUNCTION成本。

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the __builtin__ module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.


回答 1

list()需要全局查找和函数调用,但需要[]编译为一条指令。看到:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

list() requires a global lookup and a function call but [] compiles to a single instruction. See:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

回答 2

因为list是一个功能转化说一个字符串列表对象,而[]用于创建一个列表蝙蝠。尝试一下(可能对您更有意义):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

y = ["wham bam"]
>>> y
["wham bam"]

为您提供包含您所输入内容的实际列表。

Because list is a function to convert say a string to a list object, while [] is used to create a list off the bat. Try this (might make more sense to you):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

While

y = ["wham bam"]
>>> y
["wham bam"]

Gives you a actual list containing whatever you put in it.


回答 3

至此,答案非常好,并完全涵盖了这个问题。对于那些感兴趣的人,我将进一步从字节码中删除。我正在使用CPython的最新仓库;在这方面,旧版本的行为类似,但可能会稍作更改。

这是每个BUILD_LIST针对for []CALL_FUNCTIONfor 的执行情况的细分list()


BUILD_LIST指令:

您应该只查看恐怖:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

我知道那令人费解。这是多么简单:

  • 使用创建新列表PyList_New(主要是为新的列表对象分配内存),以oparg信号指示堆栈上的参数数量。开门见山。
  • 检查是否没有问题if (list==NULL)
  • 使用PyList_SET_ITEM(宏)添加位于堆栈上的所有参数(在我们的示例中此参数未执行)。

难怪它很快!它是为创建新列表而定制的,仅此而已:-)

CALL_FUNCTION指令:

窥视代码处理时,这是您看到的第一件事CALL_FUNCTION

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

看起来很无害吧?好吧,不是,不幸的call_function是,不是一个会立即调用该函数的直截了当的家伙,它不会。相反,它从堆栈中获取对象,获取堆栈中的所有参数,然后根据对象的类型进行切换。它是:

我们正在调用list类型,传入的参数call_functionPyList_Type。CPython现在必须调用一个泛型函数来处理名为的所有可调用对象_PyObject_FastCallKeywords,还有更多函数调用。

该函数再次检查某些函数类型(我不明白为什么),然后在为kwargs创建字典后,如果需要,继续调用_PyObject_FastCallDict

_PyObject_FastCallDict终于把我们带到某个地方!执行后甚至更多的检查抓住了tp_call从插槽type中的type我们在通过了,那就是它抓住type.tp_call。然后,它根据传入的参数来创建元组_PyStack_AsTuple,最后可以最终进行调用

tp_call,它将匹配type.__call__并最终创建列表对象。它调用与之__new__对应的列表PyType_GenericNew并为其分配内存PyType_GenericAlloc这实际上是它与追上的部分PyList_New,最后。所有以前的内容对于以通用方式处理对象都是必需的。

最后,使用任何可用参数type_call调用list.__init__并初始化列表,然后继续返回原来的方式。:-)

最后,记住 LOAD_NAME,这是另一个在这里做出贡献的家伙。


很容易看到,在处理我们的输入时,Python通常必须跳过圈以真正找到合适的C函数来完成工作。它不具有立即调用它的功能,因为它是动态的,有人可能会掩盖list并且男孩会做很多人做的事情),因此必须采取另一条路。

这是哪里 list()损失很多的地方:正在探索的Python需要做以找出它应该做什么。

另一方面,字面语法恰好意味着一回事。它无法更改,并且始终以预定的方式运行。

脚注:所有功能名称均可能从一个版本更改为另一个版本。关键点仍然存在,并且很可能在将来的任何版本中都存在,这是动态查找使事情变慢的原因。

The answers here are great, to the point and fully cover this question. I’ll drop a further step down from byte-code for those interested. I’m using the most recent repo of CPython; older versions behave similar in this regard but slight changes might be in place.

Here’s a break down of the execution for each of these, BUILD_LIST for [] and CALL_FUNCTION for list().


The BUILD_LIST instruction:

You should just view the horror:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

Terribly convoluted, I know. This is how simple it is:

  • Create a new list with PyList_New (this mainly allocates the memory for a new list object), oparg signalling the number of arguments on the stack. Straight to the point.
  • Check that nothing went wrong with if (list==NULL).
  • Add any arguments (in our case this isn’t executed) located on the stack with PyList_SET_ITEM (a macro).

No wonder it is fast! It’s custom-made for creating new lists, nothing else :-)

The CALL_FUNCTION instruction:

Here’s the first thing you see when you peek at the code handling CALL_FUNCTION:

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

Looks pretty harmless, right? Well, no, unfortunately not, call_function is not a straightforward guy that will call the function immediately, it can’t. Instead, it grabs the object from the stack, grabs all arguments of the stack and then switches based on the type of the object; is it a:

We’re calling the list type, the argument passed in to call_function is PyList_Type. CPython now has to call a generic function to handle any callable objects named _PyObject_FastCallKeywords, yay more function calls.

This function again makes some checks for certain function types (which I cannot understand why) and then, after creating a dict for kwargs if required, goes on to call _PyObject_FastCallDict.

_PyObject_FastCallDict finally gets us somewhere! After performing even more checks it grabs the tp_call slot from the type of the type we’ve passed in, that is, it grabs type.tp_call. It then proceeds to create a tuple out of of the arguments passed in with _PyStack_AsTuple and, finally, a call can finally be made!

tp_call, which matches type.__call__ takes over and finally creates the list object. It calls the lists __new__ which corresponds to PyType_GenericNew and allocates memory for it with PyType_GenericAlloc: This is actually the part where it catches up with PyList_New, finally. All the previous are necessary to handle objects in a generic fashion.

In the end, type_call calls list.__init__ and initializes the list with any available arguments, then we go on a returning back the way we came. :-)

Finally, remmeber the LOAD_NAME, that’s another guy that contributes here.


It’s easy to see that, when dealing with our input, Python generally has to jump through hoops in order to actually find out the appropriate C function to do the job. It doesn’t have the curtesy of immediately calling it because it’s dynamic, someone might mask list (and boy do many people do) and another path must be taken.

This is where list() loses much: The exploring Python needs to do to find out what the heck it should do.

Literal syntax, on the other hand, means exactly one thing; it cannot be changed and always behaves in a pre-determined way.

Footnote: All function names are subject to change from one release to the other. The point still stands and most likely will stand in any future versions, it’s the dynamic look-up that slows things down.


回答 4

为什么[]要比list()

最大的原因是Python list()就像用户定义的函数一样对待,这意味着您可以通过别名别名来拦截它list并做一些不同的事情(例如使用您自己的子类列表或双端队列)。

它将立即使用创建新的内置列表实例[]

我的解释旨在为您提供直觉。

说明

[] 通常称为文字语法。

在语法中,这称为“列表显示”。从文档

列表显示是括在方括号中的一系列可能为空的表达式:

list_display ::=  "[" [starred_list | comprehension] "]"

列表显示将产生一个新的列表对象,其内容由表达式列表或理解列表指定。提供逗号分隔的表达式列表时,将按从左到右的顺序评估其元素,并将其按此顺序放入列表对象中。提供理解后,将根据理解产生的元素来构建列表。

简而言之,这意味着将list创建一个内置类型的对象。

不能回避这一点-这意味着Python可以尽快完成它。

另一方面,list()可以list使用内置列表构造函数拦截创建内置对象的过程。

例如,假设我们希望创建噪音较大的列表:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

然后,我们可以list在模块级别的全局范围内截取该名称,然后在创建时list,实际上创建了子类型列表:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

同样,我们可以将其从全局命名空间中删除

del list

并将其放在内置命名空间中:

import builtins
builtins.list = List

现在:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

并注意列表显示无条件创建列表:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

我们可能只是暂时执行此操作,所以请撤消更改-首先List从内置文件中删除新对象:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

哦,不,我们失去了原来的踪迹。

不用担心,我们仍然可以得到list-它是列表文字的类型:

>>> builtins.list = type([])
>>> list()
[]

所以…

为什么[]要比list()

如我们所见-我们可以覆盖list-但是我们不能截取文字类型的创建。使用时,list我们必须进行查找以查看是否存在任何内容。

然后,我们必须调用已查找的任何可调用对象。从语法上:

调用使用一系列可能为空的参数来调用可调用对象(例如,函数):

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

我们可以看到它对任何名称都具有相同的作用,而不仅仅是列表:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

因为[]在Python字节码级别没有函数调用:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

它只是直接建立列表而无需在字节码级别进行任何查找或调用。

结论

我们已经证明了list可以使用范围规则用用户代码拦截,并且可以list()查找可调用对象然后调用它。

[]列表显示或文字显示则避免了名称查找和函数调用。

Why is [] faster than list()?

The biggest reason is that Python treats list() just like a user-defined function, which means you can intercept it by aliasing something else to list and do something different (like use your own subclassed list or perhaps a deque).

It immediately creates a new instance of a builtin list with [].

My explanation seeks to give you the intuition for this.

Explanation

[] is commonly known as literal syntax.

In the grammar, this is referred to as a “list display”. From the docs:

A list display is a possibly empty series of expressions enclosed in square brackets:

list_display ::=  "[" [starred_list | comprehension] "]"

A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension.

In short, this means that a builtin object of type list is created.

There is no circumventing this – which means Python can do it as quickly as it may.

On the other hand, list() can be intercepted from creating a builtin list using the builtin list constructor.

For example, say we want our lists to be created noisily:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

We could then intercept the name list on the module level global scope, and then when we create a list, we actually create our subtyped list:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

Similarly we could remove it from the global namespace

del list

and put it in the builtin namespace:

import builtins
builtins.list = List

And now:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

And note that the list display creates a list unconditionally:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

We probably only do this temporarily, so lets undo our changes – first remove the new List object from the builtins:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

Oh, no, we lost track of the original.

Not to worry, we can still get list – it’s the type of a list literal:

>>> builtins.list = type([])
>>> list()
[]

So…

Why is [] faster than list()?

As we’ve seen – we can overwrite list – but we can’t intercept the creation of the literal type. When we use list we have to do the lookups to see if anything is there.

Then we have to call whatever callable we have looked up. From the grammar:

A call calls a callable object (e.g., a function) with a possibly empty series of arguments:

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

We can see that it does the same thing for any name, not just list:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

For [] there is no function call at the Python bytecode level:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

It simply goes straight to building the list without any lookups or calls at the bytecode level.

Conclusion

We have demonstrated that list can be intercepted with user code using the scoping rules, and that list() looks for a callable and then calls it.

Whereas [] is a list display, or a literal, and thus avoids the name lookup and function call.


从字符串列表中删除空字符串

问题:从字符串列表中删除空字符串

我想从python中的字符串列表中删除所有空字符串。

我的想法如下:

while '' in str_list:
    str_list.remove('')

还有其他pythonic方式可以做到这一点吗?

I want to remove all empty strings from a list of strings in python.

My idea looks like this:

while '' in str_list:
    str_list.remove('')

Is there any more pythonic way to do this?


回答 0

我会使用filter

str_list = filter(None, str_list)
str_list = filter(bool, str_list)
str_list = filter(len, str_list)
str_list = filter(lambda item: item, str_list)

Python 3从返回一个迭代器filter,因此应包装在对的调用中list()

str_list = list(filter(None, str_list))

I would use filter:

str_list = filter(None, str_list)
str_list = filter(bool, str_list)
str_list = filter(len, str_list)
str_list = filter(lambda item: item, str_list)

Python 3 returns an iterator from filter, so should be wrapped in a call to list()

str_list = list(filter(None, str_list))

回答 1

使用列表理解是最Python的方式:

>>> strings = ["first", "", "second"]
>>> [x for x in strings if x]
['first', 'second']

如果必须就地修改列表,因为还有其他引用必须看到更新的数据,则使用分片分配:

strings[:] = [x for x in strings if x]

Using a list comprehension is the most Pythonic way:

>>> strings = ["first", "", "second"]
>>> [x for x in strings if x]
['first', 'second']

If the list must be modified in-place, because there are other references which must see the updated data, then use a slice assignment:

strings[:] = [x for x in strings if x]

回答 2

过滤器实际上对此有一个特殊的选择:

filter(None, sequence)

它将滤除所有评估为False的元素。此处无需使用实际的可调用对象,例如bool,len等。

和map(bool,…)一样快

filter actually has a special option for this:

filter(None, sequence)

It will filter out all elements that evaluate to False. No need to use an actual callable here such as bool, len and so on.

It’s equally fast as map(bool, …)


回答 3

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']

>>> ' '.join(lstr).split()
['hello', 'world']

>>> filter(None, lstr)
['hello', ' ', 'world', ' ']

比较时间

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
4.226747989654541
>>> timeit('filter(None, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.0278358459472656

请注意,filter(None, lstr)它不会删除带有空格的空字符串' ',只会修剪掉''而同时' '.join(lstr).split()删除它们。

要使用filter()删除的空格字符串,需要花费更多时间:

>>> timeit('filter(None, [l.replace(" ", "") for l in lstr])', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
18.101892948150635
>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']

>>> ' '.join(lstr).split()
['hello', 'world']

>>> filter(None, lstr)
['hello', ' ', 'world', ' ']

Compare time

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
4.226747989654541
>>> timeit('filter(None, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.0278358459472656

Notice that filter(None, lstr) does not remove empty strings with a space ' ', it only prunes away '' while ' '.join(lstr).split() removes both.

To use filter() with white space strings removed, it takes a lot more time:

>>> timeit('filter(None, [l.replace(" ", "") for l in lstr])', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
18.101892948150635

回答 4

@ Ib33X的回复很棒。如果要删除每个空字符串,请剥离后。您也需要使用strip方法。否则,如果有空格,它将也返回空字符串。如,“”对于该答案也将有效。这样,就可以实现。

strings = ["first", "", "second ", " "]
[x.strip() for x in strings if x.strip()]

答案是["first", "second"]
如果要改用filtermethod,可以执行like
list(filter(lambda item: item.strip(), strings))。这给出了相同的结果。

Reply from @Ib33X is awesome. If you want to remove every empty string, after stripped. you need to use the strip method too. Otherwise, it will return the empty string too if it has white spaces. Like, ” ” will be valid too for that answer. So, can be achieved by.

strings = ["first", "", "second ", " "]
[x.strip() for x in strings if x.strip()]

The answer for this will be ["first", "second"].
If you want to use filter method instead, you can do like
list(filter(lambda item: item.strip(), strings)). This is give the same result.


回答 5

代替if x,我将使用if X!=”来消除空字符串。像这样:

str_list = [x for x in str_list if x != '']

这将在列表中保留“无”数据类型。此外,如果您的列表中有整数,并且0是其中的一个,它也将被保留。

例如,

str_list = [None, '', 0, "Hi", '', "Hello"]
[x for x in str_list if x != '']
[None, 0, "Hi", "Hello"]

Instead of if x, I would use if X != ” in order to just eliminate empty strings. Like this:

str_list = [x for x in str_list if x != '']

This will preserve None data type within your list. Also, in case your list has integers and 0 is one among them, it will also be preserved.

For example,

str_list = [None, '', 0, "Hi", '', "Hello"]
[x for x in str_list if x != '']
[None, 0, "Hi", "Hello"]

回答 6

根据列表的大小,如果您使用list.remove()而不是创建新列表,则可能是最有效的:

l = ["1", "", "3", ""]

while True:
  try:
    l.remove("")
  except ValueError:
    break

这具有不创建新列表的优点,但是具有每次都必须从头开始搜索的缺点,尽管与while '' in l上面建议的用法不同,它每次出现时仅需要搜索一次''(当然,有一种方法可以保持最佳状态)两种方法,但更为复杂)。

Depending on the size of your list, it may be most efficient if you use list.remove() rather than create a new list:

l = ["1", "", "3", ""]

while True:
  try:
    l.remove("")
  except ValueError:
    break

This has the advantage of not creating a new list, but the disadvantage of having to search from the beginning each time, although unlike using while '' in l as proposed above, it only requires searching once per occurrence of '' (there is certainly a way to keep the best of both methods, but it is more complicated).


回答 7

请记住,如果要将空格保留在字符串中,则可以使用某些方法无意中将其删除。如果你有这个清单

[‘hello world’,”,’,’hello’]您可能想要的内容[‘hello world’,’hello’]

首先修剪列表以将任何类型的空格转换为空字符串:

space_to_empty = [x.strip() for x in _text_list]

然后从列表中删除空字符串

space_clean_list = [x for x in space_to_empty if x]

Keep in mind that if you want to keep the white spaces within a string, you may remove them unintentionally using some approaches. If you have this list

[‘hello world’, ‘ ‘, ”, ‘hello’] what you may want [‘hello world’,’hello’]

first trim the list to convert any type of white space to empty string:

space_to_empty = [x.strip() for x in _text_list]

then remove empty string from them list

space_clean_list = [x for x in space_to_empty if x]

回答 8

用途filter

newlist=filter(lambda x: len(x)>0, oldlist) 

如所指出的,使用过滤器的缺点是它比替代方法慢。而且,lambda通常很昂贵。

或者,您可以选择最简单,最迭代的方法:

# I am assuming listtext is the original list containing (possibly) empty items
for item in listtext:
    if item:
        newlist.append(str(item))
# You can remove str() based on the content of your original list

这是最直观的方法,并且可以在适当的时间内完成。

Use filter:

newlist=filter(lambda x: len(x)>0, oldlist) 

The drawbacks of using filter as pointed out is that it is slower than alternatives; also, lambda is usually costly.

Or you can go for the simplest and the most iterative of all:

# I am assuming listtext is the original list containing (possibly) empty items
for item in listtext:
    if item:
        newlist.append(str(item))
# You can remove str() based on the content of your original list

this is the most intuitive of the methods and does it in decent time.


回答 9

正如Aziz Alto 所报告的filter(None, lstr)那样,不会删除带有空格的空字符串,' '但是如果您确定lstr仅包含字符串,则可以使用filter(str.strip, lstr)

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']
>>> ' '.join(lstr).split()
['hello', 'world']
>>> filter(str.strip, lstr)
['hello', 'world']

比较我的电脑上的时间

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.356455087661743
>>> timeit('filter(str.strip, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
5.276503801345825

删除''和清空带有空格的字符串的最快解决方案' '仍然是' '.join(lstr).split()

如评论中所述,如果您的字符串包含空格,则情况会有所不同。

>>> lstr = ['hello', '', ' ', 'world', '    ', 'see you']
>>> lstr
['hello', '', ' ', 'world', '    ', 'see you']
>>> ' '.join(lstr).split()
['hello', 'world', 'see', 'you']
>>> filter(str.strip, lstr)
['hello', 'world', 'see you']

您会看到filter(str.strip, lstr)保留带空格的字符串,但' '.join(lstr).split()会拆分这些字符串。

As reported by Aziz Alto filter(None, lstr) does not remove empty strings with a space ' ' but if you are sure lstr contains only string you can use filter(str.strip, lstr)

>>> lstr = ['hello', '', ' ', 'world', ' ']
>>> lstr
['hello', '', ' ', 'world', ' ']
>>> ' '.join(lstr).split()
['hello', 'world']
>>> filter(str.strip, lstr)
['hello', 'world']

Compare time on my pc

>>> from timeit import timeit
>>> timeit('" ".join(lstr).split()', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
3.356455087661743
>>> timeit('filter(str.strip, lstr)', "lstr=['hello', '', ' ', 'world', ' ']", number=10000000)
5.276503801345825

The fastest solution to remove '' and empty strings with a space ' ' remains ' '.join(lstr).split().

As reported in a comment the situation is different if your strings contain spaces.

>>> lstr = ['hello', '', ' ', 'world', '    ', 'see you']
>>> lstr
['hello', '', ' ', 'world', '    ', 'see you']
>>> ' '.join(lstr).split()
['hello', 'world', 'see', 'you']
>>> filter(str.strip, lstr)
['hello', 'world', 'see you']

You can see that filter(str.strip, lstr) preserve strings with spaces on it but ' '.join(lstr).split() will split this strings.


回答 10

总结最佳答案:

1.消除空洞而无需剥离:

也就是说,保留所有空格字符串:

slist = list(filter(None, slist))

优点:

  • 最简单
  • 最快(请参见下面的基准)。

2.去除剥离后的空容器…

2.a …当字符串在单词之间不包含空格时:

slist = ' '.join(slist).split()

优点:

  • 小代码
  • 快速(但由于内存原因,对于大型数据集而言并非最快,这与@ paolo-melchiorre结果相反)

2.b …字符串在单词之间包含空格吗?

slist = list(filter(str.strip, slist))

优点:

  • 最快的;
  • 代码的可理解性。

2018年机器上的基准测试:

## Build test-data
#
import random, string
nwords = 10000
maxlen = 30
null_ratio = 0.1
rnd = random.Random(0)                  # deterministic results
words = [' ' * rnd.randint(0, maxlen)
         if rnd.random() > (1 - null_ratio)
         else
         ''.join(random.choices(string.ascii_letters, k=rnd.randint(0, maxlen)))
         for _i in range(nwords)
        ]

## Test functions
#
def nostrip_filter(slist):
    return list(filter(None, slist))

def nostrip_comprehension(slist):
    return [s for s in slist if s]

def strip_filter(slist):
    return list(filter(str.strip, slist))

def strip_filter_map(slist): 
    return list(filter(None, map(str.strip, slist))) 

def strip_filter_comprehension(slist):  # waste memory
    return list(filter(None, [s.strip() for s in slist]))

def strip_filter_generator(slist):
    return list(filter(None, (s.strip() for s in slist)))

def strip_join_split(slist):  # words without(!) spaces
    return ' '.join(slist).split()

## Benchmarks
#
%timeit nostrip_filter(words)
142 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit nostrip_comprehension(words)
263 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter(words)
653 µs ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_map(words)
642 µs ± 36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_comprehension(words)
693 µs ± 42.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_generator(words)
750 µs ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_join_split(words)
796 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

Sum up best answers:

1. Eliminate emtpties WITHOUT stripping:

That is, all-space strings are retained:

slist = list(filter(None, slist))

PROs:

  • simplest;
  • fastest (see benchmarks below).

2. To eliminate empties after stripping …

2.a … when strings do NOT contain spaces between words:

slist = ' '.join(slist).split()

PROs:

  • small code
  • fast (BUT not fastest with big datasets due to memory, contrary to what @paolo-melchiorre results)

2.b … when strings contain spaces between words?

slist = list(filter(str.strip, slist))

PROs:

  • fastest;
  • understandability of the code.

Benchmarks on a 2018 machine:

## Build test-data
#
import random, string
nwords = 10000
maxlen = 30
null_ratio = 0.1
rnd = random.Random(0)                  # deterministic results
words = [' ' * rnd.randint(0, maxlen)
         if rnd.random() > (1 - null_ratio)
         else
         ''.join(random.choices(string.ascii_letters, k=rnd.randint(0, maxlen)))
         for _i in range(nwords)
        ]

## Test functions
#
def nostrip_filter(slist):
    return list(filter(None, slist))

def nostrip_comprehension(slist):
    return [s for s in slist if s]

def strip_filter(slist):
    return list(filter(str.strip, slist))

def strip_filter_map(slist): 
    return list(filter(None, map(str.strip, slist))) 

def strip_filter_comprehension(slist):  # waste memory
    return list(filter(None, [s.strip() for s in slist]))

def strip_filter_generator(slist):
    return list(filter(None, (s.strip() for s in slist)))

def strip_join_split(slist):  # words without(!) spaces
    return ' '.join(slist).split()

## Benchmarks
#
%timeit nostrip_filter(words)
142 µs ± 16.8 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit nostrip_comprehension(words)
263 µs ± 19.1 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter(words)
653 µs ± 37.5 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_map(words)
642 µs ± 36 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_comprehension(words)
693 µs ± 42.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_filter_generator(words)
750 µs ± 28.6 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%timeit strip_join_split(words)
796 µs ± 103 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

回答 11

对于包含空格和空值的列表,请使用简单的列表理解-

>>> s = ['I', 'am', 'a', '', 'great', ' ', '', '  ', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', '', 'a', '', 'joke', '', ' ', '', '?', '', '', '', '?']

因此,您可以看到,此列表包含空格和null元素的组合。使用摘要-

>>> d = [x for x in s if x.strip()]
>>> d
>>> d = ['I', 'am', 'a', 'great', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', 'a', 'joke', '?', '?']

For a list with a combination of spaces and empty values, use simple list comprehension –

>>> s = ['I', 'am', 'a', '', 'great', ' ', '', '  ', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', '', 'a', '', 'joke', '', ' ', '', '?', '', '', '', '?']

So, you can see, this list has a combination of spaces and null elements. Using the snippet –

>>> d = [x for x in s if x.strip()]
>>> d
>>> d = ['I', 'am', 'a', 'great', 'person', '!!', 'Do', 'you', 'think', 'its', 'a', 'a', 'joke', '?', '?']

使用Python将列表写入文件

问题:使用Python将列表写入文件

因为writelines()不插入换行符,这是将列表写入文件的最干净的方法吗?

file.writelines(["%s\n" % item  for item in list])

似乎会有一种标准的方法…

Is this the cleanest way to write a list to a file, since writelines() doesn’t insert newline characters?

file.writelines(["%s\n" % item  for item in list])

It seems like there would be a standard way…


回答 0

您可以使用循环:

with open('your_file.txt', 'w') as f:
    for item in my_list:
        f.write("%s\n" % item)

在Python 2中,您也可以使用

with open('your_file.txt', 'w') as f:
    for item in my_list:
        print >> f, item

如果您热衷于单个函数调用,请至少移除方括号[],以使要打印的字符串一次生成一个(一个genexp而不是一个listcomp)-没有理由占用所有需要的内存具体化整个字符串列表。

You can use a loop:

with open('your_file.txt', 'w') as f:
    for item in my_list:
        f.write("%s\n" % item)

In Python 2, you can also use

with open('your_file.txt', 'w') as f:
    for item in my_list:
        print >> f, item

If you’re keen on a single function call, at least remove the square brackets [], so that the strings to be printed get made one at a time (a genexp rather than a listcomp) — no reason to take up all the memory required to materialize the whole list of strings.


回答 1

您将如何处理该文件?该文件是否存在于人类或具有明确互操作性要求的其他程序中?

如果您只是尝试将列表序列化到磁盘以供同一python应用程序稍后使用,则应该对列表进行腌制

import pickle

with open('outfile', 'wb') as fp:
    pickle.dump(itemlist, fp)

读回:

with open ('outfile', 'rb') as fp:
    itemlist = pickle.load(fp)

What are you going to do with the file? Does this file exist for humans, or other programs with clear interoperability requirements?

If you are just trying to serialize a list to disk for later use by the same python app, you should be pickleing the list.

import pickle

with open('outfile', 'wb') as fp:
    pickle.dump(itemlist, fp)

To read it back:

with open ('outfile', 'rb') as fp:
    itemlist = pickle.load(fp)

回答 2

比较简单的方法是:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(itemlist))

您可以使用生成器表达式来确保项目列表中的所有项目都是字符串:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(str(item) for item in itemlist))

请记住,所有itemlist列表都必须在内存中,因此,请注意内存消耗。

The simpler way is:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(itemlist))

You could ensure that all items in item list are strings using a generator expression:

with open("outfile", "w") as outfile:
    outfile.write("\n".join(str(item) for item in itemlist))

Remember that all itemlist list need to be in memory, so, take care about the memory consumption.


回答 3

使用Python 3Python 2.6+语法:

with open(filepath, 'w') as file_handler:
    for item in the_list:
        file_handler.write("{}\n".format(item))

这是与平台无关的。它还以换行符结束最后一行,这是UNIX的最佳实践

从Python 3.6开始,"{}\n".format(item)可以用f字符串替换:f"{item}\n"

Using Python 3 and Python 2.6+ syntax:

with open(filepath, 'w') as file_handler:
    for item in the_list:
        file_handler.write("{}\n".format(item))

This is platform-independent. It also terminates the final line with a newline character, which is a UNIX best practice.

Starting with Python 3.6, "{}\n".format(item) can be replaced with an f-string: f"{item}\n".


回答 4

还有另一种方式。使用simplejson(在python 2.6中包含为json)序列化为json

>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()

如果您检查output.txt:

[1、2、3、4]

这很有用,因为语法是pythonic的,它是人类可读的,并且可以由其他语言的其他程序读取。

Yet another way. Serialize to json using simplejson (included as json in python 2.6):

>>> import simplejson
>>> f = open('output.txt', 'w')
>>> simplejson.dump([1,2,3,4], f)
>>> f.close()

If you examine output.txt:

[1, 2, 3, 4]

This is useful because the syntax is pythonic, it’s human readable, and it can be read by other programs in other languages.


回答 5

我认为探索使用genexp的好处会很有趣,所以这是我的看法。

问题中的示例使用方括号创建临时列表,因此等效于:

file.writelines( list( "%s\n" % item for item in list ) )

它不必要地构造了将要写出的所有行的临时列表,这可能会消耗大量内存,具体取决于列表的大小以及输出的详细str(item)程度。

放下方括号(相当于删除list()上面的包装调用)将改为将临时生成器传递给file.writelines()

file.writelines( "%s\n" % item for item in list )

该生成器将item按需创建换行终止的对象表示形式(即,当对象被写出时)。这样做有几个方面的好处:

  • 内存开销很小,即使列表很大
  • 如果str(item)速度较慢,则在处理每个项目时文件中都有可见的进度

这样可以避免出现内存问题,例如:

In [1]: import os

In [2]: f = file(os.devnull, "w")

In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop

In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
...
MemoryError

(通过,我通过将Python的最大虚拟内存限制为〜100MB触发了此错误ulimit -v 102400)。

一方面,此方法实际上并没有比原始方法快:

In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop

In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop

(Linux上的Python 2.6.2)

I thought it would be interesting to explore the benefits of using a genexp, so here’s my take.

The example in the question uses square brackets to create a temporary list, and so is equivalent to:

file.writelines( list( "%s\n" % item for item in list ) )

Which needlessly constructs a temporary list of all the lines that will be written out, this may consume significant amounts of memory depending on the size of your list and how verbose the output of str(item) is.

Drop the square brackets (equivalent to removing the wrapping list() call above) will instead pass a temporary generator to file.writelines():

file.writelines( "%s\n" % item for item in list )

This generator will create newline-terminated representation of your item objects on-demand (i.e. as they are written out). This is nice for a couple of reasons:

  • Memory overheads are small, even for very large lists
  • If str(item) is slow there’s visible progress in the file as each item is processed

This avoids memory issues, such as:

In [1]: import os

In [2]: f = file(os.devnull, "w")

In [3]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 385 ms per loop

In [4]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
ERROR: Internal Python error in the inspect module.
Below is the traceback from this internal error.

Traceback (most recent call last):
...
MemoryError

(I triggered this error by limiting Python’s max. virtual memory to ~100MB with ulimit -v 102400).

Putting memory usage to one side, this method isn’t actually any faster than the original:

In [4]: %timeit f.writelines( "%s\n" % item for item in xrange(2**20) )
1 loops, best of 3: 370 ms per loop

In [5]: %timeit f.writelines( ["%s\n" % item for item in xrange(2**20)] )
1 loops, best of 3: 360 ms per loop

(Python 2.6.2 on Linux)


回答 6

因为我很懒…

import json
a = [1,2,3]
with open('test.txt', 'w') as f:
    f.write(json.dumps(a))

#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
    a = json.loads(f.read())

Because i’m lazy….

import json
a = [1,2,3]
with open('test.txt', 'w') as f:
    f.write(json.dumps(a))

#Now read the file back into a Python list object
with open('test.txt', 'r') as f:
    a = json.loads(f.read())

回答 7

将列表序列化为带有逗号分隔值的文本文件

mylist = dir()
with open('filename.txt','w') as f:
    f.write( ','.join( mylist ) )

Serialize list into text file with comma sepparated value

mylist = dir()
with open('filename.txt','w') as f:
    f.write( ','.join( mylist ) )

回答 8

一般来说

以下是writelines()方法的语法

fileObject.writelines( sequence )

#!/usr/bin/python

# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]

# Write sequence of lines at the end of the file.
line = fo.writelines( seq )

# Close opend file
fo.close()

参考

http://www.tutorialspoint.com/python/file_writelines.htm

In General

Following is the syntax for writelines() method

fileObject.writelines( sequence )

Example

#!/usr/bin/python

# Open a file
fo = open("foo.txt", "rw+")
seq = ["This is 6th line\n", "This is 7th line"]

# Write sequence of lines at the end of the file.
line = fo.writelines( seq )

# Close opend file
fo.close()

Reference

http://www.tutorialspoint.com/python/file_writelines.htm


回答 9

file.write('\n'.join(list))
file.write('\n'.join(list))

回答 10

with open ("test.txt","w")as fp:
   for line in list12:
       fp.write(line+"\n")
with open ("test.txt","w")as fp:
   for line in list12:
       fp.write(line+"\n")

回答 11

如果您使用的是python3,则还可以使用print函数,如下所示。

f = open("myfile.txt","wb")
print(mylist, file=f)

You can also use the print function if you’re on python3 as follows.

f = open("myfile.txt","wb")
print(mylist, file=f)

回答 12

你为什么不尝试

file.write(str(list))

Why don’t you try

file.write(str(list))

回答 13

此逻辑将首先将list中的项目转换为string(str)。有时列表中包含一个元组,例如

alist = [(i12,tiger), 
(113,lion)]

此逻辑将在新行中写入文件每个元组。我们稍后可以eval在读取文件时加载每个元组时使用:

outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence:    # iterate over the list items
   outfile.write(str(item) + '\n') # write to the file
outfile.close()   # close the file 

This logic will first convert the items in list to string(str). Sometimes the list contains a tuple like

alist = [(i12,tiger), 
(113,lion)]

This logic will write to file each tuple in a new line. We can later use eval while loading each tuple when reading the file:

outfile = open('outfile.txt', 'w') # open a file in write mode
for item in list_to_persistence:    # iterate over the list items
   outfile.write(str(item) + '\n') # write to the file
outfile.close()   # close the file 

回答 14

迭代和添加换行符的另一种方法:

for item in items:
    filewriter.write(f"{item}" + "\n")

Another way of iterating and adding newline:

for item in items:
    filewriter.write(f"{item}" + "\n")

回答 15

在python> 3中,您可以将print*用于参数解包:

with open("fout.txt", "w") as fout:
    print(*my_list, sep="\n", file=fout)

In python>3 you can use print and * for argument unpacking:

with open("fout.txt", "w") as fout:
    print(*my_list, sep="\n", file=fout)

回答 16

Python3中,您可以使用此循环

with open('your_file.txt', 'w') as f:
    for item in list:
        f.print("", item)

In Python3 You Can use this loop

with open('your_file.txt', 'w') as f:
    for item in list:
        f.print("", item)

回答 17

让avg作为列表,然后:

In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')

您可以根据需要使用%e%s

Let avg be the list, then:

In [29]: a = n.array((avg))
In [31]: a.tofile('avgpoints.dat',sep='\n',dtype = '%f')

You can use %e or %s depending on your requirement.


回答 18

poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file

工作原理:首先,使用内置的打开功能打开文件,并指定文件名称和我们要打开文件的方式。该模式可以是读取模式(’r’),写入模式(’w’)或追加模式(’a’)。我们还可以指定是以文本模式(’t’)还是二进制模式(’b’)阅读,书写或追加内容。实际上,还有更多可用的模式,help(open)将为您提供有关它们的更多详细信息。默认情况下,open()将文件视为“ t”扩展文件,并以“ r’ead”模式将其打开。在我们的示例中,我们首先以写文本模式打开文件,然后使用文件对象的write方法写入文件,然后最终关闭文件。

上面的示例来自Swaroop C H. swaroopch.com 的书“ A Byte of Python”。

poem = '''\
Programming is fun
When the work is done
if you wanna make your work also fun:
use Python!
'''
f = open('poem.txt', 'w') # open for 'w'riting
f.write(poem) # write text to file
f.close() # close the file

How It Works: First, open a file by using the built-in open function and specifying the name of the file and the mode in which we want to open the file. The mode can be a read mode (’r’), write mode (’w’) or append mode (’a’). We can also specify whether we are reading, writing, or appending in text mode (’t’) or binary mode (’b’). There are actually many more modes available and help(open) will give you more details about them. By default, open() considers the file to be a ’t’ext file and opens it in ’r’ead mode. In our example, we first open the file in write text mode and use the write method of the file object to write to the file and then we finally close the file.

The above example is from the book “A Byte of Python” by Swaroop C H. swaroopch.com


如何按给定索引处的元素对列表/元组的列表/元组进行排序?

问题:如何按给定索引处的元素对列表/元组的列表/元组进行排序?

我在列表列表或元组列表中都有一些数据,如下所示:

data = [[1,2,3], [4,5,6], [7,8,9]]
data = [(1,2,3), (4,5,6), (7,8,9)]

我想按子集中的第二个元素排序。这意味着,由2,5,8,其中排序2(1,2,3)5是从(4,5,6)。常见的做法是什么?我应该将元组或列表存储在列表中吗?

I have some data either in a list of lists or a list of tuples, like this:

data = [[1,2,3], [4,5,6], [7,8,9]]
data = [(1,2,3), (4,5,6), (7,8,9)]

And I want to sort by the 2nd element in the subset. Meaning, sorting by 2,5,8 where 2 is from (1,2,3), 5 is from (4,5,6). What is the common way to do this? Should I store tuples or lists in my list?


回答 0

sorted_by_second = sorted(data, key=lambda tup: tup[1])

要么:

data.sort(key=lambda tup: tup[1])  # sorts in place
sorted_by_second = sorted(data, key=lambda tup: tup[1])

or:

data.sort(key=lambda tup: tup[1])  # sorts in place

回答 1

from operator import itemgetter
data.sort(key=itemgetter(1))
from operator import itemgetter
data.sort(key=itemgetter(1))

回答 2

如果您想将数组从高到低排序,我只想添加到Stephen的答案中,除了上面的注释中的另一种方法就是将其添加到行中:

reverse = True

结果将如下所示:

data.sort(key=lambda tup: tup[1], reverse=True)

I just want to add to Stephen’s answer if you want to sort the array from high to low, another way other than in the comments above is just to add this to the line:

reverse = True

and the result will be as follows:

data.sort(key=lambda tup: tup[1], reverse=True)

回答 3

为了按照多个条件进行排序,例如按元组中的第二个和第三个元素进行排序,

data = [(1,2,3),(1,2,1),(1,1,4)]

并定义一个lambda来返回描述优先级的元组,例如

sorted(data, key=lambda tup: (tup[1],tup[2]) )
[(1, 1, 4), (1, 2, 1), (1, 2, 3)]

For sorting by multiple criteria, namely for instance by the second and third elements in a tuple, let

data = [(1,2,3),(1,2,1),(1,1,4)]

and so define a lambda that returns a tuple that describes priority, for instance

sorted(data, key=lambda tup: (tup[1],tup[2]) )
[(1, 1, 4), (1, 2, 1), (1, 2, 3)]

回答 4

斯蒂芬的答案就是我会用的答案。为了完整起见,这是带有列表推导的DSU(装饰-排序-取消装饰)模式:

decorated = [(tup[1], tup) for tup in data]
decorated.sort()
undecorated = [tup for second, tup in decorated]

或者,更简洁地说:

[b for a,b in sorted((tup[1], tup) for tup in data)]

Python Sorting HowTo中所述,自Python 2.4启用关键功能以来,就没有必要这样做

Stephen’s answer is the one I’d use. For completeness, here’s the DSU (decorate-sort-undecorate) pattern with list comprehensions:

decorated = [(tup[1], tup) for tup in data]
decorated.sort()
undecorated = [tup for second, tup in decorated]

Or, more tersely:

[b for a,b in sorted((tup[1], tup) for tup in data)]

As noted in the Python Sorting HowTo, this has been unnecessary since Python 2.4, when key functions became available.


回答 5

为了对元组列表进行排序(<word>, <count>),以count降序和word字母顺序:

data = [
('betty', 1),
('bought', 1),
('a', 1),
('bit', 1),
('of', 1),
('butter', 2),
('but', 1),
('the', 1),
('was', 1),
('bitter', 1)]

我使用这种方法:

sorted(data, key=lambda tup:(-tup[1], tup[0]))

它给了我结果:

[('butter', 2),
('a', 1),
('betty', 1),
('bit', 1),
('bitter', 1),
('bought', 1),
('but', 1),
('of', 1),
('the', 1),
('was', 1)]

In order to sort a list of tuples (<word>, <count>), for count in descending order and word in alphabetical order:

data = [
('betty', 1),
('bought', 1),
('a', 1),
('bit', 1),
('of', 1),
('butter', 2),
('but', 1),
('the', 1),
('was', 1),
('bitter', 1)]

I use this method:

sorted(data, key=lambda tup:(-tup[1], tup[0]))

and it gives me the result:

[('butter', 2),
('a', 1),
('betty', 1),
('bit', 1),
('bitter', 1),
('bought', 1),
('but', 1),
('of', 1),
('the', 1),
('was', 1)]

回答 6

没有lambda:

def sec_elem(s):
    return s[1]

sorted(data, key=sec_elem)

Without lambda:

def sec_elem(s):
    return s[1]

sorted(data, key=sec_elem)

回答 7

itemgetter() 比…快一点 lambda tup: tup[1],但增长幅度相对较小(大约10%到25%)。

(IPython会话)

>>> from operator import itemgetter
>>> from numpy.random import randint
>>> values = randint(0, 9, 30000).reshape((10000,3))
>>> tpls = [tuple(values[i,:]) for i in range(len(values))]

>>> tpls[:5]    # display sample from list
[(1, 0, 0), 
 (8, 5, 5), 
 (5, 4, 0), 
 (5, 7, 7), 
 (4, 2, 1)]

>>> sorted(tpls[:5], key=itemgetter(1))    # example sort
[(1, 0, 0), 
 (4, 2, 1), 
 (5, 4, 0), 
 (8, 5, 5), 
 (5, 7, 7)]

>>> %timeit sorted(tpls, key=itemgetter(1))
100 loops, best of 3: 4.89 ms per loop

>>> %timeit sorted(tpls, key=lambda tup: tup[1])
100 loops, best of 3: 6.39 ms per loop

>>> %timeit sorted(tpls, key=(itemgetter(1,0)))
100 loops, best of 3: 16.1 ms per loop

>>> %timeit sorted(tpls, key=lambda tup: (tup[1], tup[0]))
100 loops, best of 3: 17.1 ms per loop

itemgetter() is somewhat faster than lambda tup: tup[1], but the increase is relatively modest (around 10 to 25 percent).

(IPython session)

>>> from operator import itemgetter
>>> from numpy.random import randint
>>> values = randint(0, 9, 30000).reshape((10000,3))
>>> tpls = [tuple(values[i,:]) for i in range(len(values))]

>>> tpls[:5]    # display sample from list
[(1, 0, 0), 
 (8, 5, 5), 
 (5, 4, 0), 
 (5, 7, 7), 
 (4, 2, 1)]

>>> sorted(tpls[:5], key=itemgetter(1))    # example sort
[(1, 0, 0), 
 (4, 2, 1), 
 (5, 4, 0), 
 (8, 5, 5), 
 (5, 7, 7)]

>>> %timeit sorted(tpls, key=itemgetter(1))
100 loops, best of 3: 4.89 ms per loop

>>> %timeit sorted(tpls, key=lambda tup: tup[1])
100 loops, best of 3: 6.39 ms per loop

>>> %timeit sorted(tpls, key=(itemgetter(1,0)))
100 loops, best of 3: 16.1 ms per loop

>>> %timeit sorted(tpls, key=lambda tup: (tup[1], tup[0]))
100 loops, best of 3: 17.1 ms per loop

回答 8

@Stephen的答案很关键!这是一个更好的可视化示例,

为Ready Player One粉丝大喊大叫!=)

>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]

key是一个函数,将调用该函数来转换集合的项目以进行比较compareTo

传递给key的参数必须是可调用的。在这里,使用lambdacreate创建一个匿名函数(可调用)。
lambda的语法是单词lambda,后跟一个可迭代的名称,然后是单个代码块。

在下面的示例中,我们正在对元组列表进行排序,该元组列表包含某些事件和演员名称的信息记录时间。

我们按照事件发生的时间对该列表进行排序-这是元组的第0个元素。

注意- s.sort([cmp[, key[, reverse]]]) 将s的项目排序到位

@Stephen ‘s answer is to the point! Here is an example for better visualization,

Shout out for the Ready Player One fans! =)

>>> gunters = [('2044-04-05', 'parzival'), ('2044-04-07', 'aech'), ('2044-04-06', 'art3mis')]
>>> gunters.sort(key=lambda tup: tup[0])
>>> print gunters
[('2044-04-05', 'parzival'), ('2044-04-06', 'art3mis'), ('2044-04-07', 'aech')]

key is a function that will be called to transform the collection’s items for comparison.. like compareTo method in Java.

The parameter passed to key must be something that is callable. Here, the use of lambda creates an anonymous function (which is a callable).
The syntax of lambda is the word lambda followed by a iterable name then a single block of code.

Below example, we are sorting a list of tuple that holds the info abt time of certain event and actor name.

We are sorting this list by time of event occurrence – which is the 0th element of a tuple.

Note – s.sort([cmp[, key[, reverse]]]) sorts the items of s in place


回答 9

对元组进行排序非常简单:

tuple(sorted(t))

Sorting a tuple is quite simple:

tuple(sorted(t))

列表更改列表意外地反映在子列表中

问题:列表更改列表意外地反映在子列表中

我需要在Python中创建列表列表,因此输入了以下内容:

myList = [[1] * 4] * 3

该列表如下所示:

[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]  

然后,我更改了最内在的值之一:

myList[0][0] = 5

现在我的列表如下所示:

[[5, 1, 1, 1], [5, 1, 1, 1], [5, 1, 1, 1]]  

这不是我想要或期望的。有人可以解释发生了什么,以及如何解决吗?

I needed to create a list of lists in Python, so I typed the following:

myList = [[1] * 4] * 3

The list looked like this:

[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]  

Then I changed one of the innermost values:

myList[0][0] = 5

Now my list looks like this:

[[5, 1, 1, 1], [5, 1, 1, 1], [5, 1, 1, 1]]  

which is not what I wanted or expected. Can someone please explain what’s going on, and how to get around it?


回答 0

当您编写时,[x]*3您基本上得到了list [x, x, x]。也就是说,具有3个对same的引用的列表x。然后,当您修改此单曲时x,通过对它的所有三个引用可以看到它:

x = [1] * 4
l = [x] * 3
print(f"id(x): {id(x)}")
# id(x): 140560897920048
print(
    f"id(l[0]): {id(l[0])}\n"
    f"id(l[1]): {id(l[1])}\n"
    f"id(l[2]): {id(l[2])}"
)
# id(l[0]): 140560897920048
# id(l[1]): 140560897920048
# id(l[2]): 140560897920048

x[0] = 42
print(f"x: {x}")
# x: [42, 1, 1, 1]
print(f"l: {l}")
# l: [[42, 1, 1, 1], [42, 1, 1, 1], [42, 1, 1, 1]]

要解决此问题,您需要确保在每个位置都创建一个新列表。一种方法是

[[1]*4 for _ in range(3)]

它将重新评估[1]*4每次而不是一次评估并对1个列表进行3次引用。


您可能想知道为什么*不能像列表理解那样创建独立的对象。这是因为乘法运算符*对对象进行操作,而没有看到表达式。当您使用*乘以[[1] * 4]3时,*只会看到1元素列表的[[1] * 4]计算结果,而不是[[1] * 4表达式文本。*不知道如何制作该元素的副本,不知道如何重新评估[[1] * 4],甚至不想要复制,而且一般来说,甚至没有办法复制该元素。

唯一的选择*是对现有子列表进行新引用,而不是尝试创建新子列表。其他所有内容将不一致或需要对基础语言设计决策进行重大重新设计。

相反,列表推导会在每次迭代时重新评估元素表达式。[[1] * 4 for n in range(3)]重新评估[1] * 4出于同样的原因,每次[x**2 for x in range(3)]重新评估x**2每一次。的每次评估都会[1] * 4生成一个新列表,因此列表理解功能可以满足您的需求。

顺便说一句,[1] * 4也不会复制的元素[1],但这并不重要,因为整数是不可变的。您不能做类似的事情1.value = 2,将1变成2。

When you write [x]*3 you get, essentially, the list [x, x, x]. That is, a list with 3 references to the same x. When you then modify this single x it is visible via all three references to it:

x = [1] * 4
l = [x] * 3
print(f"id(x): {id(x)}")
# id(x): 140560897920048
print(
    f"id(l[0]): {id(l[0])}\n"
    f"id(l[1]): {id(l[1])}\n"
    f"id(l[2]): {id(l[2])}"
)
# id(l[0]): 140560897920048
# id(l[1]): 140560897920048
# id(l[2]): 140560897920048

x[0] = 42
print(f"x: {x}")
# x: [42, 1, 1, 1]
print(f"l: {l}")
# l: [[42, 1, 1, 1], [42, 1, 1, 1], [42, 1, 1, 1]]

To fix it, you need to make sure that you create a new list at each position. One way to do it is

[[1]*4 for _ in range(3)]

which will reevaluate [1]*4 each time instead of evaluating it once and making 3 references to 1 list.


You might wonder why * can’t make independent objects the way the list comprehension does. That’s because the multiplication operator * operates on objects, without seeing expressions. When you use * to multiply [[1] * 4] by 3, * only sees the 1-element list [[1] * 4] evaluates to, not the [[1] * 4 expression text. * has no idea how to make copies of that element, no idea how to reevaluate [[1] * 4], and no idea you even want copies, and in general, there might not even be a way to copy the element.

The only option * has is to make new references to the existing sublist instead of trying to make new sublists. Anything else would be inconsistent or require major redesigning of fundamental language design decisions.

In contrast, a list comprehension reevaluates the element expression on every iteration. [[1] * 4 for n in range(3)] reevaluates [1] * 4 every time for the same reason [x**2 for x in range(3)] reevaluates x**2 every time. Every evaluation of [1] * 4 generates a new list, so the list comprehension does what you wanted.

Incidentally, [1] * 4 also doesn’t copy the elements of [1], but that doesn’t matter, since integers are immutable. You can’t do something like 1.value = 2 and turn a 1 into a 2.


回答 1

size = 3
matrix_surprise = [[0] * size] * size
matrix = [[0]*size for i in range(size)]

框架和物体

实时Python导师可视化

size = 3
matrix_surprise = [[0] * size] * size
matrix = [[0]*size for i in range(size)]

Frames and Objects

Live Python Tutor Visualize


回答 2

实际上,这正是您所期望的。让我们分解一下这里发生的事情:

你写

lst = [[1] * 4] * 3

这等效于:

lst1 = [1]*4
lst = [lst1]*3

这意味着lst一个包含3个元素的列表lst1。这意味着以下两行是等效的:

lst[0][0] = 5
lst1[0] = 5

至于lst[0]是什么,但lst1

要获得所需的行为,可以使用列表理解:

lst = [ [1]*4 for n in range(3) ] #python 3
lst = [ [1]*4 for n in xrange(3) ] #python 2

在这种情况下,将对每个n重新计算表达式,从而得出不同的列表。

Actually, this is exactly what you would expect. Let’s decompose what is happening here:

You write

lst = [[1] * 4] * 3

This is equivalent to:

lst1 = [1]*4
lst = [lst1]*3

This means lst is a list with 3 elements all pointing to lst1. This means the two following lines are equivalent:

lst[0][0] = 5
lst1[0] = 5

As lst[0] is nothing but lst1.

To obtain the desired behavior, you can use list comprehension:

lst = [ [1]*4 for n in range(3) ] #python 3
lst = [ [1]*4 for n in xrange(3) ] #python 2

In this case, the expression is re-evaluated for each n, leading to a different list.


回答 3

[[1] * 4] * 3

甚至:

[[1, 1, 1, 1]] * 3

创建一个引用内部[1,1,1,1]3次的列表-而不是内部列表的3个副本,因此,每次修改列表(在任何位置)时,都会看到3次更改。

与此示例相同:

>>> inner = [1,1,1,1]
>>> outer = [inner]*3
>>> outer
[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
>>> inner[0] = 5
>>> outer
[[5, 1, 1, 1], [5, 1, 1, 1], [5, 1, 1, 1]]

可能不那么令人惊讶。

[[1] * 4] * 3

or even:

[[1, 1, 1, 1]] * 3

Creates a list that references the internal [1,1,1,1] 3 times – not three copies of the inner list, so any time you modify the list (in any position), you’ll see the change three times.

It’s the same as this example:

>>> inner = [1,1,1,1]
>>> outer = [inner]*3
>>> outer
[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
>>> inner[0] = 5
>>> outer
[[5, 1, 1, 1], [5, 1, 1, 1], [5, 1, 1, 1]]

where it’s probably a little less surprising.


回答 4

除了可以正确解释问题的公认答案之外,在列表理解范围内,如果您使用的是python-2.x,则使用return xrange()可以返回更高效的生成器(range()在python 3中执行相同的工作),_而不是throwaway变量n

[[1]*4 for _ in xrange(3)]      # and in python3 [[1]*4 for _ in range(3)]

另外,作为一种Python方式,您可以itertools.repeat()用来创建重复元素的迭代器对象:

>>> a=list(repeat(1,4))
[1, 1, 1, 1]
>>> a[0]=5
>>> a
[5, 1, 1, 1]

PS使用numpy的,如果你只是想创建1或0,你可以使用数组np.onesnp.zeros和/或其他使用次数np.repeat()

In [1]: import numpy as np

In [2]: 

In [2]: np.ones(4)
Out[2]: array([ 1.,  1.,  1.,  1.])

In [3]: np.ones((4, 2))
Out[3]: 
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [4]: np.zeros((4, 2))
Out[4]: 
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

In [5]: np.repeat([7], 10)
Out[5]: array([7, 7, 7, 7, 7, 7, 7, 7, 7, 7])

Alongside the accepted answer that explained the problem correctly, within your list comprehension, if You are using python-2.x use xrange() that returns a generator which is more efficient (range() in python 3 does the same job) _ instead of the throwaway variable n:

[[1]*4 for _ in xrange(3)]      # and in python3 [[1]*4 for _ in range(3)]

Also, as a much more Pythonic way you can use itertools.repeat() to create an iterator object of repeated elements :

>>> a=list(repeat(1,4))
[1, 1, 1, 1]
>>> a[0]=5
>>> a
[5, 1, 1, 1]

P.S. Using numpy, if you only want to create an array of ones or zeroes you can use np.ones and np.zeros and/or for other number use np.repeat():

In [1]: import numpy as np

In [2]: 

In [2]: np.ones(4)
Out[2]: array([ 1.,  1.,  1.,  1.])

In [3]: np.ones((4, 2))
Out[3]: 
array([[ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.],
       [ 1.,  1.]])

In [4]: np.zeros((4, 2))
Out[4]: 
array([[ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.],
       [ 0.,  0.]])

In [5]: np.repeat([7], 10)
Out[5]: array([7, 7, 7, 7, 7, 7, 7, 7, 7, 7])

回答 5

Python容器包含对其他对象的引用。请参阅以下示例:

>>> a = []
>>> b = [a]
>>> b
[[]]
>>> a.append(1)
>>> b
[[1]]

在此b列表中包含一个项目,该项目是对list的引用a。该列表a是可变的。

将列表乘以整数等于将列表多次添加到自身(请参阅常见序列操作)。因此,继续下面的示例:

>>> c = b + b
>>> c
[[1], [1]]
>>>
>>> a[0] = 2
>>> c
[[2], [2]]

我们可以看到列表c现在包含两个对list的引用,a它们等效于c = b * 2

Python FAQ也包含此行为的解释:如何创建多维列表?

Python containers contain references to other objects. See this example:

>>> a = []
>>> b = [a]
>>> b
[[]]
>>> a.append(1)
>>> b
[[1]]

In this b is a list that contains one item that is a reference to list a. The list a is mutable.

The multiplication of a list by an integer is equivalent to adding the list to itself multiple times (see common sequence operations). So continuing with the example:

>>> c = b + b
>>> c
[[1], [1]]
>>>
>>> a[0] = 2
>>> c
[[2], [2]]

We can see that the list c now contains two references to list a which is equivalent to c = b * 2.

Python FAQ also contains explanation of this behavior: How do I create a multidimensional list?


回答 6

myList = [[1]*4] * 3[1,1,1,1]在内存中创建一个列表对象,然后将其引用复制3次。这等效于obj = [1,1,1,1]; myList = [obj]*3。列表中引用的obj任何地方,对的任何修改都会在三个地方反映出来obj。正确的声明是:

myList = [[1]*4 for _ in range(3)]

要么

myList = [[1 for __ in range(4)] for _ in range(3)]

这里要注意的重要一点是,*运算符通常用于创建文字列表。虽然1是一成不变的,obj =[1]*4但仍将创建1重复4遍以上的列表[1,1,1,1]。但是,如果对不可变对象进行了任何引用,则该对象将被新对象覆盖。

这意味着,如果我们这样做obj[1]=42,那么obj它将变得[1,42,1,1] 不像 [42,42,42,42]某些人想象的那样。这也可以验证:

>>> myList = [1]*4
>>> myList
[1, 1, 1, 1]

>>> id(myList[0])
4522139440
>>> id(myList[1]) # Same as myList[0]
4522139440

>>> myList[1] = 42 # Since myList[1] is immutable, this operation overwrites myList[1] with a new object changing its id.
>>> myList
[1, 42, 1, 1]

>>> id(myList[0])
4522139440
>>> id(myList[1]) # id changed
4522140752
>>> id(myList[2]) # id still same as myList[0], still referring to value `1`.
4522139440

myList = [[1]*4] * 3 creates one list object [1,1,1,1] in memory and copies its reference 3 times over. This is equivalent to obj = [1,1,1,1]; myList = [obj]*3. Any modification to obj will be reflected at three places, wherever obj is referenced in the list. The right statement would be:

myList = [[1]*4 for _ in range(3)]

or

myList = [[1 for __ in range(4)] for _ in range(3)]

Important thing to note here is that * operator is mostly used to create a list of literals. Although 1 is immutable, obj =[1]*4 will still create a list of 1 repeated 4 times over to form [1,1,1,1]. But if any reference to an immutable object is made, the object is overwritten with a new one.

This means if we do obj[1]=42, then obj will become [1,42,1,1] not [42,42,42,42] as some may assume. This can also be verified:

>>> myList = [1]*4
>>> myList
[1, 1, 1, 1]

>>> id(myList[0])
4522139440
>>> id(myList[1]) # Same as myList[0]
4522139440

>>> myList[1] = 42 # Since myList[1] is immutable, this operation overwrites myList[1] with a new object changing its id.
>>> myList
[1, 42, 1, 1]

>>> id(myList[0])
4522139440
>>> id(myList[1]) # id changed
4522140752
>>> id(myList[2]) # id still same as myList[0], still referring to value `1`.
4522139440

回答 7

简而言之,这是因为在python中,所有内容都可以通过引用来工作,因此,当您以这种方式创建列表时,基本上就可以解决此类问题。

要解决您的问题,您可以执行以下任一操作:1.将numpy数组文档用于numpy.empty。2 .将列表追加到列表中。3.您也可以使用字典

In simple words this is happening because in python everything works by reference, so when you create a list of list that way you basically end up with such problems.

To solve your issue you can do either one of them: 1. Use numpy array documentation for numpy.empty 2. Append the list as you get to a list. 3. You can also use dictionary if you want


回答 8

让我们以以下方式重写您的代码:

x = 1
y = [x]
z = y * 4

myList = [z] * 3

然后,运行以下代码使所有内容更加清晰。该代码所做的基本上是打印id获得的对象的,

返回对象的“身份”

并将帮助我们识别它们并分析发生的情况:

print("myList:")
for i, subList in enumerate(myList):
    print("\t[{}]: {}".format(i, id(subList)))
    for j, elem in enumerate(subList):
        print("\t\t[{}]: {}".format(j, id(elem)))

您将获得以下输出:

x: 1
y: [1]
z: [1, 1, 1, 1]
myList:
    [0]: 4300763792
        [0]: 4298171528
        [1]: 4298171528
        [2]: 4298171528
        [3]: 4298171528
    [1]: 4300763792
        [0]: 4298171528
        [1]: 4298171528
        [2]: 4298171528
        [3]: 4298171528
    [2]: 4300763792
        [0]: 4298171528
        [1]: 4298171528
        [2]: 4298171528
        [3]: 4298171528

现在让我们逐步进行。你有x哪些是1和一个单一的元素列表y包含x。第一步是y * 4将获得一个z基本上为的新列表,[x, x, x, x]即创建一个包含4个元素的新列表,这些元素是对初始x对象的引用。净步骤非常相似。基本上z * 3,您要做的是[[x, x, x, x]] * 3和返回[[x, x, x, x], [x, x, x, x], [x, x, x, x]],其原因与第一步相同。

Let us rewrite your code in the following way:

x = 1
y = [x]
z = y * 4

myList = [z] * 3

Then having this, run the following code to make everything more clear. What the code does is basically print the ids of the obtained objects, which

Return the “identity” of an object

and will help us identify them and analyse what happens:

print("myList:")
for i, subList in enumerate(myList):
    print("\t[{}]: {}".format(i, id(subList)))
    for j, elem in enumerate(subList):
        print("\t\t[{}]: {}".format(j, id(elem)))

And you will get the following output:

x: 1
y: [1]
z: [1, 1, 1, 1]
myList:
    [0]: 4300763792
        [0]: 4298171528
        [1]: 4298171528
        [2]: 4298171528
        [3]: 4298171528
    [1]: 4300763792
        [0]: 4298171528
        [1]: 4298171528
        [2]: 4298171528
        [3]: 4298171528
    [2]: 4300763792
        [0]: 4298171528
        [1]: 4298171528
        [2]: 4298171528
        [3]: 4298171528

So now let us go step-by-step. You have x which is 1, and a single element list y containing x. Your first step is y * 4 which will get you a new list z, which is basically [x, x, x, x], i.e. it creates a new list which will have 4 elements, which are references to the initial x object. The net step is pretty similar. You basically do z * 3, which is [[x, x, x, x]] * 3 and returns [[x, x, x, x], [x, x, x, x], [x, x, x, x]], for the same reason as for the first step.


回答 9

我想每个人都解释发生了什么。我建议一种解决方法:

myList = [[1 for i in range(4)] for j in range(3)]

myList[0][0] = 5

print myList

然后您有:

[[5, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]

I guess everybody explain what is happening. I suggest one way to solve it:

myList = [[1 for i in range(4)] for j in range(3)]

myList[0][0] = 5

print myList

And then you have:

[[5, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]

回答 10

试图以更具描述性的方式进行解释,

操作一:

x = [[0, 0], [0, 0]]
print(type(x)) # <class 'list'>
print(x) # [[0, 0], [0, 0]]

x[0][0] = 1
print(x) # [[1, 0], [0, 0]]

操作2:

y = [[0] * 2] * 2
print(type(y)) # <class 'list'>
print(y) # [[0, 0], [0, 0]]

y[0][0] = 1
print(y) # [[1, 0], [1, 0]]

注意为什么不修改第一个列表的第一个元素而不修改每个列表的第二个元素?那是因为[0] * 2实际上是两个数字的列表,并且不能修改对0的引用。

如果要创建克隆副本,请尝试操作3:

import copy
y = [0] * 2   
print(y)   # [0, 0]

y = [y, copy.deepcopy(y)]  
print(y) # [[0, 0], [0, 0]]

y[0][0] = 1
print(y) # [[1, 0], [0, 0]]

创建克隆副本的另一种有趣方式是操作4:

import copy
y = [0] * 2
print(y) # [0, 0]

y = [copy.deepcopy(y) for num in range(1,5)]
print(y) # [[0, 0], [0, 0], [0, 0], [0, 0]]

y[0][0] = 5
print(y) # [[5, 0], [0, 0], [0, 0], [0, 0]]

Trying to explain it more descriptively,

Operation 1:

x = [[0, 0], [0, 0]]
print(type(x)) # <class 'list'>
print(x) # [[0, 0], [0, 0]]

x[0][0] = 1
print(x) # [[1, 0], [0, 0]]

Operation 2:

y = [[0] * 2] * 2
print(type(y)) # <class 'list'>
print(y) # [[0, 0], [0, 0]]

y[0][0] = 1
print(y) # [[1, 0], [1, 0]]

Noticed why doesn’t modifying the first element of the first list didn’t modify the second element of each list? That’s because [0] * 2 really is a list of two numbers, and a reference to 0 cannot be modified.

If you want to create clone copies, try Operation 3:

import copy
y = [0] * 2   
print(y)   # [0, 0]

y = [y, copy.deepcopy(y)]  
print(y) # [[0, 0], [0, 0]]

y[0][0] = 1
print(y) # [[1, 0], [0, 0]]

another interesting way to create clone copies, Operation 4:

import copy
y = [0] * 2
print(y) # [0, 0]

y = [copy.deepcopy(y) for num in range(1,5)]
print(y) # [[0, 0], [0, 0], [0, 0], [0, 0]]

y[0][0] = 5
print(y) # [[5, 0], [0, 0], [0, 0], [0, 0]]

回答 11

@spelchekr来自Python的列表乘法:[[…]] * 3使得3个列表在修改后会相互镜像,我也有一个相同的问题:“为什么只有外部* 3创建更多引用,而内部* 3却没有创建更多引用为什么不是全1?”

li = [0] * 3
print([id(v) for v in li]) # [140724141863728, 140724141863728, 140724141863728]
li[0] = 1
print([id(v) for v in li]) # [140724141863760, 140724141863728, 140724141863728]
print(id(0)) # 140724141863728
print(id(1)) # 140724141863760
print(li) # [1, 0, 0]

ma = [[0]*3] * 3 # mainly discuss inner & outer *3 here
print([id(li) for li in ma]) # [1987013355080, 1987013355080, 1987013355080]
ma[0][0] = 1
print([id(li) for li in ma]) # [1987013355080, 1987013355080, 1987013355080]
print(ma) # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]

尝试上面的代码后,这是我的解释:

  • 内部*3也创建引用,但是它的引用是不可变的,例如[&0, &0, &0],然后,当要更改时li[0],您不能更改const int的任何基础引用0,因此您只需将引用地址更改为新的引用地址即可&1
  • while ma=[&li, &li, &li]limutable是可变的,因此当您调用时ma[0][0]=1,ma [0] [0]等于to &li[0],因此所有&li实例都将其第一个地址更改为&1

@spelchekr from Python list multiplication: [[…]]*3 makes 3 lists which mirror each other when modified and I had the same question about “Why does only the outer *3 create more references while the inner one doesn’t? Why isn’t it all 1s?”

li = [0] * 3
print([id(v) for v in li]) # [140724141863728, 140724141863728, 140724141863728]
li[0] = 1
print([id(v) for v in li]) # [140724141863760, 140724141863728, 140724141863728]
print(id(0)) # 140724141863728
print(id(1)) # 140724141863760
print(li) # [1, 0, 0]

ma = [[0]*3] * 3 # mainly discuss inner & outer *3 here
print([id(li) for li in ma]) # [1987013355080, 1987013355080, 1987013355080]
ma[0][0] = 1
print([id(li) for li in ma]) # [1987013355080, 1987013355080, 1987013355080]
print(ma) # [[1, 0, 0], [1, 0, 0], [1, 0, 0]]

Here is my explanation after trying the code above:

  • The inner *3 also creates references, but it’s references are immutable, something like [&0, &0, &0], then when to change li[0], you can’t change any underlying reference of const int 0, so you can just change the reference address into the new one &1;
  • while ma=[&li, &li, &li] and li is mutable, so when you call ma[0][0]=1, ma[0][0] is equally to &li[0], so all the &li instances will change its 1st address into &1.

回答 12

通过使用内置列表功能,您可以像这样

a
out:[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
#Displaying the list

a.remove(a[0])
out:[[1, 1, 1, 1], [1, 1, 1, 1]]
# Removed the first element of the list in which you want altered number

a.append([5,1,1,1])
out:[[1, 1, 1, 1], [1, 1, 1, 1], [5, 1, 1, 1]]
# append the element in the list but the appended element as you can see is appended in last but you want that in starting

a.reverse()
out:[[5, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
#So at last reverse the whole list to get the desired list

By using the inbuilt list function you can do like this

a
out:[[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
#Displaying the list

a.remove(a[0])
out:[[1, 1, 1, 1], [1, 1, 1, 1]]
# Removed the first element of the list in which you want altered number

a.append([5,1,1,1])
out:[[1, 1, 1, 1], [1, 1, 1, 1], [5, 1, 1, 1]]
# append the element in the list but the appended element as you can see is appended in last but you want that in starting

a.reverse()
out:[[5, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]]
#So at last reverse the whole list to get the desired list

如何在Python中串联两个列表?

问题:如何在Python中串联两个列表?

如何在Python中串联两个列表?

例:

listone = [1, 2, 3]
listtwo = [4, 5, 6]

预期结果:

>>> joinedlist
[1, 2, 3, 4, 5, 6]

How do I concatenate two lists in Python?

Example:

listone = [1, 2, 3]
listtwo = [4, 5, 6]

Expected outcome:

>>> joinedlist
[1, 2, 3, 4, 5, 6]

回答 0

您可以使用+运算符来组合它们:

listone = [1,2,3]
listtwo = [4,5,6]

joinedlist = listone + listtwo

输出:

>>> joinedlist
[1,2,3,4,5,6]

You can use the + operator to combine them:

listone = [1,2,3]
listtwo = [4,5,6]

joinedlist = listone + listtwo

Output:

>>> joinedlist
[1,2,3,4,5,6]

回答 1

也可以创建一个生成器,使用来简单地遍历两个列表中的项目itertools.chain()。这使您可以将列表(或任何可迭代的)链接在一起进行处理,而无需将项目复制到新列表中:

import itertools
for item in itertools.chain(listone, listtwo):
    # Do something with each list item

It’s also possible to create a generator that simply iterates over the items in both lists using itertools.chain(). This allows you to chain lists (or any iterable) together for processing without copying the items to a new list:

import itertools
for item in itertools.chain(listone, listtwo):
    # Do something with each list item

回答 2

Python >= 3.5替代品:[*l1, *l2]

通过接受已经引入了另一种选择PEP 448,值得一提。

当在Python中使用加星标的表达式时,PEP的标题为Additional Unpacking Generalizations,通常会减少一些语法限制*。使用它,现在还可以使用以下方法来加入两个列表(适用于任何可迭代对象):

>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> joined_list = [*l1, *l2]  # unpack both iterables in a list literal
>>> print(joined_list)
[1, 2, 3, 4, 5, 6]

此功能是为Python定义的,3.5尚未回迁到该3.x系列的先前版本中。在不受支持的版本SyntaxError中,将被提出。

与其他方法一样,这也会在相应列表中创建元素的浅表副本


这种方法的好处是,您实际上不需要列表即可执行它,任何可迭代的操作都可以。如PEP中所述:

这对于将可迭代项求和成一个列表(如my_list + list(my_tuple) + list(my_range)现在等同于just)的可读性更高,也很有用[*my_list, *my_tuple, *my_range]

因此,虽然加上+与会TypeError由于类型不匹配而引发:

l = [1, 2, 3]
r = range(4, 7)
res = l + r

以下内容不会:

res = [*l, *r]

因为它首先将可迭代对象的内容解包,然后list仅从内容中创建一个即可。

Python >= 3.5 alternative: [*l1, *l2]

Another alternative has been introduced via the acceptance of PEP 448 which deserves mentioning.

The PEP, titled Additional Unpacking Generalizations, generally reduced some syntactic restrictions when using the starred * expression in Python; with it, joining two lists (applies to any iterable) can now also be done with:

>>> l1 = [1, 2, 3]
>>> l2 = [4, 5, 6]
>>> joined_list = [*l1, *l2]  # unpack both iterables in a list literal
>>> print(joined_list)
[1, 2, 3, 4, 5, 6]

This functionality was defined for Python 3.5 it hasn’t been backported to previous versions in the 3.x family. In unsupported versions a SyntaxError is going to be raised.

As with the other approaches, this too creates as shallow copy of the elements in the corresponding lists.


The upside to this approach is that you really don’t need lists in order to perform it, anything that is iterable will do. As stated in the PEP:

This is also useful as a more readable way of summing iterables into a list, such as my_list + list(my_tuple) + list(my_range) which is now equivalent to just [*my_list, *my_tuple, *my_range].

So while addition with + would raise a TypeError due to type mismatch:

l = [1, 2, 3]
r = range(4, 7)
res = l + r

The following won’t:

res = [*l, *r]

because it will first unpack the contents of the iterables and then simply create a list from the contents.


回答 3

您可以使用集合来获取唯一值的合并列表

mergedlist = list(set(listone + listtwo))

You can use sets to obtain merged list of unique values

mergedlist = list(set(listone + listtwo))

回答 4

您也可以使用list.extend()方法将a添加list到另一个的结尾:

listone = [1,2,3]
listtwo = [4,5,6]

listone.extend(listtwo)

如果要保持原始列表完整无缺,则可以创建一个新list对象,并且extend两个列表都指向该对象:

mergedlist = []
mergedlist.extend(listone)
mergedlist.extend(listtwo)

You could also use the list.extend() method in order to add a list to the end of another one:

listone = [1,2,3]
listtwo = [4,5,6]

listone.extend(listtwo)

If you want to keep the original list intact, you can create a new list object, and extend both lists to it:

mergedlist = []
mergedlist.extend(listone)
mergedlist.extend(listtwo)

回答 5

如何在Python中串联两个列表?

从3.7开始,这些是在python中串联两个(或多个)列表的最受欢迎的stdlib方法。

在此处输入图片说明

脚注

  1. 由于它的简洁性,这是一个不错的解决方案。但是sum以成对方式执行串联操作,这意味着这是二次操作,因为必须为每个步骤分配内存。如果您的列表很大,请不要使用。

  2. 查看chain 和阅读 chain.from_iterable 文档。您需要import itertools先。串联在内存中是线性的,因此这在性能和版本兼容性方面是最佳的。chain.from_iterable在2.6中引入。

  3. 此方法使用“ 其他解包概述”(PEP 448),但除非您手动手动解压缩每个列表,否则无法将其归纳为N个列表。

  4. a += ba.extend(b)在所有实际用途上大致相同。+=当在列表上调用时list.__iadd__,将内部调用 ,从而将第一个列表扩展到第二个列表。


性能

2列表串联1

在此处输入图片说明

这些方法之间没有太大区别,但是鉴于它们都具有相同的复杂度顺序(线性),这是有道理的。除了风格之外,没有特别的理由比一个更喜欢一个。

N列表串联

在此处输入图片说明

使用perfplot模块已生成图。代码,供您参考。

1. iadd+=)和extend方法就地操作,因此每次测试前都必须生成一个副本。为了公平起见,所有方法在左侧列表中都有一个预复制步骤,可以忽略。


对其他解决方案的评论

  • 请勿list.__add__以任何方式,形状或形式直接使用DUNDER方法。实际上,请避免使用dunder方法,并使用operator设计用于它们的运算符和函数。Python具有仔细的语义,这些语义比直接调用dunder更复杂。这是一个例子。因此,总而言之,a.__add__(b)=>差;a + b=>好。

  • 此处提供reduce(operator.add, [a, b])了成对连接的一些答案-这与sum([a, b], [])更多单词一样。

  • 使用的任何方法都set将删除重复项并失去顺序。请谨慎使用。

  • for i in b: a.append(i)a.extend(b)单一功能调用和惯用语言更加罗word,并且速度更慢。append之所以变慢,是因为为列表分配和增长了内存的语义。参见此处进行类似的讨论。

  • heapq.merge可以使用,但是它的用例是在线性时间内合并排序列表。在任何其他情况下使用它都是一种反模式。

  • yield从函数中列出列表元素是一种可以接受的方法,但是chain这样做更快,更好(它在C中具有代码路径,因此速度很快)。

  • operator.add(a, b)是可以接受的等效功能a + b。它的用例主要用于动态方法分派。否则,在我看来,最好选择更a + b短,更易读的格式。YMMV。

How do I concatenate two lists in Python?

As of 3.7, these are the most popular stdlib methods for concatenating two (or more) lists in python.

enter image description here

Footnotes

  1. This is a slick solution because of its succinctness. But sum performs concatenation in a pairwise fashion, which means this is a quadratic operation as memory has to be allocated for each step. DO NOT USE if your lists are large.

  2. See chain and chain.from_iterable from the docs. You will need to import itertools first. Concatenation is linear in memory, so this is the best in terms of performance and version compatibility. chain.from_iterable was introduced in 2.6.

  3. This method uses Additional Unpacking Generalizations (PEP 448), but cannot generalize to N lists unless you manually unpack each one yourself.

  4. a += b and a.extend(b) are more or less equivalent for all practical purposes. += when called on a list will internally call list.__iadd__, which extends the first list by the second.


Performance

2-List Concatenation1

enter image description here

There’s not much difference between these methods but that makes sense given they all have the same order of complexity (linear). There’s no particular reason to prefer one over the other except as a matter of style.

N-List Concatenation

enter image description here

Plots have been generated using the perfplot module. Code, for your reference.

1. The iadd (+=) and extend methods operate in-place, so a copy has to be generated each time before testing. To keep things fair, all methods have a pre-copy step for the left-hand list which can be ignored.


Comments on Other Solutions

  • DO NOT USE THE DUNDER METHOD list.__add__ directly in any way, shape or form. In fact, stay clear of dunder methods, and use the operators and operator functions like they were designed for. Python has careful semantics baked into these which are more complicated than just calling the dunder directly. Here is an example. So, to summarise, a.__add__(b) => BAD; a + b => GOOD.

  • Some answers here offer reduce(operator.add, [a, b]) for pairwise concatenation — this is the same as sum([a, b], []) only more wordy.

  • Any method that uses set will drop duplicates and lose ordering. Use with caution.

  • for i in b: a.append(i) is more wordy, and slower than a.extend(b), which is single function call and more idiomatic. append is slower because of the semantics with which memory is allocated and grown for lists. See here for a similar discussion.

  • heapq.merge will work, but its use case is for merging sorted lists in linear time. Using it in any other situation is an anti-pattern.

  • yielding list elements from a function is an acceptable method, but chain does this faster and better (it has a code path in C, so it is fast).

  • operator.add(a, b) is an acceptable functional equivalent to a + b. It’s use cases are mainly for dynamic method dispatch. Otherwise, prefer a + b which is shorter and more readable, in my opinion. YMMV.


回答 6

这很简单,我认为它甚至在本教程中已显示:

>>> listone = [1,2,3]
>>> listtwo = [4,5,6]
>>>
>>> listone + listtwo
[1, 2, 3, 4, 5, 6]

This is quite simple, and I think it was even shown in the tutorial:

>>> listone = [1,2,3]
>>> listtwo = [4,5,6]
>>>
>>> listone + listtwo
[1, 2, 3, 4, 5, 6]

回答 7

这个问题直接询问有关加入两个列表的问题。但是,即使您正在寻找加入许多列表的方式(包括加入零列表的情况),其搜索量也很高。

我认为最好的选择是使用列表推导:

>>> a = [[1,2,3], [4,5,6], [7,8,9]]
>>> [x for xs in a for x in xs]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

您还可以创建生成器:

>>> map(str, (x for xs in a for x in xs))
['1', '2', '3', '4', '5', '6', '7', '8', '9']

旧答案

考虑这种更通用的方法:

a = [[1,2,3], [4,5,6], [7,8,9]]
reduce(lambda c, x: c + x, a, [])

将输出:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

请注意,这在ais []或时也可以正常使用[[1,2,3]]

但是,可以使用以下命令更有效地完成此操作itertools

a = [[1,2,3], [4,5,6], [7,8,9]]
list(itertools.chain(*a))

如果不需要a list,而只是需要迭代,请省略list()

更新资料

Patrick Collins在评论中建议的替代方法也可能对您有用:

sum(a, [])

This question directly asks about joining two lists. However it’s pretty high in search even when you are looking for a way of joining many lists (including the case when you joining zero lists).

I think the best option is to use list comprehensions:

>>> a = [[1,2,3], [4,5,6], [7,8,9]]
>>> [x for xs in a for x in xs]
[1, 2, 3, 4, 5, 6, 7, 8, 9]

You can create generators as well:

>>> map(str, (x for xs in a for x in xs))
['1', '2', '3', '4', '5', '6', '7', '8', '9']

Old Answer

Consider this more generic approach:

a = [[1,2,3], [4,5,6], [7,8,9]]
reduce(lambda c, x: c + x, a, [])

Will output:

[1, 2, 3, 4, 5, 6, 7, 8, 9]

Note, this also works correctly when a is [] or [[1,2,3]].

However, this can be done more efficiently with itertools:

a = [[1,2,3], [4,5,6], [7,8,9]]
list(itertools.chain(*a))

If you don’t need a list, but just an iterable, omit list().

Update

Alternative suggested by Patrick Collins in the comments could also work for you:

sum(a, [])

回答 8

您可以简单地使用+or +=运算符,如下所示:

a = [1, 2, 3]
b = [4, 5, 6]

c = a + b

要么:

c = []
a = [1, 2, 3]
b = [4, 5, 6]

c += (a + b)

另外,如果您希望合并列表中的值唯一,则可以执行以下操作:

c = list(set(a + b))

You could simply use the + or += operator as follows:

a = [1, 2, 3]
b = [4, 5, 6]

c = a + b

Or:

c = []
a = [1, 2, 3]
b = [4, 5, 6]

c += (a + b)

Also, if you want the values in the merged list to be unique you can do:

c = list(set(a + b))

回答 9

值得注意的是,该itertools.chain函数接受可变数量的参数:

>>> l1 = ['a']; l2 = ['b', 'c']; l3 = ['d', 'e', 'f']
>>> [i for i in itertools.chain(l1, l2)]
['a', 'b', 'c']
>>> [i for i in itertools.chain(l1, l2, l3)]
['a', 'b', 'c', 'd', 'e', 'f']

如果输入一个可迭代的(元组,列表,生成器等),from_iterable则可以使用class方法:

>>> il = [['a'], ['b', 'c'], ['d', 'e', 'f']]
>>> [i for i in itertools.chain.from_iterable(il)]
['a', 'b', 'c', 'd', 'e', 'f']

It’s worth noting that the itertools.chain function accepts variable number of arguments:

>>> l1 = ['a']; l2 = ['b', 'c']; l3 = ['d', 'e', 'f']
>>> [i for i in itertools.chain(l1, l2)]
['a', 'b', 'c']
>>> [i for i in itertools.chain(l1, l2, l3)]
['a', 'b', 'c', 'd', 'e', 'f']

If an iterable (tuple, list, generator, etc.) is the input, the from_iterable class method may be used:

>>> il = [['a'], ['b', 'c'], ['d', 'e', 'f']]
>>> [i for i in itertools.chain.from_iterable(il)]
['a', 'b', 'c', 'd', 'e', 'f']

回答 10

使用Python 3.3+,您可以使用yield from

listone = [1,2,3]
listtwo = [4,5,6]

def merge(l1, l2):
    yield from l1
    yield from l2

>>> list(merge(listone, listtwo))
[1, 2, 3, 4, 5, 6]

或者,如果您想支持任意数量的迭代器:

def merge(*iters):
    for it in iters:
        yield from it

>>> list(merge(listone, listtwo, 'abcd', [20, 21, 22]))
[1, 2, 3, 4, 5, 6, 'a', 'b', 'c', 'd', 20, 21, 22]

With Python 3.3+ you can use yield from:

listone = [1,2,3]
listtwo = [4,5,6]

def merge(l1, l2):
    yield from l1
    yield from l2

>>> list(merge(listone, listtwo))
[1, 2, 3, 4, 5, 6]

Or, if you want to support an arbitrary number of iterators:

def merge(*iters):
    for it in iters:
        yield from it

>>> list(merge(listone, listtwo, 'abcd', [20, 21, 22]))
[1, 2, 3, 4, 5, 6, 'a', 'b', 'c', 'd', 20, 21, 22]

回答 11

如果你想在排序的形式两个列表合并,您可以使用merge从函数heapq库。

from heapq import merge

a = [1, 2, 4]
b = [2, 4, 6, 7]

print list(merge(a, b))

If you want to merge the two lists in sorted form, you can use the merge function from the heapq library.

from heapq import merge

a = [1, 2, 4]
b = [2, 4, 6, 7]

print list(merge(a, b))

回答 12

如果您不能使用加号(+),则可以使用operator导入:

import operator

listone = [1,2,3]
listtwo = [4,5,6]

result = operator.add(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

另外,您也可以使用__add__ dunder函数:

listone = [1,2,3]
listtwo = [4,5,6]

result = list.__add__(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

If you can’t use the plus operator (+), you can use the operator import:

import operator

listone = [1,2,3]
listtwo = [4,5,6]

result = operator.add(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

Alternatively, you could also use the __add__ dunder function:

listone = [1,2,3]
listtwo = [4,5,6]

result = list.__add__(listone, listtwo)
print(result)

>>> [1, 2, 3, 4, 5, 6]

回答 13

作为更多列表的更通用方法,您可以将它们放在列表中并使用itertools.chain.from_iterable()1函数,该函数基于答案是扁平化嵌套列表的最佳方法:

>>> l=[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> import itertools
>>> list(itertools.chain.from_iterable(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

1.请注意,这chain.from_iterable()在Python 2.6和更高版本中可用。在其他版本中,请使用chain(*l)

As a more general way for more lists you can put them within a list and use the itertools.chain.from_iterable()1 function which based on this answer is the best way for flatting a nested list:

>>> l=[[1, 2, 3], [4, 5, 6], [7, 8, 9]]
>>> import itertools
>>> list(itertools.chain.from_iterable(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

1. Note that chain.from_iterable() is available in Python 2.6 and later. In other versions, use chain(*l).


回答 14

如果您需要使用复杂的排序规则合并两个有序列表,则可能需要像下面的代码一样滚动它(使用简单的排序规则以提高可读性:-))。

list1 = [1,2,5]
list2 = [2,3,4]
newlist = []

while list1 and list2:
    if list1[0] == list2[0]:
        newlist.append(list1.pop(0))
        list2.pop(0)
    elif list1[0] < list2[0]:
        newlist.append(list1.pop(0))
    else:
        newlist.append(list2.pop(0))

if list1:
    newlist.extend(list1)
if list2:
    newlist.extend(list2)

assert(newlist == [1, 2, 3, 4, 5])

If you need to merge two ordered lists with complicated sorting rules, you might have to roll it yourself like in the following code (using a simple sorting rule for readability :-) ).

list1 = [1,2,5]
list2 = [2,3,4]
newlist = []

while list1 and list2:
    if list1[0] == list2[0]:
        newlist.append(list1.pop(0))
        list2.pop(0)
    elif list1[0] < list2[0]:
        newlist.append(list1.pop(0))
    else:
        newlist.append(list2.pop(0))

if list1:
    newlist.extend(list1)
if list2:
    newlist.extend(list2)

assert(newlist == [1, 2, 3, 4, 5])

回答 15

您可以使用append()list对象上定义的方法:

mergedlist =[]
for elem in listone:
    mergedlist.append(elem)
for elem in listtwo:
    mergedlist.append(elem)

You could use the append() method defined on list objects:

mergedlist =[]
for elem in listone:
    mergedlist.append(elem)
for elem in listtwo:
    mergedlist.append(elem)

回答 16

list(set(listone) | set(listtwo))

上面的代码不保留顺序,而是从每个列表中删除重复项(但不从串联列表中删除)

list(set(listone) | set(listtwo))

The above code, does not preserve order, removes duplicate from each list (but not from the concatenated list)


回答 17

正如许多人已经指出itertools.chain()的那样,如果一个人需要对两个列表应用完全相同的处理方法,那该走的路。就我而言,我有一个标签和一个标志,它们与一个列表彼此不同,因此我需要稍微复杂一些的东西。事实证明,在幕后itertools.chain()只需执行以下操作即可:

for it in iterables:
    for element in it:
        yield element

(请参阅https://docs.python.org/2/library/itertools.html),所以我从这里汲取了灵感,并根据以下内容写了一些东西:

for iterable, header, flag in ( (newList, 'New', ''), (modList, 'Modified', '-f')):
    print header + ':'
    for path in iterable:
        [...]
        command = 'cp -r' if os.path.isdir(srcPath) else 'cp'
        print >> SCRIPT , command, flag, srcPath, mergedDirPath
        [...]

这里要理解的要点是,列表只是可迭代的特例,它们是与其他对象一样的对象。并且for ... inpython 中的循环可以使用元组变量,因此同时循环多个变量很简单。

As already pointed out by many, itertools.chain() is the way to go if one needs to apply exactly the same treatment to both lists. In my case, I had a label and a flag which were different from one list to the other, so I needed something slightly more complex. As it turns out, behind the scenes itertools.chain() simply does the following:

for it in iterables:
    for element in it:
        yield element

(see https://docs.python.org/2/library/itertools.html), so I took inspiration from here and wrote something along these lines:

for iterable, header, flag in ( (newList, 'New', ''), (modList, 'Modified', '-f')):
    print header + ':'
    for path in iterable:
        [...]
        command = 'cp -r' if os.path.isdir(srcPath) else 'cp'
        print >> SCRIPT , command, flag, srcPath, mergedDirPath
        [...]

The main points to understand here are that lists are just a special case of iterable, which are objects like any other; and that for ... in loops in python can work with tuple variables, so it is simple to loop on multiple variables at the same time.


回答 18

使用简单的列表理解:

joined_list = [item for list_ in [list_one, list_two] for item in list_]

它具有使用附加解包概括的最新方法的所有优点-即您可以以这种方式连接任意数量的不同可迭代对象(例如,列表,元组,范围和生成器)-而且不限于Python 3.5或更高版本。

Use a simple list comprehension:

joined_list = [item for list_ in [list_one, list_two] for item in list_]

It has all the advantages of the newest approach of using Additional Unpacking Generalizations – i.e. you can concatenate an arbitrary number of different iterables (for example, lists, tuples, ranges, and generators) that way – and it’s not limited to Python 3.5 or later.


回答 19

合并列表列表的一种非常简洁的方法是

list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
reduce(list.__add__, list_of_lists)

这给了我们

[1, 2, 3, 4, 5, 6, 7, 8, 9]

A really concise way to combine a list of lists is

list_of_lists = [[1,2,3], [4,5,6], [7,8,9]]
reduce(list.__add__, list_of_lists)

which gives us

[1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 20

在Python中,您可以使用此命令来连接两个兼容维度的数组

numpy.concatenate([a,b])

In Python you can concatenate two arrays of compatible dimensions with this command

numpy.concatenate([a,b])

回答 21

因此,有两种简单的方法。

  1. 使用+:它从提供的列表中创建一个新列表

例:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: a + b
Out[3]: [1, 2, 3, 4, 5, 6]

In [4]: %timeit a + b
10000000 loops, best of 3: 126 ns per loop
  1. 使用extend:将新列表追加到现有列表。这意味着它不会创建单独的列表。

例:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: %timeit a.extend(b)
10000000 loops, best of 3: 91.1 ns per loop

因此,我们看到在两种最流行的方法中,它extend是有效的。

So there are two easy ways.

  1. Using +: It creates a new list from provided lists

Example:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: a + b
Out[3]: [1, 2, 3, 4, 5, 6]

In [4]: %timeit a + b
10000000 loops, best of 3: 126 ns per loop
  1. Using extend: It appends new list to existing list. That means it does not create a separate list.

Example:

In [1]: a = [1, 2, 3]

In [2]: b = [4, 5, 6]

In [3]: %timeit a.extend(b)
10000000 loops, best of 3: 91.1 ns per loop

Thus we see that out of two of most popular methods, extend is efficient.


回答 22

有多种方法可以在python中串联列表。

l1 = [1,2,3,4]
l2 = [3,4,5,6]

1. new_list = l1.extend(l2)
2. new_list = l1 + l2
3. new_list = [*l1, *l2]

There are multiple ways to concatenete lists in python.

l1 = [1,2,3,4]
l2 = [3,4,5,6]

1. new_list = l1.extend(l2)
2. new_list = l1 + l2
3. new_list = [*l1, *l2]

回答 23

import itertools

A = list(zip([1,3,5,7,9],[2,4,6,8,10]))
B = [1,3,5,7,9]+[2,4,6,8,10]
C = list(set([1,3,5,7,9] + [2,4,6,8,10]))

D = [1,3,5,7,9]
D.append([2,4,6,8,10])

E = [1,3,5,7,9]
E.extend([2,4,6,8,10])

F = []
for a in itertools.chain([1,3,5,7,9], [2,4,6,8,10]):
    F.append(a)


print ("A: " + str(A))
print ("B: " + str(B))
print ("C: " + str(C))
print ("D: " + str(D))
print ("E: " + str(E))
print ("F: " + str(F))

输出:

A: [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
B: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
C: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
D: [1, 3, 5, 7, 9, [2, 4, 6, 8, 10]]
E: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
F: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
import itertools

A = list(zip([1,3,5,7,9],[2,4,6,8,10]))
B = [1,3,5,7,9]+[2,4,6,8,10]
C = list(set([1,3,5,7,9] + [2,4,6,8,10]))

D = [1,3,5,7,9]
D.append([2,4,6,8,10])

E = [1,3,5,7,9]
E.extend([2,4,6,8,10])

F = []
for a in itertools.chain([1,3,5,7,9], [2,4,6,8,10]):
    F.append(a)


print ("A: " + str(A))
print ("B: " + str(B))
print ("C: " + str(C))
print ("D: " + str(D))
print ("E: " + str(E))
print ("F: " + str(F))

Output:

A: [(1, 2), (3, 4), (5, 6), (7, 8), (9, 10)]
B: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
C: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
D: [1, 3, 5, 7, 9, [2, 4, 6, 8, 10]]
E: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]
F: [1, 3, 5, 7, 9, 2, 4, 6, 8, 10]

回答 24

如果您想要一个新列表,同时保留两个旧列表:

def concatenate_list(listOne, listTwo):
    joinedList = []
    for i in listOne:
        joinedList.append(i)
    for j in listTwo:
        joinedList.append(j)

    sorted(joinedList)

    return joinedList

If you wanted a new list whilst keeping the two old lists:

def concatenate_list(listOne, listTwo):
    joinedList = []
    for i in listOne:
        joinedList.append(i)
    for j in listTwo:
        joinedList.append(j)

    sorted(joinedList)

    return joinedList

回答 25

lst1 = [1,2]

lst2 = [3,4]

def list_combinationer(Bushisms, are_funny):

    for item in lst1:
        lst2.append(item)
        lst1n2 = sorted(lst2)
        print lst1n2

list_combinationer(lst1, lst2)

[1,2,3,4]
lst1 = [1,2]

lst2 = [3,4]

def list_combinationer(Bushisms, are_funny):

    for item in lst1:
        lst2.append(item)
        lst1n2 = sorted(lst2)
        print lst1n2

list_combinationer(lst1, lst2)

[1,2,3,4]

回答 26

您可以按照代码进行操作

listone = [1, 2, 3]
listtwo = [4, 5, 6]

for i in listone:
    listtwo.append(i)
print(listtwo)

[1,2,3,4,5,6]

You may follow the code

listone = [1, 2, 3]
listtwo = [4, 5, 6]

for i in listone:
    listtwo.append(i)
print(listtwo)

[1,2,3,4,5,6]

如何将列表分成大小均匀的块?

问题:如何将列表分成大小均匀的块?

我有一个任意长度的列表,我需要将其分成相等大小的块并对其进行操作。有一些明显的方法可以做到这一点,例如保留一个计数器和两个列表,当第二个列表填满时,将其添加到第一个列表中,并为第二轮数据清空第二个列表,但这可能会非常昂贵。

我想知道是否有人对任何长度的列表都有很好的解决方案,例如使用生成器。

我一直在寻找有用的东西,itertools但找不到任何明显有用的东西。可能已经错过了。

相关问题:遍历大块列表的最“ pythonic”方法是什么?

I have a list of arbitrary length, and I need to split it up into equal size chunks and operate on it. There are some obvious ways to do this, like keeping a counter and two lists, and when the second list fills up, add it to the first list and empty the second list for the next round of data, but this is potentially extremely expensive.

I was wondering if anyone had a good solution to this for lists of any length, e.g. using generators.

I was looking for something useful in itertools but I couldn’t find anything obviously useful. Might’ve missed it, though.

Related question: What is the most “pythonic” way to iterate over a list in chunks?


回答 0

这是一个生成所需块的生成器:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

如果您使用的是Python 2,则应使用xrange()而不是range()

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

同样,您可以简单地使用列表理解而不是编写函数,尽管将这样的操作封装在命名函数中是个好主意,这样您的代码更易于理解。Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

Python 2版本:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

Here’s a generator that yields the chunks you want:

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]

import pprint
pprint.pprint(list(chunks(range(10, 75), 10)))
[[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

If you’re using Python 2, you should use xrange() instead of range():

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in xrange(0, len(lst), n):
        yield lst[i:i + n]

Also you can simply use list comprehension instead of writing a function, though it’s a good idea to encapsulate operations like this in named functions so that your code is easier to understand. Python 3:

[lst[i:i + n] for i in range(0, len(lst), n)]

Python 2 version:

[lst[i:i + n] for i in xrange(0, len(lst), n)]

回答 1

如果您想要超级简单的东西:

def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

在Python 2.x中使用xrange()代替range()

If you want something super simple:

def chunks(l, n):
    n = max(1, n)
    return (l[i:i+n] for i in range(0, len(l), n))

Use xrange() instead of range() in the case of Python 2.x


回答 2

直接来自(旧的)Python文档(itertools的注意事项):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

JFSebastian建议的当前版本:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

我猜想Guido的时间机器可以工作了,可以工作了,可以工作了,可以再次工作。

这些解决方案之所以有效,是因为[iter(iterable)]*n(或早期版本中的等效项)创建了一个迭代器,n并在列表中重复了几次。izip_longest然后有效地执行“每个”迭代器的循环;因为这是相同的迭代器,所以每次此类调用都会对其进行高级处理,从而使每个此类zip-roundrobin生成一个元组n项。

Directly from the (old) Python documentation (recipes for itertools):

from itertools import izip, chain, repeat

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return izip(*[chain(iterable, repeat(padvalue, n-1))]*n)

The current version, as suggested by J.F.Sebastian:

#from itertools import izip_longest as zip_longest # for Python 2.x
from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(n, iterable, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

I guess Guido’s time machine works—worked—will work—will have worked—was working again.

These solutions work because [iter(iterable)]*n (or the equivalent in the earlier version) creates one iterator, repeated n times in the list. izip_longest then effectively performs a round-robin of “each” iterator; because this is the same iterator, it is advanced by each such call, resulting in each such zip-roundrobin generating one tuple of n items.


回答 3

我知道这有点陈旧,但没有人提到numpy.array_split

import numpy as np

lst = range(50)
np.array_split(lst, 5)
# [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#  array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
#  array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
#  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
#  array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

I know this is kind of old but nobody yet mentioned numpy.array_split:

import numpy as np

lst = range(50)
np.array_split(lst, 5)
# [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]),
#  array([10, 11, 12, 13, 14, 15, 16, 17, 18, 19]),
#  array([20, 21, 22, 23, 24, 25, 26, 27, 28, 29]),
#  array([30, 31, 32, 33, 34, 35, 36, 37, 38, 39]),
#  array([40, 41, 42, 43, 44, 45, 46, 47, 48, 49])]

回答 4

令我惊讶的是,没有人想到使用iter二元形式

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

演示:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

这可以与任何迭代一起工作,并产生延迟输出。它返回元组而不是迭代器,但是我认为它仍然具有一定的优雅。它也不会填充;如果您想进行填充,则只需对上述内容进行简单的修改即可:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

演示:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

izip_longest基于解决方案的解决方案一样,以上内容总是可以解决的。据我所知,没有可选的填充函数的单行或两行itertools配方。通过结合以上两种方法,这一方法非常接近:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

演示:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

我认为这是最短的分块器,建议提供可选的填充。

饰演Tomasz Gandor 观察到的,如果两个填充分块器遇到一长串填充值,它们将意外停止。这是一个可以合理解决该问题的最终变体:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

演示:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]

I’m surprised nobody has thought of using iter‘s two-argument form:

from itertools import islice

def chunk(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]

This works with any iterable and produces output lazily. It returns tuples rather than iterators, but I think it has a certain elegance nonetheless. It also doesn’t pad; if you want padding, a simple variation on the above will suffice:

from itertools import islice, chain, repeat

def chunk_pad(it, size, padval=None):
    it = chain(iter(it), repeat(padval))
    return iter(lambda: tuple(islice(it, size)), (padval,) * size)

Demo:

>>> list(chunk_pad(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk_pad(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

Like the izip_longest-based solutions, the above always pads. As far as I know, there’s no one- or two-line itertools recipe for a function that optionally pads. By combining the above two approaches, this one comes pretty close:

_no_padding = object()

def chunk(it, size, padval=_no_padding):
    if padval == _no_padding:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(padval))
        sentinel = (padval,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Demo:

>>> list(chunk(range(14), 3))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13)]
>>> list(chunk(range(14), 3, None))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, None)]
>>> list(chunk(range(14), 3, 'a'))
[(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, 11), (12, 13, 'a')]

I believe this is the shortest chunker proposed that offers optional padding.

As Tomasz Gandor observed, the two padding chunkers will stop unexpectedly if they encounter a long sequence of pad values. Here’s a final variation that works around that problem in a reasonable way:

_no_padding = object()
def chunk(it, size, padval=_no_padding):
    it = iter(it)
    chunker = iter(lambda: tuple(islice(it, size)), ())
    if padval == _no_padding:
        yield from chunker
    else:
        for ch in chunker:
            yield ch if len(ch) == size else ch + (padval,) * (size - len(ch))

Demo:

>>> list(chunk([1, 2, (), (), 5], 2))
[(1, 2), ((), ()), (5,)]
>>> list(chunk([1, 2, None, None, 5], 2, None))
[(1, 2), (None, None), (5, None)]

回答 5

这是处理任意可迭代对象的生成器:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

例:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

Here is a generator that work on arbitrary iterables:

def split_seq(iterable, size):
    it = iter(iterable)
    item = list(itertools.islice(it, size))
    while item:
        yield item
        item = list(itertools.islice(it, size))

Example:

>>> import pprint
>>> pprint.pprint(list(split_seq(xrange(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

回答 6

def chunk(input, size):
    return map(None, *([iter(input)] * size))
def chunk(input, size):
    return map(None, *([iter(input)] * size))

回答 7

简单而优雅

l = range(1, 1000)
print [l[x:x+10] for x in xrange(0, len(l), 10)]

或者,如果您喜欢:

def chunks(l, n): return [l[x: x+n] for x in xrange(0, len(l), n)]
chunks(l, 10)

Simple yet elegant

l = range(1, 1000)
print [l[x:x+10] for x in xrange(0, len(l), 10)]

or if you prefer:

def chunks(l, n): return [l[x: x+n] for x in xrange(0, len(l), n)]
chunks(l, 10)

回答 8

批判其他答案在这里:

这些答案都不是均匀大小的块,它们都在末尾留下欠缺的块,因此它们并不完全平衡。如果您使用这些功能来分配工作,那么您就建立了一个前景可能比其他事情早完成的前景,因此在其他人继续努力的同时,它什么也没做。

例如,当前的最佳答案以:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

我只是讨厌最后那个矮子!

其他人,例如list(grouper(3, xrange(7)))和,chunk(xrange(7), 3)都返回:[(0, 1, 2), (3, 4, 5), (6, None, None)]。的None只是填充,在我看来相当不雅。他们没有将可迭代对象均匀地分块。

为什么我们不能更好地划分这些?

我的解决方案

这是一个平衡的解决方案,它是根据我在生产环境中使用过的函数改编而成的(Python 3中的Note替换xrangerange):

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 

我创建了一个生成器,如果将其放入列表中,它的功能也相同:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in xrange(baskets):
        yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]

最后,由于我看到上述所有函数均按连续顺序返回元素(如给出的那样):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in xrange(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in xrange(length)]

输出量

要测试它们:

print(baskets_from(xrange(6), 8))
print(list(iter_baskets_from(xrange(6), 8)))
print(list(iter_baskets_contiguous(xrange(6), 8)))
print(baskets_from(xrange(22), 8))
print(list(iter_baskets_from(xrange(22), 8)))
print(list(iter_baskets_contiguous(xrange(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(xrange(26), 5))
print(list(iter_baskets_from(xrange(26), 5)))
print(list(iter_baskets_contiguous(xrange(26), 5)))

打印出:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

请注意,连续生成器以与其他两个相同的长度模式提供块,但是所有项都是有序的,并且它们被均匀地划分为一个可以划分一列离散元素的形式。

Critique of other answers here:

None of these answers are evenly sized chunks, they all leave a runt chunk at the end, so they’re not completely balanced. If you were using these functions to distribute work, you’ve built-in the prospect of one likely finishing well before the others, so it would sit around doing nothing while the others continued working hard.

For example, the current top answer ends with:

[60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
[70, 71, 72, 73, 74]]

I just hate that runt at the end!

Others, like list(grouper(3, xrange(7))), and chunk(xrange(7), 3) both return: [(0, 1, 2), (3, 4, 5), (6, None, None)]. The None‘s are just padding, and rather inelegant in my opinion. They are NOT evenly chunking the iterables.

Why can’t we divide these better?

My Solution(s)

Here’s a balanced solution, adapted from a function I’ve used in production (Note in Python 3 to replace xrange with range):

def baskets_from(items, maxbaskets=25):
    baskets = [[] for _ in xrange(maxbaskets)] # in Python 3 use range
    for i, item in enumerate(items):
        baskets[i % maxbaskets].append(item)
    return filter(None, baskets) 

And I created a generator that does the same if you put it into a list:

def iter_baskets_from(items, maxbaskets=3):
    '''generates evenly balanced baskets from indexable iterable'''
    item_count = len(items)
    baskets = min(item_count, maxbaskets)
    for x_i in xrange(baskets):
        yield [items[y_i] for y_i in xrange(x_i, item_count, baskets)]

And finally, since I see that all of the above functions return elements in a contiguous order (as they were given):

def iter_baskets_contiguous(items, maxbaskets=3, item_count=None):
    '''
    generates balanced baskets from iterable, contiguous contents
    provide item_count if providing a iterator that doesn't support len()
    '''
    item_count = item_count or len(items)
    baskets = min(item_count, maxbaskets)
    items = iter(items)
    floor = item_count // baskets 
    ceiling = floor + 1
    stepdown = item_count % baskets
    for x_i in xrange(baskets):
        length = ceiling if x_i < stepdown else floor
        yield [items.next() for _ in xrange(length)]

Output

To test them out:

print(baskets_from(xrange(6), 8))
print(list(iter_baskets_from(xrange(6), 8)))
print(list(iter_baskets_contiguous(xrange(6), 8)))
print(baskets_from(xrange(22), 8))
print(list(iter_baskets_from(xrange(22), 8)))
print(list(iter_baskets_contiguous(xrange(22), 8)))
print(baskets_from('ABCDEFG', 3))
print(list(iter_baskets_from('ABCDEFG', 3)))
print(list(iter_baskets_contiguous('ABCDEFG', 3)))
print(baskets_from(xrange(26), 5))
print(list(iter_baskets_from(xrange(26), 5)))
print(list(iter_baskets_contiguous(xrange(26), 5)))

Which prints out:

[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0], [1], [2], [3], [4], [5]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 8, 16], [1, 9, 17], [2, 10, 18], [3, 11, 19], [4, 12, 20], [5, 13, 21], [6, 14], [7, 15]]
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10, 11], [12, 13, 14], [15, 16, 17], [18, 19], [20, 21]]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'D', 'G'], ['B', 'E'], ['C', 'F']]
[['A', 'B', 'C'], ['D', 'E'], ['F', 'G']]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 5, 10, 15, 20, 25], [1, 6, 11, 16, 21], [2, 7, 12, 17, 22], [3, 8, 13, 18, 23], [4, 9, 14, 19, 24]]
[[0, 1, 2, 3, 4, 5], [6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20], [21, 22, 23, 24, 25]]

Notice that the contiguous generator provide chunks in the same length patterns as the other two, but the items are all in order, and they are as evenly divided as one may divide a list of discrete elements.


回答 9

我在其中看到了最棒的Python式答案 这个问题重复部分中,:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

您可以为任何n个创建n个元组。如果为a = range(1, 15),则结果将为:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

如果列表平均分配,则可以替换zip_longestzip,否则三元组(13, 14, None)将丢失。上面使用了Python 3。对于Python 2,请使用izip_longest

I saw the most awesome Python-ish answer in a duplicate of this question:

from itertools import zip_longest

a = range(1, 16)
i = iter(a)
r = list(zip_longest(i, i, i))
>>> print(r)
[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, 15)]

You can create n-tuple for any n. If a = range(1, 15), then the result will be:

[(1, 2, 3), (4, 5, 6), (7, 8, 9), (10, 11, 12), (13, 14, None)]

If the list is divided evenly, then you can replace zip_longest with zip, otherwise the triplet (13, 14, None) would be lost. Python 3 is used above. For Python 2, use izip_longest.


回答 10

如果您知道列表大小:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

如果不这样做(一个迭代器):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

在后一种情况下,如果可以确定序列始终包含给定大小的所有块(即不存在不完整的最后一个块),则可以用更漂亮的方式来重新措词。

If you know list size:

def SplitList(mylist, chunk_size):
    return [mylist[offs:offs+chunk_size] for offs in range(0, len(mylist), chunk_size)]

If you don’t (an iterator):

def IterChunks(sequence, chunk_size):
    res = []
    for item in sequence:
        res.append(item)
        if len(res) >= chunk_size:
            yield res
            res = []
    if res:
        yield res  # yield the last, incomplete, portion

In the latter case, it can be rephrased in a more beautiful way if you can be sure that the sequence always contains a whole number of chunks of given size (i.e. there is no incomplete last chunk).


回答 11

图尔茨库具有partition此功能:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]

The toolz library has the partition function for this:

from toolz.itertoolz.core import partition

list(partition(2, [1, 2, 3, 4]))
[(1, 2), (3, 4)]

回答 12

例如,如果块大小为3,则可以执行以下操作:

zip(*[iterable[i::3] for i in range(3)]) 

来源:http//code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

当我可以输入的块大小为固定数字(例如“ 3”)并且永远不会更改时,我将使用此选项。

If you had a chunk size of 3 for example, you could do:

zip(*[iterable[i::3] for i in range(3)]) 

source: http://code.activestate.com/recipes/303060-group-a-list-into-sequential-n-tuples/

I would use this when my chunk size is fixed number I can type, e.g. ‘3’, and would never change.


回答 13

我非常喜欢tzot和JFSebastian提出的Python文档版本,但是它有两个缺点:

  • 这不是很明确
  • 我通常不希望在最后一块填充值

我在代码中经常使用此代码:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

更新:惰性块版本:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))

I like the Python doc’s version proposed by tzot and J.F.Sebastian a lot, but it has two shortcomings:

  • it is not very explicit
  • I usually don’t want a fill value in the last chunk

I’m using this one a lot in my code:

from itertools import islice

def chunks(n, iterable):
    iterable = iter(iterable)
    while True:
        yield tuple(islice(iterable, n)) or iterable.next()

UPDATE: A lazy chunks version:

from itertools import chain, islice

def chunks(n, iterable):
   iterable = iter(iterable)
   while True:
       yield chain([next(iterable)], islice(iterable, n-1))

回答 14

[AA[i:i+SS] for i in range(len(AA))[::SS]]

其中AA是数组,SS是块大小。例如:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3
[AA[i:i+SS] for i in range(len(AA))[::SS]]

Where AA is array, SS is chunk size. For example:

>>> AA=range(10,21);SS=3
>>> [AA[i:i+SS] for i in range(len(AA))[::SS]]
[[10, 11, 12], [13, 14, 15], [16, 17, 18], [19, 20]]
# or [range(10, 13), range(13, 16), range(16, 19), range(19, 21)] in py3

回答 15

我很好奇不同方法的性能,这里是:

在Python 3.5.1上测试

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

结果:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

I was curious about the performance of different approaches and here it is:

Tested on Python 3.5.1

import time
batch_size = 7
arr_len = 298937

#---------slice-------------

print("\r\nslice")
start = time.time()
arr = [i for i in range(0, arr_len)]
while True:
    if not arr:
        break

    tmp = arr[0:batch_size]
    arr = arr[batch_size:-1]
print(time.time() - start)

#-----------index-----------

print("\r\nindex")
arr = [i for i in range(0, arr_len)]
start = time.time()
for i in range(0, round(len(arr) / batch_size + 1)):
    tmp = arr[batch_size * i : batch_size * (i + 1)]
print(time.time() - start)

#----------batches 1------------

def batch(iterable, n=1):
    l = len(iterable)
    for ndx in range(0, l, n):
        yield iterable[ndx:min(ndx + n, l)]

print("\r\nbatches 1")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#----------batches 2------------

from itertools import islice, chain

def batch(iterable, size):
    sourceiter = iter(iterable)
    while True:
        batchiter = islice(sourceiter, size)
        yield chain([next(batchiter)], batchiter)


print("\r\nbatches 2")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in batch(arr, batch_size):
    tmp = x
print(time.time() - start)

#---------chunks-------------
def chunks(l, n):
    """Yield successive n-sized chunks from l."""
    for i in range(0, len(l), n):
        yield l[i:i + n]
print("\r\nchunks")
arr = [i for i in range(0, arr_len)]
start = time.time()
for x in chunks(arr, batch_size):
    tmp = x
print(time.time() - start)

#-----------grouper-----------

from itertools import zip_longest # for Python 3.x
#from six.moves import zip_longest # for both (uses the six compat library)

def grouper(iterable, n, padvalue=None):
    "grouper(3, 'abcdefg', 'x') --> ('a','b','c'), ('d','e','f'), ('g','x','x')"
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

arr = [i for i in range(0, arr_len)]
print("\r\ngrouper")
start = time.time()
for x in grouper(arr, batch_size):
    tmp = x
print(time.time() - start)

Results:

slice
31.18285083770752

index
0.02184295654296875

batches 1
0.03503894805908203

batches 2
0.22681021690368652

chunks
0.019841909408569336

grouper
0.006506919860839844

回答 16

码:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

结果:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

code:

def split_list(the_list, chunk_size):
    result_list = []
    while the_list:
        result_list.append(the_list[:chunk_size])
        the_list = the_list[chunk_size:]
    return result_list

a_list = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

print split_list(a_list, 3)

result:

[[1, 2, 3], [4, 5, 6], [7, 8, 9], [10]]

回答 17

您还可以get_chunksutilspie库函数用作:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

您可以utilspie通过pip 安装:

sudo pip install utilspie

免责声明:我是utilspie library 的创建者

You may also use get_chunks function of utilspie library as:

>>> from utilspie import iterutils
>>> a = [1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> list(iterutils.get_chunks(a, 5))
[[1, 2, 3, 4, 5], [6, 7, 8, 9]]

You can install utilspie via pip:

sudo pip install utilspie

Disclaimer: I am the creator of utilspie library.


回答 18

在这一点上,我认为我们需要一个递归生成器,以防万一…

在python 2:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

在python 3:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

同样,在外星人大规模入侵的情况下,经过修饰的递归生成器可能会派上用场:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

At this point, I think we need a recursive generator, just in case…

In python 2:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

In python 3:

def chunks(li, n):
    if li == []:
        return
    yield li[:n]
    yield from chunks(li[n:], n)

Also, in case of massive Alien invasion, a decorated recursive generator might become handy:

def dec(gen):
    def new_gen(li, n):
        for e in gen(li, n):
            if e == []:
                return
            yield e
    return new_gen

@dec
def chunks(li, n):
    yield li[:n]
    for e in chunks(li[n:], n):
        yield e

回答 19

使用Python 3.8中的赋值表达式,它变得非常不错:

import itertools

def batch(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        yield item

这适用于任意迭代,而不仅仅是列表。

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

With Assignment Expressions in Python 3.8 it becomes quite nice:

import itertools

def batch(iterable, size):
    it = iter(iterable)
    while item := list(itertools.islice(it, size)):
        yield item

This works on an arbitrary iterable, not just a list.

>>> import pprint
>>> pprint.pprint(list(batch(range(75), 10)))
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
 [30, 31, 32, 33, 34, 35, 36, 37, 38, 39],
 [40, 41, 42, 43, 44, 45, 46, 47, 48, 49],
 [50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
 [60, 61, 62, 63, 64, 65, 66, 67, 68, 69],
 [70, 71, 72, 73, 74]]

回答 20

呵呵,单行版

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

heh, one line version

In [48]: chunk = lambda ulist, step:  map(lambda i: ulist[i:i+step],  xrange(0, len(ulist), step))

In [49]: chunk(range(1,100), 10)
Out[49]: 
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
 [21, 22, 23, 24, 25, 26, 27, 28, 29, 30],
 [31, 32, 33, 34, 35, 36, 37, 38, 39, 40],
 [41, 42, 43, 44, 45, 46, 47, 48, 49, 50],
 [51, 52, 53, 54, 55, 56, 57, 58, 59, 60],
 [61, 62, 63, 64, 65, 66, 67, 68, 69, 70],
 [71, 72, 73, 74, 75, 76, 77, 78, 79, 80],
 [81, 82, 83, 84, 85, 86, 87, 88, 89, 90],
 [91, 92, 93, 94, 95, 96, 97, 98, 99]]

回答 21

def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

用法:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq
def split_seq(seq, num_pieces):
    start = 0
    for i in xrange(num_pieces):
        stop = start + len(seq[i::num_pieces])
        yield seq[start:stop]
        start = stop

usage:

seq = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

for seq in split_seq(seq, 3):
    print seq

回答 22

另一个更明确的版本。

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList

Another more explicit version.

def chunkList(initialList, chunkSize):
    """
    This function chunks a list into sub lists 
    that have a length equals to chunkSize.

    Example:
    lst = [3, 4, 9, 7, 1, 1, 2, 3]
    print(chunkList(lst, 3)) 
    returns
    [[3, 4, 9], [7, 1, 1], [2, 3]]
    """
    finalList = []
    for i in range(0, len(initialList), chunkSize):
        finalList.append(initialList[i:i+chunkSize])
    return finalList

回答 23

在不调用len()的情况下,该方法非常适合大型列表:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

这是针对可迭代对象的:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

以上功能的味道:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

要么:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

要么:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))

Without calling len() which is good for large lists:

def splitter(l, n):
    i = 0
    chunk = l[:n]
    while chunk:
        yield chunk
        i += n
        chunk = l[i:i+n]

And this is for iterables:

def isplitter(l, n):
    l = iter(l)
    chunk = list(islice(l, n))
    while chunk:
        yield chunk
        chunk = list(islice(l, n))

The functional flavour of the above:

def isplitter2(l, n):
    return takewhile(bool,
                     (tuple(islice(start, n))
                            for start in repeat(iter(l))))

OR:

def chunks_gen_sentinel(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return iter(imap(tuple, continuous_slices).next,())

OR:

def chunks_gen_filter(n, seq):
    continuous_slices = imap(islice, repeat(iter(seq)), repeat(0), repeat(n))
    return takewhile(bool,imap(tuple, continuous_slices))

回答 24

以下是其他方法的列表:

给定

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

标准图书馆

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

more_itertools+

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

参考文献

+一个实现itertools配方等的第三方库。> pip install more_itertools

Here is a list of additional approaches:

Given

import itertools as it
import collections as ct

import more_itertools as mit


iterable = range(11)
n = 3

Code

The Standard Library

list(it.zip_longest(*[iter(iterable)] * n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

d = {}
for i, x in enumerate(iterable):
    d.setdefault(i//n, []).append(x)

list(d.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

dd = ct.defaultdict(list)
for i, x in enumerate(iterable):
    dd[i//n].append(x)

list(dd.values())
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

more_itertools+

list(mit.chunked(iterable, n))
# [[0, 1, 2], [3, 4, 5], [6, 7, 8], [9, 10]]

list(mit.sliced(iterable, n))
# [range(0, 3), range(3, 6), range(6, 9), range(9, 11)]

list(mit.grouper(n, iterable))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

list(mit.windowed(iterable, len(iterable)//n, step=n))
# [(0, 1, 2), (3, 4, 5), (6, 7, 8), (9, 10, None)]

References

+ A third-party library that implements itertools recipes and more. > pip install more_itertools


回答 25

看到这个参考

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>> 

Python3

See this reference

>>> orange = range(1, 1001)
>>> otuples = list( zip(*[iter(orange)]*10))
>>> print(otuples)
[(1, 2, 3, 4, 5, 6, 7, 8, 9, 10), ... (991, 992, 993, 994, 995, 996, 997, 998, 999, 1000)]
>>> olist = [list(i) for i in otuples]
>>> print(olist)
[[1, 2, 3, 4, 5, 6, 7, 8, 9, 10], ..., [991, 992, 993, 994, 995, 996, 997, 998, 999, 1000]]
>>> 

Python3


回答 26

由于这里的每个人都在谈论迭代器。boltons为此有一个完美的方法,称为iterutils.chunked_iter

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

输出:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

但是,如果您不想对内存存留怜悯,可以使用old-way并通过将数据存储list在第一位iterutils.chunked

Since everybody here talking about iterators. boltons has perfect method for that, called iterutils.chunked_iter.

from boltons import iterutils

list(iterutils.chunked_iter(list(range(50)), 11))

Output:

[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10],
 [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
 [22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
 [33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43],
 [44, 45, 46, 47, 48, 49]]

But if you don’t want to be mercy on memory, you can use old-way and store the full list in the first place with iterutils.chunked.


回答 27

另一种解决方案

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>> 

One more solution

def make_chunks(data, chunk_size): 
    while data:
        chunk, data = data[:chunk_size], data[chunk_size:]
        yield chunk

>>> for chunk in make_chunks([1, 2, 3, 4, 5, 6, 7], 2):
...     print chunk
... 
[1, 2]
[3, 4]
[5, 6]
[7]
>>> 

回答 28

def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'
def chunks(iterable,n):
    """assumes n is an integer>0
    """
    iterable=iter(iterable)
    while True:
        result=[]
        for i in range(n):
            try:
                a=next(iterable)
            except StopIteration:
                break
            else:
                result.append(a)
        if result:
            yield result
        else:
            break

g1=(i*i for i in range(10))
g2=chunks(g1,3)
print g2
'<generator object chunks at 0x0337B9B8>'
print list(g2)
'[[0, 1, 4], [9, 16, 25], [36, 49, 64], [81]]'

回答 29

考虑使用matplotlib.cbook片段

例如:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s

Consider using matplotlib.cbook pieces

for example:

import matplotlib.cbook as cbook
segments = cbook.pieces(np.arange(20), 3)
for s in segments:
     print s

获取列表的最后一个元素

问题:获取列表的最后一个元素

在Python中,如何获取列表的最后一个元素?

In Python, how do you get the last element of a list?


回答 0

some_list[-1] 是最短和最Pythonic的。

实际上,您可以使用此语法做更多的事情。该some_list[-n]语法获取第n到最后一个元素。因此some_list[-1]获取最后一个元素,some_list[-2]获取倒数第二个,依此类推,一直向下到some_list[-len(some_list)],这将为您提供第一个元素。

您也可以通过这种方式设置列表元素。例如:

>>> some_list = [1, 2, 3]
>>> some_list[-1] = 5 # Set the last element
>>> some_list[-2] = 3 # Set the second to last element
>>> some_list
[1, 3, 5]

请注意,IndexError如果期望的项目不存在,则按索引获取列表项将引发。这意味着some_list[-1]如果some_list为空将引发异常,因为空列表不能有最后一个元素。

some_list[-1] is the shortest and most Pythonic.

In fact, you can do much more with this syntax. The some_list[-n] syntax gets the nth-to-last element. So some_list[-1] gets the last element, some_list[-2] gets the second to last, etc, all the way down to some_list[-len(some_list)], which gives you the first element.

You can also set list elements in this way. For instance:

>>> some_list = [1, 2, 3]
>>> some_list[-1] = 5 # Set the last element
>>> some_list[-2] = 3 # Set the second to last element
>>> some_list
[1, 3, 5]

Note that getting a list item by index will raise an IndexError if the expected item doesn’t exist. This means that some_list[-1] will raise an exception if some_list is empty, because an empty list can’t have a last element.


回答 1

如果您的str()list()对象最终可能是空的:astr = ''alist = [],那么您可能要使用alist[-1:]而不是alist[-1]对象“ sameness”。

其意义是:

alist = []
alist[-1]   # will generate an IndexError exception whereas 
alist[-1:]  # will return an empty list
astr = ''
astr[-1]    # will generate an IndexError exception whereas
astr[-1:]   # will return an empty str

区别在于返回空列表对象或空str对象更像是“异常元素”,而不是异常对象。

If your str() or list() objects might end up being empty as so: astr = '' or alist = [], then you might want to use alist[-1:] instead of alist[-1] for object “sameness”.

The significance of this is:

alist = []
alist[-1]   # will generate an IndexError exception whereas 
alist[-1:]  # will return an empty list
astr = ''
astr[-1]    # will generate an IndexError exception whereas
astr[-1:]   # will return an empty str

Where the distinction being made is that returning an empty list object or empty str object is more “last element”-like then an exception object.


回答 2

您也可以这样做:

alist.pop()

这取决于您要对列表执行的操作,因为该pop()方法将删除最后一个元素。

You can also do:

alist.pop()

It depends on what you want to do with your list because the pop() method will delete the last element.


回答 3

在python中显示最后一个元素的最简单方法是

>>> list[-1:] # returns indexed value
    [3]
>>> list[-1]  # returns value
    3

还有许多其他方法可以实现这一目标,但是它们简短易用。

The simplest way to display last element in python is

>>> list[-1:] # returns indexed value
    [3]
>>> list[-1]  # returns value
    3

there are many other method to achieve such a goal but these are short and sweet to use.


回答 4

在Python中,如何获取列表的最后一个元素?

为了获得最后一个元素,

  • 而不修改列表,以及
  • 假设您知道列表中最后一个元素(即非空)

传递-1给下标符号:

>>> a_list = ['zero', 'one', 'two', 'three']
>>> a_list[-1]
'three'

说明

索引和切片可以采用负整数作为参数。

我已经从文档中修改了一个示例以指示每个索引引用序列中的哪个项目,在这种情况下,在string中"Python"-1引用最后一个元素(字符)'n'

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5 
  -6  -5  -4  -3  -2  -1

>>> p = 'Python'
>>> p[-1]
'n'

通过迭代拆包分配

为了获取最后一个元素,但出于完整性的考虑,此方法可能不必要地实现第二个列表(并且由于它支持任何可迭代的对象-不只是列表):

>>> *head, last = a_list
>>> last
'three'

变量名head绑定到不必要的新创建的列表:

>>> head
['zero', 'one', 'two']

如果您不打算对该列表进行任何操作,则可能会更合适:

*_, last = a_list

或者,实际上,如果您知道它是一个列表(或至少接受下标符号):

last = a_list[-1]

在功能上

评论者说:

我希望Python像Lisp一样具有first()和last()函数…它将摆脱很多不必要的lambda函数。

这些定义起来非常简单:

def last(a_list):
    return a_list[-1]

def first(a_list):
    return a_list[0]

或使用operator.itemgetter

>>> import operator
>>> last = operator.itemgetter(-1)
>>> first = operator.itemgetter(0)

在任一情况下:

>>> last(a_list)
'three'
>>> first(a_list)
'zero'

特别案例

如果您正在做更复杂的事情,您可能会发现以略微不同的方式获取最后一个元素的性能更高。

如果您是编程的新手,则应避免使用本节,因为它会将原本在语义上不同的算法部分结合在一起。如果在某个地方更改算法,则可能会对另一行代码产生意外影响。

我尽力提供所有警告和条件,但我可能错过了一些东西。如果您认为我没有提出警告,请发表评论。

切片

列表的一部分将返回一个新列表-因此,如果要在新列表中使用该元素,我们可以从-1到末尾进行切片:

>>> a_slice = a_list[-1:]
>>> a_slice
['three']

如果列表为空,这样做的好处是不会失败:

>>> empty_list = []
>>> tail = empty_list[-1:]
>>> if tail:
...     do_something(tail)

尝试通过索引访问会引发一个IndexError需要处理的问题:

>>> empty_list[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

但是同样,仅在需要时才可以切片:

  • 创建一个新列表
  • 如果先前列表为空,则新列表为空。

for 循环

作为Python的功能,for循环中没有内部作用域。

如果您已经对列表执行了完整的迭代,则最后一个元素仍将由循环中分配的变量名称引用:

>>> def do_something(arg): pass
>>> for item in a_list:
...     do_something(item)
...     
>>> item
'three'

从语义上讲,这并不是列表中的最后一件事。从语义上讲,这是名称item绑定到的最后一件事。

>>> def do_something(arg): raise Exception
>>> for item in a_list:
...     do_something(item)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 1, in do_something
Exception
>>> item
'zero'

因此,只有当您

  • 已经在循环,并且
  • 您知道循环将结束(不会由于错误而中断或退出),否则它将指向循环引用的最后一个元素。

获取和删除它

我们还可以通过删除并返回最后一个元素来更改原始列表:

>>> a_list.pop(-1)
'three'
>>> a_list
['zero', 'one', 'two']

但是现在原始列表已修改。

-1实际上是默认参数,因此list.pop可以在没有索引参数的情况下使用):

>>> a_list.pop()
'two'

仅在以下情况下这样做

  • 您知道列表中有元素,或者准备为空时处理异常,并且
  • 您确实打算从列表中删除最后一个元素,将其视为堆栈。

这些是有效的用例,但不是很常见。

保存相反的其余部分以供以后使用:

我不知道为什么要这么做,但出于完整性考虑,由于reversed返回了迭代器(支持迭代器协议),您可以将其结果传递给next

>>> next(reversed([1,2,3]))
3

所以就像做相反的事情:

>>> next(iter([1,2,3]))
1

但是我想不出这样做的充分理由,除非稍后需要其余的反向迭代器,这可能看起来像这样:

reverse_iterator = reversed([1,2,3])
last_element = next(reverse_iterator)

use_later = list(reverse_iterator)

现在:

>>> use_later
[2, 1]
>>> last_element
3

In Python, how do you get the last element of a list?

To just get the last element,

  • without modifying the list, and
  • assuming you know the list has a last element (i.e. it is nonempty)

pass -1 to the subscript notation:

>>> a_list = ['zero', 'one', 'two', 'three']
>>> a_list[-1]
'three'

Explanation

Indexes and slices can take negative integers as arguments.

I have modified an example from the documentation to indicate which item in a sequence each index references, in this case, in the string "Python", -1 references the last element, the character, 'n':

 +---+---+---+---+---+---+
 | P | y | t | h | o | n |
 +---+---+---+---+---+---+
   0   1   2   3   4   5 
  -6  -5  -4  -3  -2  -1

>>> p = 'Python'
>>> p[-1]
'n'

Assignment via iterable unpacking

This method may unnecessarily materialize a second list for the purposes of just getting the last element, but for the sake of completeness (and since it supports any iterable – not just lists):

>>> *head, last = a_list
>>> last
'three'

The variable name, head is bound to the unnecessary newly created list:

>>> head
['zero', 'one', 'two']

If you intend to do nothing with that list, this would be more apropos:

*_, last = a_list

Or, really, if you know it’s a list (or at least accepts subscript notation):

last = a_list[-1]

In a function

A commenter said:

I wish Python had a function for first() and last() like Lisp does… it would get rid of a lot of unnecessary lambda functions.

These would be quite simple to define:

def last(a_list):
    return a_list[-1]

def first(a_list):
    return a_list[0]

Or use operator.itemgetter:

>>> import operator
>>> last = operator.itemgetter(-1)
>>> first = operator.itemgetter(0)

In either case:

>>> last(a_list)
'three'
>>> first(a_list)
'zero'

Special cases

If you’re doing something more complicated, you may find it more performant to get the last element in slightly different ways.

If you’re new to programming, you should avoid this section, because it couples otherwise semantically different parts of algorithms together. If you change your algorithm in one place, it may have an unintended impact on another line of code.

I try to provide caveats and conditions as completely as I can, but I may have missed something. Please comment if you think I’m leaving a caveat out.

Slicing

A slice of a list returns a new list – so we can slice from -1 to the end if we are going to want the element in a new list:

>>> a_slice = a_list[-1:]
>>> a_slice
['three']

This has the upside of not failing if the list is empty:

>>> empty_list = []
>>> tail = empty_list[-1:]
>>> if tail:
...     do_something(tail)

Whereas attempting to access by index raises an IndexError which would need to be handled:

>>> empty_list[-1]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
IndexError: list index out of range

But again, slicing for this purpose should only be done if you need:

  • a new list created
  • and the new list to be empty if the prior list was empty.

for loops

As a feature of Python, there is no inner scoping in a for loop.

If you’re performing a complete iteration over the list already, the last element will still be referenced by the variable name assigned in the loop:

>>> def do_something(arg): pass
>>> for item in a_list:
...     do_something(item)
...     
>>> item
'three'

This is not semantically the last thing in the list. This is semantically the last thing that the name, item, was bound to.

>>> def do_something(arg): raise Exception
>>> for item in a_list:
...     do_something(item)
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
  File "<stdin>", line 1, in do_something
Exception
>>> item
'zero'

Thus this should only be used to get the last element if you

  • are already looping, and
  • you know the loop will finish (not break or exit due to errors), otherwise it will point to the last element referenced by the loop.

Getting and removing it

We can also mutate our original list by removing and returning the last element:

>>> a_list.pop(-1)
'three'
>>> a_list
['zero', 'one', 'two']

But now the original list is modified.

(-1 is actually the default argument, so list.pop can be used without an index argument):

>>> a_list.pop()
'two'

Only do this if

  • you know the list has elements in it, or are prepared to handle the exception if it is empty, and
  • you do intend to remove the last element from the list, treating it like a stack.

These are valid use-cases, but not very common.

Saving the rest of the reverse for later:

I don’t know why you’d do it, but for completeness, since reversed returns an iterator (which supports the iterator protocol) you can pass its result to next:

>>> next(reversed([1,2,3]))
3

So it’s like doing the reverse of this:

>>> next(iter([1,2,3]))
1

But I can’t think of a good reason to do this, unless you’ll need the rest of the reverse iterator later, which would probably look more like this:

reverse_iterator = reversed([1,2,3])
last_element = next(reverse_iterator)

use_later = list(reverse_iterator)

and now:

>>> use_later
[2, 1]
>>> last_element
3

回答 5

为了防止IndexError: list index out of range,请使用以下语法:

mylist = [1, 2, 3, 4]

# With None as default value:
value = mylist and mylist[-1]

# With specified default value (option 1):
value = mylist and mylist[-1] or 'default'

# With specified default value (option 2):
value = mylist[-1] if mylist else 'default'

To prevent IndexError: list index out of range, use this syntax:

mylist = [1, 2, 3, 4]

# With None as default value:
value = mylist and mylist[-1]

# With specified default value (option 1):
value = mylist and mylist[-1] or 'default'

# With specified default value (option 2):
value = mylist[-1] if mylist else 'default'

回答 6

另一种方法:

some_list.reverse() 
some_list[0]

Another method:

some_list.reverse() 
some_list[0]

回答 7

lst[-1]是最好的方法,但是对于一般的可迭代对象,请考虑more_itertools.last

import more_itertools as mit


mit.last([0, 1, 2, 3])
# 3

mit.last(iter([1, 2, 3]))
# 3

mit.last([], "some default")
# 'some default'

lst[-1] is the best approach, but with general iterables, consider more_itertools.last:

Code

import more_itertools as mit


mit.last([0, 1, 2, 3])
# 3

mit.last(iter([1, 2, 3]))
# 3

mit.last([], "some default")
# 'some default'

回答 8

list[-1]将检索列表的最后一个元素而不更改列表。 list.pop()将检索列表的最后一个元素,但它将更改/更改原始列表。通常,不建议更改原始列表。

另外,如果出于某种原因,您正在寻找一些不符合pythonic的工具,则可以使用list[len(list)-1],假设列表不为空。

list[-1] will retrieve the last element of the list without changing the list. list.pop() will retrieve the last element of the list, but it will mutate/change the original list. Usually, mutating the original list is not recommended.

Alternatively, if, for some reason, you’re looking for something less pythonic, you could use list[len(list)-1], assuming the list is not empty.


回答 9

如果您不想在列表为空时获取IndexError,则也可以使用下面的代码。

next(reversed(some_list), None)

You can also use the code below, if you do not want to get IndexError when the list is empty.

next(reversed(some_list), None)

回答 10

好的,但是几乎每种语言都常见items[len(items) - 1]吗?这是IMO获得最后一个元素的最简单方法,因为它不需要任何Python知识。

Ok, but what about common in almost every language way items[len(items) - 1]? This is IMO the easiest way to get last element, because it does not require anything pythonic knowledge.