标签归档:list

列表是否包含简短的包含功能?

问题:列表是否包含简短的包含功能?

我看到人们正在使用any另一个列表来查看列表中是否存在某项,但是有一种快速的方法吗?

if list.contains(myItem):
    # do something

I see people are using any to gather another list to see if an item exists in a list, but is there a quick way to just do?:

if list.contains(myItem):
    # do something

回答 0

您可以使用以下语法:

if myItem in list:
    # do something

同样,逆运算符:

if myItem not in list:
    # do something

它适用于列表,元组,集合和字典(检查键)。

请注意,这是列表和元组中的O(n)操作,而集合和字典中是O(1)操作。

You can use this syntax:

if myItem in list:
    # do something

Also, inverse operator:

if myItem not in list:
    # do something

It’s work fine for lists, tuples, sets and dicts (check keys).

Note that this is an O(n) operation in lists and tuples, but an O(1) operation in sets and dicts.


回答 1

除了别人说过的话,您可能还想知道什么in是调用list.__contains__方法,您可以在编写的任何类上定义该方法,并且可以非常方便地全面使用python。  

愚蠢的用途可能是:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

In addition to what other have said, you may also be interested to know that what in does is to call the list.__contains__ method, that you can define on any class you write and can get extremely handy to use python at his full extent.  

A dumb use may be:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

回答 2

我最近想出了这条衬垫,用于获取True列表中是否包含任何数量的项目,或者该列表中不包含任何项目或False根本不包含任何项目。使用next(...)会给它提供默认的返回值(False),这意味着它的运行速度应比运行整个列表理解的速度快得多。

list_does_contain = next((True for item in list_to_test if item == test_item), False)

I came up with this one liner recently for getting True if a list contains any number of occurrences of an item, or False if it contains no occurrences or nothing at all. Using next(...) gives this a default return value (False) and means it should run significantly faster than running the whole list comprehension.

list_does_contain = next((True for item in list_to_test if item == test_item), False)


回答 3

如果该项目不存在,则list方法index将返回,-1如果该项目存在,则将返回该项目在列表中的索引。或者,if您可以在语句中执行以下操作:

if myItem in list:
    #do things

您还可以使用以下if语句检查元素是否不在列表中:

if myItem not in list:
    #do things

The list method index will return -1 if the item is not present, and will return the index of the item in the list if it is present. Alternatively in an if statement you can do the following:

if myItem in list:
    #do things

You can also check if an element is not in a list with the following if statement:

if myItem not in list:
    #do things

转置/解压缩功能(zip的反函数)?

问题:转置/解压缩功能(zip的反函数)?

我有一个2项元组的列表,我想将它们转换为2个列表,其中第一个包含每个元组中的第一项,第二个包含第二项。

例如:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

有内置的功能吗?

I have a list of 2-item tuples and I’d like to convert them to 2 lists where the first contains the first item in each tuple and the second list holds the second item.

For example:

original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
# and I want to become...
result = (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Is there a builtin function that does that?


回答 0

zip是它自己的逆!前提是您使用特殊的*运算符。

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

它的工作方式是通过调用zip参数:

zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))

…除了参数zip直接传递(在转换为元组之后)之外,因此不必担心参数数量太大。

zip is its own inverse! Provided you use the special * operator.

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

The way this works is by calling zip with the arguments:

zip(('a', 1), ('b', 2), ('c', 3), ('d', 4))

… except the arguments are passed to zip directly (after being converted to a tuple), so there’s no need to worry about the number of arguments getting too big.


回答 1

你也可以

result = ([ a for a,b in original ], [ b for a,b in original ])

应该更好地扩展。特别是如果Python除非需要,否则最好不要扩展列表推导。

(顺便说一句,它会组成一个2元组(一对)的列表,而不是一个元组列表,例如 zip。)

如果可以使用生成器而不是实际列表,则可以这样做:

result = (( a for a,b in original ), ( b for a,b in original ))

生成器在您请求每个元素之前不会仔细检查列表,但是另一方面,它们会保留对原始列表的引用。

You could also do

result = ([ a for a,b in original ], [ b for a,b in original ])

It should scale better. Especially if Python makes good on not expanding the list comprehensions unless needed.

(Incidentally, it makes a 2-tuple (pair) of lists, rather than a list of tuples, like zip does.)

If generators instead of actual lists are ok, this would do that:

result = (( a for a,b in original ), ( b for a,b in original ))

The generators don’t munch through the list until you ask for each element, but on the other hand, they do keep references to the original list.


回答 2

如果列表的长度不同,则可能不希望按照Patricks的答案使用zip。这有效:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

但是使用不同的长度列表,zip会将每个项目截断为最短列表的长度:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]

您可以使用不带功能的map来用None填充空白结果:

>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]

zip()稍快一些。

If you have lists that are not the same length, you may not want to use zip as per Patricks answer. This works:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4)])
[('a', 'b', 'c', 'd'), (1, 2, 3, 4)]

But with different length lists, zip truncates each item to the length of the shortest list:

>>> zip(*[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e')]

You can use map with no function to fill empty results with None:

>>> map(None, *[('a', 1), ('b', 2), ('c', 3), ('d', 4), ('e', )])
[('a', 'b', 'c', 'd', 'e'), (1, 2, 3, 4, None)]

zip() is marginally faster though.


回答 3

我喜欢在程序中使用zip(*iterable)(这是您要查找的代码):

def unzip(iterable):
    return zip(*iterable)

我发现unzip更具可读性。

I like to use zip(*iterable) (which is the piece of code you’re looking for) in my programs as so:

def unzip(iterable):
    return zip(*iterable)

I find unzip more readable.


回答 4

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

给出问题中的列表元组。

list1, list2 = [list(tup) for tup in zip(*original)]

解压缩两个列表。

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple([list(tup) for tup in zip(*original)])
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Gives a tuple of lists as in the question.

list1, list2 = [list(tup) for tup in zip(*original)]

Unpacks the two lists.


回答 5

天真的方法

def transpose_finite_iterable(iterable):
    return zip(*iterable)  # `itertools.izip` for Python 2 users

对于(潜在无限)可迭代的有限可迭代(例如list/ tuple/的序列str),效果很好

| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |

哪里

  • n in ℕ
  • a_ij对应于-th可迭代的j-th元素i

申请后transpose_finite_iterable我们得到

| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |

这种情况的Python示例,其中a_ij == jn == 2

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)

但是我们不能transpose_finite_iterable再次使用它来返回原始的结构,iterable因为它result是有限迭代的无限迭代(tuple在我们的例子中是s):

>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
  File "...", line 1, in ...
  File "...", line 2, in transpose_finite_iterable
MemoryError

那么我们该如何处理呢?

…这是 deque

看完itertools.teefunction文档后,有一些Python配方可以通过一些修改来帮助解决我们的问题

def transpose_finite_iterables(iterable):
    iterator = iter(iterable)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))

让我们检查

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1

合成

现在我们可以定义通用函数来处理可迭代的可迭代对象,其中一些是有限的,而另一个则可以使用functools.singledispatch装饰器(例如)

from collections import (abc,
                         deque)
from functools import singledispatch


@singledispatch
def transpose(object_):
    """
    Transposes given object.
    """
    raise TypeError('Unsupported object type: {type}.'
                    .format(type=type))


@transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
    """
    Transposes given iterable of finite iterables.
    """
    iterator = iter(object_)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))


def transpose_finite_iterable(object_):
    """
    Transposes given finite iterable of iterables.
    """
    yield from zip(*object_)

try:
    transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
    # Python3.5-
    transpose.register(abc.Mapping, transpose_finite_iterable)
    transpose.register(abc.Sequence, transpose_finite_iterable)
    transpose.register(abc.Set, transpose_finite_iterable)

在有限非空可迭代项上的二元运算符类中,可以将其视为自身的逆(数学家称这种函数为“对合”)。


作为singledispatching 的奖励,我们可以处理numpy类似

import numpy as np
...
transpose.register(np.ndarray, np.transpose)

然后像

>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
       [2, 3]])
>>> transpose(array)
array([[0, 2],
       [1, 3]])

注意

由于transpose返回迭代器,并且如果有人希望在OP中具有的tuplelist则可以通过map内置函数(例如

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

广告

我已经添加推广解决方案lz0.5.0版本,可以像使用

>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]

聚苯乙烯

没有用于处理潜在无限迭代的潜在无限迭代的解决方案(至少很明显),但是这种情况并不常见。

Naive approach

def transpose_finite_iterable(iterable):
    return zip(*iterable)  # `itertools.izip` for Python 2 users

works fine for finite iterable (e.g. sequences like list/tuple/str) of (potentially infinite) iterables which can be illustrated like

| |a_00| |a_10| ... |a_n0| |
| |a_01| |a_11| ... |a_n1| |
| |... | |... | ... |... | |
| |a_0i| |a_1i| ... |a_ni| |
| |... | |... | ... |... | |

where

  • n in ℕ,
  • a_ij corresponds to j-th element of i-th iterable,

and after applying transpose_finite_iterable we get

| |a_00| |a_01| ... |a_0i| ... |
| |a_10| |a_11| ... |a_1i| ... |
| |... | |... | ... |... | ... |
| |a_n0| |a_n1| ... |a_ni| ... |

Python example of such case where a_ij == j, n == 2

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterable(iterable)
>>> next(result)
(0, 0)
>>> next(result)
(1, 1)

But we can’t use transpose_finite_iterable again to return to structure of original iterable because result is an infinite iterable of finite iterables (tuples in our case):

>>> transpose_finite_iterable(result)
... hangs ...
Traceback (most recent call last):
  File "...", line 1, in ...
  File "...", line 2, in transpose_finite_iterable
MemoryError

So how can we deal with this case?

… and here comes the deque

After we take a look at docs of itertools.tee function, there is Python recipe that with some modification can help in our case

def transpose_finite_iterables(iterable):
    iterator = iter(iterable)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))

let’s check

>>> from itertools import count
>>> iterable = [count(), count()]
>>> result = transpose_finite_iterables(transpose_finite_iterable(iterable))
>>> result
(<generator object transpose_finite_iterables.<locals>.coordinate at ...>, <generator object transpose_finite_iterables.<locals>.coordinate at ...>)
>>> next(result[0])
0
>>> next(result[0])
1

Synthesis

Now we can define general function for working with iterables of iterables ones of which are finite and another ones are potentially infinite using functools.singledispatch decorator like

from collections import (abc,
                         deque)
from functools import singledispatch


@singledispatch
def transpose(object_):
    """
    Transposes given object.
    """
    raise TypeError('Unsupported object type: {type}.'
                    .format(type=type))


@transpose.register(abc.Iterable)
def transpose_finite_iterables(object_):
    """
    Transposes given iterable of finite iterables.
    """
    iterator = iter(object_)
    try:
        first_elements = next(iterator)
    except StopIteration:
        return ()
    queues = [deque([element])
              for element in first_elements]

    def coordinate(queue):
        while True:
            if not queue:
                try:
                    elements = next(iterator)
                except StopIteration:
                    return
                for sub_queue, element in zip(queues, elements):
                    sub_queue.append(element)
            yield queue.popleft()

    return tuple(map(coordinate, queues))


def transpose_finite_iterable(object_):
    """
    Transposes given finite iterable of iterables.
    """
    yield from zip(*object_)

try:
    transpose.register(abc.Collection, transpose_finite_iterable)
except AttributeError:
    # Python3.5-
    transpose.register(abc.Mapping, transpose_finite_iterable)
    transpose.register(abc.Sequence, transpose_finite_iterable)
    transpose.register(abc.Set, transpose_finite_iterable)

which can be considered as its own inverse (mathematicians call this kind of functions “involutions”) in class of binary operators over finite non-empty iterables.


As a bonus of singledispatching we can handle numpy arrays like

import numpy as np
...
transpose.register(np.ndarray, np.transpose)

and then use it like

>>> array = np.arange(4).reshape((2,2))
>>> array
array([[0, 1],
       [2, 3]])
>>> transpose(array)
array([[0, 2],
       [1, 3]])

Note

Since transpose returns iterators and if someone wants to have a tuple of lists like in OP — this can be made additionally with map built-in function like

>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> tuple(map(list, transpose(original)))
(['a', 'b', 'c', 'd'], [1, 2, 3, 4])

Advertisement

I’ve added generalized solution to lz package from 0.5.0 version which can be used like

>>> from lz.transposition import transpose
>>> list(map(tuple, transpose(zip(range(10), range(10, 20)))))
[(0, 1, 2, 3, 4, 5, 6, 7, 8, 9), (10, 11, 12, 13, 14, 15, 16, 17, 18, 19)]

P.S.

There is no solution (at least obvious) for handling potentially infinite iterable of potentially infinite iterables, but this case is less common though.


回答 6

这只是另一种实现方式,但是它对我有很大帮助,所以我在这里写下来:

具有以下数据结构:

X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)

导致:

In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

我认为,将其解压缩并返回原始格式的更Python方式是:

x,y=zip(*XY)

但这返回一个元组,因此如果您需要一个列表,则可以使用:

x,y=(list(x),list(y))

It’s only another way to do it but it helped me a lot so I write it here:

Having this data structure:

X=[1,2,3,4]
Y=['a','b','c','d']
XY=zip(X,Y)

Resulting in:

In: XY
Out: [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'd')]

The more pythonic way to unzip it and go back to the original is this one in my opinion:

x,y=zip(*XY)

But this return a tuple so if you need a list you can use:

x,y=(list(x),list(y))

回答 7

考虑使用more_itertools.unzip

>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]     

Consider using more_itertools.unzip:

>>> from more_itertools import unzip
>>> original = [('a', 1), ('b', 2), ('c', 3), ('d', 4)]
>>> [list(x) for x in unzip(original)]
[['a', 'b', 'c', 'd'], [1, 2, 3, 4]]     

回答 8

因为它返回元组(并且可以使用大量内存),所以zip(*zipped)对我来说,这个技巧似乎比有用的还要聪明。

这是一个实际上将为您提供zip反函数的函数。

def unzip(zipped):
    """Inverse of built-in zip function.
    Args:
        zipped: a list of tuples

    Returns:
        a tuple of lists

    Example:
        a = [1, 2, 3]
        b = [4, 5, 6]
        zipped = list(zip(a, b))

        assert zipped == [(1, 4), (2, 5), (3, 6)]

        unzipped = unzip(zipped)

        assert unzipped == ([1, 2, 3], [4, 5, 6])

    """

    unzipped = ()
    if len(zipped) == 0:
        return unzipped

    dim = len(zipped[0])

    for i in range(dim):
        unzipped = unzipped + ([tup[i] for tup in zipped], )

    return unzipped

Since it returns tuples (and can use tons of memory), the zip(*zipped) trick seems more clever than useful, to me.

Here’s a function that will actually give you the inverse of zip.

def unzip(zipped):
    """Inverse of built-in zip function.
    Args:
        zipped: a list of tuples

    Returns:
        a tuple of lists

    Example:
        a = [1, 2, 3]
        b = [4, 5, 6]
        zipped = list(zip(a, b))

        assert zipped == [(1, 4), (2, 5), (3, 6)]

        unzipped = unzip(zipped)

        assert unzipped == ([1, 2, 3], [4, 5, 6])

    """

    unzipped = ()
    if len(zipped) == 0:
        return unzipped

    dim = len(zipped[0])

    for i in range(dim):
        unzipped = unzipped + ([tup[i] for tup in zipped], )

    return unzipped

回答 9

先前的答案都没有有效地提供所需的输出,即列表的元组,而不是元组的列表。对于前者,你可以使用与。区别在于:tuplemap

res1 = list(zip(*original))              # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original)))  # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

此外,大多数以前的解决方案都假定使用Python 2.7,在Python 2.7中zip返回列表而不是迭代器。

对于Python 3.x,您需要将结果传递给诸如listtuple耗尽迭代器的函数。对于内存高效的迭代器,您可以省略外部list和外部tuple调用各自的解决方案。

None of the previous answers efficiently provide the required output, which is a tuple of lists, rather than a list of tuples. For the former, you can use tuple with map. Here’s the difference:

res1 = list(zip(*original))              # [('a', 'b', 'c', 'd'), (1, 2, 3, 4)]
res2 = tuple(map(list, zip(*original)))  # (['a', 'b', 'c', 'd'], [1, 2, 3, 4])

In addition, most of the previous solutions assume Python 2.7, where zip returns a list rather than an iterator.

For Python 3.x, you will need to pass the result to a function such as list or tuple to exhaust the iterator. For memory-efficient iterators, you can omit the outer list and tuple calls for the respective solutions.


回答 10

虽然zip(*seq)非常有用,但可能不适用于很长的序列,因为它将创建要传递的值的元组。例如,我一直在使用具有超过一百万个条目的坐标系,并且发现创建它的速度明显更快序列直接。

通用方法如下所示:

from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
    for s, item in zip(output, element):
        s.append(item)

但是,根据您要对结果执行的操作,收集的选择可能会产生很大的不同。在我的实际用例中,使用集而不使用内部循环比所有其他方法明显更快。

而且,正如其他人指出的那样,如果您要对数据集执行此操作,则可以改用Numpy或Pandas集合。

While zip(*seq) is very useful, it may be unsuitable for very long sequences as it will create a tuple of values to be passed in. For example, I’ve been working with a coordinate system with over a million entries and find it signifcantly faster to create the sequences directly.

A generic approach would be something like this:

from collections import deque
seq = ((a1, b1, …), (a2, b2, …), …)
width = len(seq[0])
output = [deque(len(seq))] * width # preallocate memory
for element in seq:
    for s, item in zip(output, element):
        s.append(item)

But, depending on what you want to do with the result, the choice of collection can make a big difference. In my actual use case, using sets and no internal loop, is noticeably faster than all other approaches.

And, as others have noted, if you are doing this with datasets, it might make sense to use Numpy or Pandas collections instead.


回答 11

虽然numpy数组和熊猫可能是更可取的,但此函数模仿zip(*args)as时的行为unzip(args)

允许在args迭代值时传递生成器。装饰cls和/或main_cls微管理容器初始化。

def unzip(items, cls=list, main_cls=tuple):
    """Zip function in reverse.

    :param items: Zipped-like iterable.
    :type  items: iterable

    :param cls: Callable that returns iterable with callable append attribute.
        Defaults to `list`.
    :type  cls: callable, optional

    :param main_cls: Callable that returns iterable with callable append
        attribute. Defaults to `tuple`.
    :type  main_cls: callable, optional

    :returns: Unzipped items in instances returned from `cls`, in an instance
        returned from `main_cls`.

    :Example:

        assert unzip(zip(["a","b","c"],[1,2,3])) == (["a","b",c"],[1,2,3])
        assert unzip([("a",1),("b",2),("c",3)]) == (["a","b","c"],[1,2,3])
        assert unzip([("a",1)], deque, list) == [deque(["a"]),deque([1])]
        assert unzip((["a"],["b"]), lambda i: deque(i,1)) == (deque(["b"]),)
    """
    items = iter(items)

    try:
        i = next(items)
    except StopIteration:
        return main_cls()

    unzipped = main_cls(cls([v]) for v in i)

    for i in items:
        for c,v in zip(unzipped,i):
            c.append(v)

    return unzipped

While numpy arrays and pandas may be preferrable, this function imitates the behavior of zip(*args) when called as unzip(args).

Allows for generators to be passed as args as it iterates through values. Decorate cls and/or main_cls to micro manage container initialization.

def unzip(items, cls=list, main_cls=tuple):
    """Zip function in reverse.

    :param items: Zipped-like iterable.
    :type  items: iterable

    :param cls: Callable that returns iterable with callable append attribute.
        Defaults to `list`.
    :type  cls: callable, optional

    :param main_cls: Callable that returns iterable with callable append
        attribute. Defaults to `tuple`.
    :type  main_cls: callable, optional

    :returns: Unzipped items in instances returned from `cls`, in an instance
        returned from `main_cls`.

    :Example:

        assert unzip(zip(["a","b","c"],[1,2,3])) == (["a","b",c"],[1,2,3])
        assert unzip([("a",1),("b",2),("c",3)]) == (["a","b","c"],[1,2,3])
        assert unzip([("a",1)], deque, list) == [deque(["a"]),deque([1])]
        assert unzip((["a"],["b"]), lambda i: deque(i,1)) == (deque(["b"]),)
    """
    items = iter(items)

    try:
        i = next(items)
    except StopIteration:
        return main_cls()

    unzipped = main_cls(cls([v]) for v in i)

    for i in items:
        for c,v in zip(unzipped,i):
            c.append(v)

    return unzipped

迭代访问列表的最“ pythonic”方法是什么?

问题:迭代访问列表的最“ pythonic”方法是什么?

我有一个Python脚本,它将一个整数列表作为输入,我需要一次处理四个整数。不幸的是,我无法控制输入,或者将其作为四元素元组的列表传递。目前,我正在以这种方式对其进行迭代:

for i in xrange(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

不过,它看起来很像“ C思维”,这使我怀疑还有一种处理这种情况的更Python的方法。该列表在迭代后被丢弃,因此不需要保留。也许这样的事情会更好?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

不过,还是不太“正确”。:-/

相关问题:如何在Python中将列表分成均匀大小的块?

I have a Python script which takes as input a list of integers, which I need to work with four integers at a time. Unfortunately, I don’t have control of the input, or I’d have it passed in as a list of four-element tuples. Currently, I’m iterating over it this way:

for i in xrange(0, len(ints), 4):
    # dummy op for example code
    foo += ints[i] * ints[i + 1] + ints[i + 2] * ints[i + 3]

It looks a lot like “C-think”, though, which makes me suspect there’s a more pythonic way of dealing with this situation. The list is discarded after iterating, so it needn’t be preserved. Perhaps something like this would be better?

while ints:
    foo += ints[0] * ints[1] + ints[2] * ints[3]
    ints[0:4] = []

Still doesn’t quite “feel” right, though. :-/

Related question: How do you split a list into evenly sized chunks in Python?


回答 0

从Python的itertools文档的食谱部分进行了修改:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

示例
用伪代码保持示例简洁。

grouper('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'

注意:在Python 2上,请使用izip_longest代替zip_longest

Modified from the recipes section of Python’s itertools docs:

from itertools import zip_longest

def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

Example
In pseudocode to keep the example terse.

grouper('ABCDEFG', 3, 'x') --> 'ABC' 'DEF' 'Gxx'

Note: on Python 2 use izip_longest instead of zip_longest.


回答 1

def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))
# (in python 2 use xrange() instead of range() to avoid allocating a list)

简单。简单。快速。适用于任何序列:

text = "I am a very, very helpful text"

for group in chunker(text, 7):
   print repr(group),
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'

print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text

animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']

for group in chunker(animals, 3):
    print group
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']
def chunker(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))
# (in python 2 use xrange() instead of range() to avoid allocating a list)

Simple. Easy. Fast. Works with any sequence:

text = "I am a very, very helpful text"

for group in chunker(text, 7):
   print repr(group),
# 'I am a ' 'very, v' 'ery hel' 'pful te' 'xt'

print '|'.join(chunker(text, 10))
# I am a ver|y, very he|lpful text

animals = ['cat', 'dog', 'rabbit', 'duck', 'bird', 'cow', 'gnu', 'fish']

for group in chunker(animals, 3):
    print group
# ['cat', 'dog', 'rabbit']
# ['duck', 'bird', 'cow']
# ['gnu', 'fish']

回答 2

我是的粉丝

chunk_size= 4
for i in range(0, len(ints), chunk_size):
    chunk = ints[i:i+chunk_size]
    # process chunk of size <= chunk_size

I’m a fan of

chunk_size= 4
for i in range(0, len(ints), chunk_size):
    chunk = ints[i:i+chunk_size]
    # process chunk of size <= chunk_size

回答 3

import itertools
def chunks(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
    foo += x1 + x2 + x3 + x4

for chunk in chunks(ints,4):
    foo += sum(chunk)

另一种方式:

import itertools
def chunks2(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
    foo += x1 + x2 + x3 + x4
import itertools
def chunks(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# though this will throw ValueError if the length of ints
# isn't a multiple of four:
for x1,x2,x3,x4 in chunks(ints,4):
    foo += x1 + x2 + x3 + x4

for chunk in chunks(ints,4):
    foo += sum(chunk)

Another way:

import itertools
def chunks2(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

# x2, x3 and x4 could get the value 0 if the length is not
# a multiple of 4.
for x1,x2,x3,x4 in chunks2(ints,4,0):
    foo += x1 + x2 + x3 + x4

回答 4

from itertools import izip_longest

def chunker(iterable, chunksize, filler):
    return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)
from itertools import izip_longest

def chunker(iterable, chunksize, filler):
    return izip_longest(*[iter(iterable)]*chunksize, fillvalue=filler)

回答 5

此问题的理想解决方案适用于迭代器(而不仅仅是序列)。它也应该很快。

这是itertools文档提供的解决方案:

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

%timeit在我的Macbook Air 上使用ipython ,每个循环可获得47.5美元。

但是,这对我来说真的不起作用,因为结果被填充为甚至大小的组。没有填充的解决方案稍微复杂一些。最幼稚的解决方案可能是:

def grouper(size, iterable):
    i = iter(iterable)
    while True:
        out = []
        try:
            for _ in range(size):
                out.append(i.next())
        except StopIteration:
            yield out
            break

        yield out

简单但很慢:每个循环693 us

我可以想出的最佳解决方案islice用于内部循环:

def grouper(size, iterable):
    it = iter(iterable)
    while True:
        group = tuple(itertools.islice(it, None, size))
        if not group:
            break
        yield group

使用相同的数据集,每个循环可获得305 us。

无法以比这更快的速度获得纯解决方案,我为以下解决方案提供了一个重要的警告:如果输入数据中包含实例,filldata则可能会得到错误的答案。

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    for i in itertools.izip_longest(fillvalue=fillvalue, *args):
        if tuple(i)[-1] == fillvalue:
            yield tuple(v for v in i if v != fillvalue)
        else:
            yield i

我真的不喜欢这个答案,但是速度更快。每个循环124 us

The ideal solution for this problem works with iterators (not just sequences). It should also be fast.

This is the solution provided by the documentation for itertools:

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    return itertools.izip_longest(fillvalue=fillvalue, *args)

Using ipython’s %timeit on my mac book air, I get 47.5 us per loop.

However, this really doesn’t work for me since the results are padded to be even sized groups. A solution without the padding is slightly more complicated. The most naive solution might be:

def grouper(size, iterable):
    i = iter(iterable)
    while True:
        out = []
        try:
            for _ in range(size):
                out.append(i.next())
        except StopIteration:
            yield out
            break

        yield out

Simple, but pretty slow: 693 us per loop

The best solution I could come up with uses islice for the inner loop:

def grouper(size, iterable):
    it = iter(iterable)
    while True:
        group = tuple(itertools.islice(it, None, size))
        if not group:
            break
        yield group

With the same dataset, I get 305 us per loop.

Unable to get a pure solution any faster than that, I provide the following solution with an important caveat: If your input data has instances of filldata in it, you could get wrong answer.

def grouper(n, iterable, fillvalue=None):
    #"grouper(3, 'ABCDEFG', 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    for i in itertools.izip_longest(fillvalue=fillvalue, *args):
        if tuple(i)[-1] == fillvalue:
            yield tuple(v for v in i if v != fillvalue)
        else:
            yield i

I really don’t like this answer, but it is significantly faster. 124 us per loop


回答 6

我需要一个可以与集合和生成器一起使用的解决方案。我无法提出任何简短而又漂亮的内容,但至少可以理解。

def chunker(seq, size):
    res = []
    for el in seq:
        res.append(el)
        if len(res) == size:
            yield res
            res = []
    if res:
        yield res

清单:

>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

组:

>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

生成器:

>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

I needed a solution that would also work with sets and generators. I couldn’t come up with anything very short and pretty, but it’s quite readable at least.

def chunker(seq, size):
    res = []
    for el in seq:
        res.append(el)
        if len(res) == size:
            yield res
            res = []
    if res:
        yield res

List:

>>> list(chunker([i for i in range(10)], 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Set:

>>> list(chunker(set([i for i in range(10)]), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

Generator:

>>> list(chunker((i for i in range(10)), 3))
[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]

回答 7

与其他建议类似,但不完全相同,我喜欢这样做,因为它简单易读:

it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9])
for chunk in zip(it, it, it, it):
    print chunk

>>> (1, 2, 3, 4)
>>> (5, 6, 7, 8)

这样,您将不会得到最后的部分块。如果要获取(9, None, None, None)最后一块,只需使用izip_longestfrom即可itertools

Similar to other proposals, but not exactly identical, I like doing it this way, because it’s simple and easy to read:

it = iter([1, 2, 3, 4, 5, 6, 7, 8, 9])
for chunk in zip(it, it, it, it):
    print chunk

>>> (1, 2, 3, 4)
>>> (5, 6, 7, 8)

This way you won’t get the last partial chunk. If you want to get (9, None, None, None) as last chunk, just use izip_longest from itertools.


回答 8

如果您不介意使用外部软件包,则可以iteration_utilities.grouper1开始使用。它支持所有可迭代项(不仅限于序列):iteration_utilties

from iteration_utilities import grouper
seq = list(range(20))
for group in grouper(seq, 4):
    print(group)

打印:

(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)

如果长度不是分组大小的倍数,则它还支持填充(不完整的最后一组)或截断(丢弃不完整的最后一组)最后一个:

from iteration_utilities import grouper
seq = list(range(17))
for group in grouper(seq, 4):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16,)

for group in grouper(seq, 4, fillvalue=None):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16, None, None, None)

for group in grouper(seq, 4, truncate=True):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)

基准测试

我还决定比较上述几种方法的运行时间。这是一个对数-对数图,根据大小不同的列表分为“ 10”个元素组。对于定性结果:较低意味着更快:

在此处输入图片说明

至少在此基准测试中,iteration_utilities.grouper效果最佳。其次是疯狂的方法。

基准是使用1创建的。用于运行该基准测试的代码为:simple_benchmark

import iteration_utilities
import itertools
from itertools import zip_longest

def consume_all(it):
    return iteration_utilities.consume(it, None)

import simple_benchmark
b = simple_benchmark.BenchmarkBuilder()

@b.add_function()
def grouper(l, n):
    return consume_all(iteration_utilities.grouper(l, n))

def Craz_inner(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

@b.add_function()
def Craz(iterable, n, fillvalue=None):
    return consume_all(Craz_inner(iterable, n, fillvalue))

def nosklo_inner(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

@b.add_function()
def nosklo(seq, size):
    return consume_all(nosklo_inner(seq, size))

def SLott_inner(ints, chunk_size):
    for i in range(0, len(ints), chunk_size):
        yield ints[i:i+chunk_size]

@b.add_function()
def SLott(ints, chunk_size):
    return consume_all(SLott_inner(ints, chunk_size))

def MarkusJarderot1_inner(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

@b.add_function()
def MarkusJarderot1(iterable,size):
    return consume_all(MarkusJarderot1_inner(iterable,size))

def MarkusJarderot2_inner(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

@b.add_function()
def MarkusJarderot2(iterable,size):
    return consume_all(MarkusJarderot2_inner(iterable,size))

@b.add_arguments()
def argument_provider():
    for exp in range(2, 20):
        size = 2**exp
        yield size, simple_benchmark.MultiArgument([[0] * size, 10])

r = b.run()

1免责声明:我是图书馆的作者iteration_utilitiessimple_benchmark

If you don’t mind using an external package you could use iteration_utilities.grouper from iteration_utilties 1. It supports all iterables (not just sequences):

from iteration_utilities import grouper
seq = list(range(20))
for group in grouper(seq, 4):
    print(group)

which prints:

(0, 1, 2, 3)
(4, 5, 6, 7)
(8, 9, 10, 11)
(12, 13, 14, 15)
(16, 17, 18, 19)

In case the length isn’t a multiple of the groupsize it also supports filling (the incomplete last group) or truncating (discarding the incomplete last group) the last one:

from iteration_utilities import grouper
seq = list(range(17))
for group in grouper(seq, 4):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16,)

for group in grouper(seq, 4, fillvalue=None):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)
# (16, None, None, None)

for group in grouper(seq, 4, truncate=True):
    print(group)
# (0, 1, 2, 3)
# (4, 5, 6, 7)
# (8, 9, 10, 11)
# (12, 13, 14, 15)

Benchmarks

I also decided to compare the run-time of a few of the mentioned approaches. It’s a log-log plot grouping into groups of “10” elements based on a list of varying size. For qualitative results: Lower means faster:

enter image description here

At least in this benchmark the iteration_utilities.grouper performs best. Followed by the approach of Craz.

The benchmark was created with simple_benchmark1. The code used to run this benchmark was:

import iteration_utilities
import itertools
from itertools import zip_longest

def consume_all(it):
    return iteration_utilities.consume(it, None)

import simple_benchmark
b = simple_benchmark.BenchmarkBuilder()

@b.add_function()
def grouper(l, n):
    return consume_all(iteration_utilities.grouper(l, n))

def Craz_inner(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(*args, fillvalue=fillvalue)

@b.add_function()
def Craz(iterable, n, fillvalue=None):
    return consume_all(Craz_inner(iterable, n, fillvalue))

def nosklo_inner(seq, size):
    return (seq[pos:pos + size] for pos in range(0, len(seq), size))

@b.add_function()
def nosklo(seq, size):
    return consume_all(nosklo_inner(seq, size))

def SLott_inner(ints, chunk_size):
    for i in range(0, len(ints), chunk_size):
        yield ints[i:i+chunk_size]

@b.add_function()
def SLott(ints, chunk_size):
    return consume_all(SLott_inner(ints, chunk_size))

def MarkusJarderot1_inner(iterable,size):
    it = iter(iterable)
    chunk = tuple(itertools.islice(it,size))
    while chunk:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

@b.add_function()
def MarkusJarderot1(iterable,size):
    return consume_all(MarkusJarderot1_inner(iterable,size))

def MarkusJarderot2_inner(iterable,size,filler=None):
    it = itertools.chain(iterable,itertools.repeat(filler,size-1))
    chunk = tuple(itertools.islice(it,size))
    while len(chunk) == size:
        yield chunk
        chunk = tuple(itertools.islice(it,size))

@b.add_function()
def MarkusJarderot2(iterable,size):
    return consume_all(MarkusJarderot2_inner(iterable,size))

@b.add_arguments()
def argument_provider():
    for exp in range(2, 20):
        size = 2**exp
        yield size, simple_benchmark.MultiArgument([[0] * size, 10])

r = b.run()

1 Disclaimer: I’m the author of the libraries iteration_utilities and simple_benchmark.


回答 9

由于没有人提到它,所以这里有一个zip()解决方案:

>>> def chunker(iterable, chunksize):
...     return zip(*[iter(iterable)]*chunksize)

仅当序列的长度始终可被块大小整除或不关心尾随的块时,它才起作用。

例:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

或使用itertools.izip返回迭代器而不是列表:

>>> from itertools import izip
>>> def chunker(iterable, chunksize):
...     return izip(*[iter(iterable)]*chunksize)

可以使用@▼ZΩΤZΙΟΥ的答案来固定填充:

>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
...     it   = chain(iterable, repeat(fillvalue, chunksize-1))
...     args = [it] * chunksize
...     return izip(*args)

Since nobody’s mentioned it yet here’s a zip() solution:

>>> def chunker(iterable, chunksize):
...     return zip(*[iter(iterable)]*chunksize)

It works only if your sequence’s length is always divisible by the chunk size or you don’t care about a trailing chunk if it isn’t.

Example:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9')]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8')]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

Or using itertools.izip to return an iterator instead of a list:

>>> from itertools import izip
>>> def chunker(iterable, chunksize):
...     return izip(*[iter(iterable)]*chunksize)

Padding can be fixed using @ΤΖΩΤΖΙΟΥ’s answer:

>>> from itertools import chain, izip, repeat
>>> def chunker(iterable, chunksize, fillvalue=None):
...     it   = chain(iterable, repeat(fillvalue, chunksize-1))
...     args = [it] * chunksize
...     return izip(*args)

回答 10

使用map()而不是zip()可解决JF Sebastian的答案中的填充问题:

>>> def chunker(iterable, chunksize):
...   return map(None,*[iter(iterable)]*chunksize)

例:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

Using map() instead of zip() fixes the padding issue in J.F. Sebastian’s answer:

>>> def chunker(iterable, chunksize):
...   return map(None,*[iter(iterable)]*chunksize)

Example:

>>> s = '1234567890'
>>> chunker(s, 3)
[('1', '2', '3'), ('4', '5', '6'), ('7', '8', '9'), ('0', None, None)]
>>> chunker(s, 4)
[('1', '2', '3', '4'), ('5', '6', '7', '8'), ('9', '0', None, None)]
>>> chunker(s, 5)
[('1', '2', '3', '4', '5'), ('6', '7', '8', '9', '0')]

回答 11

另一种方法是使用以下两个参数的形式iter

from itertools import islice

def group(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

可以很容易地调整它以使用填充(这类似于Markus Jarderot的回答):

from itertools import islice, chain, repeat

def group_pad(it, size, pad=None):
    it = chain(iter(it), repeat(pad))
    return iter(lambda: tuple(islice(it, size)), (pad,) * size)

这些甚至可以结合起来用于可选的填充:

_no_pad = object()
def group(it, size, pad=_no_pad):
    if pad == _no_pad:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(pad))
        sentinel = (pad,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

Another approach would be to use the two-argument form of iter:

from itertools import islice

def group(it, size):
    it = iter(it)
    return iter(lambda: tuple(islice(it, size)), ())

This can be adapted easily to use padding (this is similar to Markus Jarderot’s answer):

from itertools import islice, chain, repeat

def group_pad(it, size, pad=None):
    it = chain(iter(it), repeat(pad))
    return iter(lambda: tuple(islice(it, size)), (pad,) * size)

These can even be combined for optional padding:

_no_pad = object()
def group(it, size, pad=_no_pad):
    if pad == _no_pad:
        it = iter(it)
        sentinel = ()
    else:
        it = chain(iter(it), repeat(pad))
        sentinel = (pad,) * size
    return iter(lambda: tuple(islice(it, size)), sentinel)

回答 12

如果列表很大,执行此操作的最高性能方法是使用生成器:

def get_chunk(iterable, chunk_size):
    result = []
    for item in iterable:
        result.append(item)
        if len(result) == chunk_size:
            yield tuple(result)
            result = []
    if len(result) > 0:
        yield tuple(result)

for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
    print x

(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)

If the list is large, the highest-performing way to do this will be to use a generator:

def get_chunk(iterable, chunk_size):
    result = []
    for item in iterable:
        result.append(item)
        if len(result) == chunk_size:
            yield tuple(result)
            result = []
    if len(result) > 0:
        yield tuple(result)

for x in get_chunk([1,2,3,4,5,6,7,8,9,10], 3):
    print x

(1, 2, 3)
(4, 5, 6)
(7, 8, 9)
(10,)

回答 13

使用小功能和事情确实对我没有吸引力。我更喜欢只使用切片:

data = [...]
chunk_size = 10000 # or whatever
chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)]
for chunk in chunks:
    ...

Using little functions and things really doesn’t appeal to me; I prefer to just use slices:

data = [...]
chunk_size = 10000 # or whatever
chunks = [data[i:i+chunk_size] for i in xrange(0,len(data),chunk_size)]
for chunk in chunks:
    ...

回答 14

为了避免所有转换为列表,import itertools并且:

>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10):
...     list(g)

生成:

... 
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
3 [30, 31, 32, 33, 34]
>>> 

我检查了一下groupby,它没有转换为列表或使用len所以我(认为)这将延迟每个值的解析,直到实际使用它为止。可悲的是,目前没有可用的答案似乎提供这种变化。

显然,如果您需要依次处理每个项目,请在g上嵌套for循环:

for k,g in itertools.groupby(xrange(35), lambda x: x/10):
    for i in g:
       # do what you need to do with individual items
    # now do what you need to do with the whole group

我对此的特别兴趣是需要消耗一个生成器才能将最多1000个更改批量提交给gmail API:

    messages = a_generator_which_would_not_be_smart_as_a_list
    for idx, batch in groupby(messages, lambda x: x/1000):
        batch_request = BatchHttpRequest()
        for message in batch:
            batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels))
        http = httplib2.Http()
        self.credentials.authorize(http)
        batch_request.execute(http=http)

To avoid all conversions to a list import itertools and:

>>> for k, g in itertools.groupby(xrange(35), lambda x: x/10):
...     list(g)

Produces:

... 
0 [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
1 [10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
2 [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
3 [30, 31, 32, 33, 34]
>>> 

I checked groupby and it doesn’t convert to list or use len so I (think) this will delay resolution of each value until it is actually used. Sadly none of the available answers (at this time) seemed to offer this variation.

Obviously if you need to handle each item in turn nest a for loop over g:

for k,g in itertools.groupby(xrange(35), lambda x: x/10):
    for i in g:
       # do what you need to do with individual items
    # now do what you need to do with the whole group

My specific interest in this was the need to consume a generator to submit changes in batches of up to 1000 to the gmail API:

    messages = a_generator_which_would_not_be_smart_as_a_list
    for idx, batch in groupby(messages, lambda x: x/1000):
        batch_request = BatchHttpRequest()
        for message in batch:
            batch_request.add(self.service.users().messages().modify(userId='me', id=message['id'], body=msg_labels))
        http = httplib2.Http()
        self.credentials.authorize(http)
        batch_request.execute(http=http)

回答 15

使用NumPy很简单:

ints = array([1, 2, 3, 4, 5, 6, 7, 8])
for int1, int2 in ints.reshape(-1, 2):
    print(int1, int2)

输出:

1 2
3 4
5 6
7 8

With NumPy it’s simple:

ints = array([1, 2, 3, 4, 5, 6, 7, 8])
for int1, int2 in ints.reshape(-1, 2):
    print(int1, int2)

output:

1 2
3 4
5 6
7 8

回答 16

def chunker(iterable, n):
    """Yield iterable in chunk sizes.

    >>> chunks = chunker('ABCDEF', n=4)
    >>> chunks.next()
    ['A', 'B', 'C', 'D']
    >>> chunks.next()
    ['E', 'F']
    """
    it = iter(iterable)
    while True:
        chunk = []
        for i in range(n):
            try:
                chunk.append(next(it))
            except StopIteration:
                yield chunk
                raise StopIteration
        yield chunk

if __name__ == '__main__':
    import doctest

    doctest.testmod()
def chunker(iterable, n):
    """Yield iterable in chunk sizes.

    >>> chunks = chunker('ABCDEF', n=4)
    >>> chunks.next()
    ['A', 'B', 'C', 'D']
    >>> chunks.next()
    ['E', 'F']
    """
    it = iter(iterable)
    while True:
        chunk = []
        for i in range(n):
            try:
                chunk.append(next(it))
            except StopIteration:
                yield chunk
                raise StopIteration
        yield chunk

if __name__ == '__main__':
    import doctest

    doctest.testmod()

回答 17

除非我错过任何事情,否则不会提及以下带有生成器表达式的简单解决方案。它假定块的大小和数量都是已知的(通常是这种情况),并且不需要填充:

def chunks(it, n, m):
    """Make an iterator over m first chunks of size n.
    """
    it = iter(it)
    # Chunks are presented as tuples.
    return (tuple(next(it) for _ in range(n)) for _ in range(m))

Unless I misses something, the following simple solution with generator expressions has not been mentioned. It assumes that both the size and the number of chunks are known (which is often the case), and that no padding is required:

def chunks(it, n, m):
    """Make an iterator over m first chunks of size n.
    """
    it = iter(it)
    # Chunks are presented as tuples.
    return (tuple(next(it) for _ in range(n)) for _ in range(m))

回答 18

在您的第二种方法中,我将通过执行以下操作进入下一组4:

ints = ints[4:]

但是,我还没有进行任何性能评估,所以我不知道哪个效率更高。

话虽如此,我通常会选择第一种方法。这不是很漂亮,但这通常是与外界交互的结果。

In your second method, I would advance to the next group of 4 by doing this:

ints = ints[4:]

However, I haven’t done any performance measurement so I don’t know which one might be more efficient.

Having said that, I would usually choose the first method. It’s not pretty, but that’s often a consequence of interfacing with the outside world.


回答 19

另一个答案,其优点是:

1)易于理解
2)可以处理任何可迭代的对象,而不仅是序列(上面的一些回答会在文件句柄上阻塞)
3)不会一次将块全部加载到内存中
4)不会对引用进行大块的长列表内存中的相同迭代器
5)列表末尾没有填充值

话虽这么说,我还没有计时,所以它可能比一些更聪明的方法要慢,并且某些优点可能与用例无关。

def chunkiter(iterable, size):
  def inneriter(first, iterator, size):
    yield first
    for _ in xrange(size - 1): 
      yield iterator.next()
  it = iter(iterable)
  while True:
    yield inneriter(it.next(), it, size)

In [2]: i = chunkiter('abcdefgh', 3)
In [3]: for ii in i:                                                
          for c in ii:
            print c,
          print ''
        ...:     
        a b c 
        d e f 
        g h 

更新:
由于内循环和外循环从同一个迭代器中提取值而造成的一些弊端:
1)继续无法在外循环中按预期方式工作-继续执行下一个项目而不是跳过一个块。但是,这似乎不是问题,因为在外循环中没有要测试的东西。
2)break不能在内部循环中按预期方式工作-控件将在迭代器中的下一个项目中再次进入内部循环。要跳过整个块,可以将内部迭代器(上面的ii)包装在一个元组中,例如for c in tuple(ii),或者设置一个标志并耗尽迭代器。

Yet another answer, the advantages of which are:

1) Easily understandable
2) Works on any iterable, not just sequences (some of the above answers will choke on filehandles)
3) Does not load the chunk into memory all at once
4) Does not make a chunk-long list of references to the same iterator in memory
5) No padding of fill values at the end of the list

That being said, I haven’t timed it so it might be slower than some of the more clever methods, and some of the advantages may be irrelevant given the use case.

def chunkiter(iterable, size):
  def inneriter(first, iterator, size):
    yield first
    for _ in xrange(size - 1): 
      yield iterator.next()
  it = iter(iterable)
  while True:
    yield inneriter(it.next(), it, size)

In [2]: i = chunkiter('abcdefgh', 3)
In [3]: for ii in i:                                                
          for c in ii:
            print c,
          print ''
        ...:     
        a b c 
        d e f 
        g h 

Update:
A couple of drawbacks due to the fact the inner and outer loops are pulling values from the same iterator:
1) continue doesn’t work as expected in the outer loop – it just continues on to the next item rather than skipping a chunk. However, this doesn’t seem like a problem as there’s nothing to test in the outer loop.
2) break doesn’t work as expected in the inner loop – control will wind up in the inner loop again with the next item in the iterator. To skip whole chunks, either wrap the inner iterator (ii above) in a tuple, e.g. for c in tuple(ii), or set a flag and exhaust the iterator.


回答 20

def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist
def group_by(iterable, size):
    """Group an iterable into lists that don't exceed the size given.

    >>> group_by([1,2,3,4,5], 2)
    [[1, 2], [3, 4], [5]]

    """
    sublist = []

    for index, item in enumerate(iterable):
        if index > 0 and index % size == 0:
            yield sublist
            sublist = []

        sublist.append(item)

    if sublist:
        yield sublist

回答 21

您可以从funcy库中使用分区功能:

from funcy import partition

for a, b, c, d in partition(4, ints):
    foo += a * b * c * d

这些函数还具有迭代器版本ipartitionichunks,在这种情况下将更加高效。

您也可以查看它们的实现

You can use partition or chunks function from funcy library:

from funcy import partition

for a, b, c, d in partition(4, ints):
    foo += a * b * c * d

These functions also has iterator versions ipartition and ichunks, which will be more efficient in this case.

You can also peek at their implementation.


回答 22

关于J.F. Sebastian 这里给的解决方案:

def chunker(iterable, chunksize):
    return zip(*[iter(iterable)]*chunksize)

它很聪明,但是有一个缺点-总是返回元组。如何获取字符串呢?
当然您可以编写''.join(chunker(...)),但是无论如何都构造了临时元组。

您可以通过编写own来摆脱临时元组zip,如下所示:

class IteratorExhausted(Exception):
    pass

def translate_StopIteration(iterable, to=IteratorExhausted):
    for i in iterable:
        yield i
    raise to # StopIteration would get ignored because this is generator,
             # but custom exception can leave the generator.

def custom_zip(*iterables, reductor=tuple):
    iterators = tuple(map(translate_StopIteration, iterables))
    while True:
        try:
            yield reductor(next(i) for i in iterators)
        except IteratorExhausted: # when any of iterators get exhausted.
            break

然后

def chunker(data, size, reductor=tuple):
    return custom_zip(*[iter(data)]*size, reductor=reductor)

用法示例:

>>> for i in chunker('12345', 2):
...     print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
...     print(repr(i))
...
'12'
'34'

About solution gave by J.F. Sebastian here:

def chunker(iterable, chunksize):
    return zip(*[iter(iterable)]*chunksize)

It’s clever, but has one disadvantage – always return tuple. How to get string instead?
Of course you can write ''.join(chunker(...)), but the temporary tuple is constructed anyway.

You can get rid of the temporary tuple by writing own zip, like this:

class IteratorExhausted(Exception):
    pass

def translate_StopIteration(iterable, to=IteratorExhausted):
    for i in iterable:
        yield i
    raise to # StopIteration would get ignored because this is generator,
             # but custom exception can leave the generator.

def custom_zip(*iterables, reductor=tuple):
    iterators = tuple(map(translate_StopIteration, iterables))
    while True:
        try:
            yield reductor(next(i) for i in iterators)
        except IteratorExhausted: # when any of iterators get exhausted.
            break

Then

def chunker(data, size, reductor=tuple):
    return custom_zip(*[iter(data)]*size, reductor=reductor)

Example usage:

>>> for i in chunker('12345', 2):
...     print(repr(i))
...
('1', '2')
('3', '4')
>>> for i in chunker('12345', 2, ''.join):
...     print(repr(i))
...
'12'
'34'

回答 23

我喜欢这种方法。它感觉简单而不是魔术,并且支持所有可迭代的类型,并且不需要导入。

def chunk_iter(iterable, chunk_size):
it = iter(iterable)
while True:
    chunk = tuple(next(it) for _ in range(chunk_size))
    if not chunk:
        break
    yield chunk

I like this approach. It feels simple and not magical and supports all iterable types and doesn’t require imports.

def chunk_iter(iterable, chunk_size):
it = iter(iterable)
while True:
    chunk = tuple(next(it) for _ in range(chunk_size))
    if not chunk:
        break
    yield chunk

回答 24

我从不希望填充我的数据块,因此这一要求至关重要。我发现还需要具有处理任何迭代的能力。鉴于此,我决定扩展接受的答案,https://stackoverflow.com/a/434411/1074659

如果由于需要比较和过滤填充值而不需要填充,则此方法的性能会受到轻微影响。但是,对于大块数据,此实用程序非常有效。

#!/usr/bin/env python3
from itertools import zip_longest


_UNDEFINED = object()


def chunker(iterable, chunksize, fillvalue=_UNDEFINED):
    """
    Collect data into chunks and optionally pad it.

    Performance worsens as `chunksize` approaches 1.

    Inspired by:
        https://docs.python.org/3/library/itertools.html#itertools-recipes

    """
    args = [iter(iterable)] * chunksize
    chunks = zip_longest(*args, fillvalue=fillvalue)
    yield from (
        filter(lambda val: val is not _UNDEFINED, chunk)
        if chunk[-1] is _UNDEFINED
        else chunk
        for chunk in chunks
    ) if fillvalue is _UNDEFINED else chunks

I never want my chunks padded, so that requirement is essential. I find that the ability to work on any iterable is also requirement. Given that, I decided to extend on the accepted answer, https://stackoverflow.com/a/434411/1074659.

Performance takes a slight hit in this approach if padding is not wanted due to the need to compare and filter the padded values. However, for large chunk sizes, this utility is very performant.

#!/usr/bin/env python3
from itertools import zip_longest


_UNDEFINED = object()


def chunker(iterable, chunksize, fillvalue=_UNDEFINED):
    """
    Collect data into chunks and optionally pad it.

    Performance worsens as `chunksize` approaches 1.

    Inspired by:
        https://docs.python.org/3/library/itertools.html#itertools-recipes

    """
    args = [iter(iterable)] * chunksize
    chunks = zip_longest(*args, fillvalue=fillvalue)
    yield from (
        filter(lambda val: val is not _UNDEFINED, chunk)
        if chunk[-1] is _UNDEFINED
        else chunk
        for chunk in chunks
    ) if fillvalue is _UNDEFINED else chunks

回答 25

这是一个没有导入功能的分块器,它支持生成器:

def chunks(seq, size):
    it = iter(seq)
    while True:
        ret = tuple(next(it) for _ in range(size))
        if len(ret) == size:
            yield ret
        else:
            raise StopIteration()

使用示例:

>>> def foo():
...     i = 0
...     while True:
...         i += 1
...         yield i
...
>>> c = chunks(foo(), 3)
>>> c.next()
(1, 2, 3)
>>> c.next()
(4, 5, 6)
>>> list(chunks('abcdefg', 2))
[('a', 'b'), ('c', 'd'), ('e', 'f')]

Here is a chunker without imports that supports generators:

def chunks(seq, size):
    it = iter(seq)
    while True:
        ret = tuple(next(it) for _ in range(size))
        if len(ret) == size:
            yield ret
        else:
            raise StopIteration()

Example of use:

>>> def foo():
...     i = 0
...     while True:
...         i += 1
...         yield i
...
>>> c = chunks(foo(), 3)
>>> c.next()
(1, 2, 3)
>>> c.next()
(4, 5, 6)
>>> list(chunks('abcdefg', 2))
[('a', 'b'), ('c', 'd'), ('e', 'f')]

回答 26

在Python 3.8中,您可以使用walrus运算符和itertools.islice

from itertools import islice

list_ = [i for i in range(10, 100)]

def chunker(it, size):
    iterator = iter(it)
    while chunk := list(islice(iterator, size)):
        print(chunk)
In [2]: chunker(list_, 10)                                                         
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]

With Python 3.8 you can use the walrus operator and itertools.islice.

from itertools import islice

list_ = [i for i in range(10, 100)]

def chunker(it, size):
    iterator = iter(it)
    while chunk := list(islice(iterator, size)):
        print(chunk)
In [2]: chunker(list_, 10)                                                         
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
[20, 21, 22, 23, 24, 25, 26, 27, 28, 29]
[30, 31, 32, 33, 34, 35, 36, 37, 38, 39]
[40, 41, 42, 43, 44, 45, 46, 47, 48, 49]
[50, 51, 52, 53, 54, 55, 56, 57, 58, 59]
[60, 61, 62, 63, 64, 65, 66, 67, 68, 69]
[70, 71, 72, 73, 74, 75, 76, 77, 78, 79]
[80, 81, 82, 83, 84, 85, 86, 87, 88, 89]
[90, 91, 92, 93, 94, 95, 96, 97, 98, 99]


回答 27

似乎没有做到这一点的漂亮方法。 是一个包含许多方法的页面,包括:

def split_seq(seq, size):
    newseq = []
    splitsize = 1.0/size*len(seq)
    for i in range(size):
        newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
    return newseq

There doesn’t seem to be a pretty way to do this. Here is a page that has a number of methods, including:

def split_seq(seq, size):
    newseq = []
    splitsize = 1.0/size*len(seq)
    for i in range(size):
        newseq.append(seq[int(round(i*splitsize)):int(round((i+1)*splitsize))])
    return newseq

回答 28

如果列表大小相同,则可以将它们组合成4元组的列表zip()。例如:

# Four lists of four elements each.

l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)

for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
    ...

下面是什么zip()函数生成:

>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]

如果列表很大,并且您不想将它们组合成更大的列表,请使用itertools.izip(),它会生成一个迭代器,而不是列表。

from itertools import izip

for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
    ...

If the lists are the same size, you can combine them into lists of 4-tuples with zip(). For example:

# Four lists of four elements each.

l1 = range(0, 4)
l2 = range(4, 8)
l3 = range(8, 12)
l4 = range(12, 16)

for i1, i2, i3, i4 in zip(l1, l2, l3, l4):
    ...

Here’s what the zip() function produces:

>>> print l1
[0, 1, 2, 3]
>>> print l2
[4, 5, 6, 7]
>>> print l3
[8, 9, 10, 11]
>>> print l4
[12, 13, 14, 15]
>>> print zip(l1, l2, l3, l4)
[(0, 4, 8, 12), (1, 5, 9, 13), (2, 6, 10, 14), (3, 7, 11, 15)]

If the lists are large, and you don’t want to combine them into a bigger list, use itertools.izip(), which produces an iterator, rather than a list.

from itertools import izip

for i1, i2, i3, i4 in izip(l1, l2, l3, l4):
    ...

回答 29

一种单行的即席解决方案,可x对大小成块的列表进行迭代4

for a, b, c, d in zip(x[0::4], x[1::4], x[2::4], x[3::4]):
    ... do something with a, b, c and d ...

One-liner, adhoc solution to iterate over a list x in chunks of size 4

for a, b, c, d in zip(x[0::4], x[1::4], x[2::4], x[3::4]):
    ... do something with a, b, c and d ...

查找列表的平均值

问题:查找列表的平均值

我必须在Python中找到列表的平均值。到目前为止,这是我的代码

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print reduce(lambda x, y: x + y, l)

我已经知道了,所以它可以将列表中的值加在一起,但是我不知道如何将其划分为它们?

I have to find the average of a list in Python. This is my code so far

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print reduce(lambda x, y: x + y, l)

I’ve got it so it adds together the values in the list, but I don’t know how to make it divide into them?


回答 0

在Python 3.4+上,您可以使用 statistics.mean()

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]

import statistics
statistics.mean(l)  # 20.11111111111111

在旧版本的Python上,您可以执行

sum(l) / len(l)

在Python 2上,您需要转换len为浮点数才能进行浮点数除法

sum(l) / float(len(l))

无需使用reduce。它慢得多,并在Python 3 中删除

On Python 3.4+ you can use statistics.mean()

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]

import statistics
statistics.mean(l)  # 20.11111111111111

On older versions of Python you can do

sum(l) / len(l)

On Python 2 you need to convert len to a float to get float division

sum(l) / float(len(l))

There is no need to use reduce. It is much slower and was removed in Python 3.


回答 1

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
sum(l) / len(l)
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
sum(l) / len(l)

回答 2

您可以使用numpy.mean

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]

import numpy as np
print(np.mean(l))

You can use numpy.mean:

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]

import numpy as np
print(np.mean(l))

回答 3

一个统计模块已经加入到了Python 3.4。它具有计算平均值的功能,称为均值。您提供的列表的示例为:

from statistics import mean
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
mean(l)

A statistics module has been added to python 3.4. It has a function to calculate the average called mean. An example with the list you provided would be:

from statistics import mean
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
mean(l)

回答 4

reduce()当Python具有完美的sum()功能时,为什么要使用此功能?

print sum(l) / float(len(l))

float()必须强制Python执行浮点除法。)

Why would you use reduce() for this when Python has a perfectly cromulent sum() function?

print sum(l) / float(len(l))

(The float() is necessary to force Python to do a floating-point division.)


回答 5

如果您使用的是python> = 3.4,则有一个统计资料库

https://docs.python.org/3/library/statistics.html

您可以使用这种卑鄙的方法。假设您有一个要查找均值的数字列表:-

list = [11, 13, 12, 15, 17]
import statistics as s
s.mean(list)

它还有其他方法,如stdev,方差,众数,谐波均值,中位数等,这些方法也非常有用。

There is a statistics library if you are using python >= 3.4

https://docs.python.org/3/library/statistics.html

You may use it’s mean method like this. Let’s say you have a list of numbers of which you want to find mean:-

list = [11, 13, 12, 15, 17]
import statistics as s
s.mean(list)

It has other methods too like stdev, variance, mode, harmonic mean, median etc which are too useful.


回答 6

除了将其强制转换为浮点数外,还可以在总和上加上0.0:

def avg(l):
    return sum(l, 0.0) / len(l)

Instead of casting to float, you can add 0.0 to the sum:

def avg(l):
    return sum(l, 0.0) / len(l)

回答 7

sum(l) / float(len(l)) 是正确的答案,但仅出于完整性考虑,您可以通过一次减少来计算平均值:

>>> reduce(lambda x, y: x + y / float(len(l)), l, 0)
20.111111111111114

请注意,这可能会导致轻微的舍入错误:

>>> sum(l) / float(len(l))
20.111111111111111

sum(l) / float(len(l)) is the right answer, but just for completeness you can compute an average with a single reduce:

>>> reduce(lambda x, y: x + y / float(len(l)), l, 0)
20.111111111111114

Note that this can result in a slight rounding error:

>>> sum(l) / float(len(l))
20.111111111111111

回答 8

我尝试使用上面的选项,但是没有用。尝试这个:

from statistics import mean

n = [11, 13, 15, 17, 19]

print(n)
print(mean(n))

在python 3.5上工作

I tried using the options above but didn’t work. Try this:

from statistics import mean

n = [11, 13, 15, 17, 19]

print(n)
print(mean(n))

worked on python 3.5


回答 9

或使用pandasSeries.mean方法:

pd.Series(sequence).mean()

演示:

>>> import pandas as pd
>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> pd.Series(l).mean()
20.11111111111111
>>> 

从文档:

Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

这是文档:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html

以及整个文档:

https://pandas.pydata.org/pandas-docs/stable/10min.html

Or use pandas‘s Series.mean method:

pd.Series(sequence).mean()

Demo:

>>> import pandas as pd
>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> pd.Series(l).mean()
20.11111111111111
>>> 

From the docs:

Series.mean(axis=None, skipna=None, level=None, numeric_only=None, **kwargs)

And here is the docs for this:

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.mean.html

And the whole documentation:

https://pandas.pydata.org/pandas-docs/stable/10min.html


回答 10

在Udacity的问题中,我也有类似的问题要解决。我编码的不是内置函数:

def list_mean(n):

    summing = float(sum(n))
    count = float(len(n))
    if n == []:
        return False
    return float(summing/count)

比平常更长的时间,但是对于初学者来说,这是一个很大的挑战。

I had a similar question to solve in a Udacity´s problems. Instead of a built-in function i coded:

def list_mean(n):

    summing = float(sum(n))
    count = float(len(n))
    if n == []:
        return False
    return float(summing/count)

Much more longer than usual but for a beginner its quite challenging.


回答 11

作为一个初学者,我只是编写了这样的代码:

L = [15, 18, 2, 36, 12, 78, 5, 6, 9]

total = 0

def average(numbers):
    total = sum(numbers)
    total = float(total)
    return total / len(numbers)

print average(L)

as a beginner, I just coded this:

L = [15, 18, 2, 36, 12, 78, 5, 6, 9]

total = 0

def average(numbers):
    total = sum(numbers)
    total = float(total)
    return total / len(numbers)

print average(L)

回答 12

如果您想获得的不仅仅是平均值(也就是平均值),您可以查看scipy统计信息

from scipy import stats
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(stats.describe(l))

# DescribeResult(nobs=9, minmax=(2, 78), mean=20.11111111111111, 
# variance=572.3611111111111, skewness=1.7791785448425341, 
# kurtosis=1.9422716419666397)

If you wanted to get more than just the mean (aka average) you might check out scipy stats

from scipy import stats
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(stats.describe(l))

# DescribeResult(nobs=9, minmax=(2, 78), mean=20.11111111111111, 
# variance=572.3611111111111, skewness=1.7791785448425341, 
# kurtosis=1.9422716419666397)

回答 13

为了reduce用于获得运行平均值,您需要跟踪总数,但也要跟踪到目前为止看到的元素总数。由于这不是列表中的琐碎元素,因此您还必须传递reduce一个额外的参数以使其折叠。

>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> running_average = reduce(lambda aggr, elem: (aggr[0] + elem, aggr[1]+1), l, (0.0,0))
>>> running_average[0]
(181.0, 9)
>>> running_average[0]/running_average[1]
20.111111111111111

In order to use reduce for taking a running average, you’ll need to track the total but also the total number of elements seen so far. since that’s not a trivial element in the list, you’ll also have to pass reduce an extra argument to fold into.

>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> running_average = reduce(lambda aggr, elem: (aggr[0] + elem, aggr[1]+1), l, (0.0,0))
>>> running_average[0]
(181.0, 9)
>>> running_average[0]/running_average[1]
20.111111111111111

回答 14

两者都可以为您提供接近整数或至少10个十进制值的相似值。但是,如果您真正考虑的是长浮点值,则两者可能会有所不同。方法可能因您要实现的目标而异。

>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> print reduce(lambda x, y: x + y, l) / len(l)
20
>>> sum(l)/len(l)
20

浮动值

>>> print reduce(lambda x, y: x + y, l) / float(len(l))
20.1111111111
>>> print sum(l)/float(len(l))
20.1111111111

@安德鲁·克拉克(Andrew Clark)的发言是正确的。

Both can give you close to similar values on an integer or at least 10 decimal values. But if you are really considering long floating values both can be different. Approach can vary on what you want to achieve.

>>> l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
>>> print reduce(lambda x, y: x + y, l) / len(l)
20
>>> sum(l)/len(l)
20

Floating values

>>> print reduce(lambda x, y: x + y, l) / float(len(l))
20.1111111111
>>> print sum(l)/float(len(l))
20.1111111111

@Andrew Clark was correct on his statement.


回答 15

假设

x = [[-5.01,-5.43,1.08,0.86,-2.67,4.94,-2.51,-2.25,5.56,1.03], [-8.12,-3.48,-5.52,-3.78,0.63,3.29,2.09,-2.13,2.86,-3.33], [-3.68,-3.54,1.66,-4.11,7.39,2.08,-2.59,-6.94,-2.26,4.33]]

您会注意到它的x尺寸为3 * 10,如果您需要mean进入每一行,则可以键入

theMean = np.mean(x1,axis=1)

别忘了 import numpy as np

suppose that

x = [[-5.01,-5.43,1.08,0.86,-2.67,4.94,-2.51,-2.25,5.56,1.03], [-8.12,-3.48,-5.52,-3.78,0.63,3.29,2.09,-2.13,2.86,-3.33], [-3.68,-3.54,1.66,-4.11,7.39,2.08,-2.59,-6.94,-2.26,4.33]]

you can notice that x has dimension 3*10 if you need to get the mean to each row you can type this

theMean = np.mean(x1,axis=1)

don’t forget to import numpy as np


回答 16

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]

l = map(float,l)
print '%.2f' %(sum(l)/len(l))
l = [15, 18, 2, 36, 12, 78, 5, 6, 9]

l = map(float,l)
print '%.2f' %(sum(l)/len(l))

回答 17

使用以下PYTHON代码在列表中查找平均值:

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(sum(l)//len(l))

试试这个很容易。

Find the average in list By using the following PYTHON code:

l = [15, 18, 2, 36, 12, 78, 5, 6, 9]
print(sum(l)//len(l))

try this it easy.


回答 18

print reduce(lambda x, y: x + y, l)/(len(l)*1.0)

或喜欢以前发布的

sum(l)/(len(l)*1.0)

1.0是确保获得浮点除法

print reduce(lambda x, y: x + y, l)/(len(l)*1.0)

or like posted previously

sum(l)/(len(l)*1.0)

The 1.0 is to make sure you get a floating point division


回答 19

结合以上几个答案,我提出了以下与reduce一起使用的方法,并且不假定您在reduce L函数中可用:

from operator import truediv

L = [15, 18, 2, 36, 12, 78, 5, 6, 9]

def sum_and_count(x, y):
    try:
        return (x[0] + y, x[1] + 1)
    except TypeError:
        return (x + y, 2)

truediv(*reduce(sum_and_count, L))

# prints 
20.11111111111111

Combining a couple of the above answers, I’ve come up with the following which works with reduce and doesn’t assume you have L available inside the reducing function:

from operator import truediv

L = [15, 18, 2, 36, 12, 78, 5, 6, 9]

def sum_and_count(x, y):
    try:
        return (x[0] + y, x[1] + 1)
    except TypeError:
        return (x + y, 2)

truediv(*reduce(sum_and_count, L))

# prints 
20.11111111111111

回答 20

我想添加另一种方法

import itertools,operator
list(itertools.accumulate(l,operator.add)).pop(-1) / len(l)

I want to add just another approach

import itertools,operator
list(itertools.accumulate(l,operator.add)).pop(-1) / len(l)

回答 21

numbers = [0,1,2,3]

numbers[0] = input("Please enter a number")

numbers[1] = input("Please enter a second number")

numbers[2] = input("Please enter a third number")

numbers[3] = input("Please enter a fourth number")

print (numbers)

print ("Finding the Avarage")

avarage = int(numbers[0]) + int(numbers[1]) + int(numbers[2]) + int(numbers [3]) / 4

print (avarage)
numbers = [0,1,2,3]

numbers[0] = input("Please enter a number")

numbers[1] = input("Please enter a second number")

numbers[2] = input("Please enter a third number")

numbers[3] = input("Please enter a fourth number")

print (numbers)

print ("Finding the Avarage")

avarage = int(numbers[0]) + int(numbers[1]) + int(numbers[2]) + int(numbers [3]) / 4

print (avarage)

查找两个嵌套列表的交集?

问题:查找两个嵌套列表的交集?

我知道如何得到两个平面列表的交集:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

要么

def intersect(a, b):
    return list(set(a) & set(b))

print intersect(b1, b2)

但是当我必须找到嵌套列表的交集时,我的问题就开始了:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

最后,我希望收到:

c3 = [[13,32],[7,13,28],[1,6]]

你们能帮我这个忙吗?

有关

I know how to get an intersection of two flat lists:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

or

def intersect(a, b):
    return list(set(a) & set(b))

print intersect(b1, b2)

But when I have to find intersection for nested lists then my problems starts:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

In the end I would like to receive:

c3 = [[13,32],[7,13,28],[1,6]]

Can you guys give me a hand with this?

Related


回答 0

如果你想:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]

然后这是您的Python 2解决方案:

c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]

在Python 3中,filter返回的是一个Iterable而不是list,因此您需要使用以下命令包装filter调用list()

c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]

说明:

过滤器部分接受每个子列表的项目,并检查它是否在源列表c1中。对c2中的每个子列表执行列表推导。

If you want:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [[13, 32], [7, 13, 28], [1,6]]

Then here is your solution for Python 2:

c3 = [filter(lambda x: x in c1, sublist) for sublist in c2]

In Python 3 filter returns an iterable instead of list, so you need to wrap filter calls with list():

c3 = [list(filter(lambda x: x in c1, sublist)) for sublist in c2]

Explanation:

The filter part takes each sublist’s item and checks to see if it is in the source list c1. The list comprehension is executed for each sublist in c2.


回答 1

您无需定义交集。它已经是场景的一流组成部分。

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> set(b1).intersection(b2)
set([4, 5])

You don’t need to define intersection. It’s already a first-class part of set.

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> set(b1).intersection(b2)
set([4, 5])

回答 2

对于只想查找两个列表的交集的人们,Asker提供了两种方法:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(b1, b2)

但是有一种更有效的混合方法,因为您只需要在列表/集合之间进行一次转换,而不是三种:

b1 = [1,2,3,4,5]
b2 = [3,4,5,6]
s2 = set(b2)
b3 = [val for val in b1 if val in s2]

这将在O(n)中运行,而他涉及列表理解的原始方法将在O(n ^ 2)中运行

For people just looking to find the intersection of two lists, the Asker provided two methods:

b1 = [1,2,3,4,5,9,11,15]
b2 = [4,5,6,7,8]
b3 = [val for val in b1 if val in b2]

and

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(b1, b2)

But there is a hybrid method that is more efficient, because you only have to do one conversion between list/set, as opposed to three:

b1 = [1,2,3,4,5]
b2 = [3,4,5,6]
s2 = set(b2)
b3 = [val for val in b1 if val in s2]

This will run in O(n), whereas his original method involving list comprehension will run in O(n^2)


回答 3

功能方法:

input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]

result = reduce(set.intersection, map(set, input_list))

它可以应用于1+列表的更一般情况

The functional approach:

input_list = [[1, 2, 3, 4, 5], [2, 3, 4, 5, 6], [3, 4, 5, 6, 7]]

result = reduce(set.intersection, map(set, input_list))

and it can be applied to the more general case of 1+ lists


回答 4

纯列表理解版本

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> c1set = frozenset(c1)

展平变体:

>>> [n for lst in c2 for n in lst if n in c1set]
[13, 32, 7, 13, 28, 1, 6]

嵌套变体:

>>> [[n for n in lst if n in c1set] for lst in c2]
[[13, 32], [7, 13, 28], [1, 6]]

Pure list comprehension version

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> c1set = frozenset(c1)

Flatten variant:

>>> [n for lst in c2 for n in lst if n in c1set]
[13, 32, 7, 13, 28, 1, 6]

Nested variant:

>>> [[n for n in lst if n in c1set] for lst in c2]
[[13, 32], [7, 13, 28], [1, 6]]

回答 5

&运算符采用两组的交集。

{1, 2, 3} & {2, 3, 4}
Out[1]: {2, 3}

The & operator takes the intersection of two sets.

{1, 2, 3} & {2, 3, 4}
Out[1]: {2, 3}

回答 6

采取2个列表的交集的pythonic方法是:

[x for x in list1 if x in list2]

A pythonic way of taking the intersection of 2 lists is:

[x for x in list1 if x in list2]

回答 7

您应该使用此代码(取自http://kogs-www.informatik.uni-hamburg.de/~meine/python_tricks)进行拼合,该代码未经测试,但我确定它可以正常工作:


def flatten(x):
    """flatten(sequence) -> list

    Returns a single, flat list which contains all elements retrieved
    from the sequence and all recursively contained sub-sequences
    (iterables).

    Examples:
    >>> [1, 2, [3,4], (5,6)]
    [1, 2, [3, 4], (5, 6)]
    >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
    [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""

    result = []
    for el in x:
        #if isinstance(el, (list, tuple)):
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

展平列表后,可以按通常方式执行交集:


c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(flatten(c1), flatten(c2))

You should flatten using this code ( taken from http://kogs-www.informatik.uni-hamburg.de/~meine/python_tricks ), the code is untested, but I’m pretty sure it works:


def flatten(x):
    """flatten(sequence) -> list

    Returns a single, flat list which contains all elements retrieved
    from the sequence and all recursively contained sub-sequences
    (iterables).

    Examples:
    >>> [1, 2, [3,4], (5,6)]
    [1, 2, [3, 4], (5, 6)]
    >>> flatten([[[1,2,3], (42,None)], [4,5], [6], 7, MyVector(8,9,10)])
    [1, 2, 3, 42, None, 4, 5, 6, 7, 8, 9, 10]"""

    result = []
    for el in x:
        #if isinstance(el, (list, tuple)):
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

After you had flattened the list, you perform the intersection in the usual way:


c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

def intersect(a, b):
     return list(set(a) & set(b))

print intersect(flatten(c1), flatten(c2))


回答 8

intersect定义以来,基本的列表理解就足够了:

>>> c3 = [intersect(c1, i) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

得益于S. Lott的评论和TM。的相关评论:

>>> c3 = [list(set(c1).intersection(i)) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

Since intersect was defined, a basic list comprehension is enough:

>>> c3 = [intersect(c1, i) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

Improvement thanks to S. Lott’s remark and TM.’s associated remark:

>>> c3 = [list(set(c1).intersection(i)) for i in c2]
>>> c3
[[32, 13], [28, 13, 7], [1, 6]]

回答 9

鉴于:

> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

我发现以下代码运行良好,并且如果使用set操作,则可能更简洁:

> c3 = [list(set(f)&set(c1)) for f in c2] 

它得到:

> [[32, 13], [28, 13, 7], [1, 6]]

如果需要订购:

> c3 = [sorted(list(set(f)&set(c1))) for f in c2] 

我们有:

> [[13, 32], [7, 13, 28], [1, 6]]

顺便说一句,对于更多的python样式,这个也很好:

> c3 = [ [i for i in set(f) if i in c1] for f in c2]

Given:

> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

I find the following code works well and maybe more concise if using set operation:

> c3 = [list(set(f)&set(c1)) for f in c2] 

It got:

> [[32, 13], [28, 13, 7], [1, 6]]

If order needed:

> c3 = [sorted(list(set(f)&set(c1))) for f in c2] 

we got:

> [[13, 32], [7, 13, 28], [1, 6]]

By the way, for a more python style, this one is fine too:

> c3 = [ [i for i in set(f) if i in c1] for f in c2]

回答 10

我不知道我是否迟于回答你的问题。阅读完您的问题后,我想到了一个可在列表和嵌套列表上使用的函数intersect()。我使用递归来定义此功能,这非常直观。希望它是您要寻找的:

def intersect(a, b):
    result=[]
    for i in b:
        if isinstance(i,list):
            result.append(intersect(a,i))
        else:
            if i in a:
                 result.append(i)
    return result

例:

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> print intersect(c1,c2)
[[13, 32], [7, 13, 28], [1, 6]]

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> print intersect(b1,b2)
[4, 5]

I don’t know if I am late in answering your question. After reading your question I came up with a function intersect() that can work on both list and nested list. I used recursion to define this function, it is very intuitive. Hope it is what you are looking for:

def intersect(a, b):
    result=[]
    for i in b:
        if isinstance(i,list):
            result.append(intersect(a,i))
        else:
            if i in a:
                 result.append(i)
    return result

Example:

>>> c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
>>> c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
>>> print intersect(c1,c2)
[[13, 32], [7, 13, 28], [1, 6]]

>>> b1 = [1,2,3,4,5,9,11,15]
>>> b2 = [4,5,6,7,8]
>>> print intersect(b1,b2)
[4, 5]

回答 11

你考虑[1,2]相交[1, [2]]吗?也就是说,仅仅是您关心的数字还是列表结构?

如果只是数字,请研究如何“拉平”列表,然后使用该set()方法。

Do you consider [1,2] to intersect with [1, [2]]? That is, is it only the numbers you care about, or the list structure as well?

If only the numbers, investigate how to “flatten” the lists, then use the set() method.


回答 12

我也在寻找一种实现方法,最终它最终像这样:

def compareLists(a,b):
    removed = [x for x in a if x not in b]
    added = [x for x in b if x not in a]
    overlap = [x for x in a if x in b]
    return [removed,added,overlap]

I was also looking for a way to do it, and eventually it ended up like this:

def compareLists(a,b):
    removed = [x for x in a if x not in b]
    added = [x for x in b if x not in a]
    overlap = [x for x in a if x in b]
    return [removed,added,overlap]

回答 13

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

c3 = [list(set(c2[i]).intersection(set(c1))) for i in xrange(len(c2))]

c3
->[[32, 13], [28, 13, 7], [1, 6]]
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]

c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

c3 = [list(set(c2[i]).intersection(set(c1))) for i in xrange(len(c2))]

c3
->[[32, 13], [28, 13, 7], [1, 6]]

回答 14

我们可以为此使用set方法:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

   result = [] 
   for li in c2:
       res = set(li) & set(c1)
       result.append(list(res))

   print result

We can use set methods for this:

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]

   result = [] 
   for li in c2:
       res = set(li) & set(c1)
       result.append(list(res))

   print result

回答 15

要定义正确考虑元素基数的交集,请使用Counter

from collections import Counter

>>> c1 = [1, 2, 2, 3, 4, 4, 4]
>>> c2 = [1, 2, 4, 4, 4, 4, 5]
>>> list((Counter(c1) & Counter(c2)).elements())
[1, 2, 4, 4, 4]

To define intersection that correctly takes into account the cardinality of the elements use Counter:

from collections import Counter

>>> c1 = [1, 2, 2, 3, 4, 4, 4]
>>> c2 = [1, 2, 4, 4, 4, 4, 5]
>>> list((Counter(c1) & Counter(c2)).elements())
[1, 2, 4, 4, 4]

回答 16

# Problem:  Given c1 and c2:
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
# how do you get c3 to be [[13, 32], [7, 13, 28], [1, 6]] ?

这是一种c3不涉及集合的设置方法:

c3 = []
for sublist in c2:
    c3.append([val for val in c1 if val in sublist])

但是,如果您只想使用一行,则可以执行以下操作:

c3 = [[val for val in c1 if val in sublist]  for sublist in c2]

这是列表理解中的列表理解,这有点不寻常,但是我认为您在遵循它时应该不会有太多麻烦。

# Problem:  Given c1 and c2:
c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
# how do you get c3 to be [[13, 32], [7, 13, 28], [1, 6]] ?

Here’s one way to set c3 that doesn’t involve sets:

c3 = []
for sublist in c2:
    c3.append([val for val in c1 if val in sublist])

But if you prefer to use just one line, you can do this:

c3 = [[val for val in c1 if val in sublist]  for sublist in c2]

It’s a list comprehension inside a list comprehension, which is a little unusual, but I think you shouldn’t have too much trouble following it.


回答 17

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [list(set(i) & set(c1)) for i in c2]
c3
[[32, 13], [28, 13, 7], [1, 6]]

对我来说,这是一种非常优雅,快捷的方法:)

c1 = [1, 6, 7, 10, 13, 28, 32, 41, 58, 63]
c2 = [[13, 17, 18, 21, 32], [7, 11, 13, 14, 28], [1, 5, 6, 8, 15, 16]]
c3 = [list(set(i) & set(c1)) for i in c2]
c3
[[32, 13], [28, 13, 7], [1, 6]]

For me this is very elegant and quick way to to it :)


回答 18

平面清单可以reduce很容易地通过。

您需要使用初始化程序 -函数中的第三个参数reduce

reduce(
   lambda result, _list: result.append(
       list(set(_list)&set(c1)) 
     ) or result, 
   c2, 
   [])

上面的代码适用于python2和python3,但是您需要将reduce模块导入为from functools import reduce。有关详细信息,请参见下面的链接。

flat list can be made through reduce easily.

All you need to use initializer – third argument in the reduce function.

reduce(
   lambda result, _list: result.append(
       list(set(_list)&set(c1)) 
     ) or result, 
   c2, 
   [])

Above code works for both python2 and python3, but you need to import reduce module as from functools import reduce. Refer below link for details.


回答 19

查找可迭代对象之间的差异和交集的简单方法

如果重复很重要,请使用此方法

from collections import Counter

def intersection(a, b):
    """
    Find the intersection of two iterables

    >>> intersection((1,2,3), (2,3,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,3,4))
    (2, 3, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)
    """
    return tuple(n for n, count in (Counter(a) & Counter(b)).items() for _ in range(count))

def difference(a, b):
    """
    Find the symmetric difference of two iterables

    >>> difference((1,2,3), (2,3,4))
    (1, 4)

    >>> difference((1,2,3,3), (2,3,4))
    (1, 3, 4)

    >>> difference((1,2,3,3), (2,3,4,4))
    (1, 3, 4, 4)
    """
    diff = lambda x, y: tuple(n for n, count in (Counter(x) - Counter(y)).items() for _ in range(count))
    return diff(a, b) + diff(b, a)

Simple way to find difference and intersection between iterables

Use this method if repetition matters

from collections import Counter

def intersection(a, b):
    """
    Find the intersection of two iterables

    >>> intersection((1,2,3), (2,3,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,3,4))
    (2, 3, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)

    >>> intersection((1,2,3,3), (2,3,4,4))
    (2, 3)
    """
    return tuple(n for n, count in (Counter(a) & Counter(b)).items() for _ in range(count))

def difference(a, b):
    """
    Find the symmetric difference of two iterables

    >>> difference((1,2,3), (2,3,4))
    (1, 4)

    >>> difference((1,2,3,3), (2,3,4))
    (1, 3, 4)

    >>> difference((1,2,3,3), (2,3,4,4))
    (1, 3, 4, 4)
    """
    diff = lambda x, y: tuple(n for n, count in (Counter(x) - Counter(y)).items() for _ in range(count))
    return diff(a, b) + diff(b, a)

与常规Python列表相比,NumPy有什么优势?

问题:与常规Python列表相比,NumPy有什么优势?

与常规Python列表相比,NumPy有什么优势?

我大约有100个金融市场系列,我将创建一个100x100x100 = 1百万个单元的多维数据集数组。我将每个x与y和z回归(3变量),以用标准误差填充数组。

我听说对于“大型矩阵”,出于性能和可伸缩性的原因,我应该使用NumPy而不是Python列表。事实是,我知道Python列表,它们似乎对我有用。

如果我转到NumPy,会有什么好处?

如果我有1000个序列(即立方体中有10亿个浮点单元)怎么办?

What are the advantages of NumPy over regular Python lists?

I have approximately 100 financial markets series, and I am going to create a cube array of 100x100x100 = 1 million cells. I will be regressing (3-variable) each x with each y and z, to fill the array with standard errors.

I have heard that for “large matrices” I should use NumPy as opposed to Python lists, for performance and scalability reasons. Thing is, I know Python lists and they seem to work for me.

What will the benefits be if I move to NumPy?

What if I had 1000 series (that is, 1 billion floating point cells in the cube)?


回答 0

NumPy的数组比Python列表更紧凑-您在Python中描述的列表列表至少需要20 MB左右,而单元格中具有单精度浮点数的NumPy 3D数组则需要4 MB。使用NumPy可以更快地读取和写入项目。

也许您只关心一百万个单元就不会那么在意,但是您肯定会关心十亿个单元-两种方法都不适合32位体系结构,但是使用64位版本,NumPy可以节省约4 GB ,仅Python一项就至少需要12 GB(很多指针的大小加倍),这是一个昂贵得多的硬件!

差异主要是由于“间接性”造成的-Python列表是指向Python对象的指针的数组,每个指针至少4个字节,对于最小的Python对象也至少包含16个字节(类型指针为4,引用计数为4,类型为4值-内存分配器舍入为16)。NumPy数组是统一值的数组-单精度数字每个占用4个字节,双精度数字每个占用8个字节。灵活性较差,但您要为标准Python列表的灵活性付出高昂的代价!

NumPy’s arrays are more compact than Python lists — a list of lists as you describe, in Python, would take at least 20 MB or so, while a NumPy 3D array with single-precision floats in the cells would fit in 4 MB. Access in reading and writing items is also faster with NumPy.

Maybe you don’t care that much for just a million cells, but you definitely would for a billion cells — neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) — a much costlier piece of hardware!

The difference is mostly due to “indirectness” — a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value — and the memory allocators rounds up to 16). A NumPy array is an array of uniform values — single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!


回答 1

NumPy不仅效率更高;这也更加方便。您可以免费获得许多矢量和矩阵运算,有时这可以避免不必要的工作。而且它们也得到有效实施。

例如,您可以将多维数据集直接从文件读取到数组中:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100))

沿第二维求和:

s = x.sum(axis=1)

查找哪些单元格高于阈值:

(x > 0.5).nonzero()

删除沿第三维的每个偶数索引切片:

x[:, :, ::2]

同样,许多有用的库都可以与NumPy数组一起使用。例如,统计分析和可视化库。

即使您没有性能问题,学习NumPy也是值得的。

NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

For example, you could read your cube directly from a file into an array:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100))

Sum along the second dimension:

s = x.sum(axis=1)

Find which cells are above a threshold:

(x > 0.5).nonzero()

Remove every even-indexed slice along the third dimension:

x[:, :, ::2]

Also, many useful libraries work with NumPy arrays. For example, statistical analysis and visualization libraries.

Even if you don’t have performance problems, learning NumPy is worth the effort.


回答 2

Alex提到了内存效率,Roberto提到了便利性,这些都是不错的地方。对于其他一些想法,我将提到速度功能

功能性:NumPy,FFT,卷积,快速搜索,基本统计信息,线性代数,直方图等都内置了很多功能。实际上,没有FFT谁能活下去?

速度:这是一项对列表和NumPy数组求和的测试,表明NumPy数组的求和速度快10倍(在此测试中,里程可能会有所不同)。

from numpy import arange
from timeit import Timer

Nelements = 10000
Ntimeits = 10000

x = arange(Nelements)
y = range(Nelements)

t_numpy = Timer("x.sum()", "from __main__ import x")
t_list = Timer("sum(y)", "from __main__ import y")
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))

在我的系统上(运行备份时),它会给出:

numpy: 3.004e-05
list:  5.363e-04

Alex mentioned memory efficiency, and Roberto mentions convenience, and these are both good points. For a few more ideas, I’ll mention speed and functionality.

Functionality: You get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc. And really, who can live without FFTs?

Speed: Here’s a test on doing a sum over a list and a NumPy array, showing that the sum on the NumPy array is 10x faster (in this test — mileage may vary).

from numpy import arange
from timeit import Timer

Nelements = 10000
Ntimeits = 10000

x = arange(Nelements)
y = range(Nelements)

t_numpy = Timer("x.sum()", "from __main__ import x")
t_list = Timer("sum(y)", "from __main__ import y")
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))

which on my systems (while I’m running a backup) gives:

numpy: 3.004e-05
list:  5.363e-04

回答 3

这是scipy.org网站上的常见问题解答中的一个很好的答案:

与(嵌套)Python列表相比,NumPy数组有什么优势?

Python的列表是有效的通用容器。它们支持(相当)高效的插入,删除,附加和连接,并且Python的列表理解使它们易于构造和操作。但是,它们有一定的局限性:它们不支持“向量化”操作,例如逐元素加法和乘法,并且它们可以包含不同类型的对象这一事实意味着Python必须为每个元素存储类型信息,并且必须执行类型分派代码在每个元素上操作时。这也意味着有效的C循环几乎无法执行列表操作-每次迭代都需要类型检查和其他Python API簿记。

Here’s a nice answer from the FAQ on the scipy.org website:

What advantages do NumPy arrays offer over (nested) Python lists?

Python’s lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending, and concatenation, and Python’s list comprehensions make them easy to construct and manipulate. However, they have certain limitations: they don’t support “vectorized” operations like elementwise addition and multiplication, and the fact that they can contain objects of differing types mean that Python must store type information for every element, and must execute type dispatching code when operating on each element. This also means that very few list operations can be carried out by efficient C loops – each iteration would require type checks and other Python API bookkeeping.


回答 4

所有人都强调了numpy数组和python列表之间的几乎所有主要区别,在这里我将向大家简单介绍一下:

  1. Numpy数组在创建时具有固定的大小,这与python列表(可以动态增长)不同。更改ndarray的大小将创建一个新数组并删除原始数组。

  2. Numpy数组中的所有元素都必须具有相同的数据类型(我们也可以具有异构类型,但这将不允许您进行数学运算),因此在内存中的大小将相同

  3. Numpy数组有助于对大量数据进行数学运算和其他类型的运算。通常,与使用python顺序构建相比,此类操作执行效率更高且代码更少

All have highlighted almost all major differences between numpy array and python list, I will just brief them out here:

  1. Numpy arrays have a fixed size at creation, unlike python lists (which can grow dynamically). Changing the size of ndarray will create a new array and delete the original.

  2. The elements in a Numpy array are all required to be of the same data type (we can have the heterogeneous type as well but that will not gonna permit you mathematical operations) and thus will be the same size in memory

  3. Numpy arrays are facilitated advances mathematical and other types of operations on large numbers of data. Typically such operations are executed more efficiently and with less code than is possible using pythons build in sequences


使用列表上的max()/ min()获取返回的最大或最小项目的索引

问题:使用列表上的max()/ min()获取返回的最大或最小项目的索引

我正在使用列表中的Python maxmin函数来执行minimax算法,并且需要由max()或返回的值的索引min()。换句话说,我需要知道哪个移动产生了最大(第一玩家回合)或最小(第二玩家)值。

for i in range(9):
    newBoard = currentBoard.newBoardWithMove([i / 3, i % 3], player)

    if newBoard:
        temp = minMax(newBoard, depth + 1, not isMinLevel)  
        values.append(temp)

if isMinLevel:
    return min(values)
else:
    return max(values)

我需要能够返回最小值或最大值的实际索引,而不仅仅是返回值。

I’m using Python’s max and min functions on lists for a minimax algorithm, and I need the index of the value returned by max() or min(). In other words, I need to know which move produced the max (at a first player’s turn) or min (second player) value.

for i in range(9):
    newBoard = currentBoard.newBoardWithMove([i / 3, i % 3], player)

    if newBoard:
        temp = minMax(newBoard, depth + 1, not isMinLevel)  
        values.append(temp)

if isMinLevel:
    return min(values)
else:
    return max(values)

I need to be able to return the actual index of the min or max value, not just the value.


回答 0

如果isMinLevel:
    返回values.index(min(values))
其他:
    返回values.index(max(values))
if isMinLevel:
    return values.index(min(values))
else:
    return values.index(max(values))

回答 1

假设您有一个list values = [3,6,1,5],并且需要最小元素的索引,即index_min = 2在这种情况下。

避免itemgetter()使用其他答案中提出的解决方案,而改用

index_min = min(range(len(values)), key=values.__getitem__)

因为它不需要import operator使用enumerate,也总是比使用解决方案更快(下面的基准)itemgetter()

如果您正在处理numpy数组或可以numpy作为依赖提供,请考虑同时使用

import numpy as np
index_min = np.argmin(values)

即使在以下情况下将其应用于纯Python列表,也将比第一个解决方案更快。

  • 它大于几个元素(我的机器上大约2 ** 4个元素)
  • 您可以提供从纯列表到numpy数组的内存副本

正如该基准所指出的: 在此处输入图片说明

我已经在我的机器上使用python 2.7运行了基准测试,用于上述两个解决方案(蓝色:纯python,第一个解决方案)(红色,numpy解决方案)以及基于的标准解决方案itemgetter()(黑色,参考解决方案)。与python 3.5相同的基准测试表明,这些方法与上述python 2.7情况完全相同

Say that you have a list values = [3,6,1,5], and need the index of the smallest element, i.e. index_min = 2 in this case.

Avoid the solution with itemgetter() presented in the other answers, and use instead

index_min = min(range(len(values)), key=values.__getitem__)

because it doesn’t require to import operator nor to use enumerate, and it is always faster(benchmark below) than a solution using itemgetter().

If you are dealing with numpy arrays or can afford numpy as a dependency, consider also using

import numpy as np
index_min = np.argmin(values)

This will be faster than the first solution even if you apply it to a pure Python list if:

  • it is larger than a few elements (about 2**4 elements on my machine)
  • you can afford the memory copy from a pure list to a numpy array

as this benchmark points out: enter image description here

I have run the benchmark on my machine with python 2.7 for the two solutions above (blue: pure python, first solution) (red, numpy solution) and for the standard solution based on itemgetter() (black, reference solution). The same benchmark with python 3.5 showed that the methods compare exactly the same of the python 2.7 case presented above


回答 2

如果您枚举列表中的项目,则可以同时找到min / max索引和值,但是要对列表的原始值执行min / max。像这样:

import operator
min_index, min_value = min(enumerate(values), key=operator.itemgetter(1))
max_index, max_value = max(enumerate(values), key=operator.itemgetter(1))

这样,列表将只遍历一次最小值(或最大值)。

You can find the min/max index and value at the same time if you enumerate the items in the list, but perform min/max on the original values of the list. Like so:

import operator
min_index, min_value = min(enumerate(values), key=operator.itemgetter(1))
max_index, max_value = max(enumerate(values), key=operator.itemgetter(1))

This way the list will only be traversed once for min (or max).


回答 3

如果要在数字列表中查找max的索引(这似乎是您的情况),那么建议您使用numpy:

import numpy as np
ind = np.argmax(mylist)

If you want to find the index of max within a list of numbers (which seems your case), then I suggest you use numpy:

import numpy as np
ind = np.argmax(mylist)

回答 4

可能更简单的解决方案是将值的数组转换为值,索引对的数组,并取其最大值/最小值。这将给出具有最大值/最小值的最大/最小索引(即,通过首先比较第一个元素,然后比较第二个元素(如果第一个元素相同,则比较第二对元素))。注意,实际上不需要创建数组,因为最小/最大允许生成器作为输入。

values = [3,4,5]
(m,i) = max((v,i) for i,v in enumerate(values))
print (m,i) #(5, 2)

Possibly a simpler solution would be to turn the array of values into an array of value,index-pairs, and take the max/min of that. This would give the largest/smallest index that has the max/min (i.e. pairs are compared by first comparing the first element, and then comparing the second element if the first ones are the same). Note that it’s not necessary to actually create the array, because min/max allow generators as input.

values = [3,4,5]
(m,i) = max((v,i) for i,v in enumerate(values))
print (m,i) #(5, 2)

回答 5

list=[1.1412, 4.3453, 5.8709, 0.1314]
list.index(min(list))

将给您第一个最小值的索引。

list=[1.1412, 4.3453, 5.8709, 0.1314]
list.index(min(list))

Will give you first index of minimum.


回答 6

我认为最好的办法是将列表转换为a numpy array并使用以下功能:

a = np.array(list)
idx = np.argmax(a)

I think the best thing to do is convert the list to a numpy array and use this function :

a = np.array(list)
idx = np.argmax(a)

回答 7

我对此也很感兴趣,并使用perfplot比较了一些建议的解决方案(我的一个宠物项目)。

原来那是numpy的argmin

numpy.argmin(x)

即使足够大的列表(从输入list到a 的隐式转换),它也是最快的方法numpy.array

在此处输入图片说明


生成绘图的代码:

import numpy
import operator
import perfplot


def min_enumerate(a):
    return min(enumerate(a), key=lambda x: x[1])[0]


def min_enumerate_itemgetter(a):
    min_index, min_value = min(enumerate(a), key=operator.itemgetter(1))
    return min_index


def getitem(a):
    return min(range(len(a)), key=a.__getitem__)


def np_argmin(a):
    return numpy.argmin(a)


perfplot.show(
    setup=lambda n: numpy.random.rand(n).tolist(),
    kernels=[
        min_enumerate,
        min_enumerate_itemgetter,
        getitem,
        np_argmin,
        ],
    n_range=[2**k for k in range(15)],
    logx=True,
    logy=True,
    )

I was also interested in this and compared some of the suggested solutions using perfplot (a pet project of mine).

Turns out that numpy’s argmin,

numpy.argmin(x)

is the fastest method for large enough lists, even with the implicit conversion from the input list to a numpy.array.

enter image description here


Code for generating the plot:

import numpy
import operator
import perfplot


def min_enumerate(a):
    return min(enumerate(a), key=lambda x: x[1])[0]


def min_enumerate_itemgetter(a):
    min_index, min_value = min(enumerate(a), key=operator.itemgetter(1))
    return min_index


def getitem(a):
    return min(range(len(a)), key=a.__getitem__)


def np_argmin(a):
    return numpy.argmin(a)


perfplot.show(
    setup=lambda n: numpy.random.rand(n).tolist(),
    kernels=[
        min_enumerate,
        min_enumerate_itemgetter,
        getitem,
        np_argmin,
        ],
    n_range=[2**k for k in range(15)],
    logx=True,
    logy=True,
    )

回答 8

使用一个numpy数组和argmax()函数

 a=np.array([1,2,3])
 b=np.argmax(a)
 print(b) #2

Use a numpy array and the argmax() function

 a=np.array([1,2,3])
 b=np.argmax(a)
 print(b) #2

回答 9

获得最大值后,请尝试以下操作:

max_val = max(list)
index_max = list.index(max_val)

比很多选择要简单得多。

After you get the maximum values, try this:

max_val = max(list)
index_max = list.index(max_val)

Much simpler than a lot of options.


回答 10

我认为以上答案解决了您的问题,但我想我会分享一种方法,该方法可以为您提供最小值以及所有出现在其中的索引。

minval = min(mylist)
ind = [i for i, v in enumerate(mylist) if v == minval]

这两次通过了列表,但仍然非常快。但是,这比找到最小值的第一次遇到的索引要慢一些。因此,如果您仅需要一个最小值,请使用Matt Anderson的解决方案,如果您需要全部解决方案,请使用此解决方案。

I think the answer above solves your problem but I thought I’d share a method that gives you the minimum and all the indices the minimum appears in.

minval = min(mylist)
ind = [i for i, v in enumerate(mylist) if v == minval]

This passes the list twice but is still quite fast. It is however slightly slower than finding the index of the first encounter of the minimum. So if you need just one of the minima, use Matt Anderson‘s solution, if you need them all, use this.


回答 11

使用numpy模块的函数numpy.where

import numpy as n
x = n.array((3,3,4,7,4,56,65,1))

对于最小值索引:

idx = n.where(x==x.min())[0]

对于最大值索引:

idx = n.where(x==x.max())[0]

实际上,此功能更强大。您可以构成各种布尔运算对于3到60之间的值的索引:

idx = n.where((x>3)&(x<60))[0]
idx
array([2, 3, 4, 5])
x[idx]
array([ 4,  7,  4, 56])

Use numpy module’s function numpy.where

import numpy as n
x = n.array((3,3,4,7,4,56,65,1))

For index of minimum value:

idx = n.where(x==x.min())[0]

For index of maximum value:

idx = n.where(x==x.max())[0]

In fact, this function is much more powerful. You can pose all kinds of boolean operations For index of value between 3 and 60:

idx = n.where((x>3)&(x<60))[0]
idx
array([2, 3, 4, 5])
x[idx]
array([ 4,  7,  4, 56])

回答 12

使用内置enumerate()max()函数以及函数的可选key参数max()和简单的lambda表达式,就可以轻松实现:

theList = [1, 5, 10]
maxIndex, maxValue = max(enumerate(theList), key=lambda v: v[1])
# => (2, 10)

在文档中max()说,该key参数需要一个类似于函数中的list.sort()函数。另请参阅“ 排序方法”

的工作原理相同min()。顺便说一句,它返回第一个最大值/最小值。

This is simply possible using the built-in enumerate() and max() function and the optional key argument of the max() function and a simple lambda expression:

theList = [1, 5, 10]
maxIndex, maxValue = max(enumerate(theList), key=lambda v: v[1])
# => (2, 10)

In the docs for max() it says that the key argument expects a function like in the list.sort() function. Also see the Sorting How To.

It works the same for min(). Btw it returns the first max/min value.


回答 13

假设您有一个清单,例如:

a = [9,8,7]

以下两种方法是使用最小元素及其索引获取元组的非常紧凑的方法。两者都相似时间来处理。我更喜欢zip方法,但这就是我的口味。

拉链方式

element, index = min(list(zip(a, range(len(a)))))

min(list(zip(a, range(len(a)))))
(7, 2)

timeit min(list(zip(a, range(len(a)))))
1.36 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

列举方法

index, element = min(list(enumerate(a)), key=lambda x:x[1])

min(list(enumerate(a)), key=lambda x:x[1])
(2, 7)

timeit min(list(enumerate(a)), key=lambda x:x[1])
1.45 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Say you have a list such as:

a = [9,8,7]

The following two methods are pretty compact ways to get a tuple with the minimum element and its index. Both take a similar time to process. I better like the zip method, but that is my taste.

zip method

element, index = min(list(zip(a, range(len(a)))))

min(list(zip(a, range(len(a)))))
(7, 2)

timeit min(list(zip(a, range(len(a)))))
1.36 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

enumerate method

index, element = min(list(enumerate(a)), key=lambda x:x[1])

min(list(enumerate(a)), key=lambda x:x[1])
(2, 7)

timeit min(list(enumerate(a)), key=lambda x:x[1])
1.45 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

回答 14

只要您知道如何使用lambda和“ key”参数,一个简单的解决方案就是:

max_index = max( range( len(my_list) ), key = lambda index : my_list[ index ] )

As long as you know how to use lambda and the “key” argument, a simple solution is:

max_index = max( range( len(my_list) ), key = lambda index : my_list[ index ] )

回答 15

就那么简单 :

stuff = [2, 4, 8, 15, 11]

index = stuff.index(max(stuff))

Simple as that :

stuff = [2, 4, 8, 15, 11]

index = stuff.index(max(stuff))

回答 16

为什么要先添加索引然后再颠倒索引呢?Enumerate()函数只是zip()函数用法的一种特殊情况。让我们以适当的方式使用它:

my_indexed_list = zip(my_list, range(len(my_list)))

min_value, min_index = min(my_indexed_list)
max_value, max_index = max(my_indexed_list)

Why bother to add indices first and then reverse them? Enumerate() function is just a special case of zip() function usage. Let’s use it in appropiate way:

my_indexed_list = zip(my_list, range(len(my_list)))

min_value, min_index = min(my_indexed_list)
max_value, max_index = max(my_indexed_list)

回答 17

只是已经说过的一小部分。 values.index(min(values))似乎返回的最小值的最小值。以下是最大的索引:

    values.reverse()
    (values.index(min(values)) + len(values) - 1) % len(values)
    values.reverse()

如果原地反转的副作用无关紧要,则可以省略最后一行。

遍历所有事件

    indices = []
    i = -1
    for _ in range(values.count(min(values))):
      i = values[i + 1:].index(min(values)) + i + 1
      indices.append(i)

为了简洁起见。将其缓存min(values), values.count(min)在循环外可能是一个更好的主意。

Just a minor addition to what has already been said. values.index(min(values)) seems to return the smallest index of min. The following gets the largest index:

    values.reverse()
    (values.index(min(values)) + len(values) - 1) % len(values)
    values.reverse()

The last line can be left out if the side effect of reversing in place does not matter.

To iterate through all occurrences

    indices = []
    i = -1
    for _ in range(values.count(min(values))):
      i = values[i + 1:].index(min(values)) + i + 1
      indices.append(i)

For the sake of brevity. It is probably a better idea to cache min(values), values.count(min) outside the loop.


回答 18

如果您不想导入其他模块,则可以使用一种简单的方法在列表中查找值最小的索引:

min_value = min(values)
indexes_with_min_value = [i for i in range(0,len(values)) if values[i] == min_value]

然后选择第一个:

choosen = indexes_with_min_value[0]

A simple way for finding the indexes with minimal value in a list if you don’t want to import additional modules:

min_value = min(values)
indexes_with_min_value = [i for i in range(0,len(values)) if values[i] == min_value]

Then choose for example the first one:

choosen = indexes_with_min_value[0]

回答 19

没有足够高的代表评论现有答案。

但对于https://stackoverflow.com/a/11825864/3920439回答

这适用于整数,但不适用于浮点数数组(至少在python 3.6中),它将引发 TypeError: list indices must be integers or slices, not float

Dont have high enough rep to comment on existing answer.

But for https://stackoverflow.com/a/11825864/3920439 answer

This works for integers, but does not work for array of floats (at least in python 3.6) It will raise TypeError: list indices must be integers or slices, not float


回答 20

https://docs.python.org/3/library/functions.html#max

如果有多个最大项,则该函数返回遇到的第一个项。这与其他排序稳定性保持工具(例如sorted(iterable, key=keyfunc, reverse=True)[0]

要获得的不仅仅是第一个,请使用sort方法。

import operator

x = [2, 5, 7, 4, 8, 2, 6, 1, 7, 1, 8, 3, 4, 9, 3, 6, 5, 0, 9, 0]

min = False
max = True

min_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = min )

max_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = max )


min_val_index[0]
>(0, 17)

max_val_index[0]
>(9, 13)

import ittertools

max_val = max_val_index[0][0]

maxes = [n for n in itertools.takewhile(lambda x: x[0] == max_val, max_val_index)]

https://docs.python.org/3/library/functions.html#max

If multiple items are maximal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc, reverse=True)[0]

To get more than just the first use the sort method.

import operator

x = [2, 5, 7, 4, 8, 2, 6, 1, 7, 1, 8, 3, 4, 9, 3, 6, 5, 0, 9, 0]

min = False
max = True

min_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = min )

max_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = max )


min_val_index[0]
>(0, 17)

max_val_index[0]
>(9, 13)

import ittertools

max_val = max_val_index[0][0]

maxes = [n for n in itertools.takewhile(lambda x: x[0] == max_val, max_val_index)]

回答 21

那这个呢:

a=[1,55,2,36,35,34,98,0]
max_index=dict(zip(a,range(len(a))))[max(a)]

它从in中的项a作为键创建其字典,并将其索引作为值创建索引,从而dict(zip(a,range(len(a))))[max(a)]返回与键对应的值,即max(a)a中最大值的索引。我是python的初学者,所以我不知道该解决方案的计算复杂性。

What about this:

a=[1,55,2,36,35,34,98,0]
max_index=dict(zip(a,range(len(a))))[max(a)]

It creates a dictionary from the items in a as keys and their indexes as values, thus dict(zip(a,range(len(a))))[max(a)] returns the value that corresponds to the key max(a) which is the index of the maximum in a. I’m a beginner in python so I don’t know about the computational complexity of this solution.


如何检查对象是列表还是元组(而不是字符串)?

问题:如何检查对象是列表还是元组(而不是字符串)?

这就是我通常做,以确定输入是一个list/ tuple-但不是str。因为很多时候我偶然发现了一个错误,即一个函数str错误地传递了一个对象,而目标函数确实for x in lst假定这lst实际上是一个listor tuple

assert isinstance(lst, (list, tuple))

我的问题是:是否有更好的方法来实现这一目标?

This is what I normally do in order to ascertain that the input is a list/tuple – but not a str. Because many times I stumbled upon bugs where a function passes a str object by mistake, and the target function does for x in lst assuming that lst is actually a list or tuple.

assert isinstance(lst, (list, tuple))

My question is: is there a better way of achieving this?


回答 0

仅在python 2中(不是python 3):

assert not isinstance(lst, basestring)

实际上就是您想要的,否则您会错过很多像列表一样的东西,但它们不是listor的子类tuple

In python 2 only (not python 3):

assert not isinstance(lst, basestring)

Is actually what you want, otherwise you’ll miss out on a lot of things which act like lists, but aren’t subclasses of list or tuple.


回答 1

请记住,在Python中,我们要使用“鸭子类型”。因此,任何类似列表的行为都可以视为列表。因此,不要检查列表的类型,只看它是否像列表一样。

但是字符串也像列表一样,通常这不是我们想要的。有时甚至是一个问题!因此,显式检查字符串,然后使用鸭子类型。

这是我写的一个有趣的函数。这是它的特殊版本,repr()可以在尖括号('<‘,’>’)中打印任何序列。

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

总体而言,这是干净优雅的。但是那张isinstance()支票在那里做什么?这是一种hack。但这是必不可少的。

该函数以递归方式调用类似于列表的任何对象。如果我们不专门处理字符串,则将其视为列表,并一次拆分一个字符。但是,然后递归调用将尝试将每个字符视为一个列表-它将起作用!即使是一个字符的字符串也可以作为列表!该函数将继续递归调用自身,直到堆栈溢出为止。

像这样的函数,依赖于每个递归调用来分解要完成的工作,必须使用特殊情况的字符串-因为您不能将字符串分解为一个字符以下的字符串,甚至不能分解为一个以下的字符串-字符字符串的作用类似于列表。

注意:try/ except是表达我们意图的最干净的方法。但是,如果这段代码在某种程度上对时间很紧迫,我们可能要用某种测试来替换它,看看是否arg是一个序列。除了测试类型,我们可能应该测试行为。如果它有一个.strip()方法,它是一个字符串,所以不要认为它是一个序列。否则,如果它是可索引的或可迭代的,则它是一个序列:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

编辑:我最初写上面检查,__getslice__()但我注意到在collections模块文档中,有趣的方法是__getitem__(); 这很有意义,这就是您索引对象的方式。这似乎比根本,__getslice__()因此我更改了上面的内容。

Remember that in Python we want to use “duck typing”. So, anything that acts like a list can be treated as a list. So, don’t check for the type of a list, just see if it acts like a list.

But strings act like a list too, and often that is not what we want. There are times when it is even a problem! So, check explicitly for a string, but then use duck typing.

Here is a function I wrote for fun. It is a special version of repr() that prints any sequence in angle brackets (‘<‘, ‘>’).

def srepr(arg):
    if isinstance(arg, basestring): # Python 3: isinstance(arg, str)
        return repr(arg)
    try:
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    except TypeError: # catch when for loop fails
        return repr(arg) # not a sequence so just return repr

This is clean and elegant, overall. But what’s that isinstance() check doing there? That’s kind of a hack. But it is essential.

This function calls itself recursively on anything that acts like a list. If we didn’t handle the string specially, then it would be treated like a list, and split up one character at a time. But then the recursive call would try to treat each character as a list — and it would work! Even a one-character string works as a list! The function would keep on calling itself recursively until stack overflow.

Functions like this one, that depend on each recursive call breaking down the work to be done, have to special-case strings–because you can’t break down a string below the level of a one-character string, and even a one-character string acts like a list.

Note: the try/except is the cleanest way to express our intentions. But if this code were somehow time-critical, we might want to replace it with some sort of test to see if arg is a sequence. Rather than testing the type, we should probably test behaviors. If it has a .strip() method, it’s a string, so don’t consider it a sequence; otherwise, if it is indexable or iterable, it’s a sequence:

def is_sequence(arg):
    return (not hasattr(arg, "strip") and
            hasattr(arg, "__getitem__") or
            hasattr(arg, "__iter__"))

def srepr(arg):
    if is_sequence(arg):
        return '<' + ", ".join(srepr(x) for x in arg) + '>'
    return repr(arg)

EDIT: I originally wrote the above with a check for __getslice__() but I noticed that in the collections module documentation, the interesting method is __getitem__(); this makes sense, that’s how you index an object. That seems more fundamental than __getslice__() so I changed the above.


回答 2

H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.
H = "Hello"

if type(H) is list or type(H) is tuple:
    ## Do Something.
else
    ## Do Something.

回答 3

对于Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

在版本3.3中进行了更改:将集合抽象基类移至collections.abc模块。为了向后兼容,它们在此模块中也将继续可见,直到3.8版将停止工作为止。

对于Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

For Python 3:

import collections.abc

if isinstance(obj, collections.abc.Sequence) and not isinstance(obj, str):
    print("obj is a sequence (list, tuple, etc) but not a string")

Changed in version 3.3: Moved Collections Abstract Base Classes to the collections.abc module. For backwards compatibility, they will continue to be visible in this module as well until version 3.8 where it will stop working.

For Python 2:

import collections

if isinstance(obj, collections.Sequence) and not isinstance(obj, basestring):
    print "obj is a sequence (list, tuple, etc) but not a string or unicode"

回答 4

具有PHP风格的Python:

def is_array(var):
    return isinstance(var, (list, tuple))

Python with PHP flavor:

def is_array(var):
    return isinstance(var, (list, tuple))

回答 5

一般来说,在对象上进行迭代的函数不仅可以处理错误,还可以处理字符串,元组和列表。您当然可以使用isinstance或鸭式输入来检查参数,但是为什么要这么做呢?

这听起来像是个反问,但事实并非如此。答案为“为什么我应该检查参数的类型?” 可能会建议解决实际问题,而不是感知到的问题。将字符串传递给函数时,为什么会出错?另外:如果将字符串传递给此函数是一个错误,是否将其他非列表/元组可迭代传递给它也是一个错误吗?为什么或者为什么不?

我认为这个问题的最常见答案可能是 f("abc")期望该函数的行为就像编写的一样f(["abc"])。在某些情况下,保护开发人员免受自身侵害比支持对字符串中的字符进行迭代的用例更有意义。但是我首先会考虑很长时间。

Generally speaking, the fact that a function which iterates over an object works on strings as well as tuples and lists is more feature than bug. You certainly can use isinstance or duck typing to check an argument, but why should you?

That sounds like a rhetorical question, but it isn’t. The answer to “why should I check the argument’s type?” is probably going to suggest a solution to the real problem, not the perceived problem. Why is it a bug when a string is passed to the function? Also: if it’s a bug when a string is passed to this function, is it also a bug if some other non-list/tuple iterable is passed to it? Why, or why not?

I think that the most common answer to the question is likely to be that developers who write f("abc") are expecting the function to behave as though they’d written f(["abc"]). There are probably circumstances where it makes more sense to protect developers from themselves than it does to support the use case of iterating across the characters in a string. But I’d think long and hard about it first.


回答 6

尝试此操作以提高可读性和最佳做法:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

希望能帮助到你。

Try this for readability and best practices:

Python2

import types
if isinstance(lst, types.ListType) or isinstance(lst, types.TupleType):
    # Do something

Python3

import typing
if isinstance(lst, typing.List) or isinstance(lst, typing.Tuple):
    # Do something

Hope it helps.


回答 7

str对象没有__iter__属性

>>> hasattr('', '__iter__')
False 

所以你可以检查一下

assert hasattr(x, '__iter__')

这也AssertionError将为其他任何不可迭代的对象带来好处。

编辑: 正如蒂姆在评论中提到的那样,这仅适用于python 2.x,而不是3.x

The str object doesn’t have an __iter__ attribute

>>> hasattr('', '__iter__')
False 

so you can do a check

assert hasattr(x, '__iter__')

and this will also raise a nice AssertionError for any other non-iterable object too.

Edit: As Tim mentions in the comments, this will only work in python 2.x, not 3.x


回答 8

这并不是要直接回答OP,而是要分享一些相关想法。

我对上面的@steveha回答非常感兴趣,这似乎举了一个鸭子输入似乎中断的示例。换个角度说,他的例子表明鸭子的分类很难遵循,但是并不能说明str值得任何特殊处理。

毕竟,非str类型(例如,维护一些复杂的递归结构的用户定义类型)可能导致@steveha srepr函数引起无限递归。尽管这确实不太可能,但我们不能忽略这种可能性。因此,与其特殊外壳strsrepr,我们应该明确,我们想要什么srepr在无限递归产生时的事情情况。

似乎一种合理的方法是srepr暂时中断当前递归list(arg) == [arg]。这,其实,彻底解决这个问题str,没有任何isinstance

但是,真正复杂的递归结构可能会导致无限循环,list(arg) == [arg]永远不会发生。因此,尽管上面的检查很有用,但还不够。我们需要对递归深度进行严格限制。

我的观点是,如果您打算处理任意参数类型,则str通过鸭子类型进行处理要比处理(理论上)遇到的更通用类型容易得多。因此,如果您需要排除str实例,则应该要求该参数是您明确指定的几种类型之一的实例。

This is not intended to directly answer the OP, but I wanted to share some related ideas.

I was very interested in @steveha answer above, which seemed to give an example where duck typing seems to break. On second thought, however, his example suggests that duck typing is hard to conform to, but it does not suggest that str deserves any special handling.

After all, a non-str type (e.g., a user-defined type that maintains some complicated recursive structures) may cause @steveha srepr function to cause an infinite recursion. While this is admittedly rather unlikely, we can’t ignore this possibility. Therefore, rather than special-casing str in srepr, we should clarify what we want srepr to do when an infinite recursion results.

It may seem that one reasonable approach is to simply break the recursion in srepr the moment list(arg) == [arg]. This would, in fact, completely solve the problem with str, without any isinstance.

However, a really complicated recursive structure may cause an infinite loop where list(arg) == [arg] never happens. Therefore, while the above check is useful, it’s not sufficient. We need something like a hard limit on the recursion depth.

My point is that if you plan to handle arbitrary argument types, handling str via duck typing is far, far easier than handling the more general types you may (theoretically) encounter. So if you feel the need to exclude str instances, you should instead demand that the argument is an instance of one of the few types that you explicitly specify.


回答 9

在tensorflow中找到了一个名为is_sequence的函数

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

而且我已经证实它可以满足您的需求。

I find such a function named is_sequence in tensorflow.

def is_sequence(seq):
  """Returns a true if its input is a collections.Sequence (except strings).
  Args:
    seq: an input sequence.
  Returns:
    True if the sequence is a not a string and is a collections.Sequence.
  """
  return (isinstance(seq, collections.Sequence)
and not isinstance(seq, six.string_types))

And I have verified that it meets your needs.


回答 10

我在测试用例中执行此操作。

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

未经生成器测试,我认为如果通过生成器,您将处于下一个“收益”状态,这可能会使下游情况恶化。但是再说一次,这是一个“单元测试”

I do this in my testcases.

def assertIsIterable(self, item):
    #add types here you don't want to mistake as iterables
    if isinstance(item, basestring): 
        raise AssertionError("type %s is not iterable" % type(item))

    #Fake an iteration.
    try:
        for x in item:
            break;
    except TypeError:
        raise AssertionError("type %s is not iterable" % type(item))

Untested on generators, I think you are left at the next ‘yield’ if passed in a generator, which may screw things up downstream. But then again, this is a ‘unittest’


回答 11

以“鸭子打字”的方式

try:
    lst = lst + []
except TypeError:
    #it's not a list

要么

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

分别。这避免了isinstance/ hasattr内省的东西。

您也可以反之亦然:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

所有变体实际上都不会更改变量的内容,而是暗示了重新分配。我不确定这在某些情况下是否不受欢迎。

有趣的是,在任何情况下,如果是列表(不是元组),则在“就地”赋值时都不会引发+=no 。这就是为什么以这种方式完成分配的原因。也许有人可以阐明原因。TypeErrorlst

In “duck typing” manner, how about

try:
    lst = lst + []
except TypeError:
    #it's not a list

or

try:
    lst = lst + ()
except TypeError:
    #it's not a tuple

respectively. This avoids the isinstance / hasattr introspection stuff.

You could also check vice versa:

try:
    lst = lst + ''
except TypeError:
    #it's not (base)string

All variants do not actually change the content of the variable, but imply a reassignment. I’m unsure whether this might be undesirable under some circumstances.

Interestingly, with the “in place” assignment += no TypeError would be raised in any case if lst is a list (not a tuple). That’s why the assignment is done this way. Maybe someone can shed light on why that is.


回答 12

最简单的方法…使用anyisinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

simplest way… using any and isinstance

>>> console_routers = 'x'
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
False
>>>
>>> console_routers = ('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True
>>> console_routers = list('x',)
>>> any([isinstance(console_routers, list), isinstance(console_routers, tuple)])
True

回答 13

鸭式打字的另一种形式,可以帮助区分类似字符串的对象和其他类似序列的对象。

类字符串对象的字符串表示形式是字符串本身,因此您可以检查是否从str构造函数中返回了相等的对象:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

这应该适用于与str所有可迭代对象兼容的所有对象。

Another version of duck-typing to help distinguish string-like objects from other sequence-like objects.

The string representation of string-like objects is the string itself, so you can check if you get an equal object back from the str constructor:

# If a string was passed, convert it to a single-element sequence
if var == str(var):
    my_list = [var]

# All other iterables
else: 
    my_list = list(var)

This should work for all objects compatible with str and for all kinds of iterable objects.


回答 14

Python 3具有以下功能:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

因此,要同时检查列表和元组,将是:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

Python 3 has this:

from typing import List

def isit(value):
    return isinstance(value, List)

isit([1, 2, 3])  # True
isit("test")  # False
isit({"Hello": "Mars"})  # False
isit((1, 2))  # False

So to check for both Lists and Tuples, it would be:

from typing import List, Tuple

def isit(value):
    return isinstance(value, List) or isinstance(value, Tuple)

回答 15

assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"
assert (type(lst) == list) | (type(lst) == tuple), "Not a valid lst type, cannot be string"

回答 16

做这个

if type(lst) in (list, tuple):
    # Do stuff

Just do this

if type(lst) in (list, tuple):
    # Do stuff

回答 17

在python> 3.6中

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

in python >3.6

import collections
isinstance(set(),collections.abc.Container)
True
isinstance([],collections.abc.Container)
True
isinstance({},collections.abc.Container)
True
isinstance((),collections.abc.Container)
True
isinstance(str,collections.abc.Container)
False

回答 18

我倾向于这样做(如果真的必须这样做的话):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

I tend to do this (if I really, really had to):

for i in some_var:
   if type(i) == type(list()):
       #do something with a list
   elif type(i) == type(tuple()):
       #do something with a tuple
   elif type(i) == type(str()):
       #here's your string

拼合不规则的列表

问题:拼合不规则的列表

是的,我知道以前已经讨论过这个主题(这里这里这里这里),但是据我所知,除一个解决方案外,所有解决方案在这样的列表上都失败了:

L = [[[1, 2, 3], [4, 5]], 6]

所需的输出是

[1, 2, 3, 4, 5, 6]

甚至更好的迭代器。这个问题是我看到的唯一适用于任意嵌套的解决方案:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

flatten(L)

这是最好的模型吗?我有事吗 任何问题?

Yes, I know this subject has been covered before (here, here, here, here), but as far as I know, all solutions, except for one, fail on a list like this:

L = [[[1, 2, 3], [4, 5]], 6]

Where the desired output is

[1, 2, 3, 4, 5, 6]

Or perhaps even better, an iterator. The only solution I saw that works for an arbitrary nesting is found in this question:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

flatten(L)

Is this the best model? Did I overlook something? Any problems?


回答 0

使用生成器函数可以使您的示例更易于阅读,并可能提高性能。

Python 2

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

我使用了2.6中添加的Iterable ABC

Python 3

在Python 3中,basestring是没有更多的,但你可以使用一个元组str,并bytes得到同样的效果存在。

yield from运营商从一时间产生一个返回的项目。这句法委派到子发生器在3.3加入

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

Using generator functions can make your example a little easier to read and probably boost the performance.

Python 2

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

I used the Iterable ABC added in 2.6.

Python 3

In Python 3, the basestring is no more, but you can use a tuple of str and bytes to get the same effect there.

The yield from operator returns an item from a generator one at a time. This syntax for delegating to a subgenerator was added in 3.3

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

回答 1

我的解决方案:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

更加简洁,但几乎相同。

My solution:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

A little more concise, but pretty much the same.


回答 2

使用递归和鸭子类型生成器(针对Python 3更新):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

Generator using recursion and duck typing (updated for Python 3):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

回答 3

@unutbu的非递归解决方案的生成器版本,由@Andrew在注释中要求:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

此生成器的简化版本:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

Generator version of @unutbu’s non-recursive solution, as requested by @Andrew in a comment:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

Slightly simplified version of this generator:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

回答 4

这是我的功能性版本的递归展平,它既处理元组又处理列表,并允许您引入位置参数的任何组合。返回一个生成器,该生成器按arg由arg的顺序生成整个序列:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

用法:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Here is my functional version of recursive flatten which handles both tuples and lists, and lets you throw in any mix of positional arguments. Returns a generator which produces the entire sequence in order, arg by arg:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

Usage:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

回答 5

此版本的版本flatten避免了python的递归限制(因此可用于任意深度的嵌套可迭代对象)。它是一个生成器,可以处理字符串和任意可迭代(甚至是无限的)。

import itertools as IT
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        first = next(remainder)
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = IT.chain(first, remainder)
        else:
            yield first

以下是一些示例说明其用法:

print(list(IT.islice(flatten(IT.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(IT.islice(flatten(IT.chain(IT.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       IT.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

尽管flatten可以处理无限生成器,但不能处理无限嵌套:

def infinitely_nested():
    while True:
        yield IT.chain(infinitely_nested(), IT.repeat(1))

print(list(IT.islice(flatten(infinitely_nested()), 10)))
# hangs

This version of flatten avoids python’s recursion limit (and thus works with arbitrarily deep, nested iterables). It is a generator which can handle strings and arbitrary iterables (even infinite ones).

import itertools as IT
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        first = next(remainder)
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = IT.chain(first, remainder)
        else:
            yield first

Here are some examples demonstrating its use:

print(list(IT.islice(flatten(IT.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(IT.islice(flatten(IT.chain(IT.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       IT.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

Although flatten can handle infinite generators, it can not handle infinite nesting:

def infinitely_nested():
    while True:
        yield IT.chain(infinitely_nested(), IT.repeat(1))

print(list(IT.islice(flatten(infinitely_nested()), 10)))
# hangs

回答 6

这是另一个更有趣的答案…

import re

def Flatten(TheList):
    a = str(TheList)
    b,crap = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

基本上,它将嵌套列表转换为字符串,使用正则表达式去除嵌套语法,然后将结果转换回(扁平化的)列表。

Here’s another answer that is even more interesting…

import re

def Flatten(TheList):
    a = str(TheList)
    b,crap = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

Basically, it converts the nested list to a string, uses a regex to strip out the nested syntax, and then converts the result back to a (flattened) list.


回答 7

def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res
def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res

回答 8

您可以deepflatten在第三方套餐中使用iteration_utilities

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

这是一个迭代器,因此您需要对其进行迭代(例如,通过将其包装list或在循环中使用)。在内部,它使用迭代方法而不是递归方法,并且将其编写为C扩展,因此它可以比纯python方法更快:

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://stackoverflow.com/a/2158532/5393381
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://stackoverflow.com/a/2158522/5393381
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://stackoverflow.com/a/2159079/5393381
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

我是iteration_utilities图书馆的作者。

You could use deepflatten from the 3rd party package iteration_utilities:

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

It’s an iterator so you need to iterate it (for example by wrapping it with list or using it in a loop). Internally it uses an iterative approach instead of an recursive approach and it’s written as C extension so it can be faster than pure python approaches:

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://stackoverflow.com/a/2158532/5393381
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://stackoverflow.com/a/2158522/5393381
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://stackoverflow.com/a/2159079/5393381
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I’m the author of the iteration_utilities library.


回答 9

尝试创建一个可以平化Python中不规则列表的函数很有趣,但是当然这就是Python的目的(使编程变得有趣)。以下生成器在某些警告方面工作得很好:

def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable

这将压扁的数据类型,你可能想独自离开(比如bytearraybytesstr对象)。此外,代码还依赖于以下事实:从不可迭代的对象请求迭代器会引发TypeError

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable


>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>>

编辑:

我不同意以前的实现。问题在于您不应该将无法迭代的东西弄平。这令人困惑,并给人以错误的印象。

>>> list(flatten(123))
[123]
>>>

下面的生成器与第一个生成器几乎相同,但是不存在试图展平不可迭代对象的问题。当给它一个不适当的参数时,它会像人们期望的那样失败。

def flatten(iterable):
    for item in iterable:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

使用提供的列表对生成器进行测试可以正常工作。但是,TypeError当给它一个不可迭代的对象时,新代码将引发一个。下面显示了新行为的示例。

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(123))
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    list(flatten(123))
  File "<pyshell#27>", line 2, in flatten
    for item in iterable:
TypeError: 'int' object is not iterable
>>>

It was fun trying to create a function that could flatten irregular list in Python, but of course that is what Python is for (to make programming fun). The following generator works fairly well with some caveats:

def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable

It will flatten datatypes that you might want left alone (like bytearray, bytes, and str objects). Also, the code relies on the fact that requesting an iterator from a non-iterable raises a TypeError.

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable


>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>>

Edit:

I disagree with the previous implementation. The problem is that you should not be able to flatten something that is not an iterable. It is confusing and gives the wrong impression of the argument.

>>> list(flatten(123))
[123]
>>>

The following generator is almost the same as the first but does not have the problem of trying to flatten a non-iterable object. It fails as one would expect when an inappropriate argument is given to it.

def flatten(iterable):
    for item in iterable:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

Testing the generator works fine with the list that was provided. However, the new code will raise a TypeError when a non-iterable object is given to it. Example are shown below of the new behavior.

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(123))
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    list(flatten(123))
  File "<pyshell#27>", line 2, in flatten
    for item in iterable:
TypeError: 'int' object is not iterable
>>>

回答 10

尽管选择了一个优雅且非常Python化的答案,但我仅出于审查目的而提出我的解决方案:

def flat(l):
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flat(i))
        else:
            ret.append(i)
    return ret

请告诉我们这段代码的好坏?

Although an elegant and very pythonic answer has been selected I would present my solution just for the review:

def flat(l):
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flat(i))
        else:
            ret.append(i)
    return ret

Please tell how good or bad this code is?


回答 11

我喜欢简单的答案。没有生成器。没有递归或递归限制。只是迭代:

def flatten(TheList):
    listIsNested = True

    while listIsNested:                 #outer loop
        keepChecking = False
        Temp = []

        for element in TheList:         #inner loop
            if isinstance(element,list):
                Temp.extend(element)
                keepChecking = True
            else:
                Temp.append(element)

        listIsNested = keepChecking     #determine if outer loop exits
        TheList = Temp[:]

    return TheList

这适用于两个列表:内部for循环和外部while循环。

内部的for循环遍历列表。如果找到列表元素,则(1)使用list.extend()展平该部分嵌套的层次,并且(2)将keepChecking切换为True。keepchecking用于控制外部while循环。如果将外部循环设置为true,则会触发内部循环进行另一遍处理。

这些通行证一直发生,直到找不到更多的嵌套列表。当最后一次通过但找不到任何地方的传递时,keepChecking永远不会变为true,这意味着listIsNested保持为false,而外部while循环退出。

然后返回扁平化列表。

测试运行

flatten([1,2,3,4,[100,200,300,[1000,2000,3000]]])

[1, 2, 3, 4, 100, 200, 300, 1000, 2000, 3000]

I prefer simple answers. No generators. No recursion or recursion limits. Just iteration:

def flatten(TheList):
    listIsNested = True

    while listIsNested:                 #outer loop
        keepChecking = False
        Temp = []

        for element in TheList:         #inner loop
            if isinstance(element,list):
                Temp.extend(element)
                keepChecking = True
            else:
                Temp.append(element)

        listIsNested = keepChecking     #determine if outer loop exits
        TheList = Temp[:]

    return TheList

This works with two lists: an inner for loop and an outer while loop.

The inner for loop iterates through the list. If it finds a list element, it (1) uses list.extend() to flatten that part one level of nesting and (2) switches keepChecking to True. keepchecking is used to control the outer while loop. If the outer loop gets set to true, it triggers the inner loop for another pass.

Those passes keep happening until no more nested lists are found. When a pass finally occurs where none are found, keepChecking never gets tripped to true, which means listIsNested stays false and the outer while loop exits.

The flattened list is then returned.

Test-run

flatten([1,2,3,4,[100,200,300,[1000,2000,3000]]])

[1, 2, 3, 4, 100, 200, 300, 1000, 2000, 3000]


回答 12

这是一个简单的函数,可以平铺任意深度的列表。没有递归,以避免堆栈溢出。

from copy import deepcopy

def flatten_list(nested_list):
    """Flatten an arbitrarily nested list, without recursion (to avoid
    stack overflows). Returns a new list, the original list is unchanged.

    >> list(flatten_list([1, 2, 3, [4], [], [[[[[[[[[5]]]]]]]]]]))
    [1, 2, 3, 4, 5]
    >> list(flatten_list([[1, 2], 3]))
    [1, 2, 3]

    """
    nested_list = deepcopy(nested_list)

    while nested_list:
        sublist = nested_list.pop(0)

        if isinstance(sublist, list):
            nested_list = sublist + nested_list
        else:
            yield sublist

Here’s a simple function that flattens lists of arbitrary depth. No recursion, to avoid stack overflow.

from copy import deepcopy

def flatten_list(nested_list):
    """Flatten an arbitrarily nested list, without recursion (to avoid
    stack overflows). Returns a new list, the original list is unchanged.

    >> list(flatten_list([1, 2, 3, [4], [], [[[[[[[[[5]]]]]]]]]]))
    [1, 2, 3, 4, 5]
    >> list(flatten_list([[1, 2], 3]))
    [1, 2, 3]

    """
    nested_list = deepcopy(nested_list)

    while nested_list:
        sublist = nested_list.pop(0)

        if isinstance(sublist, list):
            nested_list = sublist + nested_list
        else:
            yield sublist

回答 13

我很惊讶没有人想到这一点。该死的递归我没有这里的高级人员做出的递归答案。无论如何,这是我的尝试。请注意,这是非常特定于OP的用例的

import re

L = [[[1, 2, 3], [4, 5]], 6]
flattened_list = re.sub("[\[\]]", "", str(L)).replace(" ", "").split(",")
new_list = list(map(int, flattened_list))
print(new_list)

输出:

[1, 2, 3, 4, 5, 6]

I’m surprised no one has thought of this. Damn recursion I don’t get the recursive answers that the advanced people here made. anyway here is my attempt on this. caveat is it’s very specific to the OP’s use case

import re

L = [[[1, 2, 3], [4, 5]], 6]
flattened_list = re.sub("[\[\]]", "", str(L)).replace(" ", "").split(",")
new_list = list(map(int, flattened_list))
print(new_list)

output:

[1, 2, 3, 4, 5, 6]

回答 14

我没有在这里浏览所有已经可用的答案,但这是我想到的一个衬里,它借鉴了Lisp的第一张清单和其余清单的处理方式

def flatten(l): return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]

这是一种简单而又不太简单的情况-

>>> flatten([1,[2,3],4])
[1, 2, 3, 4]

>>> flatten([1, [2, 3], 4, [5, [6, {'name': 'some_name', 'age':30}, 7]], [8, 9, [10, [11, [12, [13, {'some', 'set'}, 14, [15, 'some_string'], 16], 17, 18], 19], 20], 21, 22, [23, 24], 25], 26, 27, 28, 29, 30])
[1, 2, 3, 4, 5, 6, {'age': 30, 'name': 'some_name'}, 7, 8, 9, 10, 11, 12, 13, set(['set', 'some']), 14, 15, 'some_string', 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> 

I didn’t go through all the already available answers here, but here is a one liner I came up with, borrowing from lisp’s way of first and rest list processing

def flatten(l): return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]

here is one simple and one not-so-simple case –

>>> flatten([1,[2,3],4])
[1, 2, 3, 4]

>>> flatten([1, [2, 3], 4, [5, [6, {'name': 'some_name', 'age':30}, 7]], [8, 9, [10, [11, [12, [13, {'some', 'set'}, 14, [15, 'some_string'], 16], 17, 18], 19], 20], 21, 22, [23, 24], 25], 26, 27, 28, 29, 30])
[1, 2, 3, 4, 5, 6, {'age': 30, 'name': 'some_name'}, 7, 8, 9, 10, 11, 12, 13, set(['set', 'some']), 14, 15, 'some_string', 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> 

回答 15

当试图回答这样的问题时,您确实需要给出您提议作为解决方案的代码的限制。如果只考虑性能,我不会太在意,但是提议作为解决方案的大多数代码(包括可接受的答案)都无法使深度大于1000的列表变平。

当我说大多数代码我指的是所有使用任何形式的递归的代码(或调用递归的标准库函数)。所有这些代码都会失败,因为对于每个递归调用,(调用)堆栈都增加一个单位,而(默认)python调用堆栈的大小为1000。

如果您不太熟悉调用堆栈,那么以下内容可能会有所帮助(否则,您可以滚动到Implementation)。

调用堆栈大小和递归编程(类似于地下城)

寻找宝藏并退出

想象一下,您进入一个带编号房间的巨大地牢,寻找宝藏。您不知道这个地方,但是对于如何找到宝藏有一些指示。每个指示都是一个谜(难度各不相同,但是您无法预测它们的难易程度)。您决定对节省时间的策略进行一点思考,然后进行两个观察:

  1. 很难(很长)找到宝藏,因为您必须解决(可能很难)谜团才能到达那里。
  2. 找到宝藏后,返回入口可能很容易,您只需要在另一个方向上使用相同的路径即可(尽管这需要一点记忆才能调用您的路径)。

进入地牢时,您会在这里注意到一个小笔记本。您决定使用它来写下谜题(当进入新房间时)之后退出的每个房间,这样您就可以返回到入口。那是个天才的主意,您甚至都不会花一分钱实施自己的策略。

您进入了地牢,成功地解决了前1001个难题,但是这是您未曾计划的事情,您借用的笔记本中没有剩余空间。您决定放弃自己的任务,因为您更喜欢没有宝物,而不是永远迷失在地牢中(确实看起来很聪明)。

执行递归程序

基本上,这与寻找宝藏完全相同。地牢是计算机的内存,您现在的目标不是找到宝藏,而是要计算某些函数(对于给定x找到f(x))。这些指示只是子例程,可以帮助您解决f(x)。您的策略与调用堆栈策略相同,笔记本是堆栈,房间是函数的返回地址:

x = ["over here", "am", "I"]
y = sorted(x) # You're about to enter a room named `sorted`, note down the current room address here so you can return back: 0x4004f4 (that room address looks weird)
# Seems like you went back from your quest using the return address 0x4004f4
# Let's see what you've collected 
print(' '.join(y))

您在地牢中遇到的问题在这里将是相同的,调用堆栈的大小是有限的(此处为1000),因此,如果您输入了太多函数而没有返回,则您将填充调用堆栈并出现错误就像一次调用自己-一遍又一遍-),您将一遍又一遍地输入,直到计算完成(直到找到宝藏为止),然后返回,直到返回到调用的位置为止 “亲爱的冒险家,很抱歉,您的笔记本已经满了”:最初的地方。直到最后一次将调用栈从所有返回地址中释放出来之前,调用栈将永远不会被释放。RecursionError: maximum recursion depth exceeded。请注意,您不需要递归即可填充调用堆栈,但是非递归程序调用1000函数而永远不会返回的可能性很小。同样重要的是要了解,从函数返回后,调用栈将从使用的地址中释放出来(因此,名称“栈”,返回地址在进入函数之前就被压入,并在返回时被拉出)。在简单递归的特殊情况下(一个函数ffff

如何避免这个问题?

这实际上很简单:“如果您不知道递归的深度,请不要使用递归”。并非总是如此,因为在某些情况下,可以优化尾调用递归(TCO)。但是在python中,情况并非如此,即使“写得很好”的递归函数也无法优化堆栈的使用。Guido有一个有趣的帖子,关于这个问题:尾递归消除

您可以使用一种技术来迭代任何递归函数,我们可以称之为自带笔记本。例如,在我们的特定情况下,我们只是在探索一个列表,进入一个房间等同于进入一个子列表,您应该问自己的问题是如何从列表返回其父列表?答案并不那么复杂,请重复以下操作,直到stack为空:

  1. 推送当前列表,addressindexstack进入新的子列表时将其推入(请注意,列表地址+索引也是地址,因此我们只使用调用堆栈使用的完全相同的技术);
  2. 每次找到一个项目yield(或将它们添加到列表中);
  3. 完全浏览列表后,请使用stack return address(和index)返回父列表。

还要注意,这等效于树中的DFS,其中某些节点是子列表,A = [1, 2]而有些则是简单项:(0, 1, 2, 3, 4用于L = [0, [1,2], 3, 4])。树看起来像这样:

                    L
                    |
           -------------------
           |     |     |     |
           0   --A--   3     4
               |   |
               1   2

DFS遍历的顺序为:L,0,A,1、2、3、4。请记住,要实现迭代DFS,您还需要“堆栈”。我之前提出的实现导致具有以下状态(针对stackflat_list):

init.:  stack=[(L, 0)]
**0**:  stack=[(L, 0)],         flat_list=[0]
**A**:  stack=[(L, 1), (A, 0)], flat_list=[0]
**1**:  stack=[(L, 1), (A, 0)], flat_list=[0, 1]
**2**:  stack=[(L, 1), (A, 1)], flat_list=[0, 1, 2]
**3**:  stack=[(L, 2)],         flat_list=[0, 1, 2, 3]
**3**:  stack=[(L, 3)],         flat_list=[0, 1, 2, 3, 4]
return: stack=[],               flat_list=[0, 1, 2, 3, 4]

在此示例中,堆栈最大大小为2,因为输入列表(因此树)的深度为2。

实作

对于实现,在python中,您可以使用迭代器而不是简单的列表来简化一点。对(子)迭代器的引用将用于存储子列表的返回地址(而不是同时具有列表地址和索引)。这不是什么大的区别,但是我觉得这更具可读性(并且速度更快):

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:   # post-order
            cursor_stack.pop()
            continue
        if is_list_like(item):  # pre-order
            cursor_stack.append(iter(item))
        elif item is not None:
            yield item          # in-order

def is_list_like(item):
    return isinstance(item, list)

另外,请注意,在is_list_likeI have中isinstance(item, list),可以将其更改为处理更多输入类型,在这里,我只想拥有最简单的版本,其中(可迭代)只是一个列表。但是您也可以这样做:

def is_list_like(item):
    try:
        iter(item)
        return not isinstance(item, str)  # strings are not lists (hmm...) 
    except TypeError:
        return False

flatten_iter([["test", "a"], "b])会将字符串视为“简单项目”,因此将返回["test", "a", "b"]而不是["t", "e", "s", "t", "a", "b"]。请注意,在这种情况下,iter(item)每个项目都会被调用两次,让我们假设这是读者练习此清洁器的一种练习。

测试和评论其他实现

最后,请记住,您不能使用来打印无限嵌套的列表Lprint(L)因为它在内部将使用对__repr__RecursionError: maximum recursion depth exceeded while getting the repr of an object)的递归调用。出于相同的原因,flatten涉及解决方案str将失败,并显示相同的错误消息。

如果您需要测试解决方案,则可以使用此函数生成一个简单的嵌套列表:

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

给出:build_deep_list(5)>>> [4, [3, [2, [1, [0]]]]]

When trying to answer such a question you really need to give the limitations of the code you propose as a solution. If it was only about performances I wouldn’t mind too much, but most of the codes proposed as solution (including the accepted answer) fail to flatten any list that has a depth greater than 1000.

When I say most of the codes I mean all codes that use any form of recursion (or call a standard library function that is recursive). All these codes fail because for every of the recursive call made, the (call) stack grow by one unit, and the (default) python call stack has a size of 1000.

If you’re not too familiar with the call stack, then maybe the following will help (otherwise you can just scroll to the Implementation).

Call stack size and recursive programming (dungeon analogy)

Finding the treasure and exit

Imagine you enter a huge dungeon with numbered rooms, looking for a treasure. You don’t know the place but you have some indications on how to find the treasure. Each indication is a riddle (difficulty varies, but you can’t predict how hard they will be). You decide to think a little bit about a strategy to save time, you make two observations:

  1. It’s hard (long) to find the treasure as you’ll have to solve (potentially hard) riddles to get there.
  2. Once the treasure found, returning to the entrance may be easy, you just have to use the same path in the other direction (though this needs a bit of memory to recall your path).

When entering the dungeon, you notice a small notebook here. You decide to use it to write down every room you exit after solving a riddle (when entering a new room), this way you’ll be able to return back to the entrance. That’s a genius idea, you won’t even spend a cent implementing your strategy.

You enter the dungeon, solving with great success the first 1001 riddles, but here comes something you hadn’t planed, you have no space left in the notebook you borrowed. You decide to abandon your quest as you prefer not having the treasure than being lost forever inside the dungeon (that looks smart indeed).

Executing a recursive program

Basically, it’s the exact same thing as finding the treasure. The dungeon is the computer’s memory, your goal now is not to find a treasure but to compute some function (find f(x) for a given x). The indications simply are sub-routines that will help you solving f(x). Your strategy is the same as the call stack strategy, the notebook is the stack, the rooms are the functions’ return addresses:

x = ["over here", "am", "I"]
y = sorted(x) # You're about to enter a room named `sorted`, note down the current room address here so you can return back: 0x4004f4 (that room address looks weird)
# Seems like you went back from your quest using the return address 0x4004f4
# Let's see what you've collected 
print(' '.join(y))

The problem you encountered in the dungeon will be the same here, the call stack has a finite size (here 1000) and therefore, if you enter too many functions without returning back then you’ll fill the call stack and have an error that look like “Dear adventurer, I’m very sorry but your notebook is full”: RecursionError: maximum recursion depth exceeded. Note that you don’t need recursion to fill the call stack, but it’s very unlikely that a non-recursive program call 1000 functions without ever returning. It’s important to also understand that once you returned from a function, the call stack is freed from the address used (hence the name “stack”, return address are pushed in before entering a function and pulled out when returning). In the special case of a simple recursion (a function f that call itself once — over and over –) you will enter f over and over until the computation is finished (until the treasure is found) and return from f until you go back to the place where you called f in the first place. The call stack will never be freed from anything until the end where it will be freed from all return addresses one after the other.

How to avoid this issue?

That’s actually pretty simple: “don’t use recursion if you don’t know how deep it can go”. That’s not always true as in some cases, Tail Call recursion can be Optimized (TCO). But in python, this is not the case, and even “well written” recursive function will not optimize stack use. There is an interesting post from Guido about this question: Tail Recursion Elimination.

There is a technique that you can use to make any recursive function iterative, this technique we could call bring your own notebook. For example, in our particular case we simply are exploring a list, entering a room is equivalent to entering a sublist, the question you should ask yourself is how can I get back from a list to its parent list? The answer is not that complex, repeat the following until the stack is empty:

  1. push the current list address and index in a stack when entering a new sublist (note that a list address+index is also an address, therefore we just use the exact same technique used by the call stack);
  2. every time an item is found, yield it (or add them in a list);
  3. once a list is fully explored, go back to the parent list using the stack return address (and index).

Also note that this is equivalent to a DFS in a tree where some nodes are sublists A = [1, 2] and some are simple items: 0, 1, 2, 3, 4 (for L = [0, [1,2], 3, 4]). The tree looks like this:

                    L
                    |
           -------------------
           |     |     |     |
           0   --A--   3     4
               |   |
               1   2

The DFS traversal pre-order is: L, 0, A, 1, 2, 3, 4. Remember, in order to implement an iterative DFS you also “need” a stack. The implementation I proposed before result in having the following states (for the stack and the flat_list):

init.:  stack=[(L, 0)]
**0**:  stack=[(L, 0)],         flat_list=[0]
**A**:  stack=[(L, 1), (A, 0)], flat_list=[0]
**1**:  stack=[(L, 1), (A, 0)], flat_list=[0, 1]
**2**:  stack=[(L, 1), (A, 1)], flat_list=[0, 1, 2]
**3**:  stack=[(L, 2)],         flat_list=[0, 1, 2, 3]
**3**:  stack=[(L, 3)],         flat_list=[0, 1, 2, 3, 4]
return: stack=[],               flat_list=[0, 1, 2, 3, 4]

In this example, the stack maximum size is 2, because the input list (and therefore the tree) have depth 2.

Implementation

For the implementation, in python you can simplify a little bit by using iterators instead of simple lists. References to the (sub)iterators will be used to store sublists return addresses (instead of having both the list address and the index). This is not a big difference but I feel this is more readable (and also a bit faster):

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:   # post-order
            cursor_stack.pop()
            continue
        if is_list_like(item):  # pre-order
            cursor_stack.append(iter(item))
        elif item is not None:
            yield item          # in-order

def is_list_like(item):
    return isinstance(item, list)

Also, notice that in is_list_like I have isinstance(item, list), which could be changed to handle more input types, here I just wanted to have the simplest version where (iterable) is just a list. But you could also do that:

def is_list_like(item):
    try:
        iter(item)
        return not isinstance(item, str)  # strings are not lists (hmm...) 
    except TypeError:
        return False

This considers strings as “simple items” and therefore flatten_iter([["test", "a"], "b]) will return ["test", "a", "b"] and not ["t", "e", "s", "t", "a", "b"]. Remark that in that case, iter(item) is called twice on each item, let’s pretend it’s an exercise for the reader to make this cleaner.

Testing and remarks on other implementations

In the end, remember that you can’t print a infinitely nested list L using print(L) because internally it will use recursive calls to __repr__ (RecursionError: maximum recursion depth exceeded while getting the repr of an object). For the same reason, solutions to flatten involving str will fail with the same error message.

If you need to test your solution, you can use this function to generate a simple nested list:

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

Which gives: build_deep_list(5) >>> [4, [3, [2, [1, [0]]]]].


回答 16

这是compiler.ast.flatten2.7.5中的实现:

def flatten(seq):
    l = []
    for elt in seq:
        t = type(elt)
        if t is tuple or t is list:
            for elt2 in flatten(elt):
                l.append(elt2)
        else:
            l.append(elt)
    return l

有更好,更快的方法(如果您已经到达这里,您已经看到了它们)

另请注意:

自2.6版起弃用:编译器软件包已在Python 3中删除。

Here’s the compiler.ast.flatten implementation in 2.7.5:

def flatten(seq):
    l = []
    for elt in seq:
        t = type(elt)
        if t is tuple or t is list:
            for elt2 in flatten(elt):
                l.append(elt2)
        else:
            l.append(elt)
    return l

There are better, faster methods (If you’ve reached here, you have seen them already)

Also note:

Deprecated since version 2.6: The compiler package has been removed in Python 3.


回答 17

完全hacky,但我认为它可以工作(取决于您的data_type)

flat_list = ast.literal_eval("[%s]"%re.sub("[\[\]]","",str(the_list)))

totally hacky but I think it would work (depending on your data_type)

flat_list = ast.literal_eval("[%s]"%re.sub("[\[\]]","",str(the_list)))

回答 18

只需使用一个funcy库: pip install funcy

import funcy


funcy.flatten([[[[1, 1], 1], 2], 3]) # returns generator
funcy.lflatten([[[[1, 1], 1], 2], 3]) # returns list

Just use a funcy library: pip install funcy

import funcy


funcy.flatten([[[[1, 1], 1], 2], 3]) # returns generator
funcy.lflatten([[[[1, 1], 1], 2], 3]) # returns list

回答 19

这是另一种py2方法,我不确定它是最快还是最优雅也不最安全…

from collections import Iterable
from itertools import imap, repeat, chain


def flat(seqs, ignore=(int, long, float, basestring)):
    return repeat(seqs, 1) if any(imap(isinstance, repeat(seqs), ignore)) or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

它可以忽略您想要的任何特定(或派生)类型,它返回一个迭代器,因此您可以将其转换为任何特定的容器(例如list,tuple,dict或仅使用它)以减少内存占用,无论是好是坏它可以处理初始的不可迭代对象,例如int …

请注意,大多数繁重的工作都是在C中完成的,因为据我所知,这是itertools的实现方式,因此尽管是递归的,但AFAIK并不受python递归深度的限制,因为函数调用发生在C中,尽管这样做并不意味着您会受到内存的限制,特别是在OS X中,从今天开始,它的堆栈大小有了硬限制(OS X Mavericks)…

有一种稍微快一点的方法,但可移植性较低的方法,只有在可以假定可以明确确定输入的基本元素的情况下,才使用它,否则,将获得无限递归,并且具有有限堆栈大小的OS X将很快地引发细分错误…

def flat(seqs, ignore={int, long, float, str, unicode}):
    return repeat(seqs, 1) if type(seqs) in ignore or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

在这里,我们使用集合来检查类型,因此需要O(1)与O(类型数)来检查是否应忽略某个元素,尽管任何具有声明的被忽略类型的派生类型的值都将失败,这就是为什么要使用它strunicode因此请谨慎使用…

测试:

import random

def test_flat(test_size=2000):
    def increase_depth(value, depth=1):
        for func in xrange(depth):
            value = repeat(value, 1)
        return value

    def random_sub_chaining(nested_values):
        for values in nested_values:
            yield chain((values,), chain.from_iterable(imap(next, repeat(nested_values, random.randint(1, 10)))))

    expected_values = zip(xrange(test_size), imap(str, xrange(test_size)))
    nested_values = random_sub_chaining((increase_depth(value, depth) for depth, value in enumerate(expected_values)))
    assert not any(imap(cmp, chain.from_iterable(expected_values), flat(chain(((),), nested_values, ((),)))))

>>> test_flat()
>>> list(flat([[[1, 2, 3], [4, 5]], 6]))
[1, 2, 3, 4, 5, 6]
>>>  

$ uname -a
Darwin Samys-MacBook-Pro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ python --version
Python 2.7.5

Here is another py2 approach, Im not sure if its the fastest or the most elegant nor safest …

from collections import Iterable
from itertools import imap, repeat, chain


def flat(seqs, ignore=(int, long, float, basestring)):
    return repeat(seqs, 1) if any(imap(isinstance, repeat(seqs), ignore)) or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

It can ignore any specific (or derived) type you would like, it returns an iterator, so you can convert it to any specific container such as list, tuple, dict or simply consume it in order to reduce memory footprint, for better or worse it can handle initial non-iterable objects such as int …

Note most of the heavy lifting is done in C, since as far as I know thats how itertools are implemented, so while it is recursive, AFAIK it isn’t bounded by python recursion depth since the function calls are happening in C, though this doesn’t mean you are bounded by memory, specially in OS X where its stack size has a hard limit as of today (OS X Mavericks) …

there is a slightly faster approach, but less portable method, only use it if you can assume that the base elements of the input can be explicitly determined otherwise, you’ll get an infinite recursion, and OS X with its limited stack size, will throw a segmentation fault fairly quickly …

def flat(seqs, ignore={int, long, float, str, unicode}):
    return repeat(seqs, 1) if type(seqs) in ignore or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

here we are using sets to check for the type so it takes O(1) vs O(number of types) to check whether or not an element should be ignored, though of course any value with derived type of the stated ignored types will fail, this is why its using str, unicode so use it with caution …

tests:

import random

def test_flat(test_size=2000):
    def increase_depth(value, depth=1):
        for func in xrange(depth):
            value = repeat(value, 1)
        return value

    def random_sub_chaining(nested_values):
        for values in nested_values:
            yield chain((values,), chain.from_iterable(imap(next, repeat(nested_values, random.randint(1, 10)))))

    expected_values = zip(xrange(test_size), imap(str, xrange(test_size)))
    nested_values = random_sub_chaining((increase_depth(value, depth) for depth, value in enumerate(expected_values)))
    assert not any(imap(cmp, chain.from_iterable(expected_values), flat(chain(((),), nested_values, ((),)))))

>>> test_flat()
>>> list(flat([[[1, 2, 3], [4, 5]], 6]))
[1, 2, 3, 4, 5, 6]
>>>  

$ uname -a
Darwin Samys-MacBook-Pro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ python --version
Python 2.7.5

回答 20

不使用任何库:

def flat(l):
    def _flat(l, r):    
        if type(l) is not list:
            r.append(l)
        else:
            for i in l:
                r = r + flat(i)
        return r
    return _flat(l, [])



# example
test = [[1], [[2]], [3], [['a','b','c'] , [['z','x','y']], ['d','f','g']], 4]    
print flat(test) # prints [1, 2, 3, 'a', 'b', 'c', 'z', 'x', 'y', 'd', 'f', 'g', 4]

Without using any library:

def flat(l):
    def _flat(l, r):    
        if type(l) is not list:
            r.append(l)
        else:
            for i in l:
                r = r + flat(i)
        return r
    return _flat(l, [])



# example
test = [[1], [[2]], [3], [['a','b','c'] , [['z','x','y']], ['d','f','g']], 4]    
print flat(test) # prints [1, 2, 3, 'a', 'b', 'c', 'z', 'x', 'y', 'd', 'f', 'g', 4]

回答 21

使用itertools.chain

import itertools
from collections import Iterable

def list_flatten(lst):
    flat_lst = []
    for item in itertools.chain(lst):
        if isinstance(item, Iterable):
            item = list_flatten(item)
            flat_lst.extend(item)
        else:
            flat_lst.append(item)
    return flat_lst

或不链接:

def flatten(q, final):
    if not q:
        return
    if isinstance(q, list):
        if not isinstance(q[0], list):
            final.append(q[0])
        else:
            flatten(q[0], final)
        flatten(q[1:], final)
    else:
        final.append(q)

Using itertools.chain:

import itertools
from collections import Iterable

def list_flatten(lst):
    flat_lst = []
    for item in itertools.chain(lst):
        if isinstance(item, Iterable):
            item = list_flatten(item)
            flat_lst.extend(item)
        else:
            flat_lst.append(item)
    return flat_lst

Or without chaining:

def flatten(q, final):
    if not q:
        return
    if isinstance(q, list):
        if not isinstance(q[0], list):
            final.append(q[0])
        else:
            flatten(q[0], final)
        flatten(q[1:], final)
    else:
        final.append(q)

回答 22

我使用递归来解决任何深度的嵌套列表

def combine_nlist(nlist,init=0,combiner=lambda x,y: x+y):
    '''
    apply function: combiner to a nested list element by element(treated as flatten list)
    '''
    current_value=init
    for each_item in nlist:
        if isinstance(each_item,list):
            current_value =combine_nlist(each_item,current_value,combiner)
        else:
            current_value = combiner(current_value,each_item)
    return current_value

因此,在定义函数combin_nlist之后,就很容易使用此函数进行展平。或者,您可以将其组合为一个功能。我喜欢我的解决方案,因为它可以应用于任何嵌套列表。

def flatten_nlist(nlist):
    return combine_nlist(nlist,[],lambda x,y:x+[y])

结果

In [379]: flatten_nlist([1,2,3,[4,5],[6],[[[7],8],9],10])
Out[379]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I used recursive to solve nested list with any depth

def combine_nlist(nlist,init=0,combiner=lambda x,y: x+y):
    '''
    apply function: combiner to a nested list element by element(treated as flatten list)
    '''
    current_value=init
    for each_item in nlist:
        if isinstance(each_item,list):
            current_value =combine_nlist(each_item,current_value,combiner)
        else:
            current_value = combiner(current_value,each_item)
    return current_value

So after i define function combine_nlist, it is easy to use this function do flatting. Or you can combine it into one function. I like my solution because it can be applied to any nested list.

def flatten_nlist(nlist):
    return combine_nlist(nlist,[],lambda x,y:x+[y])

result

In [379]: flatten_nlist([1,2,3,[4,5],[6],[[[7],8],9],10])
Out[379]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

回答 23

最简单的方法是使用变身利用图书馆pip install morph

代码是:

import morph

list = [[[1, 2, 3], [4, 5]], 6]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 4, 5, 6]

The easiest way is to use the morph library using pip install morph.

The code is:

import morph

list = [[[1, 2, 3], [4, 5]], 6]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 4, 5, 6]

回答 24

我知道已经有很多很棒的答案,但是我想添加一个使用功能性编程方法解决问题的答案。在这个答案中,我使用了双重递归:

def flatten_list(seq):
    if not seq:
        return []
    elif isinstance(seq[0],list):
        return (flatten_list(seq[0])+flatten_list(seq[1:]))
    else:
        return [seq[0]]+flatten_list(seq[1:])

print(flatten_list([1,2,[3,[4],5],[6,7]]))

输出:

[1, 2, 3, 4, 5, 6, 7]

I am aware that there are already many awesome answers but i wanted to add an answer that uses the functional programming method of solving the question. In this answer i make use of double recursion :

def flatten_list(seq):
    if not seq:
        return []
    elif isinstance(seq[0],list):
        return (flatten_list(seq[0])+flatten_list(seq[1:]))
    else:
        return [seq[0]]+flatten_list(seq[1:])

print(flatten_list([1,2,[3,[4],5],[6,7]]))

output:

[1, 2, 3, 4, 5, 6, 7]

回答 25

我不确定这是否一定更快或更有效,但这是我要做的:

def flatten(lst):
    return eval('[' + str(lst).replace('[', '').replace(']', '') + ']')

L = [[[1, 2, 3], [4, 5]], 6]
print(flatten(L))

flatten这里的函数将列表转换为字符串,取出所有方括号,将方括号附加到两端,然后将其重新转换为列表。

虽然,如果您知道列表中的方括号中包含字符串,例如[[1, 2], "[3, 4] and [5]"],则您需要做其他事情。

I’m not sure if this is necessarily quicker or more effective, but this is what I do:

def flatten(lst):
    return eval('[' + str(lst).replace('[', '').replace(']', '') + ']')

L = [[[1, 2, 3], [4, 5]], 6]
print(flatten(L))

The flatten function here turns the list into a string, takes out all of the square brackets, attaches square brackets back onto the ends, and turns it back into a list.

Although, if you knew you would have square brackets in your list in strings, like [[1, 2], "[3, 4] and [5]"], you would have to do something else.


回答 26

这是在python2上进行flatten的简单实现

flatten=lambda l: reduce(lambda x,y:x+y,map(flatten,l),[]) if isinstance(l,list) else [l]

test=[[1,2,3,[3,4,5],[6,7,[8,9,[10,[11,[12,13,14]]]]]],]
print flatten(test)

#output [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

This is a simple implement of flatten on python2

flatten=lambda l: reduce(lambda x,y:x+y,map(flatten,l),[]) if isinstance(l,list) else [l]

test=[[1,2,3,[3,4,5],[6,7,[8,9,[10,[11,[12,13,14]]]]]],]
print flatten(test)

#output [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

回答 27

这将使列表或字典(或列表列表或字典的字典等)变平。它假定值是字符串,并创建一个字符串,该字符串将每个项目与分隔符参数连接在一起。如果需要,可以使用分隔符随后将结果拆分为列表对象。如果下一个值是列表或字符串,则使用递归。使用key参数来告诉您要使用字典对象的键还是值(将key设置为false)。

def flatten_obj(n_obj, key=True, my_sep=''):
    my_string = ''
    if type(n_obj) == list:
        for val in n_obj:
            my_sep_setter = my_sep if my_string != '' else ''
            if type(val) == list or type(val) == dict:
                my_string += my_sep_setter + flatten_obj(val, key, my_sep)
            else:
                my_string += my_sep_setter + val
    elif type(n_obj) == dict:
        for k, v in n_obj.items():
            my_sep_setter = my_sep if my_string != '' else ''
            d_val = k if key else v
            if type(v) == list or type(v) == dict:
                my_string += my_sep_setter + flatten_obj(v, key, my_sep)
            else:
                my_string += my_sep_setter + d_val
    elif type(n_obj) == str:
        my_sep_setter = my_sep if my_string != '' else ''
        my_string += my_sep_setter + n_obj
        return my_string
    return my_string

print(flatten_obj(['just', 'a', ['test', 'to', 'try'], 'right', 'now', ['or', 'later', 'today'],
                [{'dictionary_test': 'test'}, {'dictionary_test_two': 'later_today'}, 'my power is 9000']], my_sep=', ')

Yield:

just, a, test, to, try, right, now, or, later, today, dictionary_test, dictionary_test_two, my power is 9000

This will flatten a list or dictionary (or list of lists or dictionaries of dictionaries etc). It assumes that the values are strings and it creates a string that concatenates each item with a separator argument. If you wanted you could use the separator to split the result into a list object afterward. It uses recursion if the next value is a list or a string. Use the key argument to tell whether you want the keys or the values (set key to false) from the dictionary object.

def flatten_obj(n_obj, key=True, my_sep=''):
    my_string = ''
    if type(n_obj) == list:
        for val in n_obj:
            my_sep_setter = my_sep if my_string != '' else ''
            if type(val) == list or type(val) == dict:
                my_string += my_sep_setter + flatten_obj(val, key, my_sep)
            else:
                my_string += my_sep_setter + val
    elif type(n_obj) == dict:
        for k, v in n_obj.items():
            my_sep_setter = my_sep if my_string != '' else ''
            d_val = k if key else v
            if type(v) == list or type(v) == dict:
                my_string += my_sep_setter + flatten_obj(v, key, my_sep)
            else:
                my_string += my_sep_setter + d_val
    elif type(n_obj) == str:
        my_sep_setter = my_sep if my_string != '' else ''
        my_string += my_sep_setter + n_obj
        return my_string
    return my_string

print(flatten_obj(['just', 'a', ['test', 'to', 'try'], 'right', 'now', ['or', 'later', 'today'],
                [{'dictionary_test': 'test'}, {'dictionary_test_two': 'later_today'}, 'my power is 9000']], my_sep=', ')

yields:

just, a, test, to, try, right, now, or, later, today, dictionary_test, dictionary_test_two, my power is 9000

回答 28

如果您喜欢递归,这可能是您感兴趣的解决方案:

def f(E):
    if E==[]: 
        return []
    elif type(E) != list: 
        return [E]
    else:
        a = f(E[0])
        b = f(E[1:])
        a.extend(b)
        return a

我实际上是从前一段时间写的一些练习Scheme代码中改编而成的。

请享用!

If you like recursion, this might be a solution of interest to you:

def f(E):
    if E==[]: 
        return []
    elif type(E) != list: 
        return [E]
    else:
        a = f(E[0])
        b = f(E[1:])
        a.extend(b)
        return a

I actually adapted this from some practice Scheme code that I had written a while back.

Enjoy!


回答 29

我是python的新手,来自Lisp背景。这是我想出的(查看lulz的var名称):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

似乎可以工作。测试:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

返回:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]

I’m new to python and come from a lisp background. This is what I came up with (check out the var names for lulz):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

Seems to work. Test:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

returns:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]

如何在列表中找到重复项并使用它们创建另一个列表?

问题:如何在列表中找到重复项并使用它们创建另一个列表?

如何在Python列表中找到重复项并创建另一个重复项列表?该列表仅包含整数。

How can I find the duplicates in a Python list and create another list of the duplicates? The list only contains integers.


回答 0

要删除重复项,请使用set(a)。要打印副本,类似:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

请注意,这Counter并不是特别有效(计时),并且在这里可能会过大。set会表现更好。此代码按源顺序计算唯一元素的列表:

seen = set()
uniq = []
for x in a:
    if x not in seen:
        uniq.append(x)
        seen.add(x)

或者,更简洁地说:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]    

我不建议您使用后者,因为not seen.add(x)这样做并不明显(set add()方法总是返回None,因此需要not)。

要计算没有库的重复元素列表:

seen = {}
dupes = []

for x in a:
    if x not in seen:
        seen[x] = 1
    else:
        if seen[x] == 1:
            dupes.append(x)
        seen[x] += 1

如果列表元素不可散列,则不能使用集合/字典,而必须求助于二​​次时间解(将每个解比较)。例如:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

To remove duplicates use set(a). To print duplicates, something like:

a = [1,2,3,2,1,5,6,5,5,5]

import collections
print([item for item, count in collections.Counter(a).items() if count > 1])

## [1, 2, 5]

Note that Counter is not particularly efficient (timings) and probably overkill here. set will perform better. This code computes a list of unique elements in the source order:

seen = set()
uniq = []
for x in a:
    if x not in seen:
        uniq.append(x)
        seen.add(x)

or, more concisely:

seen = set()
uniq = [x for x in a if x not in seen and not seen.add(x)]    

I don’t recommend the latter style, because it is not obvious what not seen.add(x) is doing (the set add() method always returns None, hence the need for not).

To compute the list of duplicated elements without libraries:

seen = {}
dupes = []

for x in a:
    if x not in seen:
        seen[x] = 1
    else:
        if seen[x] == 1:
            dupes.append(x)
        seen[x] += 1

If list elements are not hashable, you cannot use sets/dicts and have to resort to a quadratic time solution (compare each with each). For example:

a = [[1], [2], [3], [1], [5], [3]]

no_dupes = [x for n, x in enumerate(a) if x not in a[:n]]
print no_dupes # [[1], [2], [3], [5]]

dupes = [x for n, x in enumerate(a) if x in a[:n]]
print dupes # [[1], [3]]

回答 1

>>> l = [1,2,3,4,4,5,5,6,1]
>>> set([x for x in l if l.count(x) > 1])
set([1, 4, 5])
>>> l = [1,2,3,4,4,5,5,6,1]
>>> set([x for x in l if l.count(x) > 1])
set([1, 4, 5])

回答 2

您不需要计数,只需要查看之前是否曾查看过该项目即可。改编这个问题的答案,这一问题:

def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list( seen_twice )

a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]

以防万一速度很重要,以下是一些时间安排:

# file: test.py
import collections

def thg435(l):
    return [x for x, y in collections.Counter(l).items() if y > 1]

def moooeeeep(l):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in l if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def RiteshKumar(l):
    return list(set([x for x in l if l.count(x) > 1]))

def JohnLaRooy(L):
    seen = set()
    seen2 = set()
    seen_add = seen.add
    seen2_add = seen2.add
    for item in L:
        if item in seen:
            seen2_add(item)
        else:
            seen_add(item)
    return list(seen2)

l = [1,2,3,2,1,5,6,5,5,5]*100

结果如下:(做得很好@JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop

有趣的是,除了计时本身之外,使用pypy时排名也略有变化。最有趣的是,基于计数器的方法从pypy的优化中受益匪浅,而我建议的方法缓存方法似乎几乎没有效果。

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop

显然,这种影响与输入数据的“重复性”有关。我已经设置l = [random.randrange(1000000) for i in xrange(10000)]并得到以下结果:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop

You don’t need the count, just whether or not the item was seen before. Adapted that answer to this problem:

def list_duplicates(seq):
  seen = set()
  seen_add = seen.add
  # adds all elements it doesn't know yet to seen and all other to seen_twice
  seen_twice = set( x for x in seq if x in seen or seen_add(x) )
  # turn the set into a list (as requested)
  return list( seen_twice )

a = [1,2,3,2,1,5,6,5,5,5]
list_duplicates(a) # yields [1, 2, 5]

Just in case speed matters, here are some timings:

# file: test.py
import collections

def thg435(l):
    return [x for x, y in collections.Counter(l).items() if y > 1]

def moooeeeep(l):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in l if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def RiteshKumar(l):
    return list(set([x for x in l if l.count(x) > 1]))

def JohnLaRooy(L):
    seen = set()
    seen2 = set()
    seen_add = seen.add
    seen2_add = seen2.add
    for item in L:
        if item in seen:
            seen2_add(item)
        else:
            seen_add(item)
    return list(seen2)

l = [1,2,3,2,1,5,6,5,5,5]*100

Here are the results: (well done @JohnLaRooy!)

$ python -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
10000 loops, best of 3: 74.6 usec per loop
$ python -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 91.3 usec per loop
$ python -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 266 usec per loop
$ python -mtimeit -s 'import test' 'test.RiteshKumar(test.l)'
100 loops, best of 3: 8.35 msec per loop

Interestingly, besides the timings itself, also the ranking slightly changes when pypy is used. Most interestingly, the Counter-based approach benefits hugely from pypy’s optimizations, whereas the method caching approach I have suggested seems to have almost no effect.

$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
100000 loops, best of 3: 17.8 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
10000 loops, best of 3: 23 usec per loop
$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
10000 loops, best of 3: 39.3 usec per loop

Apparantly this effect is related to the “duplicatedness” of the input data. I have set l = [random.randrange(1000000) for i in xrange(10000)] and got these results:

$ pypy -mtimeit -s 'import test' 'test.moooeeeep(test.l)'
1000 loops, best of 3: 495 usec per loop
$ pypy -mtimeit -s 'import test' 'test.JohnLaRooy(test.l)'
1000 loops, best of 3: 499 usec per loop
$ pypy -mtimeit -s 'import test' 'test.thg435(test.l)'
1000 loops, best of 3: 1.68 msec per loop

回答 3

您可以使用iteration_utilities.duplicates

>>> from iteration_utilities import duplicates

>>> list(duplicates([1,1,2,1,2,3,4,2]))
[1, 1, 2, 2]

或者,如果您只希望每个重复项之一,可以将其与iteration_utilities.unique_everseen

>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(duplicates([1,1,2,1,2,3,4,2])))
[1, 2]

它还可以处理不可散列的元素(但是以性能为代价):

>>> list(duplicates([[1], [2], [1], [3], [1]]))
[[1], [1]]

>>> list(unique_everseen(duplicates([[1], [2], [1], [3], [1]])))
[[1]]

这是这里仅有的其他几种方法可以处理的。

基准测试

我做了一个快速基准测试,其中包含这里提到的大多数(但不是全部)方法。

第一个基准测试仅包含一小部分列表长度,因为某些方法具有 O(n**2)行为。

在曲线图中,y轴表示时间,因此值越低越好。它还绘制了对数-对数,因此可以更好地可视化各种值:

在此处输入图片说明

除去这些O(n**2)方法,我做了另一个基准测试,列表中最多有500万个元素:

在此处输入图片说明

如您所见 iteration_utilities.duplicates方法比其他任何方法甚至链式方法都快unique_everseen(duplicates(...))都快也比其他方法快或同样快。

这里要注意的另一件有趣的事情是,对于小名单,熊猫方法非常慢,但可以轻松竞争更长的名单。

但是,由于这些基准测试表明大多数方法的性能大致相同,因此使用哪种方法都无关紧要(除了具有O(n**2)运行时的三种方法外)。

from iteration_utilities import duplicates, unique_everseen
from collections import Counter
import pandas as pd
import itertools

def georg_counter(it):
    return [item for item, count in Counter(it).items() if count > 1]

def georg_set(it):
    seen = set()
    uniq = []
    for x in it:
        if x not in seen:
            uniq.append(x)
            seen.add(x)

def georg_set2(it):
    seen = set()
    return [x for x in it if x not in seen and not seen.add(x)]   

def georg_set3(it):
    seen = {}
    dupes = []

    for x in it:
        if x not in seen:
            seen[x] = 1
        else:
            if seen[x] == 1:
                dupes.append(x)
            seen[x] += 1

def RiteshKumar_count(l):
    return set([x for x in l if l.count(x) > 1])

def moooeeeep(seq):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in seq if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def F1Rumors_implementation(c):
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in zip(a, b):
        if k != g: continue
        if k != r:
            yield k
            r = k

def F1Rumors(c):
    return list(F1Rumors_implementation(c))

def Edward(a):
    d = {}
    for elem in a:
        if elem in d:
            d[elem] += 1
        else:
            d[elem] = 1
    return [x for x, y in d.items() if y > 1]

def wordsmith(a):
    return pd.Series(a)[pd.Series(a).duplicated()].values

def NikhilPrabhu(li):
    li = li.copy()
    for x in set(li):
        li.remove(x)

    return list(set(li))

def firelynx(a):
    vc = pd.Series(a).value_counts()
    return vc[vc > 1].index.tolist()

def HenryDev(myList):
    newList = set()

    for i in myList:
        if myList.count(i) >= 2:
            newList.add(i)

    return list(newList)

def yota(number_lst):
    seen_set = set()
    duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
    return seen_set - duplicate_set

def IgorVishnevskiy(l):
    s=set(l)
    d=[]
    for x in l:
        if x in s:
            s.remove(x)
        else:
            d.append(x)
    return d

def it_duplicates(l):
    return list(duplicates(l))

def it_unique_duplicates(l):
    return list(unique_everseen(duplicates(l)))

基准1

from simple_benchmark import benchmark
import random

funcs = [
    georg_counter, georg_set, georg_set2, georg_set3, RiteshKumar_count, moooeeeep, 
    F1Rumors, Edward, wordsmith, NikhilPrabhu, firelynx,
    HenryDev, yota, IgorVishnevskiy, it_duplicates, it_unique_duplicates
]

args = {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(2, 12)}

b = benchmark(funcs, args, 'list size')

b.plot()

基准2

funcs = [
    georg_counter, georg_set, georg_set2, georg_set3, moooeeeep, 
    F1Rumors, Edward, wordsmith, firelynx,
    yota, IgorVishnevskiy, it_duplicates, it_unique_duplicates
]

args = {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(2, 20)}

b = benchmark(funcs, args, 'list size')
b.plot()

免责声明

1这来自我编写的第三方库iteration_utilities

You can use iteration_utilities.duplicates:

>>> from iteration_utilities import duplicates

>>> list(duplicates([1,1,2,1,2,3,4,2]))
[1, 1, 2, 2]

or if you only want one of each duplicate this can be combined with iteration_utilities.unique_everseen:

>>> from iteration_utilities import unique_everseen

>>> list(unique_everseen(duplicates([1,1,2,1,2,3,4,2])))
[1, 2]

It can also handle unhashable elements (however at the cost of performance):

>>> list(duplicates([[1], [2], [1], [3], [1]]))
[[1], [1]]

>>> list(unique_everseen(duplicates([[1], [2], [1], [3], [1]])))
[[1]]

That’s something that only a few of the other approaches here can handle.

Benchmarks

I did a quick benchmark containing most (but not all) of the approaches mentioned here.

The first benchmark included only a small range of list-lengths because some approaches have O(n**2) behavior.

In the graphs the y-axis represents the time, so a lower value means better. It’s also plotted log-log so the wide range of values can be visualized better:

enter image description here

Removing the O(n**2) approaches I did another benchmark up to half a million elements in a list:

enter image description here

As you can see the iteration_utilities.duplicates approach is faster than any of the other approaches and even chaining unique_everseen(duplicates(...)) was faster or equally fast than the other approaches.

One additional interesting thing to note here is that the pandas approaches are very slow for small lists but can easily compete for longer lists.

However as these benchmarks show most of the approaches perform roughly equally, so it doesn’t matter much which one is used (except for the 3 that had O(n**2) runtime).

from iteration_utilities import duplicates, unique_everseen
from collections import Counter
import pandas as pd
import itertools

def georg_counter(it):
    return [item for item, count in Counter(it).items() if count > 1]

def georg_set(it):
    seen = set()
    uniq = []
    for x in it:
        if x not in seen:
            uniq.append(x)
            seen.add(x)

def georg_set2(it):
    seen = set()
    return [x for x in it if x not in seen and not seen.add(x)]   

def georg_set3(it):
    seen = {}
    dupes = []

    for x in it:
        if x not in seen:
            seen[x] = 1
        else:
            if seen[x] == 1:
                dupes.append(x)
            seen[x] += 1

def RiteshKumar_count(l):
    return set([x for x in l if l.count(x) > 1])

def moooeeeep(seq):
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    seen_twice = set( x for x in seq if x in seen or seen_add(x) )
    # turn the set into a list (as requested)
    return list( seen_twice )

def F1Rumors_implementation(c):
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in zip(a, b):
        if k != g: continue
        if k != r:
            yield k
            r = k

def F1Rumors(c):
    return list(F1Rumors_implementation(c))

def Edward(a):
    d = {}
    for elem in a:
        if elem in d:
            d[elem] += 1
        else:
            d[elem] = 1
    return [x for x, y in d.items() if y > 1]

def wordsmith(a):
    return pd.Series(a)[pd.Series(a).duplicated()].values

def NikhilPrabhu(li):
    li = li.copy()
    for x in set(li):
        li.remove(x)

    return list(set(li))

def firelynx(a):
    vc = pd.Series(a).value_counts()
    return vc[vc > 1].index.tolist()

def HenryDev(myList):
    newList = set()

    for i in myList:
        if myList.count(i) >= 2:
            newList.add(i)

    return list(newList)

def yota(number_lst):
    seen_set = set()
    duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
    return seen_set - duplicate_set

def IgorVishnevskiy(l):
    s=set(l)
    d=[]
    for x in l:
        if x in s:
            s.remove(x)
        else:
            d.append(x)
    return d

def it_duplicates(l):
    return list(duplicates(l))

def it_unique_duplicates(l):
    return list(unique_everseen(duplicates(l)))

Benchmark 1

from simple_benchmark import benchmark
import random

funcs = [
    georg_counter, georg_set, georg_set2, georg_set3, RiteshKumar_count, moooeeeep, 
    F1Rumors, Edward, wordsmith, NikhilPrabhu, firelynx,
    HenryDev, yota, IgorVishnevskiy, it_duplicates, it_unique_duplicates
]

args = {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(2, 12)}

b = benchmark(funcs, args, 'list size')

b.plot()

Benchmark 2

funcs = [
    georg_counter, georg_set, georg_set2, georg_set3, moooeeeep, 
    F1Rumors, Edward, wordsmith, firelynx,
    yota, IgorVishnevskiy, it_duplicates, it_unique_duplicates
]

args = {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(2, 20)}

b = benchmark(funcs, args, 'list size')
b.plot()

Disclaimer

1 This is from a third-party library I have written: iteration_utilities.


回答 4

我在寻找相关问题时遇到了这个问题-想知道为什么没人提供基于生成器的解决方案?解决此问题的方法是:

>>> print list(getDupes_9([1,2,3,2,1,5,6,5,5,5]))
[1, 2, 5]

我担心可伸缩性,因此测试了几种方法,包括幼稚的项目,这些项目在小列表上都可以很好地工作,但是随着列表变大就可怕地扩展(注意-使用timeit会更好,但这只是说明性的)。

我加入了@moooeeeep进行比较(这是非常快的:如果输入列表是完全随机的,则是最快的),而itertools方法对于大多数已排序的列表来说甚至更快…现在包括@firelynx的pandas方法-慢,但没有很可怕,而且很简单。注意-对于大型的有序列表,sort / tee / zip方法在我的机器上始终是最快的,对于混洗的列表,moooeeeep最快,但是您的里程可能会有所不同。

优点

  • 使用相同的代码非常容易地测试“任何”重复项

假设条件

  • 重复应仅报告一次
  • 重复的订单不需要保留
  • 重复项可能在列表中的任何位置

最快的解决方案,一百万个条目:

def getDupes(c):
        '''sort/tee/izip'''
        a, b = itertools.tee(sorted(c))
        next(b, None)
        r = None
        for k, g in itertools.izip(a, b):
            if k != g: continue
            if k != r:
                yield k
                r = k

经过测试的方法

import itertools
import time
import random

def getDupes_1(c):
    '''naive'''
    for i in xrange(0, len(c)):
        if c[i] in c[:i]:
            yield c[i]

def getDupes_2(c):
    '''set len change'''
    s = set()
    for i in c:
        l = len(s)
        s.add(i)
        if len(s) == l:
            yield i

def getDupes_3(c):
    '''in dict'''
    d = {}
    for i in c:
        if i in d:
            if d[i]:
                yield i
                d[i] = False
        else:
            d[i] = True

def getDupes_4(c):
    '''in set'''
    s,r = set(),set()
    for i in c:
        if i not in s:
            s.add(i)
        elif i not in r:
            r.add(i)
            yield i

def getDupes_5(c):
    '''sort/adjacent'''
    c = sorted(c)
    r = None
    for i in xrange(1, len(c)):
        if c[i] == c[i - 1]:
            if c[i] != r:
                yield c[i]
                r = c[i]

def getDupes_6(c):
    '''sort/groupby'''
    def multiple(x):
        try:
            x.next()
            x.next()
            return True
        except:
            return False
    for k, g in itertools.ifilter(lambda x: multiple(x[1]), itertools.groupby(sorted(c))):
        yield k

def getDupes_7(c):
    '''sort/zip'''
    c = sorted(c)
    r = None
    for k, g in zip(c[:-1],c[1:]):
        if k == g:
            if k != r:
                yield k
                r = k

def getDupes_8(c):
    '''sort/izip'''
    c = sorted(c)
    r = None
    for k, g in itertools.izip(c[:-1],c[1:]):
        if k == g:
            if k != r:
                yield k
                r = k

def getDupes_9(c):
    '''sort/tee/izip'''
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in itertools.izip(a, b):
        if k != g: continue
        if k != r:
            yield k
            r = k

def getDupes_a(l):
    '''moooeeeep'''
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    for x in l:
        if x in seen or seen_add(x):
            yield x

def getDupes_b(x):
    '''iter*/sorted'''
    x = sorted(x)
    def _matches():
        for k,g in itertools.izip(x[:-1],x[1:]):
            if k == g:
                yield k
    for k, n in itertools.groupby(_matches()):
        yield k

def getDupes_c(a):
    '''pandas'''
    import pandas as pd
    vc = pd.Series(a).value_counts()
    i = vc[vc > 1].index
    for _ in i:
        yield _

def hasDupes(fn,c):
    try:
        if fn(c).next(): return True    # Found a dupe
    except StopIteration:
        pass
    return False

def getDupes(fn,c):
    return list(fn(c))

STABLE = True
if STABLE:
    print 'Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array'
else:
    print 'Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array'
for location in (50,250000,500000,750000,999999):
    for test in (getDupes_2, getDupes_3, getDupes_4, getDupes_5, getDupes_6,
                 getDupes_8, getDupes_9, getDupes_a, getDupes_b, getDupes_c):
        print 'Test %-15s:%10d - '%(test.__doc__ or test.__name__,location),
        deltas = []
        for FIRST in (True,False):
            for i in xrange(0, 5):
                c = range(0,1000000)
                if STABLE:
                    c[0] = location
                else:
                    c.append(location)
                    random.shuffle(c)
                start = time.time()
                if FIRST:
                    print '.' if location == test(c).next() else '!',
                else:
                    print '.' if [location] == list(test(c)) else '!',
                deltas.append(time.time()-start)
            print ' -- %0.3f  '%(sum(deltas)/len(deltas)),
        print
    print

“所有重复”测试的结果是一致的,在此数组中找到“第一个”重复项,然后是“所有”重复项:

Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array
Test set len change :    500000 -  . . . . .  -- 0.264   . . . . .  -- 0.402  
Test in dict        :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.250  
Test in set         :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.249  
Test sort/adjacent  :    500000 -  . . . . .  -- 0.159   . . . . .  -- 0.229  
Test sort/groupby   :    500000 -  . . . . .  -- 0.860   . . . . .  -- 1.286  
Test sort/izip      :    500000 -  . . . . .  -- 0.165   . . . . .  -- 0.229  
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.145   . . . . .  -- 0.206  *
Test moooeeeep      :    500000 -  . . . . .  -- 0.149   . . . . .  -- 0.232  
Test iter*/sorted   :    500000 -  . . . . .  -- 0.160   . . . . .  -- 0.221  
Test pandas         :    500000 -  . . . . .  -- 0.493   . . . . .  -- 0.499  

当列表首先被洗牌时,排序的价格显而易见-效率显着下降,@ moooeeeep方法占主导地位,set和dict方法相似,但表现较差:

Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array
Test set len change :    500000 -  . . . . .  -- 0.321   . . . . .  -- 0.473  
Test in dict        :    500000 -  . . . . .  -- 0.285   . . . . .  -- 0.360  
Test in set         :    500000 -  . . . . .  -- 0.309   . . . . .  -- 0.365  
Test sort/adjacent  :    500000 -  . . . . .  -- 0.756   . . . . .  -- 0.823  
Test sort/groupby   :    500000 -  . . . . .  -- 1.459   . . . . .  -- 1.896  
Test sort/izip      :    500000 -  . . . . .  -- 0.786   . . . . .  -- 0.845  
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.743   . . . . .  -- 0.804  
Test moooeeeep      :    500000 -  . . . . .  -- 0.234   . . . . .  -- 0.311  *
Test iter*/sorted   :    500000 -  . . . . .  -- 0.776   . . . . .  -- 0.840  
Test pandas         :    500000 -  . . . . .  -- 0.539   . . . . .  -- 0.540  

I came across this question whilst looking in to something related – and wonder why no-one offered a generator based solution? Solving this problem would be:

>>> print list(getDupes_9([1,2,3,2,1,5,6,5,5,5]))
[1, 2, 5]

I was concerned with scalability, so tested several approaches, including naive items that work well on small lists, but scale horribly as lists get larger (note- would have been better to use timeit, but this is illustrative).

I included @moooeeeep for comparison (it is impressively fast: fastest if the input list is completely random) and an itertools approach that is even faster again for mostly sorted lists… Now includes pandas approach from @firelynx — slow, but not horribly so, and simple. Note – sort/tee/zip approach is consistently fastest on my machine for large mostly ordered lists, moooeeeep is fastest for shuffled lists, but your mileage may vary.

Advantages

  • very quick simple to test for ‘any’ duplicates using the same code

Assumptions

  • Duplicates should be reported once only
  • Duplicate order does not need to be preserved
  • Duplicate might be anywhere in the list

Fastest solution, 1m entries:

def getDupes(c):
        '''sort/tee/izip'''
        a, b = itertools.tee(sorted(c))
        next(b, None)
        r = None
        for k, g in itertools.izip(a, b):
            if k != g: continue
            if k != r:
                yield k
                r = k

Approaches tested

import itertools
import time
import random

def getDupes_1(c):
    '''naive'''
    for i in xrange(0, len(c)):
        if c[i] in c[:i]:
            yield c[i]

def getDupes_2(c):
    '''set len change'''
    s = set()
    for i in c:
        l = len(s)
        s.add(i)
        if len(s) == l:
            yield i

def getDupes_3(c):
    '''in dict'''
    d = {}
    for i in c:
        if i in d:
            if d[i]:
                yield i
                d[i] = False
        else:
            d[i] = True

def getDupes_4(c):
    '''in set'''
    s,r = set(),set()
    for i in c:
        if i not in s:
            s.add(i)
        elif i not in r:
            r.add(i)
            yield i

def getDupes_5(c):
    '''sort/adjacent'''
    c = sorted(c)
    r = None
    for i in xrange(1, len(c)):
        if c[i] == c[i - 1]:
            if c[i] != r:
                yield c[i]
                r = c[i]

def getDupes_6(c):
    '''sort/groupby'''
    def multiple(x):
        try:
            x.next()
            x.next()
            return True
        except:
            return False
    for k, g in itertools.ifilter(lambda x: multiple(x[1]), itertools.groupby(sorted(c))):
        yield k

def getDupes_7(c):
    '''sort/zip'''
    c = sorted(c)
    r = None
    for k, g in zip(c[:-1],c[1:]):
        if k == g:
            if k != r:
                yield k
                r = k

def getDupes_8(c):
    '''sort/izip'''
    c = sorted(c)
    r = None
    for k, g in itertools.izip(c[:-1],c[1:]):
        if k == g:
            if k != r:
                yield k
                r = k

def getDupes_9(c):
    '''sort/tee/izip'''
    a, b = itertools.tee(sorted(c))
    next(b, None)
    r = None
    for k, g in itertools.izip(a, b):
        if k != g: continue
        if k != r:
            yield k
            r = k

def getDupes_a(l):
    '''moooeeeep'''
    seen = set()
    seen_add = seen.add
    # adds all elements it doesn't know yet to seen and all other to seen_twice
    for x in l:
        if x in seen or seen_add(x):
            yield x

def getDupes_b(x):
    '''iter*/sorted'''
    x = sorted(x)
    def _matches():
        for k,g in itertools.izip(x[:-1],x[1:]):
            if k == g:
                yield k
    for k, n in itertools.groupby(_matches()):
        yield k

def getDupes_c(a):
    '''pandas'''
    import pandas as pd
    vc = pd.Series(a).value_counts()
    i = vc[vc > 1].index
    for _ in i:
        yield _

def hasDupes(fn,c):
    try:
        if fn(c).next(): return True    # Found a dupe
    except StopIteration:
        pass
    return False

def getDupes(fn,c):
    return list(fn(c))

STABLE = True
if STABLE:
    print 'Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array'
else:
    print 'Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array'
for location in (50,250000,500000,750000,999999):
    for test in (getDupes_2, getDupes_3, getDupes_4, getDupes_5, getDupes_6,
                 getDupes_8, getDupes_9, getDupes_a, getDupes_b, getDupes_c):
        print 'Test %-15s:%10d - '%(test.__doc__ or test.__name__,location),
        deltas = []
        for FIRST in (True,False):
            for i in xrange(0, 5):
                c = range(0,1000000)
                if STABLE:
                    c[0] = location
                else:
                    c.append(location)
                    random.shuffle(c)
                start = time.time()
                if FIRST:
                    print '.' if location == test(c).next() else '!',
                else:
                    print '.' if [location] == list(test(c)) else '!',
                deltas.append(time.time()-start)
            print ' -- %0.3f  '%(sum(deltas)/len(deltas)),
        print
    print

The results for the ‘all dupes’ test were consistent, finding “first” duplicate then “all” duplicates in this array:

Finding FIRST then ALL duplicates, single dupe of "nth" placed element in 1m element array
Test set len change :    500000 -  . . . . .  -- 0.264   . . . . .  -- 0.402  
Test in dict        :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.250  
Test in set         :    500000 -  . . . . .  -- 0.163   . . . . .  -- 0.249  
Test sort/adjacent  :    500000 -  . . . . .  -- 0.159   . . . . .  -- 0.229  
Test sort/groupby   :    500000 -  . . . . .  -- 0.860   . . . . .  -- 1.286  
Test sort/izip      :    500000 -  . . . . .  -- 0.165   . . . . .  -- 0.229  
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.145   . . . . .  -- 0.206  *
Test moooeeeep      :    500000 -  . . . . .  -- 0.149   . . . . .  -- 0.232  
Test iter*/sorted   :    500000 -  . . . . .  -- 0.160   . . . . .  -- 0.221  
Test pandas         :    500000 -  . . . . .  -- 0.493   . . . . .  -- 0.499  

When the lists are shuffled first, the price of the sort becomes apparent – the efficiency drops noticeably and the @moooeeeep approach dominates, with set & dict approaches being similar but lessor performers:

Finding FIRST then ALL duplicates, single dupe of "n" included in randomised 1m element array
Test set len change :    500000 -  . . . . .  -- 0.321   . . . . .  -- 0.473  
Test in dict        :    500000 -  . . . . .  -- 0.285   . . . . .  -- 0.360  
Test in set         :    500000 -  . . . . .  -- 0.309   . . . . .  -- 0.365  
Test sort/adjacent  :    500000 -  . . . . .  -- 0.756   . . . . .  -- 0.823  
Test sort/groupby   :    500000 -  . . . . .  -- 1.459   . . . . .  -- 1.896  
Test sort/izip      :    500000 -  . . . . .  -- 0.786   . . . . .  -- 0.845  
Test sort/tee/izip  :    500000 -  . . . . .  -- 0.743   . . . . .  -- 0.804  
Test moooeeeep      :    500000 -  . . . . .  -- 0.234   . . . . .  -- 0.311  *
Test iter*/sorted   :    500000 -  . . . . .  -- 0.776   . . . . .  -- 0.840  
Test pandas         :    500000 -  . . . . .  -- 0.539   . . . . .  -- 0.540  

回答 5

使用熊猫:

>>> import pandas as pd
>>> a = [1, 2, 1, 3, 3, 3, 0]
>>> pd.Series(a)[pd.Series(a).duplicated()].values
array([1, 3, 3])

Using pandas:

>>> import pandas as pd
>>> a = [1, 2, 1, 3, 3, 3, 0]
>>> pd.Series(a)[pd.Series(a).duplicated()].values
array([1, 3, 3])

回答 6

collections.Counter是python 2.7中的新功能:


Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'Counter'
>>> 

在早期版本中,您可以改用常规dict:

a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
    if elem in d:
        d[elem] += 1
    else:
        d[elem] = 1

print [x for x, y in d.items() if y > 1]

collections.Counter is new in python 2.7:


Python 2.5.4 (r254:67916, May 31 2010, 15:03:39) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-46)] on linux2
a = [1,2,3,2,1,5,6,5,5,5]
import collections
print [x for x, y in collections.Counter(a).items() if y > 1]
Type "help", "copyright", "credits" or "license" for more information.
  File "", line 1, in 
AttributeError: 'module' object has no attribute 'Counter'
>>> 

In an earlier version you can use a conventional dict instead:

a = [1,2,3,2,1,5,6,5,5,5]
d = {}
for elem in a:
    if elem in d:
        d[elem] += 1
    else:
        d[elem] = 1

print [x for x, y in d.items() if y > 1]

回答 7

这是一个简洁明了的解决方案-

for x in set(li):
    li.remove(x)

li = list(set(li))

Here’s a neat and concise solution –

for x in set(li):
    li.remove(x)

li = list(set(li))

回答 8

如果不转换为列表,可能最简单的方法如下所示。 当他们要求不使用布景时,这可能在面试中很有用

a=[1,2,3,3,3]
dup=[]
for each in a:
  if each not in dup:
    dup.append(each)
print(dup)

=======否则将获得2个单独的唯一值和重复值列表

a=[1,2,3,3,3]
uniques=[]
dups=[]

for each in a:
  if each not in uniques:
    uniques.append(each)
  else:
    dups.append(each)
print("Unique values are below:")
print(uniques)
print("Duplicate values are below:")
print(dups)

Without converting to list and probably the simplest way would be something like below. This may be useful during a interview when they ask not to use sets

a=[1,2,3,3,3]
dup=[]
for each in a:
  if each not in dup:
    dup.append(each)
print(dup)

======= else to get 2 separate lists of unique values and duplicate values

a=[1,2,3,3,3]
uniques=[]
dups=[]

for each in a:
  if each not in uniques:
    uniques.append(each)
  else:
    dups.append(each)
print("Unique values are below:")
print(uniques)
print("Duplicate values are below:")
print(dups)

回答 9

我会用熊猫做这件事,因为我经常使用熊猫

import pandas as pd
a = [1,2,3,3,3,4,5,6,6,7]
vc = pd.Series(a).value_counts()
vc[vc > 1].index.tolist()

[3,6]

可能不是很有效,但是肯定比许多其他答案要少的代码,所以我想我会做出贡献

I would do this with pandas, because I use pandas a lot

import pandas as pd
a = [1,2,3,3,3,4,5,6,6,7]
vc = pd.Series(a).value_counts()
vc[vc > 1].index.tolist()

Gives

[3,6]

Probably isn’t very efficient, but it sure is less code than a lot of the other answers, so I thought I would contribute


回答 10

接受的答案的第三个示例给出了错误的答案,并且没有尝试重复。这是正确的版本:

number_lst = [1, 1, 2, 3, 5, ...]

seen_set = set()
duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
unique_set = seen_set - duplicate_set

the third example of the accepted answer give an erroneous answer and does not attempt to give duplicates. Here is the correct version :

number_lst = [1, 1, 2, 3, 5, ...]

seen_set = set()
duplicate_set = set(x for x in number_lst if x in seen_set or seen_set.add(x))
unique_set = seen_set - duplicate_set

回答 11

如何通过检查出现次数来简单地循环遍历列表中的每个元素,然后将它们添加到集合中,然后将其打印出来。希望这可以帮助某人。

myList  = [2 ,4 , 6, 8, 4, 6, 12];
newList = set()

for i in myList:
    if myList.count(i) >= 2:
        newList.add(i)

print(list(newList))
## [4 , 6]

How about simply loop through each element in the list by checking the number of occurrences, then adding them to a set which will then print the duplicates. Hope this helps someone out there.

myList  = [2 ,4 , 6, 8, 4, 6, 12];
newList = set()

for i in myList:
    if myList.count(i) >= 2:
        newList.add(i)

print(list(newList))
## [4 , 6]

回答 12

我们可以使用itertools.groupby来查找所有具有重复项的项目:

from itertools import groupby

myList  = [2, 4, 6, 8, 4, 6, 12]
# when the list is sorted, groupby groups by consecutive elements which are similar
for x, y in groupby(sorted(myList)):
    #  list(y) returns all the occurences of item x
    if len(list(y)) > 1:
        print x  

输出将是:

4
6

We can use itertools.groupby in order to find all the items that have dups:

from itertools import groupby

myList  = [2, 4, 6, 8, 4, 6, 12]
# when the list is sorted, groupby groups by consecutive elements which are similar
for x, y in groupby(sorted(myList)):
    #  list(y) returns all the occurences of item x
    if len(list(y)) > 1:
        print x  

The output will be:

4
6

回答 13

我想在列表中查找重复项的最有效方法是:

from collections import Counter

def duplicates(values):
    dups = Counter(values) - Counter(set(values))
    return list(dups.keys())

print(duplicates([1,2,3,6,5,2]))

它使用Counter所有元素和所有唯一元素。将第一个与第二个相减将只保留重复项。

I guess the most effective way to find duplicates in a list is:

from collections import Counter

def duplicates(values):
    dups = Counter(values) - Counter(set(values))
    return list(dups.keys())

print(duplicates([1,2,3,6,5,2]))

It uses the Counter the all the elements and all unique elements. Subtracting the first one with the second will leave out only the duplicates.


回答 14

有点晚了,但可能对某些人有所帮助。对于一个庞大的清单,我发现这对我有用。

l=[1,2,3,5,4,1,3,1]
s=set(l)
d=[]
for x in l:
    if x in s:
        s.remove(x)
    else:
        d.append(x)
d
[1,3,1]

只显示所有重复项并保留顺序。

A bit late, but maybe helpful for some. For a largish list, I found this worked for me.

l=[1,2,3,5,4,1,3,1]
s=set(l)
d=[]
for x in l:
    if x in s:
        s.remove(x)
    else:
        d.append(x)
d
[1,3,1]

Shows just and all duplicates and preserves order.


回答 15

在Python中一次迭代即可找到重复对象的非常简单快捷的方法是:

testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']

testListDict = {}

for item in testList:
  try:
    testListDict[item] += 1
  except:
    testListDict[item] = 1

print testListDict

输出如下:

>>> print testListDict
{'blue': 3, 'green': 1, 'red': 2}

这以及我博客中的更多内容http://www.howtoprogramwithpython.com

Very simple and quick way of finding dupes with one iteration in Python is:

testList = ['red', 'blue', 'red', 'green', 'blue', 'blue']

testListDict = {}

for item in testList:
  try:
    testListDict[item] += 1
  except:
    testListDict[item] = 1

print testListDict

Output will be as follows:

>>> print testListDict
{'blue': 3, 'green': 1, 'red': 2}

This and more in my blog http://www.howtoprogramwithpython.com


回答 16

我要进入这个讨论的很晚了。即使,我也想用一个衬纸来解决这个问题。因为那是Python的魅力。如果我们只想将重复项放入单独的列表(或任何集合)中,我建议按照以下步骤进行操作。说我们有一个重复的列表,我们可以将其称为“目标”

    target=[1,2,3,4,4,4,3,5,6,8,4,3]

现在,如果要获取重复项,可以使用一种衬纸,如下所示:

    duplicates=dict(set((x,target.count(x)) for x in filter(lambda rec : target.count(rec)>1,target)))

这段代码会将重复的记录作为键并作为值存入字典’duplicates’中。’duplicate’字典如下所示:

    {3: 3, 4: 4} #it saying 3 is repeated 3 times and 4 is 4 times

如果只想将所有重复的记录都放在一个列表中,那么它的代码又要短得多:

    duplicates=filter(lambda rec : target.count(rec)>1,target)

输出将是:

    [3, 4, 4, 4, 3, 4, 3]

这在python 2.7.x +版本中完美工作

I am entering much much late in to this discussion. Even though, I would like to deal with this problem with one liners . Because that’s the charm of Python. if we just want to get the duplicates in to a separate list (or any collection),I would suggest to do as below.Say we have a duplicated list which we can call as ‘target’

    target=[1,2,3,4,4,4,3,5,6,8,4,3]

Now if we want to get the duplicates,we can use the one liner as below:

    duplicates=dict(set((x,target.count(x)) for x in filter(lambda rec : target.count(rec)>1,target)))

This code will put the duplicated records as key and count as value in to the dictionary ‘duplicates’.’duplicate’ dictionary will look like as below:

    {3: 3, 4: 4} #it saying 3 is repeated 3 times and 4 is 4 times

If you just want all the records with duplicates alone in a list, its again much shorter code:

    duplicates=filter(lambda rec : target.count(rec)>1,target)

Output will be:

    [3, 4, 4, 4, 3, 4, 3]

This works perfectly in python 2.7.x + versions


回答 17

如果您不愿意编写自己的算法或使用库,则使用Python 3.8一线式:

l = [1,2,3,2,1,5,6,5,5,5]

res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]

print(res)

打印项目和计数:

[(1, 2), (2, 2), (5, 4)]

groupby具有分组功能,因此您可以以不同的方式定义分组,并Tuple根据需要返回其他字段。

groupby 很懒,所以不要太慢。

Python 3.8 one-liner if you don’t care to write your own algorithm or use libraries:

l = [1,2,3,2,1,5,6,5,5,5]

res = [(x, count) for x, g in groupby(sorted(l)) if (count := len(list(g))) > 1]

print(res)

Prints item and count:

[(1, 2), (2, 2), (5, 4)]

groupby takes a grouping function so you can define your groupings in different ways and return additional Tuple fields as needed.

groupby is lazy so it shouldn’t be too slow.


回答 18

其他一些测试。当然可以…

set([x for x in l if l.count(x) > 1])

…太贵了 使用下一个最终方法大约要快500倍(更长的数组会带来更好的结果):

def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()

仅2个循环,没有非常昂贵的l.count()操作。

例如,下面是比较这些方法的代码。代码如下,这是输出:

dups_count: 13.368s # this is a function which uses l.count()
dups_count_dict: 0.014s # this is a final best function (of the 3 functions)
dups_count_counter: 0.024s # collections.Counter

测试代码:

import numpy as np
from time import time
from collections import Counter

class TimerCounter(object):
    def __init__(self):
        self._time_sum = 0

    def start(self):
        self.time = time()

    def stop(self):
        self._time_sum += time() - self.time

    def get_time_sum(self):
        return self._time_sum


def dups_count(l):
    return set([x for x in l if l.count(x) > 1])


def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()


def dups_counter(l):
    counter = Counter(l)    

    result_d = {key: val for key, val in counter.iteritems() if val > 1}

    return result_d.keys()



def gen_array():
    np.random.seed(17)
    return list(np.random.randint(0, 5000, 10000))


def assert_equal_results(*results):
    primary_result = results[0]
    other_results = results[1:]

    for other_result in other_results:
        assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)


if __name__ == '__main__':
    dups_count_time = TimerCounter()
    dups_count_dict_time = TimerCounter()
    dups_count_counter = TimerCounter()

    l = gen_array()

    for i in range(3):
        dups_count_time.start()
        result1 = dups_count(l)
        dups_count_time.stop()

        dups_count_dict_time.start()
        result2 = dups_count_dict(l)
        dups_count_dict_time.stop()

        dups_count_counter.start()
        result3 = dups_counter(l)
        dups_count_counter.stop()

        assert_equal_results(result1, result2, result3)

    print 'dups_count: %.3f' % dups_count_time.get_time_sum()
    print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()
    print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()

Some other tests. Of course to do…

set([x for x in l if l.count(x) > 1])

…is too costly. It’s about 500 times faster (the more long array gives better results) to use the next final method:

def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()

Only 2 loops, no very costly l.count() operations.

Here is a code to compare the methods for example. The code is below, here is the output:

dups_count: 13.368s # this is a function which uses l.count()
dups_count_dict: 0.014s # this is a final best function (of the 3 functions)
dups_count_counter: 0.024s # collections.Counter

The testing code:

import numpy as np
from time import time
from collections import Counter

class TimerCounter(object):
    def __init__(self):
        self._time_sum = 0

    def start(self):
        self.time = time()

    def stop(self):
        self._time_sum += time() - self.time

    def get_time_sum(self):
        return self._time_sum


def dups_count(l):
    return set([x for x in l if l.count(x) > 1])


def dups_count_dict(l):
    d = {}

    for item in l:
        if item not in d:
            d[item] = 0

        d[item] += 1

    result_d = {key: val for key, val in d.iteritems() if val > 1}

    return result_d.keys()


def dups_counter(l):
    counter = Counter(l)    

    result_d = {key: val for key, val in counter.iteritems() if val > 1}

    return result_d.keys()



def gen_array():
    np.random.seed(17)
    return list(np.random.randint(0, 5000, 10000))


def assert_equal_results(*results):
    primary_result = results[0]
    other_results = results[1:]

    for other_result in other_results:
        assert set(primary_result) == set(other_result) and len(primary_result) == len(other_result)


if __name__ == '__main__':
    dups_count_time = TimerCounter()
    dups_count_dict_time = TimerCounter()
    dups_count_counter = TimerCounter()

    l = gen_array()

    for i in range(3):
        dups_count_time.start()
        result1 = dups_count(l)
        dups_count_time.stop()

        dups_count_dict_time.start()
        result2 = dups_count_dict(l)
        dups_count_dict_time.stop()

        dups_count_counter.start()
        result3 = dups_counter(l)
        dups_count_counter.stop()

        assert_equal_results(result1, result2, result3)

    print 'dups_count: %.3f' % dups_count_time.get_time_sum()
    print 'dups_count_dict: %.3f' % dups_count_dict_time.get_time_sum()
    print 'dups_count_counter: %.3f' % dups_count_counter.get_time_sum()

回答 19

方法1:

list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))

说明: [idx的val,如果input_list [idx + 1:]中的val,则enumerate(input_list)中的val]是一个列表推导,如果从当前位置的列表中存在相同的元素,则返回一个元素。

例如:input_list = [42,31,42,31,3,31,31,5,6,6,6,6,6,7,42]

从列表42中的第一个元素开始,索引为0,它会检查input_list [1:]中是否存在元素42(即,从索引1到列表末尾),因为input_list [1:]中存在42 ,它将返回42。

然后转到具有索引1的下一个元素31,并检查input_list [2:]中是否存在元素31(即,从索引2到列表末尾),因为input_list [2:]中存在31,它将返回31。

类似地,它遍历列表中的所有元素,并且仅将重复/重复的元素返回到列表中。

然后,因为我们有重复项,所以在列表中,我们需要从每个重复项中选择一个,即删除重复项中的重复项,然后调用python内置的名为set()的python,它会删除重复项,

然后我们剩下一个集合,而不是列表,因此要从一个集合转换为列表,我们使用typecasting,list(),并将元素集转换为一个列表。

方法2:

def dupes(ilist):
    temp_list = [] # initially, empty temporary list
    dupe_list = [] # initially, empty duplicate list
    for each in ilist:
        if each in temp_list: # Found a Duplicate element
            if not each in dupe_list: # Avoid duplicate elements in dupe_list
                dupe_list.append(each) # Add duplicate element to dupe_list
        else: 
            temp_list.append(each) # Add a new (non-duplicate) to temp_list

    return dupe_list

说明: 在这里,我们首先创建两个空列表。然后继续遍历列表的所有元素,以查看它是否存在于temp_list中(最初为空)。如果temp_list中没有它,则使用append方法将其添加到temp_list中。

如果它已经存在于temp_list中,则意味着该列表的当前元素是重复的,因此我们需要使用append方法将其添加到dupe_list中。

Method 1:

list(set([val for idx, val in enumerate(input_list) if val in input_list[idx+1:]]))

Explanation: [val for idx, val in enumerate(input_list) if val in input_list[idx+1:]] is a list comprehension, that returns an element, if the same element is present from it’s current position, in list, the index.

Example: input_list = [42,31,42,31,3,31,31,5,6,6,6,6,6,7,42]

starting with the first element in list, 42, with index 0, it checks if the element 42, is present in input_list[1:] (i.e., from index 1 till end of list) Because 42 is present in input_list[1:], it will return 42.

Then it goes to the next element 31, with index 1, and checks if element 31 is present in the input_list[2:] (i.e., from index 2 till end of list), Because 31 is present in input_list[2:], it will return 31.

similarly it goes through all the elements in the list, and will return only the repeated/duplicate elements into a list.

Then because we have duplicates, in a list, we need to pick one of each duplicate, i.e. remove duplicate among duplicates, and to do so, we do call a python built-in named set(), and it removes the duplicates,

Then we are left with a set, but not a list, and hence to convert from a set to list, we use, typecasting, list(), and that converts the set of elements to a list.

Method 2:

def dupes(ilist):
    temp_list = [] # initially, empty temporary list
    dupe_list = [] # initially, empty duplicate list
    for each in ilist:
        if each in temp_list: # Found a Duplicate element
            if not each in dupe_list: # Avoid duplicate elements in dupe_list
                dupe_list.append(each) # Add duplicate element to dupe_list
        else: 
            temp_list.append(each) # Add a new (non-duplicate) to temp_list

    return dupe_list

Explanation: Here We create two empty lists, to start with. Then keep traversing through all the elements of the list, to see if it exists in temp_list (initially empty). If it is not there in the temp_list, then we add it to the temp_list, using append method.

If it already exists in temp_list, it means, that the current element of the list is a duplicate, and hence we need to add it to dupe_list using append method.


回答 20

raw_list = [1,2,3,3,4,5,6,6,7,2,3,4,2,3,4,1,3,4,]

clean_list = list(set(raw_list))
duplicated_items = []

for item in raw_list:
    try:
        clean_list.remove(item)
    except ValueError:
        duplicated_items.append(item)


print(duplicated_items)
# [3, 6, 2, 3, 4, 2, 3, 4, 1, 3, 4]

您基本上可以通过转换为set(clean_list)来删除重复项,然后对其进行迭代raw_list,同时从item清除列表中删除每个重复项以在中出现raw_list。如果item未找到,ValueError则捕获引发的Exception并将其item添加到duplicated_items列表中。

如果需要重复项的索引,则只需enumerate列出并使用索引即可。(for index, item in enumerate(raw_list):),速度更快,并针对大型列表(如数千个元素以上)进行了优化

raw_list = [1,2,3,3,4,5,6,6,7,2,3,4,2,3,4,1,3,4,]

clean_list = list(set(raw_list))
duplicated_items = []

for item in raw_list:
    try:
        clean_list.remove(item)
    except ValueError:
        duplicated_items.append(item)


print(duplicated_items)
# [3, 6, 2, 3, 4, 2, 3, 4, 1, 3, 4]

You basically remove duplicates by converting to set (clean_list), then iterate the raw_list, while removing each item in the clean list for occurrence in raw_list. If item is not found, the raised ValueError Exception is caught and the item is added to duplicated_items list.

If the index of duplicated items is needed, just enumerate the list and play around with the index. (for index, item in enumerate(raw_list):) which is faster and optimised for large lists (like thousands+ of elements)


回答 21

使用list.count()列表中的方法找出给定列表中的重复元素

arr=[]
dup =[]
for i in range(int(input("Enter range of list: "))):
    arr.append(int(input("Enter Element in a list: ")))
for i in arr:
    if arr.count(i)>1 and i not in dup:
        dup.append(i)
print(dup)

use of list.count() method in the list to find out the duplicate elements of a given list

arr=[]
dup =[]
for i in range(int(input("Enter range of list: "))):
    arr.append(int(input("Enter Element in a list: ")))
for i in arr:
    if arr.count(i)>1 and i not in dup:
        dup.append(i)
print(dup)

回答 22

单线,乐趣无穷,并且需要一个声明。

(lambda iterable: reduce(lambda (uniq, dup), item: (uniq, dup | {item}) if item in uniq else (uniq | {item}, dup), iterable, (set(), set())))(some_iterable)

one-liner, for fun, and where a single statement is required.

(lambda iterable: reduce(lambda (uniq, dup), item: (uniq, dup | {item}) if item in uniq else (uniq | {item}, dup), iterable, (set(), set())))(some_iterable)

回答 23

list2 = [1, 2, 3, 4, 1, 2, 3]
lset = set()
[(lset.add(item), list2.append(item))
 for item in list2 if item not in lset]
print list(lset)
list2 = [1, 2, 3, 4, 1, 2, 3]
lset = set()
[(lset.add(item), list2.append(item))
 for item in list2 if item not in lset]
print list(lset)

回答 24

一线解决方案:

set([i for i in list if sum([1 for a in list if a == i]) > 1])

One line solution:

set([i for i in list if sum([1 for a in list if a == i]) > 1])

回答 25

这里有很多答案,但是我认为这是一种相对易读且易于理解的方法:

def get_duplicates(sorted_list):
    duplicates = []
    last = sorted_list[0]
    for x in sorted_list[1:]:
        if x == last:
            duplicates.append(x)
        last = x
    return set(duplicates)

笔记:

  • 如果您希望保留重复计数,请取消强制转换为底部的“ set”以获取完整列表
  • 如果您更喜欢使用生成器,请用yield x和底部的return语句替换duplicates.append(x)(可以稍后进行设置)

There are a lot of answers up here, but I think this is relatively a very readable and easy to understand approach:

def get_duplicates(sorted_list):
    duplicates = []
    last = sorted_list[0]
    for x in sorted_list[1:]:
        if x == last:
            duplicates.append(x)
        last = x
    return set(duplicates)

Notes:

  • If you wish to preserve duplication count, get rid of the cast to ‘set’ at the bottom to get the full list
  • If you prefer to use generators, replace duplicates.append(x) with yield x and the return statement at the bottom (you can cast to set later)

回答 26

这是一个快速生成器,它使用字典将每个元素存储为具有布尔值的键,以检查是否已产生重复项。

对于具有所有元素都是可哈希类型的列表:

def gen_dupes(array):
    unique = {}
    for value in array:
        if value in unique and unique[value]:
            unique[value] = False
            yield value
        else:
            unique[value] = True

array = [1, 2, 2, 3, 4, 1, 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, 1, 6]

对于可能包含列表的列表:

def gen_dupes(array):
    unique = {}
    for value in array:
        is_list = False
        if type(value) is list:
            value = tuple(value)
            is_list = True

        if value in unique and unique[value]:
            unique[value] = False
            if is_list:
                value = list(value)

            yield value
        else:
            unique[value] = True

array = [1, 2, 2, [1, 2], 3, 4, [1, 2], 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, [1, 2], 6]

Here’s a fast generator that uses a dict to store each element as a key with a boolean value for checking if the duplicate item has already been yielded.

For lists with all elements that are hashable types:

def gen_dupes(array):
    unique = {}
    for value in array:
        if value in unique and unique[value]:
            unique[value] = False
            yield value
        else:
            unique[value] = True

array = [1, 2, 2, 3, 4, 1, 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, 1, 6]

For lists that might contain lists:

def gen_dupes(array):
    unique = {}
    for value in array:
        is_list = False
        if type(value) is list:
            value = tuple(value)
            is_list = True

        if value in unique and unique[value]:
            unique[value] = False
            if is_list:
                value = list(value)

            yield value
        else:
            unique[value] = True

array = [1, 2, 2, [1, 2], 3, 4, [1, 2], 5, 2, 6, 6]
print(list(gen_dupes(array)))
# => [2, [1, 2], 6]

回答 27

def removeduplicates(a):
  seen = set()

  for i in a:
    if i not in seen:
      seen.add(i)
  return seen 

print(removeduplicates([1,1,2,2]))
def removeduplicates(a):
  seen = set()

  for i in a:
    if i not in seen:
      seen.add(i)
  return seen 

print(removeduplicates([1,1,2,2]))

回答 28

使用toolz时

from toolz import frequencies, valfilter

a = [1,2,2,3,4,5,4]
>>> list(valfilter(lambda count: count > 1, frequencies(a)).keys())
[2,4] 

When using toolz:

from toolz import frequencies, valfilter

a = [1,2,2,3,4,5,4]
>>> list(valfilter(lambda count: count > 1, frequencies(a)).keys())
[2,4] 

回答 29

这是我必须这样做的方法,因为我向自己提出挑战,不要使用其他方法:

def dupList(oldlist):
    if type(oldlist)==type((2,2)):
        oldlist=[x for x in oldlist]
    newList=[]
    newList=newList+oldlist
    oldlist=oldlist
    forbidden=[]
    checkPoint=0
    for i in range(len(oldlist)):
        #print 'start i', i
        if i in forbidden:
            continue
        else:
            for j in range(len(oldlist)):
                #print 'start j', j
                if j in forbidden:
                    continue
                else:
                    #print 'after Else'
                    if i!=j: 
                        #print 'i,j', i,j
                        #print oldlist
                        #print newList
                        if oldlist[j]==oldlist[i]:
                            #print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]
                            forbidden.append(j)
                            #print 'forbidden', forbidden
                            del newList[j-checkPoint]
                            #print newList
                            checkPoint=checkPoint+1
    return newList

因此您的示例工作方式为:

>>>a = [1,2,3,3,3,4,5,6,6,7]
>>>dupList(a)
[1, 2, 3, 4, 5, 6, 7]

this is the way I had to do it because I challenged myself not to use other methods:

def dupList(oldlist):
    if type(oldlist)==type((2,2)):
        oldlist=[x for x in oldlist]
    newList=[]
    newList=newList+oldlist
    oldlist=oldlist
    forbidden=[]
    checkPoint=0
    for i in range(len(oldlist)):
        #print 'start i', i
        if i in forbidden:
            continue
        else:
            for j in range(len(oldlist)):
                #print 'start j', j
                if j in forbidden:
                    continue
                else:
                    #print 'after Else'
                    if i!=j: 
                        #print 'i,j', i,j
                        #print oldlist
                        #print newList
                        if oldlist[j]==oldlist[i]:
                            #print 'oldlist[i],oldlist[j]', oldlist[i],oldlist[j]
                            forbidden.append(j)
                            #print 'forbidden', forbidden
                            del newList[j-checkPoint]
                            #print newList
                            checkPoint=checkPoint+1
    return newList

so your sample works as:

>>>a = [1,2,3,3,3,4,5,6,6,7]
>>>dupList(a)
[1, 2, 3, 4, 5, 6, 7]