标签归档:flatten

如何仅展平numpy数组的某些尺寸

问题:如何仅展平numpy数组的某些尺寸

是否有一种快速的方法来“子展平”或仅展平numpy数组中的某些第一维?

例如,给定一维的numpy数组(50,100,25),结果尺寸为(5000,25)

Is there a quick way to “sub-flatten” or flatten only some of the first dimensions in a numpy array?

For example, given a numpy array of dimensions (50,100,25), the resultant dimensions would be (5000,25)


回答 0

看看numpy.reshape

>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)

>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape   
# (5000, 25)

# One shape dimension can be -1. 
# In this case, the value is inferred from 
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)

Take a look at numpy.reshape .

>>> arr = numpy.zeros((50,100,25))
>>> arr.shape
# (50, 100, 25)

>>> new_arr = arr.reshape(5000,25)
>>> new_arr.shape   
# (5000, 25)

# One shape dimension can be -1. 
# In this case, the value is inferred from 
# the length of the array and remaining dimensions.
>>> another_arr = arr.reshape(-1, arr.shape[-1])
>>> another_arr.shape
# (5000, 25)

回答 1

亚历山大答案的一个概括-np.reshape可以采用-1作为参数,表示“总数组大小除以所有其他列出的维数的乘积”:

例如,将除最后一个尺寸以外的所有尺寸展平:

>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)

A slight generalization to Alexander’s answer – np.reshape can take -1 as an argument, meaning “total array size divided by product of all other listed dimensions”:

e.g. to flatten all but the last dimension:

>>> arr = numpy.zeros((50,100,25))
>>> new_arr = arr.reshape(-1, arr.shape[-1])
>>> new_arr.shape
# (5000, 25)

回答 2

彼得的答案略有概括-如果您想超越三维阵列,则可以在原始阵列的形状上指定一个范围。

例如,将除最后两个维度之外的所有维度展平:

arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)

编辑:对我之前的回答有一个概括-当然,您也可以在重塑的开始处指定一个范围:

arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)

A slight generalization to Peter’s answer — you can specify a range over the original array’s shape if you want to go beyond three dimensional arrays.

e.g. to flatten all but the last two dimensions:

arr = numpy.zeros((3, 4, 5, 6))
new_arr = arr.reshape(-1, *arr.shape[-2:])
new_arr.shape
# (12, 5, 6)

EDIT: A slight generalization to my earlier answer — you can, of course, also specify a range at the beginning of the of the reshape too:

arr = numpy.zeros((3, 4, 5, 6, 7, 8))
new_arr = arr.reshape(*arr.shape[:2], -1, *arr.shape[-2:])
new_arr.shape
# (3, 4, 30, 7, 8)

回答 3

一种替代方法是按numpy.resize()如下方式使用:

In [37]: shp = (50,100,25)
In [38]: arr = np.random.random_sample(shp)
In [45]: resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In [46]: resized_arr.shape
Out[46]: (5000, 25)

# sanity check with other solutions
In [47]: resized = np.reshape(arr, (-1, shp[-1]))
In [48]: np.allclose(resized_arr, resized)
Out[48]: True

An alternative approach is to use numpy.resize() as in:

In [37]: shp = (50,100,25)
In [38]: arr = np.random.random_sample(shp)
In [45]: resized_arr = np.resize(arr, (np.prod(shp[:2]), shp[-1]))
In [46]: resized_arr.shape
Out[46]: (5000, 25)

# sanity check with other solutions
In [47]: resized = np.reshape(arr, (-1, shp[-1]))
In [48]: np.allclose(resized_arr, resized)
Out[48]: True

将Pandas Multi-Index转换为专栏

问题:将Pandas Multi-Index转换为专栏

我有一个具有2个索引级别的数据框:

                         value
Trial    measurement
    1              0        13
                   1         3
                   2         4
    2              0       NaN
                   1        12
    3              0        34 

我想变成这样:

Trial    measurement       value

    1              0        13
    1              1         3
    1              2         4
    2              0       NaN
    2              1        12
    3              0        34 

我怎样才能最好地做到这一点?

我需要这样做是因为我想按照此处的指示汇总数据,但是如果将它们用作索引,则无法选择这样的列。

I have a dataframe with 2 index levels:

                         value
Trial    measurement
    1              0        13
                   1         3
                   2         4
    2              0       NaN
                   1        12
    3              0        34 

Which I want to turn into this:

Trial    measurement       value

    1              0        13
    1              1         3
    1              2         4
    2              0       NaN
    2              1        12
    3              0        34 

How can I best do this?

I need this because I want to aggregate the data as instructed here, but I can’t select my columns like that if they are in use as indices.


回答 0

所述reset_index()是一个数据帧熊猫方法,将索引值转移到数据帧为列。参数的默认设置为drop = False(将索引值保留为列)。

您只需.reset_index(inplace=True)在DataFrame名称后添加:

df.reset_index(inplace=True)  

The reset_index() is a pandas DataFrame method that will transfer index values into the DataFrame as columns. The default setting for the parameter is drop=False (which will keep the index values as columns).

All you have to do add .reset_index(inplace=True) after the name of the DataFrame:

df.reset_index(inplace=True)  

回答 1

这并不是真的适用于您的情况,但可能有助于其他人(例如5分钟前的我自己)知道。如果一个人的多重数具有如下相同的名称:

                         value
Trial        Trial
    1              0        13
                   1         3
                   2         4
    2              0       NaN
                   1        12
    3              0        34 

df.reset_index(inplace=True) 将会失败,因为创建的列不能具有相同的名称。

因此,您需要将multindex重命名为df.index = df.index.set_names(['Trial', 'measurement'])

                           value
Trial    measurement       

    1              0        13
    1              1         3
    1              2         4
    2              0       NaN
    2              1        12
    3              0        34 

然后df.reset_index(inplace=True)将像魅力一样工作。

在按年和月对名为datetime的列(不是索引)进行分组之后,我遇到了这个问题live_date,这意味着年和月都被命名了live_date

This doesn’t really apply to your case but could be helpful for others (like myself 5 minutes ago) to know. If one’s multindex have the same name like this:

                         value
Trial        Trial
    1              0        13
                   1         3
                   2         4
    2              0       NaN
                   1        12
    3              0        34 

df.reset_index(inplace=True) will fail, cause the columns that are created cannot have the same names.

So then you need to rename the multindex with df.index = df.index.set_names(['Trial', 'measurement']) to get:

                           value
Trial    measurement       

    1              0        13
    1              1         3
    1              2         4
    2              0       NaN
    2              1        12
    3              0        34 

And then df.reset_index(inplace=True) will work like a charm.

I encountered this problem after grouping by year and month on a datetime-column(not index) called live_date, which meant that both year and month were named live_date.


回答 2

正如@ cs95在评论中提到的,要仅降低一个级别,请使用:

df.reset_index(level=[...])

这样可以避免在重置后必须重新定义所需的索引。

As @cs95 mentioned in a comment, to drop only one level, use:

df.reset_index(level=[...])

This avoids having to redefine your desired index after reset.


numpy中的flatten和ravel函数有什么区别?

问题:numpy中的flatten和ravel函数有什么区别?

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

这两个函数返回相同的列表。那么需要两个不同的功能来执行相同的工作。

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

Both function return the same list. Then what is the need of two different functions performing same job.


回答 0

当前的API是:

  • flatten 总是返回一个副本。
  • ravel尽可能返回原始数组的视图。这在打印输出中不可见,但是如果您修改ravel返回的数组,则可能会修改原始数组中的条目。如果您修改从flatten返回的数组中的条目,则将永远不会发生。ravel通常会更快,因为没有内存被复制,但是您在修改返回的数组时要格外小心。
  • reshape((-1,)) 只要数组的步幅允许,就可以得到一个视图,即使这意味着您并不总是可以获得连续的数组。

The current API is that:

  • flatten always returns a copy.
  • ravel returns a view of the original array whenever possible. This isn’t visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
  • reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don’t always get a contiguous array.

回答 1

如此所述,关键区别在于:

  • flatten 是ndarray对象的方法,因此只能用于真正的numpy数组。

  • ravel 是库级别的函数,因此可以在任何可以成功解析的对象上调用。

例如,ravel将对ndarray列表起作用,flatten而不适用于该类型的对象。

@IanH还在回答中指出了与内存处理的重要区别。

As explained here a key difference is that:

  • flatten is a method of an ndarray object and hence can only be called for true numpy arrays.

  • ravel is a library-level function and hence can be called on any object that can successfully be parsed.

For example ravel will work on a list of ndarrays, while flatten is not available for that type of object.

@IanH also points out important differences with memory handling in his answer.


回答 2

这是函数的正确命名空间:

这两个函数均返回指向新存储器结构的展平一维数组。

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

在上例中:

  • 结果的存储位置不同,
  • 结果看起来一样
  • 展平将返回副本
  • ravel将返回一个视图。

我们如何检查某物是否是副本?使用的.base属性ndarray。如果是视图,则基础将是原始数组;如果是副本,则基数为None

Here is the correct namespace for the functions:

Both functions return flattened 1D arrays pointing to the new memory structures.

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

In the upper example:

  • the memory locations of the results are different,
  • the results look the same
  • flatten would return a copy
  • ravel would return a view.

How we check if something is a copy? Using the .base attribute of the ndarray. If it’s a view, the base will be the original array; if it is a copy, the base will be None.


拼合不规则的列表

问题:拼合不规则的列表

是的,我知道以前已经讨论过这个主题(这里这里这里这里),但是据我所知,除一个解决方案外,所有解决方案在这样的列表上都失败了:

L = [[[1, 2, 3], [4, 5]], 6]

所需的输出是

[1, 2, 3, 4, 5, 6]

甚至更好的迭代器。这个问题是我看到的唯一适用于任意嵌套的解决方案:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

flatten(L)

这是最好的模型吗?我有事吗 任何问题?

Yes, I know this subject has been covered before (here, here, here, here), but as far as I know, all solutions, except for one, fail on a list like this:

L = [[[1, 2, 3], [4, 5]], 6]

Where the desired output is

[1, 2, 3, 4, 5, 6]

Or perhaps even better, an iterator. The only solution I saw that works for an arbitrary nesting is found in this question:

def flatten(x):
    result = []
    for el in x:
        if hasattr(el, "__iter__") and not isinstance(el, basestring):
            result.extend(flatten(el))
        else:
            result.append(el)
    return result

flatten(L)

Is this the best model? Did I overlook something? Any problems?


回答 0

使用生成器函数可以使您的示例更易于阅读,并可能提高性能。

Python 2

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

我使用了2.6中添加的Iterable ABC

Python 3

在Python 3中,basestring是没有更多的,但你可以使用一个元组str,并bytes得到同样的效果存在。

yield from运营商从一时间产生一个返回的项目。这句法委派到子发生器在3.3加入

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

Using generator functions can make your example a little easier to read and probably boost the performance.

Python 2

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, basestring):
            for sub in flatten(el):
                yield sub
        else:
            yield el

I used the Iterable ABC added in 2.6.

Python 3

In Python 3, the basestring is no more, but you can use a tuple of str and bytes to get the same effect there.

The yield from operator returns an item from a generator one at a time. This syntax for delegating to a subgenerator was added in 3.3

def flatten(l):
    for el in l:
        if isinstance(el, collections.Iterable) and not isinstance(el, (str, bytes)):
            yield from flatten(el)
        else:
            yield el

回答 1

我的解决方案:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

更加简洁,但几乎相同。

My solution:

import collections


def flatten(x):
    if isinstance(x, collections.Iterable):
        return [a for i in x for a in flatten(i)]
    else:
        return [x]

A little more concise, but pretty much the same.


回答 2

使用递归和鸭子类型生成器(针对Python 3更新):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

Generator using recursion and duck typing (updated for Python 3):

def flatten(L):
    for item in L:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

list(flatten([[[1, 2, 3], [4, 5]], 6]))
>>>[1, 2, 3, 4, 5, 6]

回答 3

@unutbu的非递归解决方案的生成器版本,由@Andrew在注释中要求:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

此生成器的简化版本:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

Generator version of @unutbu’s non-recursive solution, as requested by @Andrew in a comment:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    i = 0
    while i < len(l):
        while isinstance(l[i], ltypes):
            if not l[i]:
                l.pop(i)
                i -= 1
                break
            else:
                l[i:i + 1] = l[i]
        yield l[i]
        i += 1

Slightly simplified version of this generator:

def genflat(l, ltypes=collections.Sequence):
    l = list(l)
    while l:
        while l and isinstance(l[0], ltypes):
            l[0:1] = l[0]
        if l: yield l.pop(0)

回答 4

这是我的功能性版本的递归展平,它既处理元组又处理列表,并允许您引入位置参数的任何组合。返回一个生成器,该生成器按arg由arg的顺序生成整个序列:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

用法:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Here is my functional version of recursive flatten which handles both tuples and lists, and lets you throw in any mix of positional arguments. Returns a generator which produces the entire sequence in order, arg by arg:

flatten = lambda *n: (e for a in n
    for e in (flatten(*a) if isinstance(a, (tuple, list)) else (a,)))

Usage:

l1 = ['a', ['b', ('c', 'd')]]
l2 = [0, 1, (2, 3), [[4, 5, (6, 7, (8,), [9]), 10]], (11,)]
print list(flatten(l1, -2, -1, l2))
['a', 'b', 'c', 'd', -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

回答 5

此版本的版本flatten避免了python的递归限制(因此可用于任意深度的嵌套可迭代对象)。它是一个生成器,可以处理字符串和任意可迭代(甚至是无限的)。

import itertools as IT
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        first = next(remainder)
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = IT.chain(first, remainder)
        else:
            yield first

以下是一些示例说明其用法:

print(list(IT.islice(flatten(IT.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(IT.islice(flatten(IT.chain(IT.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       IT.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

尽管flatten可以处理无限生成器,但不能处理无限嵌套:

def infinitely_nested():
    while True:
        yield IT.chain(infinitely_nested(), IT.repeat(1))

print(list(IT.islice(flatten(infinitely_nested()), 10)))
# hangs

This version of flatten avoids python’s recursion limit (and thus works with arbitrarily deep, nested iterables). It is a generator which can handle strings and arbitrary iterables (even infinite ones).

import itertools as IT
import collections

def flatten(iterable, ltypes=collections.Iterable):
    remainder = iter(iterable)
    while True:
        first = next(remainder)
        if isinstance(first, ltypes) and not isinstance(first, (str, bytes)):
            remainder = IT.chain(first, remainder)
        else:
            yield first

Here are some examples demonstrating its use:

print(list(IT.islice(flatten(IT.repeat(1)),10)))
# [1, 1, 1, 1, 1, 1, 1, 1, 1, 1]

print(list(IT.islice(flatten(IT.chain(IT.repeat(2,3),
                                       {10,20,30},
                                       'foo bar'.split(),
                                       IT.repeat(1),)),10)))
# [2, 2, 2, 10, 20, 30, 'foo', 'bar', 1, 1]

print(list(flatten([[1,2,[3,4]]])))
# [1, 2, 3, 4]

seq = ([[chr(i),chr(i-32)] for i in range(ord('a'), ord('z')+1)] + list(range(0,9)))
print(list(flatten(seq)))
# ['a', 'A', 'b', 'B', 'c', 'C', 'd', 'D', 'e', 'E', 'f', 'F', 'g', 'G', 'h', 'H',
# 'i', 'I', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', 'N', 'o', 'O', 'p', 'P',
# 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'v', 'V', 'w', 'W', 'x', 'X',
# 'y', 'Y', 'z', 'Z', 0, 1, 2, 3, 4, 5, 6, 7, 8]

Although flatten can handle infinite generators, it can not handle infinite nesting:

def infinitely_nested():
    while True:
        yield IT.chain(infinitely_nested(), IT.repeat(1))

print(list(IT.islice(flatten(infinitely_nested()), 10)))
# hangs

回答 6

这是另一个更有趣的答案…

import re

def Flatten(TheList):
    a = str(TheList)
    b,crap = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

基本上,它将嵌套列表转换为字符串,使用正则表达式去除嵌套语法,然后将结果转换回(扁平化的)列表。

Here’s another answer that is even more interesting…

import re

def Flatten(TheList):
    a = str(TheList)
    b,crap = re.subn(r'[\[,\]]', ' ', a)
    c = b.split()
    d = [int(x) for x in c]

    return(d)

Basically, it converts the nested list to a string, uses a regex to strip out the nested syntax, and then converts the result back to a (flattened) list.


回答 7

def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res
def flatten(xs):
    res = []
    def loop(ys):
        for i in ys:
            if isinstance(i, list):
                loop(i)
            else:
                res.append(i)
    loop(xs)
    return res

回答 8

您可以deepflatten在第三方套餐中使用iteration_utilities

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

这是一个迭代器,因此您需要对其进行迭代(例如,通过将其包装list或在循环中使用)。在内部,它使用迭代方法而不是递归方法,并且将其编写为C扩展,因此它可以比纯python方法更快:

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://stackoverflow.com/a/2158532/5393381
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://stackoverflow.com/a/2158522/5393381
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://stackoverflow.com/a/2159079/5393381
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

我是iteration_utilities图书馆的作者。

You could use deepflatten from the 3rd party package iteration_utilities:

>>> from iteration_utilities import deepflatten
>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(deepflatten(L))
[1, 2, 3, 4, 5, 6]

>>> list(deepflatten(L, types=list))  # only flatten "inner" lists
[1, 2, 3, 4, 5, 6]

It’s an iterator so you need to iterate it (for example by wrapping it with list or using it in a loop). Internally it uses an iterative approach instead of an recursive approach and it’s written as C extension so it can be faster than pure python approaches:

>>> %timeit list(deepflatten(L))
12.6 µs ± 298 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
>>> %timeit list(deepflatten(L, types=list))
8.7 µs ± 139 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

>>> %timeit list(flatten(L))   # Cristian - Python 3.x approach from https://stackoverflow.com/a/2158532/5393381
86.4 µs ± 4.42 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(flatten(L))   # Josh Lee - https://stackoverflow.com/a/2158522/5393381
107 µs ± 2.99 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

>>> %timeit list(genflat(L, list))  # Alex Martelli - https://stackoverflow.com/a/2159079/5393381
23.1 µs ± 710 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I’m the author of the iteration_utilities library.


回答 9

尝试创建一个可以平化Python中不规则列表的函数很有趣,但是当然这就是Python的目的(使编程变得有趣)。以下生成器在某些警告方面工作得很好:

def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable

这将压扁的数据类型,你可能想独自离开(比如bytearraybytesstr对象)。此外,代码还依赖于以下事实:从不可迭代的对象请求迭代器会引发TypeError

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable


>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>>

编辑:

我不同意以前的实现。问题在于您不应该将无法迭代的东西弄平。这令人困惑,并给人以错误的印象。

>>> list(flatten(123))
[123]
>>>

下面的生成器与第一个生成器几乎相同,但是不存在试图展平不可迭代对象的问题。当给它一个不适当的参数时,它会像人们期望的那样失败。

def flatten(iterable):
    for item in iterable:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

使用提供的列表对生成器进行测试可以正常工作。但是,TypeError当给它一个不可迭代的对象时,新代码将引发一个。下面显示了新行为的示例。

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(123))
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    list(flatten(123))
  File "<pyshell#27>", line 2, in flatten
    for item in iterable:
TypeError: 'int' object is not iterable
>>>

It was fun trying to create a function that could flatten irregular list in Python, but of course that is what Python is for (to make programming fun). The following generator works fairly well with some caveats:

def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable

It will flatten datatypes that you might want left alone (like bytearray, bytes, and str objects). Also, the code relies on the fact that requesting an iterator from a non-iterable raises a TypeError.

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> def flatten(iterable):
    try:
        for item in iterable:
            yield from flatten(item)
    except TypeError:
        yield iterable


>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>>

Edit:

I disagree with the previous implementation. The problem is that you should not be able to flatten something that is not an iterable. It is confusing and gives the wrong impression of the argument.

>>> list(flatten(123))
[123]
>>>

The following generator is almost the same as the first but does not have the problem of trying to flatten a non-iterable object. It fails as one would expect when an inappropriate argument is given to it.

def flatten(iterable):
    for item in iterable:
        try:
            yield from flatten(item)
        except TypeError:
            yield item

Testing the generator works fine with the list that was provided. However, the new code will raise a TypeError when a non-iterable object is given to it. Example are shown below of the new behavior.

>>> L = [[[1, 2, 3], [4, 5]], 6]
>>> list(flatten(L))
[1, 2, 3, 4, 5, 6]
>>> list(flatten(123))
Traceback (most recent call last):
  File "<pyshell#32>", line 1, in <module>
    list(flatten(123))
  File "<pyshell#27>", line 2, in flatten
    for item in iterable:
TypeError: 'int' object is not iterable
>>>

回答 10

尽管选择了一个优雅且非常Python化的答案,但我仅出于审查目的而提出我的解决方案:

def flat(l):
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flat(i))
        else:
            ret.append(i)
    return ret

请告诉我们这段代码的好坏?

Although an elegant and very pythonic answer has been selected I would present my solution just for the review:

def flat(l):
    ret = []
    for i in l:
        if isinstance(i, list) or isinstance(i, tuple):
            ret.extend(flat(i))
        else:
            ret.append(i)
    return ret

Please tell how good or bad this code is?


回答 11

我喜欢简单的答案。没有生成器。没有递归或递归限制。只是迭代:

def flatten(TheList):
    listIsNested = True

    while listIsNested:                 #outer loop
        keepChecking = False
        Temp = []

        for element in TheList:         #inner loop
            if isinstance(element,list):
                Temp.extend(element)
                keepChecking = True
            else:
                Temp.append(element)

        listIsNested = keepChecking     #determine if outer loop exits
        TheList = Temp[:]

    return TheList

这适用于两个列表:内部for循环和外部while循环。

内部的for循环遍历列表。如果找到列表元素,则(1)使用list.extend()展平该部分嵌套的层次,并且(2)将keepChecking切换为True。keepchecking用于控制外部while循环。如果将外部循环设置为true,则会触发内部循环进行另一遍处理。

这些通行证一直发生,直到找不到更多的嵌套列表。当最后一次通过但找不到任何地方的传递时,keepChecking永远不会变为true,这意味着listIsNested保持为false,而外部while循环退出。

然后返回扁平化列表。

测试运行

flatten([1,2,3,4,[100,200,300,[1000,2000,3000]]])

[1, 2, 3, 4, 100, 200, 300, 1000, 2000, 3000]

I prefer simple answers. No generators. No recursion or recursion limits. Just iteration:

def flatten(TheList):
    listIsNested = True

    while listIsNested:                 #outer loop
        keepChecking = False
        Temp = []

        for element in TheList:         #inner loop
            if isinstance(element,list):
                Temp.extend(element)
                keepChecking = True
            else:
                Temp.append(element)

        listIsNested = keepChecking     #determine if outer loop exits
        TheList = Temp[:]

    return TheList

This works with two lists: an inner for loop and an outer while loop.

The inner for loop iterates through the list. If it finds a list element, it (1) uses list.extend() to flatten that part one level of nesting and (2) switches keepChecking to True. keepchecking is used to control the outer while loop. If the outer loop gets set to true, it triggers the inner loop for another pass.

Those passes keep happening until no more nested lists are found. When a pass finally occurs where none are found, keepChecking never gets tripped to true, which means listIsNested stays false and the outer while loop exits.

The flattened list is then returned.

Test-run

flatten([1,2,3,4,[100,200,300,[1000,2000,3000]]])

[1, 2, 3, 4, 100, 200, 300, 1000, 2000, 3000]


回答 12

这是一个简单的函数,可以平铺任意深度的列表。没有递归,以避免堆栈溢出。

from copy import deepcopy

def flatten_list(nested_list):
    """Flatten an arbitrarily nested list, without recursion (to avoid
    stack overflows). Returns a new list, the original list is unchanged.

    >> list(flatten_list([1, 2, 3, [4], [], [[[[[[[[[5]]]]]]]]]]))
    [1, 2, 3, 4, 5]
    >> list(flatten_list([[1, 2], 3]))
    [1, 2, 3]

    """
    nested_list = deepcopy(nested_list)

    while nested_list:
        sublist = nested_list.pop(0)

        if isinstance(sublist, list):
            nested_list = sublist + nested_list
        else:
            yield sublist

Here’s a simple function that flattens lists of arbitrary depth. No recursion, to avoid stack overflow.

from copy import deepcopy

def flatten_list(nested_list):
    """Flatten an arbitrarily nested list, without recursion (to avoid
    stack overflows). Returns a new list, the original list is unchanged.

    >> list(flatten_list([1, 2, 3, [4], [], [[[[[[[[[5]]]]]]]]]]))
    [1, 2, 3, 4, 5]
    >> list(flatten_list([[1, 2], 3]))
    [1, 2, 3]

    """
    nested_list = deepcopy(nested_list)

    while nested_list:
        sublist = nested_list.pop(0)

        if isinstance(sublist, list):
            nested_list = sublist + nested_list
        else:
            yield sublist

回答 13

我很惊讶没有人想到这一点。该死的递归我没有这里的高级人员做出的递归答案。无论如何,这是我的尝试。请注意,这是非常特定于OP的用例的

import re

L = [[[1, 2, 3], [4, 5]], 6]
flattened_list = re.sub("[\[\]]", "", str(L)).replace(" ", "").split(",")
new_list = list(map(int, flattened_list))
print(new_list)

输出:

[1, 2, 3, 4, 5, 6]

I’m surprised no one has thought of this. Damn recursion I don’t get the recursive answers that the advanced people here made. anyway here is my attempt on this. caveat is it’s very specific to the OP’s use case

import re

L = [[[1, 2, 3], [4, 5]], 6]
flattened_list = re.sub("[\[\]]", "", str(L)).replace(" ", "").split(",")
new_list = list(map(int, flattened_list))
print(new_list)

output:

[1, 2, 3, 4, 5, 6]

回答 14

我没有在这里浏览所有已经可用的答案,但这是我想到的一个衬里,它借鉴了Lisp的第一张清单和其余清单的处理方式

def flatten(l): return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]

这是一种简单而又不太简单的情况-

>>> flatten([1,[2,3],4])
[1, 2, 3, 4]

>>> flatten([1, [2, 3], 4, [5, [6, {'name': 'some_name', 'age':30}, 7]], [8, 9, [10, [11, [12, [13, {'some', 'set'}, 14, [15, 'some_string'], 16], 17, 18], 19], 20], 21, 22, [23, 24], 25], 26, 27, 28, 29, 30])
[1, 2, 3, 4, 5, 6, {'age': 30, 'name': 'some_name'}, 7, 8, 9, 10, 11, 12, 13, set(['set', 'some']), 14, 15, 'some_string', 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> 

I didn’t go through all the already available answers here, but here is a one liner I came up with, borrowing from lisp’s way of first and rest list processing

def flatten(l): return flatten(l[0]) + (flatten(l[1:]) if len(l) > 1 else []) if type(l) is list else [l]

here is one simple and one not-so-simple case –

>>> flatten([1,[2,3],4])
[1, 2, 3, 4]

>>> flatten([1, [2, 3], 4, [5, [6, {'name': 'some_name', 'age':30}, 7]], [8, 9, [10, [11, [12, [13, {'some', 'set'}, 14, [15, 'some_string'], 16], 17, 18], 19], 20], 21, 22, [23, 24], 25], 26, 27, 28, 29, 30])
[1, 2, 3, 4, 5, 6, {'age': 30, 'name': 'some_name'}, 7, 8, 9, 10, 11, 12, 13, set(['set', 'some']), 14, 15, 'some_string', 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30]
>>> 

回答 15

当试图回答这样的问题时,您确实需要给出您提议作为解决方案的代码的限制。如果只考虑性能,我不会太在意,但是提议作为解决方案的大多数代码(包括可接受的答案)都无法使深度大于1000的列表变平。

当我说大多数代码我指的是所有使用任何形式的递归的代码(或调用递归的标准库函数)。所有这些代码都会失败,因为对于每个递归调用,(调用)堆栈都增加一个单位,而(默认)python调用堆栈的大小为1000。

如果您不太熟悉调用堆栈,那么以下内容可能会有所帮助(否则,您可以滚动到Implementation)。

调用堆栈大小和递归编程(类似于地下城)

寻找宝藏并退出

想象一下,您进入一个带编号房间的巨大地牢,寻找宝藏。您不知道这个地方,但是对于如何找到宝藏有一些指示。每个指示都是一个谜(难度各不相同,但是您无法预测它们的难易程度)。您决定对节省时间的策略进行一点思考,然后进行两个观察:

  1. 很难(很长)找到宝藏,因为您必须解决(可能很难)谜团才能到达那里。
  2. 找到宝藏后,返回入口可能很容易,您只需要在另一个方向上使用相同的路径即可(尽管这需要一点记忆才能调用您的路径)。

进入地牢时,您会在这里注意到一个小笔记本。您决定使用它来写下谜题(当进入新房间时)之后退出的每个房间,这样您就可以返回到入口。那是个天才的主意,您甚至都不会花一分钱实施自己的策略。

您进入了地牢,成功地解决了前1001个难题,但是这是您未曾计划的事情,您借用的笔记本中没有剩余空间。您决定放弃自己的任务,因为您更喜欢没有宝物,而不是永远迷失在地牢中(确实看起来很聪明)。

执行递归程序

基本上,这与寻找宝藏完全相同。地牢是计算机的内存,您现在的目标不是找到宝藏,而是要计算某些函数(对于给定x找到f(x))。这些指示只是子例程,可以帮助您解决f(x)。您的策略与调用堆栈策略相同,笔记本是堆栈,房间是函数的返回地址:

x = ["over here", "am", "I"]
y = sorted(x) # You're about to enter a room named `sorted`, note down the current room address here so you can return back: 0x4004f4 (that room address looks weird)
# Seems like you went back from your quest using the return address 0x4004f4
# Let's see what you've collected 
print(' '.join(y))

您在地牢中遇到的问题在这里将是相同的,调用堆栈的大小是有限的(此处为1000),因此,如果您输入了太多函数而没有返回,则您将填充调用堆栈并出现错误就像一次调用自己-一遍又一遍-),您将一遍又一遍地输入,直到计算完成(直到找到宝藏为止),然后返回,直到返回到调用的位置为止 “亲爱的冒险家,很抱歉,您的笔记本已经满了”:最初的地方。直到最后一次将调用栈从所有返回地址中释放出来之前,调用栈将永远不会被释放。RecursionError: maximum recursion depth exceeded。请注意,您不需要递归即可填充调用堆栈,但是非递归程序调用1000函数而永远不会返回的可能性很小。同样重要的是要了解,从函数返回后,调用栈将从使用的地址中释放出来(因此,名称“栈”,返回地址在进入函数之前就被压入,并在返回时被拉出)。在简单递归的特殊情况下(一个函数ffff

如何避免这个问题?

这实际上很简单:“如果您不知道递归的深度,请不要使用递归”。并非总是如此,因为在某些情况下,可以优化尾调用递归(TCO)。但是在python中,情况并非如此,即使“写得很好”的递归函数也无法优化堆栈的使用。Guido有一个有趣的帖子,关于这个问题:尾递归消除

您可以使用一种技术来迭代任何递归函数,我们可以称之为自带笔记本。例如,在我们的特定情况下,我们只是在探索一个列表,进入一个房间等同于进入一个子列表,您应该问自己的问题是如何从列表返回其父列表?答案并不那么复杂,请重复以下操作,直到stack为空:

  1. 推送当前列表,addressindexstack进入新的子列表时将其推入(请注意,列表地址+索引也是地址,因此我们只使用调用堆栈使用的完全相同的技术);
  2. 每次找到一个项目yield(或将它们添加到列表中);
  3. 完全浏览列表后,请使用stack return address(和index)返回父列表。

还要注意,这等效于树中的DFS,其中某些节点是子列表,A = [1, 2]而有些则是简单项:(0, 1, 2, 3, 4用于L = [0, [1,2], 3, 4])。树看起来像这样:

                    L
                    |
           -------------------
           |     |     |     |
           0   --A--   3     4
               |   |
               1   2

DFS遍历的顺序为:L,0,A,1、2、3、4。请记住,要实现迭代DFS,您还需要“堆栈”。我之前提出的实现导致具有以下状态(针对stackflat_list):

init.:  stack=[(L, 0)]
**0**:  stack=[(L, 0)],         flat_list=[0]
**A**:  stack=[(L, 1), (A, 0)], flat_list=[0]
**1**:  stack=[(L, 1), (A, 0)], flat_list=[0, 1]
**2**:  stack=[(L, 1), (A, 1)], flat_list=[0, 1, 2]
**3**:  stack=[(L, 2)],         flat_list=[0, 1, 2, 3]
**3**:  stack=[(L, 3)],         flat_list=[0, 1, 2, 3, 4]
return: stack=[],               flat_list=[0, 1, 2, 3, 4]

在此示例中,堆栈最大大小为2,因为输入列表(因此树)的深度为2。

实作

对于实现,在python中,您可以使用迭代器而不是简单的列表来简化一点。对(子)迭代器的引用将用于存储子列表的返回地址(而不是同时具有列表地址和索引)。这不是什么大的区别,但是我觉得这更具可读性(并且速度更快):

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:   # post-order
            cursor_stack.pop()
            continue
        if is_list_like(item):  # pre-order
            cursor_stack.append(iter(item))
        elif item is not None:
            yield item          # in-order

def is_list_like(item):
    return isinstance(item, list)

另外,请注意,在is_list_likeI have中isinstance(item, list),可以将其更改为处理更多输入类型,在这里,我只想拥有最简单的版本,其中(可迭代)只是一个列表。但是您也可以这样做:

def is_list_like(item):
    try:
        iter(item)
        return not isinstance(item, str)  # strings are not lists (hmm...) 
    except TypeError:
        return False

flatten_iter([["test", "a"], "b])会将字符串视为“简单项目”,因此将返回["test", "a", "b"]而不是["t", "e", "s", "t", "a", "b"]。请注意,在这种情况下,iter(item)每个项目都会被调用两次,让我们假设这是读者练习此清洁器的一种练习。

测试和评论其他实现

最后,请记住,您不能使用来打印无限嵌套的列表Lprint(L)因为它在内部将使用对__repr__RecursionError: maximum recursion depth exceeded while getting the repr of an object)的递归调用。出于相同的原因,flatten涉及解决方案str将失败,并显示相同的错误消息。

如果您需要测试解决方案,则可以使用此函数生成一个简单的嵌套列表:

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

给出:build_deep_list(5)>>> [4, [3, [2, [1, [0]]]]]

When trying to answer such a question you really need to give the limitations of the code you propose as a solution. If it was only about performances I wouldn’t mind too much, but most of the codes proposed as solution (including the accepted answer) fail to flatten any list that has a depth greater than 1000.

When I say most of the codes I mean all codes that use any form of recursion (or call a standard library function that is recursive). All these codes fail because for every of the recursive call made, the (call) stack grow by one unit, and the (default) python call stack has a size of 1000.

If you’re not too familiar with the call stack, then maybe the following will help (otherwise you can just scroll to the Implementation).

Call stack size and recursive programming (dungeon analogy)

Finding the treasure and exit

Imagine you enter a huge dungeon with numbered rooms, looking for a treasure. You don’t know the place but you have some indications on how to find the treasure. Each indication is a riddle (difficulty varies, but you can’t predict how hard they will be). You decide to think a little bit about a strategy to save time, you make two observations:

  1. It’s hard (long) to find the treasure as you’ll have to solve (potentially hard) riddles to get there.
  2. Once the treasure found, returning to the entrance may be easy, you just have to use the same path in the other direction (though this needs a bit of memory to recall your path).

When entering the dungeon, you notice a small notebook here. You decide to use it to write down every room you exit after solving a riddle (when entering a new room), this way you’ll be able to return back to the entrance. That’s a genius idea, you won’t even spend a cent implementing your strategy.

You enter the dungeon, solving with great success the first 1001 riddles, but here comes something you hadn’t planed, you have no space left in the notebook you borrowed. You decide to abandon your quest as you prefer not having the treasure than being lost forever inside the dungeon (that looks smart indeed).

Executing a recursive program

Basically, it’s the exact same thing as finding the treasure. The dungeon is the computer’s memory, your goal now is not to find a treasure but to compute some function (find f(x) for a given x). The indications simply are sub-routines that will help you solving f(x). Your strategy is the same as the call stack strategy, the notebook is the stack, the rooms are the functions’ return addresses:

x = ["over here", "am", "I"]
y = sorted(x) # You're about to enter a room named `sorted`, note down the current room address here so you can return back: 0x4004f4 (that room address looks weird)
# Seems like you went back from your quest using the return address 0x4004f4
# Let's see what you've collected 
print(' '.join(y))

The problem you encountered in the dungeon will be the same here, the call stack has a finite size (here 1000) and therefore, if you enter too many functions without returning back then you’ll fill the call stack and have an error that look like “Dear adventurer, I’m very sorry but your notebook is full”: RecursionError: maximum recursion depth exceeded. Note that you don’t need recursion to fill the call stack, but it’s very unlikely that a non-recursive program call 1000 functions without ever returning. It’s important to also understand that once you returned from a function, the call stack is freed from the address used (hence the name “stack”, return address are pushed in before entering a function and pulled out when returning). In the special case of a simple recursion (a function f that call itself once — over and over –) you will enter f over and over until the computation is finished (until the treasure is found) and return from f until you go back to the place where you called f in the first place. The call stack will never be freed from anything until the end where it will be freed from all return addresses one after the other.

How to avoid this issue?

That’s actually pretty simple: “don’t use recursion if you don’t know how deep it can go”. That’s not always true as in some cases, Tail Call recursion can be Optimized (TCO). But in python, this is not the case, and even “well written” recursive function will not optimize stack use. There is an interesting post from Guido about this question: Tail Recursion Elimination.

There is a technique that you can use to make any recursive function iterative, this technique we could call bring your own notebook. For example, in our particular case we simply are exploring a list, entering a room is equivalent to entering a sublist, the question you should ask yourself is how can I get back from a list to its parent list? The answer is not that complex, repeat the following until the stack is empty:

  1. push the current list address and index in a stack when entering a new sublist (note that a list address+index is also an address, therefore we just use the exact same technique used by the call stack);
  2. every time an item is found, yield it (or add them in a list);
  3. once a list is fully explored, go back to the parent list using the stack return address (and index).

Also note that this is equivalent to a DFS in a tree where some nodes are sublists A = [1, 2] and some are simple items: 0, 1, 2, 3, 4 (for L = [0, [1,2], 3, 4]). The tree looks like this:

                    L
                    |
           -------------------
           |     |     |     |
           0   --A--   3     4
               |   |
               1   2

The DFS traversal pre-order is: L, 0, A, 1, 2, 3, 4. Remember, in order to implement an iterative DFS you also “need” a stack. The implementation I proposed before result in having the following states (for the stack and the flat_list):

init.:  stack=[(L, 0)]
**0**:  stack=[(L, 0)],         flat_list=[0]
**A**:  stack=[(L, 1), (A, 0)], flat_list=[0]
**1**:  stack=[(L, 1), (A, 0)], flat_list=[0, 1]
**2**:  stack=[(L, 1), (A, 1)], flat_list=[0, 1, 2]
**3**:  stack=[(L, 2)],         flat_list=[0, 1, 2, 3]
**3**:  stack=[(L, 3)],         flat_list=[0, 1, 2, 3, 4]
return: stack=[],               flat_list=[0, 1, 2, 3, 4]

In this example, the stack maximum size is 2, because the input list (and therefore the tree) have depth 2.

Implementation

For the implementation, in python you can simplify a little bit by using iterators instead of simple lists. References to the (sub)iterators will be used to store sublists return addresses (instead of having both the list address and the index). This is not a big difference but I feel this is more readable (and also a bit faster):

def flatten(iterable):
    return list(items_from(iterable))

def items_from(iterable):
    cursor_stack = [iter(iterable)]
    while cursor_stack:
        sub_iterable = cursor_stack[-1]
        try:
            item = next(sub_iterable)
        except StopIteration:   # post-order
            cursor_stack.pop()
            continue
        if is_list_like(item):  # pre-order
            cursor_stack.append(iter(item))
        elif item is not None:
            yield item          # in-order

def is_list_like(item):
    return isinstance(item, list)

Also, notice that in is_list_like I have isinstance(item, list), which could be changed to handle more input types, here I just wanted to have the simplest version where (iterable) is just a list. But you could also do that:

def is_list_like(item):
    try:
        iter(item)
        return not isinstance(item, str)  # strings are not lists (hmm...) 
    except TypeError:
        return False

This considers strings as “simple items” and therefore flatten_iter([["test", "a"], "b]) will return ["test", "a", "b"] and not ["t", "e", "s", "t", "a", "b"]. Remark that in that case, iter(item) is called twice on each item, let’s pretend it’s an exercise for the reader to make this cleaner.

Testing and remarks on other implementations

In the end, remember that you can’t print a infinitely nested list L using print(L) because internally it will use recursive calls to __repr__ (RecursionError: maximum recursion depth exceeded while getting the repr of an object). For the same reason, solutions to flatten involving str will fail with the same error message.

If you need to test your solution, you can use this function to generate a simple nested list:

def build_deep_list(depth):
    """Returns a list of the form $l_{depth} = [depth-1, l_{depth-1}]$
    with $depth > 1$ and $l_0 = [0]$.
    """
    sub_list = [0]
    for d in range(1, depth):
        sub_list = [d, sub_list]
    return sub_list

Which gives: build_deep_list(5) >>> [4, [3, [2, [1, [0]]]]].


回答 16

这是compiler.ast.flatten2.7.5中的实现:

def flatten(seq):
    l = []
    for elt in seq:
        t = type(elt)
        if t is tuple or t is list:
            for elt2 in flatten(elt):
                l.append(elt2)
        else:
            l.append(elt)
    return l

有更好,更快的方法(如果您已经到达这里,您已经看到了它们)

另请注意:

自2.6版起弃用:编译器软件包已在Python 3中删除。

Here’s the compiler.ast.flatten implementation in 2.7.5:

def flatten(seq):
    l = []
    for elt in seq:
        t = type(elt)
        if t is tuple or t is list:
            for elt2 in flatten(elt):
                l.append(elt2)
        else:
            l.append(elt)
    return l

There are better, faster methods (If you’ve reached here, you have seen them already)

Also note:

Deprecated since version 2.6: The compiler package has been removed in Python 3.


回答 17

完全hacky,但我认为它可以工作(取决于您的data_type)

flat_list = ast.literal_eval("[%s]"%re.sub("[\[\]]","",str(the_list)))

totally hacky but I think it would work (depending on your data_type)

flat_list = ast.literal_eval("[%s]"%re.sub("[\[\]]","",str(the_list)))

回答 18

只需使用一个funcy库: pip install funcy

import funcy


funcy.flatten([[[[1, 1], 1], 2], 3]) # returns generator
funcy.lflatten([[[[1, 1], 1], 2], 3]) # returns list

Just use a funcy library: pip install funcy

import funcy


funcy.flatten([[[[1, 1], 1], 2], 3]) # returns generator
funcy.lflatten([[[[1, 1], 1], 2], 3]) # returns list

回答 19

这是另一种py2方法,我不确定它是最快还是最优雅也不最安全…

from collections import Iterable
from itertools import imap, repeat, chain


def flat(seqs, ignore=(int, long, float, basestring)):
    return repeat(seqs, 1) if any(imap(isinstance, repeat(seqs), ignore)) or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

它可以忽略您想要的任何特定(或派生)类型,它返回一个迭代器,因此您可以将其转换为任何特定的容器(例如list,tuple,dict或仅使用它)以减少内存占用,无论是好是坏它可以处理初始的不可迭代对象,例如int …

请注意,大多数繁重的工作都是在C中完成的,因为据我所知,这是itertools的实现方式,因此尽管是递归的,但AFAIK并不受python递归深度的限制,因为函数调用发生在C中,尽管这样做并不意味着您会受到内存的限制,特别是在OS X中,从今天开始,它的堆栈大小有了硬限制(OS X Mavericks)…

有一种稍微快一点的方法,但可移植性较低的方法,只有在可以假定可以明确确定输入的基本元素的情况下,才使用它,否则,将获得无限递归,并且具有有限堆栈大小的OS X将很快地引发细分错误…

def flat(seqs, ignore={int, long, float, str, unicode}):
    return repeat(seqs, 1) if type(seqs) in ignore or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

在这里,我们使用集合来检查类型,因此需要O(1)与O(类型数)来检查是否应忽略某个元素,尽管任何具有声明的被忽略类型的派生类型的值都将失败,这就是为什么要使用它strunicode因此请谨慎使用…

测试:

import random

def test_flat(test_size=2000):
    def increase_depth(value, depth=1):
        for func in xrange(depth):
            value = repeat(value, 1)
        return value

    def random_sub_chaining(nested_values):
        for values in nested_values:
            yield chain((values,), chain.from_iterable(imap(next, repeat(nested_values, random.randint(1, 10)))))

    expected_values = zip(xrange(test_size), imap(str, xrange(test_size)))
    nested_values = random_sub_chaining((increase_depth(value, depth) for depth, value in enumerate(expected_values)))
    assert not any(imap(cmp, chain.from_iterable(expected_values), flat(chain(((),), nested_values, ((),)))))

>>> test_flat()
>>> list(flat([[[1, 2, 3], [4, 5]], 6]))
[1, 2, 3, 4, 5, 6]
>>>  

$ uname -a
Darwin Samys-MacBook-Pro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ python --version
Python 2.7.5

Here is another py2 approach, Im not sure if its the fastest or the most elegant nor safest …

from collections import Iterable
from itertools import imap, repeat, chain


def flat(seqs, ignore=(int, long, float, basestring)):
    return repeat(seqs, 1) if any(imap(isinstance, repeat(seqs), ignore)) or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

It can ignore any specific (or derived) type you would like, it returns an iterator, so you can convert it to any specific container such as list, tuple, dict or simply consume it in order to reduce memory footprint, for better or worse it can handle initial non-iterable objects such as int …

Note most of the heavy lifting is done in C, since as far as I know thats how itertools are implemented, so while it is recursive, AFAIK it isn’t bounded by python recursion depth since the function calls are happening in C, though this doesn’t mean you are bounded by memory, specially in OS X where its stack size has a hard limit as of today (OS X Mavericks) …

there is a slightly faster approach, but less portable method, only use it if you can assume that the base elements of the input can be explicitly determined otherwise, you’ll get an infinite recursion, and OS X with its limited stack size, will throw a segmentation fault fairly quickly …

def flat(seqs, ignore={int, long, float, str, unicode}):
    return repeat(seqs, 1) if type(seqs) in ignore or not isinstance(seqs, Iterable) else chain.from_iterable(imap(flat, seqs))

here we are using sets to check for the type so it takes O(1) vs O(number of types) to check whether or not an element should be ignored, though of course any value with derived type of the stated ignored types will fail, this is why its using str, unicode so use it with caution …

tests:

import random

def test_flat(test_size=2000):
    def increase_depth(value, depth=1):
        for func in xrange(depth):
            value = repeat(value, 1)
        return value

    def random_sub_chaining(nested_values):
        for values in nested_values:
            yield chain((values,), chain.from_iterable(imap(next, repeat(nested_values, random.randint(1, 10)))))

    expected_values = zip(xrange(test_size), imap(str, xrange(test_size)))
    nested_values = random_sub_chaining((increase_depth(value, depth) for depth, value in enumerate(expected_values)))
    assert not any(imap(cmp, chain.from_iterable(expected_values), flat(chain(((),), nested_values, ((),)))))

>>> test_flat()
>>> list(flat([[[1, 2, 3], [4, 5]], 6]))
[1, 2, 3, 4, 5, 6]
>>>  

$ uname -a
Darwin Samys-MacBook-Pro.local 13.3.0 Darwin Kernel Version 13.3.0: Tue Jun  3 21:27:35 PDT 2014; root:xnu-2422.110.17~1/RELEASE_X86_64 x86_64
$ python --version
Python 2.7.5

回答 20

不使用任何库:

def flat(l):
    def _flat(l, r):    
        if type(l) is not list:
            r.append(l)
        else:
            for i in l:
                r = r + flat(i)
        return r
    return _flat(l, [])



# example
test = [[1], [[2]], [3], [['a','b','c'] , [['z','x','y']], ['d','f','g']], 4]    
print flat(test) # prints [1, 2, 3, 'a', 'b', 'c', 'z', 'x', 'y', 'd', 'f', 'g', 4]

Without using any library:

def flat(l):
    def _flat(l, r):    
        if type(l) is not list:
            r.append(l)
        else:
            for i in l:
                r = r + flat(i)
        return r
    return _flat(l, [])



# example
test = [[1], [[2]], [3], [['a','b','c'] , [['z','x','y']], ['d','f','g']], 4]    
print flat(test) # prints [1, 2, 3, 'a', 'b', 'c', 'z', 'x', 'y', 'd', 'f', 'g', 4]

回答 21

使用itertools.chain

import itertools
from collections import Iterable

def list_flatten(lst):
    flat_lst = []
    for item in itertools.chain(lst):
        if isinstance(item, Iterable):
            item = list_flatten(item)
            flat_lst.extend(item)
        else:
            flat_lst.append(item)
    return flat_lst

或不链接:

def flatten(q, final):
    if not q:
        return
    if isinstance(q, list):
        if not isinstance(q[0], list):
            final.append(q[0])
        else:
            flatten(q[0], final)
        flatten(q[1:], final)
    else:
        final.append(q)

Using itertools.chain:

import itertools
from collections import Iterable

def list_flatten(lst):
    flat_lst = []
    for item in itertools.chain(lst):
        if isinstance(item, Iterable):
            item = list_flatten(item)
            flat_lst.extend(item)
        else:
            flat_lst.append(item)
    return flat_lst

Or without chaining:

def flatten(q, final):
    if not q:
        return
    if isinstance(q, list):
        if not isinstance(q[0], list):
            final.append(q[0])
        else:
            flatten(q[0], final)
        flatten(q[1:], final)
    else:
        final.append(q)

回答 22

我使用递归来解决任何深度的嵌套列表

def combine_nlist(nlist,init=0,combiner=lambda x,y: x+y):
    '''
    apply function: combiner to a nested list element by element(treated as flatten list)
    '''
    current_value=init
    for each_item in nlist:
        if isinstance(each_item,list):
            current_value =combine_nlist(each_item,current_value,combiner)
        else:
            current_value = combiner(current_value,each_item)
    return current_value

因此,在定义函数combin_nlist之后,就很容易使用此函数进行展平。或者,您可以将其组合为一个功能。我喜欢我的解决方案,因为它可以应用于任何嵌套列表。

def flatten_nlist(nlist):
    return combine_nlist(nlist,[],lambda x,y:x+[y])

结果

In [379]: flatten_nlist([1,2,3,[4,5],[6],[[[7],8],9],10])
Out[379]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

I used recursive to solve nested list with any depth

def combine_nlist(nlist,init=0,combiner=lambda x,y: x+y):
    '''
    apply function: combiner to a nested list element by element(treated as flatten list)
    '''
    current_value=init
    for each_item in nlist:
        if isinstance(each_item,list):
            current_value =combine_nlist(each_item,current_value,combiner)
        else:
            current_value = combiner(current_value,each_item)
    return current_value

So after i define function combine_nlist, it is easy to use this function do flatting. Or you can combine it into one function. I like my solution because it can be applied to any nested list.

def flatten_nlist(nlist):
    return combine_nlist(nlist,[],lambda x,y:x+[y])

result

In [379]: flatten_nlist([1,2,3,[4,5],[6],[[[7],8],9],10])
Out[379]: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]

回答 23

最简单的方法是使用变身利用图书馆pip install morph

代码是:

import morph

list = [[[1, 2, 3], [4, 5]], 6]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 4, 5, 6]

The easiest way is to use the morph library using pip install morph.

The code is:

import morph

list = [[[1, 2, 3], [4, 5]], 6]
flattened_list = morph.flatten(list)  # returns [1, 2, 3, 4, 5, 6]

回答 24

我知道已经有很多很棒的答案,但是我想添加一个使用功能性编程方法解决问题的答案。在这个答案中,我使用了双重递归:

def flatten_list(seq):
    if not seq:
        return []
    elif isinstance(seq[0],list):
        return (flatten_list(seq[0])+flatten_list(seq[1:]))
    else:
        return [seq[0]]+flatten_list(seq[1:])

print(flatten_list([1,2,[3,[4],5],[6,7]]))

输出:

[1, 2, 3, 4, 5, 6, 7]

I am aware that there are already many awesome answers but i wanted to add an answer that uses the functional programming method of solving the question. In this answer i make use of double recursion :

def flatten_list(seq):
    if not seq:
        return []
    elif isinstance(seq[0],list):
        return (flatten_list(seq[0])+flatten_list(seq[1:]))
    else:
        return [seq[0]]+flatten_list(seq[1:])

print(flatten_list([1,2,[3,[4],5],[6,7]]))

output:

[1, 2, 3, 4, 5, 6, 7]

回答 25

我不确定这是否一定更快或更有效,但这是我要做的:

def flatten(lst):
    return eval('[' + str(lst).replace('[', '').replace(']', '') + ']')

L = [[[1, 2, 3], [4, 5]], 6]
print(flatten(L))

flatten这里的函数将列表转换为字符串,取出所有方括号,将方括号附加到两端,然后将其重新转换为列表。

虽然,如果您知道列表中的方括号中包含字符串,例如[[1, 2], "[3, 4] and [5]"],则您需要做其他事情。

I’m not sure if this is necessarily quicker or more effective, but this is what I do:

def flatten(lst):
    return eval('[' + str(lst).replace('[', '').replace(']', '') + ']')

L = [[[1, 2, 3], [4, 5]], 6]
print(flatten(L))

The flatten function here turns the list into a string, takes out all of the square brackets, attaches square brackets back onto the ends, and turns it back into a list.

Although, if you knew you would have square brackets in your list in strings, like [[1, 2], "[3, 4] and [5]"], you would have to do something else.


回答 26

这是在python2上进行flatten的简单实现

flatten=lambda l: reduce(lambda x,y:x+y,map(flatten,l),[]) if isinstance(l,list) else [l]

test=[[1,2,3,[3,4,5],[6,7,[8,9,[10,[11,[12,13,14]]]]]],]
print flatten(test)

#output [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

This is a simple implement of flatten on python2

flatten=lambda l: reduce(lambda x,y:x+y,map(flatten,l),[]) if isinstance(l,list) else [l]

test=[[1,2,3,[3,4,5],[6,7,[8,9,[10,[11,[12,13,14]]]]]],]
print flatten(test)

#output [1, 2, 3, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14]

回答 27

这将使列表或字典(或列表列表或字典的字典等)变平。它假定值是字符串,并创建一个字符串,该字符串将每个项目与分隔符参数连接在一起。如果需要,可以使用分隔符随后将结果拆分为列表对象。如果下一个值是列表或字符串,则使用递归。使用key参数来告诉您要使用字典对象的键还是值(将key设置为false)。

def flatten_obj(n_obj, key=True, my_sep=''):
    my_string = ''
    if type(n_obj) == list:
        for val in n_obj:
            my_sep_setter = my_sep if my_string != '' else ''
            if type(val) == list or type(val) == dict:
                my_string += my_sep_setter + flatten_obj(val, key, my_sep)
            else:
                my_string += my_sep_setter + val
    elif type(n_obj) == dict:
        for k, v in n_obj.items():
            my_sep_setter = my_sep if my_string != '' else ''
            d_val = k if key else v
            if type(v) == list or type(v) == dict:
                my_string += my_sep_setter + flatten_obj(v, key, my_sep)
            else:
                my_string += my_sep_setter + d_val
    elif type(n_obj) == str:
        my_sep_setter = my_sep if my_string != '' else ''
        my_string += my_sep_setter + n_obj
        return my_string
    return my_string

print(flatten_obj(['just', 'a', ['test', 'to', 'try'], 'right', 'now', ['or', 'later', 'today'],
                [{'dictionary_test': 'test'}, {'dictionary_test_two': 'later_today'}, 'my power is 9000']], my_sep=', ')

Yield:

just, a, test, to, try, right, now, or, later, today, dictionary_test, dictionary_test_two, my power is 9000

This will flatten a list or dictionary (or list of lists or dictionaries of dictionaries etc). It assumes that the values are strings and it creates a string that concatenates each item with a separator argument. If you wanted you could use the separator to split the result into a list object afterward. It uses recursion if the next value is a list or a string. Use the key argument to tell whether you want the keys or the values (set key to false) from the dictionary object.

def flatten_obj(n_obj, key=True, my_sep=''):
    my_string = ''
    if type(n_obj) == list:
        for val in n_obj:
            my_sep_setter = my_sep if my_string != '' else ''
            if type(val) == list or type(val) == dict:
                my_string += my_sep_setter + flatten_obj(val, key, my_sep)
            else:
                my_string += my_sep_setter + val
    elif type(n_obj) == dict:
        for k, v in n_obj.items():
            my_sep_setter = my_sep if my_string != '' else ''
            d_val = k if key else v
            if type(v) == list or type(v) == dict:
                my_string += my_sep_setter + flatten_obj(v, key, my_sep)
            else:
                my_string += my_sep_setter + d_val
    elif type(n_obj) == str:
        my_sep_setter = my_sep if my_string != '' else ''
        my_string += my_sep_setter + n_obj
        return my_string
    return my_string

print(flatten_obj(['just', 'a', ['test', 'to', 'try'], 'right', 'now', ['or', 'later', 'today'],
                [{'dictionary_test': 'test'}, {'dictionary_test_two': 'later_today'}, 'my power is 9000']], my_sep=', ')

yields:

just, a, test, to, try, right, now, or, later, today, dictionary_test, dictionary_test_two, my power is 9000

回答 28

如果您喜欢递归,这可能是您感兴趣的解决方案:

def f(E):
    if E==[]: 
        return []
    elif type(E) != list: 
        return [E]
    else:
        a = f(E[0])
        b = f(E[1:])
        a.extend(b)
        return a

我实际上是从前一段时间写的一些练习Scheme代码中改编而成的。

请享用!

If you like recursion, this might be a solution of interest to you:

def f(E):
    if E==[]: 
        return []
    elif type(E) != list: 
        return [E]
    else:
        a = f(E[0])
        b = f(E[1:])
        a.extend(b)
        return a

I actually adapted this from some practice Scheme code that I had written a while back.

Enjoy!


回答 29

我是python的新手,来自Lisp背景。这是我想出的(查看lulz的var名称):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

似乎可以工作。测试:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

返回:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]

I’m new to python and come from a lisp background. This is what I came up with (check out the var names for lulz):

def flatten(lst):
    if lst:
        car,*cdr=lst
        if isinstance(car,(list,tuple)):
            if cdr: return flatten(car) + flatten(cdr)
            return flatten(car)
        if cdr: return [car] + flatten(cdr)
        return [car]

Seems to work. Test:

flatten((1,2,3,(4,5,6,(7,8,(((1,2)))))))

returns:

[1, 2, 3, 4, 5, 6, 7, 8, 1, 2]

如何从列表列表中制作平面列表?

问题:如何从列表列表中制作平面列表?

我想知道是否有捷径可以从Python的列表清单中做出一个简单的清单。

我可以for循环执行此操作,但是也许有一些很酷的“单线”功能?我尝试使用reduce(),但出现错误。

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

错误信息

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

I wonder whether there is a shortcut to make a simple list out of list of lists in Python.

I can do that in a for loop, but maybe there is some cool “one-liner”? I tried it with reduce(), but I get an error.

Code

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

回答 0

给定一个列表列表l

flat_list = [item for sublist in l for item in sublist]

意思是:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

比到目前为止发布的快捷方式快。(l是要展平的列表。)

这是相应的功能:

flatten = lambda l: [item for sublist in l for item in sublist]

作为证据,您可以使用timeit标准库中的模块:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

说明:基于快捷方式+(包括中的隐含使用sum)的必要性是O(L**2)当有L个子列表时-随着中间结果列表的长度越来越长,每一步都会分配一个新的中间结果列表对象,并且所有项目必须复制前一个中间结果中的结果(以及最后添加的一些新结果)。因此,为简单起见,而又不失去一般性,请说您有I个项目的L个子列表:第一个I项目来回复制L-1次,第二个I项目L-2次,依此类推;等等。总份数是I乘以x从1到L的x的总和,即I * (L**2)/2

列表理解只生成一次列表,然后将每个项目(从其原始居住地复制到结果列表)也恰好复制一次。

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

flatten = lambda l: [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists — as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.


回答 1

您可以使用itertools.chain()

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

或者,您可以使用itertools.chain.from_iterable()不需要使用*运算符解压缩列表的方法:

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain.from_iterable(list2d))

You can use itertools.chain():

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

Or you can use itertools.chain.from_iterable() which doesn’t require unpacking the list with the * operator:

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain.from_iterable(list2d))

回答 2

作者注意:这是低效的。但是很有趣,因为类人猿很棒。它不适用于生产Python代码。

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

这只是对在第一个参数中传递的iterable元素求和,将第二个参数视为和的初始值(如果未给出,0则改为使用和,这种情况下会给您带来错误)。

由于您是对嵌套列表求和,因此实际上得到[1,3]+[2,4]的结果sum([[1,3],[2,4]],[])等于[1,3,2,4]

请注意,仅适用于列表列表。对于列表列表列表,您将需要其他解决方案。

Note from the author: This is inefficient. But fun, because monoids are awesome. It’s not appropriate for production Python code.

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This just sums the elements of iterable passed in the first argument, treating second argument as the initial value of the sum (if not given, 0 is used instead and this case will give you an error).

Because you are summing nested lists, you actually get [1,3]+[2,4] as a result of sum([[1,3],[2,4]],[]), which is equal to [1,3,2,4].

Note that only works on lists of lists. For lists of lists of lists, you’ll need another solution.


回答 3

我使用perfplot(我的一个宠物项目,本质上是一个包装纸timeit)测试了大多数建议的解决方案,然后发现

functools.reduce(operator.iconcat, a, [])

串联多个小列表和几个长列表时,这是最快的解决方案。(operator.iadd同样快。)


复制剧情的代码:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

I tested most suggested solutions with perfplot (a pet project of mine, essentially a wrapper around timeit), and found

functools.reduce(operator.iconcat, a, [])

to be the fastest solution, both when many small lists and few long lists are concatenated. (operator.iadd is equally fast.)


Code to reproduce the plot:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

回答 4

from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

extend()您的示例中的方法将修改x而不是返回有用的值(期望值reduce())。

reduce版本的更快方法是

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which reduce() expects).

A faster way to do the reduce version would be

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 5

如果您使用Django,请不要重新发明轮子:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

熊猫

>>> from pandas.core.common import flatten
>>> list(flatten(l))

Itertools

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

Unipath

>>> from unipath.path import flatten
>>> list(flatten(l))

Setuptools

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

Don’t reinvent the wheel if you use Django:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

Pandas:

>>> from pandas.core.common import flatten
>>> list(flatten(l))

Itertools:

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

Unipath:

>>> from unipath.path import flatten
>>> list(flatten(l))

Setuptools:

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

回答 6

这是适用于数字字符串嵌套列表和混合容器的通用方法。

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

注意事项

  • 在Python 3中,yield from flatten(x)可以替换for sub_x in flatten(x): yield sub_x
  • 在Python 3.8,抽象基类移动collection.abc所述typing模块。

演示版

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

参考

  • 此解决方案是根据Beazley,D.和B. Jones的食谱修改的。食谱4.14,Python Cookbook第三版,O’Reilly Media Inc.,塞巴斯托波尔,加利福尼亚:2013年。
  • 找到了较早的SO帖子,可能是原始的演示。

Here is a general approach that applies to numbers, strings, nested lists and mixed containers.

Code

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

Notes:

  • In Python 3, yield from flatten(x) can replace for sub_x in flatten(x): yield sub_x
  • In Python 3.8, abstract base classes are moved from collection.abc to the typing module.

Demo

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

Reference

  • This solution is modified from a recipe in Beazley, D. and B. Jones. Recipe 4.14, Python Cookbook 3rd Ed., O’Reilly Media Inc. Sebastopol, CA: 2013.
  • Found an earlier SO post, possibly the original demonstration.

回答 7

如果要展平不知道嵌套深度的数据结构,可以使用1iteration_utilities.deepflatten

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

它是一个生成器,因此您需要将结果list强制转换为或对其进行显式迭代。


如果只展平一个级别,并且每个项目本身都是可迭代的,则还可以使用iteration_utilities.flatten它本身只是一个薄包装itertools.chain.from_iterable

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

只是添加一些时间(基于NicoSchlömer的答案,其中不包括此答案中提供的功能):

这是一个对数对数图,可以容纳跨度很大的值。对于定性推理:越低越好。

研究结果表明,如果迭代只包含几个内部iterables然后sum将最快,但长期iterables只itertools.chain.from_iterableiteration_utilities.deepflatten或嵌套的理解与合理的性能itertools.chain.from_iterable是最快的(如已被尼科Schlömer注意到)。

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

1免责声明:我是该图书馆的作者

If you want to flatten a data-structure where you don’t know how deep it’s nested you could use iteration_utilities.deepflatten1

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

It’s a generator so you need to cast the result to a list or explicitly iterate over it.


To flatten only one level and if each of the items is itself iterable you can also use iteration_utilities.flatten which itself is just a thin wrapper around itertools.chain.from_iterable:

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Just to add some timings (based on Nico Schlömer answer that didn’t include the function presented in this answer):

It’s a log-log plot to accommodate for the huge range of values spanned. For qualitative reasoning: Lower is better.

The results show that if the iterable contains only a few inner iterables then sum will be fastest, however for long iterables only the itertools.chain.from_iterable, iteration_utilities.deepflatten or the nested comprehension have reasonable performance with itertools.chain.from_iterable being the fastest (as already noticed by Nico Schlömer).

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

1 Disclaimer: I’m the author of that library


回答 8

我收回我的声明。总和不是赢家。尽管列表较小时速度更快。但是,列表较大时,性能会大大降低。

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

sum版本仍在运行一分钟以上,尚未处理!

对于中型列表:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

使用小清单和时间:number = 1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrades significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and it hasn’t done processing yet!

For medium lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

Using small lists and timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

回答 9

似乎与operator.add!当您将两个列表加在一起时,正确的术语是concat,而不是添加。operator.concat是您需要使用的。

如果您认为功能正常,那么就这么简单:

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

您会看到reduce尊重序列类型,因此在提供元组时,您会得到一个元组。让我们尝试一个列表:

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

啊哈,您会得到一个清单。

性能如何:

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable相当快!但这是无法与相比的concat

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use.

If you’re thinking functional, it is as easy as this::

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. Let’s try with a list::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Aha, you get back a list.

How about performance::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable is pretty fast! But it’s no comparison to reduce with concat.

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

回答 10

为什么使用扩展?

reduce(lambda x, y: x+y, l)

这应该工作正常。

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine.


回答 11

考虑安装more_itertools软件包。

> pip install more_itertools

它附带了一个实现flattensource,来自itertools配方):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

从2.4版开始,您可以使用more_itertools.collapsesource,由abarnet提供)来展平更复杂,嵌套的可迭代对象。

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

Consider installing the more_itertools package.

> pip install more_itertools

It ships with an implementation for flatten (source, from the itertools recipes):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

As of version 2.4, you can flatten more complicated, nested iterables with more_itertools.collapse (source, contributed by abarnet).

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 12

您的函数不起作用的原因是因为扩展名扩展了数组就位并且不返回它。您仍然可以使用以下方法从lambda返回x:

reduce(lambda x,y: x.extend(y) or x, l)

注意:扩展比列表上的+更有效。

The reason your function didn’t work is because the extend extends an array in-place and doesn’t return it. You can still return x from lambda, using something like this:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.


回答 13

def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]
def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]

回答 14

递归版本

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

Recursive version

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

回答 15

matplotlib.cbook.flatten() 即使嵌套列表比示例嵌套更深,它也适用于嵌套列表。

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

结果:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

这比下划线._。flatten快18倍:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

matplotlib.cbook.flatten() will work for nested lists even if they nest more deeply than the example.

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

Result:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

This is 18x faster than underscore._.flatten:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

回答 16

在处理基于文本的可变长度列表时,可接受的答案对我不起作用。这是对我有用的另一种方法。

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

接受的答案无效

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

新提出的解决方案,没有工作对我来说:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

The accepted answer did not work for me when dealing with text-based lists of variable lengths. Here is an alternate approach that did work for me.

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

Accepted answer that did not work:

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

New proposed solution that did work for me:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

回答 17

上面的Anil函数的一个坏功能是,它要求用户始终手动将第二个参数指定为空列表[]。相反,这应该是默认设置。由于Python对象的工作方式,这些对象应在函数内部而不是参数中设置。

这是一个工作功能:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

测试:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

An bad feature of Anil’s function above is that it requires the user to always manually specify the second argument to be an empty list []. This should instead be a default. Due to the way Python objects work, these should be set inside the function, not in the arguments.

Here’s a working function:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

Testing:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

回答 18

以下对我来说似乎最简单:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

Following seem simplest to me:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

回答 19

也可以使用NumPy的flat

import numpy as np
list(np.array(l).flat)

编辑11/02/2016:仅当子列表具有相同尺寸时才可用。

One can also use NumPy’s flat:

import numpy as np
list(np.array(l).flat)

Edit 11/02/2016: Only works when sublists have identical dimensions.


回答 20

您可以使用numpy:
flat_list = list(np.concatenate(list_of_list))

You can use numpy :
flat_list = list(np.concatenate(list_of_list))


回答 21

如果您愿意放弃一点速度以获得更干净的外观,则可以使用numpy.concatenate().tolist()numpy.concatenate().ravel().tolist()

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

您可以在docs numpy.concatenatenumpy.ravel中找到更多信息

If you are willing to give up a tiny amount of speed for a cleaner look, then you could use numpy.concatenate().tolist() or numpy.concatenate().ravel().tolist():

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

You can find out more here in the docs numpy.concatenate and numpy.ravel


回答 22

我找到的最快解决方案(无论如何都是大型列表):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

做完了!您当然可以通过执行list(l)将其转换为列表

Fastest solution I have found (for large list anyway):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

Done! You can of course turn it back into a list by executing list(l)


回答 23

underscore.py包装风扇的简单代码

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

它解决了所有扁平化问题(无列表项或复杂的嵌套)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

您可以underscore.py使用pip 安装

pip install underscore.py

Simple code for underscore.py package fan

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

It solves all flatten problems (none list item or complex nesting)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

You can install underscore.py with pip

pip install underscore.py

回答 24

def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])
def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])

回答 25

注意:以下内容适用于Python 3.3+,因为它使用yield_fromsix也是第三方软件包,尽管它很稳定。或者,您可以使用sys.version


在的情况下obj = [[1, 2,], [3, 4], [5, 6]],这里的所有解决方案都不错,包括列表理解和itertools.chain.from_iterable

但是,请考虑以下稍微复杂的情况:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

这里有几个问题:

  • 一个元素6只是一个标量。它是不可迭代的,因此上述路由将在此处失败。
  • 其中一个要素,'abc'技术上可迭代(所有str s为)。但是,在行与行之间进行一点阅读时,您并不想这样处理-您希望将其视为单个元素。
  • 最后一个元素[8, [9, 10]]本身就是嵌套的可迭代对象。基本列表理解,chain.from_iterable仅提取“下一级”。

您可以通过以下方法对此进行补救:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

在这里,您检查子元素(1)是否可通过Iterable,ABC从进行迭代itertools,但还要确保(2)元素不是 “字符串状”的。

Note: Below applies to Python 3.3+ because it uses yield_from. six is also a third-party package, though it is stable. Alternately, you could use sys.version.


In the case of obj = [[1, 2,], [3, 4], [5, 6]], all of the solutions here are good, including list comprehension and itertools.chain.from_iterable.

However, consider this slightly more complex case:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

There are several problems here:

  • One element, 6, is just a scalar; it’s not iterable, so the above routes will fail here.
  • One element, 'abc', is technically iterable (all strs are). However, reading between the lines a bit, you don’t want to treat it as such–you want to treat it as a single element.
  • The final element, [8, [9, 10]] is itself a nested iterable. Basic list comprehension and chain.from_iterable only extract “1 level down.”

You can remedy this as follows:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

Here, you check that the sub-element (1) is iterable with Iterable, an ABC from itertools, but also want to ensure that (2) the element is not “string-like.”


回答 26

flat_list = []
for i in list_of_list:
    flat_list+=i

该代码也可以很好地工作,因为它会一直扩展列表。虽然非常相似,但是只有一个for循环。因此,它比添加2 for循环具有更少的复杂性。

flat_list = []
for i in list_of_list:
    flat_list+=i

This Code also works fine as it just extend the list all the way. Although it is much similar but only have one for loop. So It have less complexity than adding 2 for loops.


回答 27

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

这种解决方案相对于大多数其他解决方案的优势在于,如果您有类似以下的列表:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

虽然其他大多数解决方案都会引发错误,但此解决方案可以解决这些问题。

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

The advantage of this solution over most others here is that if you have a list like:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

while most other solutions throw an error this solution handles them.


回答 28

这可能不是最有效的方法,但我认为应该放一个衬里(实际上是两个衬里)。两种版本均可在任意层次的嵌套列表上使用,并利用语言功能(Python3.5)和递归。

def make_list_flat (l):
    flist = []
    flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l]
    return flist

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = make_list_flat(a)
print (flist)

输出是

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

这以深度优先的方式工作。递归向下进行,直到找到一个非列表元素,然后扩展局部变量flist,然后将其回滚到父变量。每当flist返回时,它就会扩展到flist列表理解中的父级。因此,从根本上返回一个平面列表。

上面的代码创建了几个本地列表并返回它们,用于扩展父级列表。我认为解决此问题的方法可能是创建gloabl flist,如下所示。

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = []
def make_list_flat (l):
    flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l]

make_list_flat(a)
print (flist)

输出再次

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

尽管目前我不确定效率。

This may not be the most efficient way but I thought to put a one-liner (actually a two-liner). Both versions will work on arbitrary hierarchy nested lists, and exploits language features (Python3.5) and recursion.

def make_list_flat (l):
    flist = []
    flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l]
    return flist

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = make_list_flat(a)
print (flist)

The output is

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

This works in a depth first manner. The recursion goes down until it finds a non-list element, then extends the local variable flist and then rolls back it to the parent. Whenever flist is returned, it is extended to the parent’s flist in the list comprehension. Therefore, at the root, a flat list is returned.

The above one creates several local lists and returns them which are used to extend the parent’s list. I think the way around for this may be creating a gloabl flist, like below.

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = []
def make_list_flat (l):
    flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l]

make_list_flat(a)
print (flist)

The output is again

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Although I am not sure at this time about the efficiency.


回答 29

适用于整数的异质和均质列表的另一种异常方法:

from typing import List


def flatten(l: list) -> List[int]:
    """Flatten an arbitrary deep nested list of lists of integers.

    Examples:
        >>> flatten([1, 2, [1, [10]]])
        [1, 2, 1, 10]

    Args:
        l: Union[l, Union[int, List[int]]

    Returns:
        Flatted list of integer
    """
    return [int(i.strip('[ ]')) for i in str(l).split(',')]

Another unusual approach that works for hetero- and homogeneous lists of integers:

from typing import List


def flatten(l: list) -> List[int]:
    """Flatten an arbitrary deep nested list of lists of integers.

    Examples:
        >>> flatten([1, 2, [1, [10]]])
        [1, 2, 1, 10]

    Args:
        l: Union[l, Union[int, List[int]]

    Returns:
        Flatted list of integer
    """
    return [int(i.strip('[ ]')) for i in str(l).split(',')]