标签归档:multidimensional-array

如何在Python中初始化二维数组?

问题:如何在Python中初始化二维数组?

我开始使用python,并尝试使用一个二维列表,最初我在每个地方都填充了相同的变量。我想出了这个:

def initialize_twodlist(foo):
    twod_list = []
    new = []
    for i in range (0, 10):
        for j in range (0, 10):
            new.append(foo)
        twod_list.append(new)
        new = []

它给出了预期的结果,但是感觉像是一种解决方法。有没有更简单/更短/更优雅的方式来做到这一点?

I’m beginning python and I’m trying to use a two-dimensional list, that I initially fill up with the same variable in every place. I came up with this:

def initialize_twodlist(foo):
    twod_list = []
    new = []
    for i in range (0, 10):
        for j in range (0, 10):
            new.append(foo)
        twod_list.append(new)
        new = []

It gives the desired result, but feels like a workaround. Is there an easier/shorter/more elegant way to do this?


回答 0

Python中经常出现的一种模式是

bar = []
for item in some_iterable:
    bar.append(SOME EXPRESSION)

这有助于激励列表理解的引入,从而将代码段转换为

bar = [SOME EXPRESSION for item in some_iterable]

它更短,有时更清晰。通常,您养成识别这些习惯的习惯,并经常用理解代替循环。

您的代码两次遵循此模式

twod_list = []                                       \                      
for i in range (0, 10):                               \
    new = []                  \ can be replaced        } this too
    for j in range (0, 10):    } with a list          /
        new.append(foo)       / comprehension        /
    twod_list.append(new)                           /

A pattern that often came up in Python was

bar = []
for item in some_iterable:
    bar.append(SOME EXPRESSION)

which helped motivate the introduction of list comprehensions, which convert that snippet to

bar = [SOME EXPRESSION for item in some_iterable]

which is shorter and sometimes clearer. Usually you get in the habit of recognizing these and often replacing loops with comprehensions.

Your code follows this pattern twice

twod_list = []                                       \                      
for i in range (0, 10):                               \
    new = []                  \ can be replaced        } this too
    for j in range (0, 10):    } with a list          /
        new.append(foo)       / comprehension        /
    twod_list.append(new)                           /

回答 1

您可以使用列表推导

x = [[foo for i in range(10)] for j in range(10)]
# x is now a 10x10 array of 'foo' (which can depend on i and j if you want)

You can use a list comprehension:

x = [[foo for i in range(10)] for j in range(10)]
# x is now a 10x10 array of 'foo' (which can depend on i and j if you want)

回答 2

这种方法比嵌套列表理解要快

[x[:] for x in [[foo] * 10] * 10]    # for immutable foo!

这是一些python3时序,适用于大小清单

$python3 -m timeit '[x[:] for x in [[1] * 10] * 10]'
1000000 loops, best of 3: 1.55 usec per loop

$ python3 -m timeit '[[1 for i in range(10)] for j in range(10)]'
100000 loops, best of 3: 6.44 usec per loop

$ python3 -m timeit '[x[:] for x in [[1] * 1000] * 1000]'
100 loops, best of 3: 5.5 msec per loop

$ python3 -m timeit '[[1 for i in range(1000)] for j in range(1000)]'
10 loops, best of 3: 27 msec per loop

说明:

[[foo]*10]*10创建重复10次的同一对象的列表。您不能只使用它,因为修改一个元素会修改每一行中的相同元素!

x[:]等效于,list(X)但效率更高,因为它避免了名称查找。无论哪种方式,它都会为每行创建一个浅表副本,因此现在所有元素都是独立的。

foo尽管所有元素都是同一个对象,所以如果foo是mutable,则不能使用此方案。必须使用

import copy
[[copy.deepcopy(foo) for x in range(10)] for y in range(10)]

或假设某个类(或函数)Foo返回foos

[[Foo() for x in range(10)] for y in range(10)]

This way is faster than the nested list comprehensions

[x[:] for x in [[foo] * 10] * 10]    # for immutable foo!

Here are some python3 timings, for small and large lists

$python3 -m timeit '[x[:] for x in [[1] * 10] * 10]'
1000000 loops, best of 3: 1.55 usec per loop

$ python3 -m timeit '[[1 for i in range(10)] for j in range(10)]'
100000 loops, best of 3: 6.44 usec per loop

$ python3 -m timeit '[x[:] for x in [[1] * 1000] * 1000]'
100 loops, best of 3: 5.5 msec per loop

$ python3 -m timeit '[[1 for i in range(1000)] for j in range(1000)]'
10 loops, best of 3: 27 msec per loop

Explanation:

[[foo]*10]*10 creates a list of the same object repeated 10 times. You can’t just use this, because modifying one element will modify that same element in each row!

x[:] is equivalent to list(X) but is a bit more efficient since it avoids the name lookup. Either way, it creates a shallow copy of each row, so now all the elements are independent.

All the elements are the same foo object though, so if foo is mutable, you can’t use this scheme., you’d have to use

import copy
[[copy.deepcopy(foo) for x in range(10)] for y in range(10)]

or assuming a class (or function) Foo that returns foos

[[Foo() for x in range(10)] for y in range(10)]

回答 3

不要使用[[v]*n]*n,这是一个陷阱!

>>> a = [[0]*3]*3
>>> a
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> a[0][0]=1
>>> a
[[1, 0, 0], [1, 0, 0], [1, 0, 0]]

    t = [ [0]*3 for i in range(3)]

效果很好。

Don’t use [[v]*n]*n, it is a trap!

>>> a = [[0]*3]*3
>>> a
[[0, 0, 0], [0, 0, 0], [0, 0, 0]]
>>> a[0][0]=1
>>> a
[[1, 0, 0], [1, 0, 0], [1, 0, 0]]

but

    t = [ [0]*3 for i in range(3)]

works great.


回答 4

在Python中初始化二维数组:

a = [[0 for x in range(columns)] for y in range(rows)]

To initialize a two-dimensional array in Python:

a = [[0 for x in range(columns)] for y in range(rows)]

回答 5

[[foo for x in xrange(10)] for y in xrange(10)]
[[foo for x in xrange(10)] for y in xrange(10)]

回答 6

通常,当您需要多维数组时,您不需要列表列表,而是想要一个numpy数组或一个dict。

例如,使用numpy,您将执行以下操作

import numpy
a = numpy.empty((10, 10))
a.fill(foo)

Usually when you want multidimensional arrays you don’t want a list of lists, but rather a numpy array or possibly a dict.

For example, with numpy you would do something like

import numpy
a = numpy.empty((10, 10))
a.fill(foo)

回答 7

您可以这样做:

[[element] * numcols] * numrows

例如:

>>> [['a'] *3] * 2
[['a', 'a', 'a'], ['a', 'a', 'a']]

但这有不希望的副作用:

>>> b = [['a']*3]*3
>>> b
[['a', 'a', 'a'], ['a', 'a', 'a'], ['a', 'a', 'a']]
>>> b[1][1]
'a'
>>> b[1][1] = 'b'
>>> b
[['a', 'b', 'a'], ['a', 'b', 'a'], ['a', 'b', 'a']]

You can do just this:

[[element] * numcols] * numrows

For example:

>>> [['a'] *3] * 2
[['a', 'a', 'a'], ['a', 'a', 'a']]

But this has a undesired side effect:

>>> b = [['a']*3]*3
>>> b
[['a', 'a', 'a'], ['a', 'a', 'a'], ['a', 'a', 'a']]
>>> b[1][1]
'a'
>>> b[1][1] = 'b'
>>> b
[['a', 'b', 'a'], ['a', 'b', 'a'], ['a', 'b', 'a']]

回答 8

twod_list = [[foo for _ in range(m)] for _ in range(n)]

因为n是行数,m是列数,foo是值。

twod_list = [[foo for _ in range(m)] for _ in range(n)]

for n is number of rows, and m is the number of column, and foo is the value.


回答 9

如果它是一个人口稀少的数组,那么最好使用以元组为键的字典:

dict = {}
key = (a,b)
dict[key] = value
...

If it’s a sparsely-populated array, you might be better off using a dictionary keyed with a tuple:

dict = {}
key = (a,b)
dict[key] = value
...

回答 10

t = [ [0]*10 for i in [0]*10]

为每个元素[0]*10创建一个新的..

t = [ [0]*10 for i in [0]*10]

for each element a new [0]*10 will be created ..


回答 11

错误的方法:[[None * m] * n]

>>> m, n = map(int, raw_input().split())
5 5
>>> x[0][0] = 34
>>> x
[[34, None, None, None, None], [34, None, None, None, None], [34, None, None, None, None], [34, None, None, None, None], [34, None, None, None, None]]
>>> id(x[0][0])
140416461589776
>>> id(x[3][0])
140416461589776

使用这种方法,python不允许为外部列创建不同的地址空间,并且会导致各种异常行为,超出您的预期。

正确的方法,但有以下exceptions:

y = [[0 for i in range(m)] for j in range(n)]
>>> id(y[0][0]) == id(y[1][0])
False

这是一个很好的方法,但是如果将默认值设置为 None

>>> r = [[None for i in range(5)] for j in range(5)]
>>> r
[[None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None]]
>>> id(r[0][0]) == id(r[2][0])
True

因此,使用此方法正确设置默认值。

绝对正确:

按照迈克对double loop的答复。

Incorrect Approach: [[None*m]*n]

>>> m, n = map(int, raw_input().split())
5 5
>>> x[0][0] = 34
>>> x
[[34, None, None, None, None], [34, None, None, None, None], [34, None, None, None, None], [34, None, None, None, None], [34, None, None, None, None]]
>>> id(x[0][0])
140416461589776
>>> id(x[3][0])
140416461589776

With this approach, python does not allow creating different address space for the outer columns and will lead to various misbehaviour than your expectation.

Correct Approach but with exception:

y = [[0 for i in range(m)] for j in range(n)]
>>> id(y[0][0]) == id(y[1][0])
False

It is good approach but there is exception if you set default value to None

>>> r = [[None for i in range(5)] for j in range(5)]
>>> r
[[None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None], [None, None, None, None, None]]
>>> id(r[0][0]) == id(r[2][0])
True

So set your default value properly using this approach.

Absolute correct:

Follow the mike’s reply of double loop.


回答 12

Matrix={}
for i in range(0,3):
  for j in range(0,3):
    Matrix[i,j] = raw_input("Enter the matrix:")
Matrix={}
for i in range(0,3):
  for j in range(0,3):
    Matrix[i,j] = raw_input("Enter the matrix:")

回答 13

要初始化二维数组,请使用: arr = [[]*m for i in range(n)]

实际上, arr = [[]*m]*n将创建一个2D数组,其中所有n个数组都指向同一数组,因此任何元素中值的任何变化都将反映在所有n个列表中

有关更多详细说明,请访问:https : //www.geeksforgeeks.org/python-using-2d-arrays-lists-the-right-way/

To initialize a 2-dimensional array use: arr = [[]*m for i in range(n)]

actually, arr = [[]*m]*n will create a 2D array in which all n arrays will point to same array, so any change in value in any element will be reflected in all n lists

for more further explanation visit : https://www.geeksforgeeks.org/python-using-2d-arrays-lists-the-right-way/


回答 14

使用最简单的思想来创建这个。

wtod_list = []

并添加大小:

wtod_list = [[0 for x in xrange(10)] for x in xrange(10)]

或者如果我们想先声明尺寸。我们只使用:

   wtod_list = [[0 for x in xrange(10)] for x in xrange(10)]

use the simplest think to create this.

wtod_list = []

and add the size:

wtod_list = [[0 for x in xrange(10)] for x in xrange(10)]

or if we want to declare the size firstly. we only use:

   wtod_list = [[0 for x in xrange(10)] for x in xrange(10)]

回答 15

正如@Arnab和@Mike指出的那样,数组不是列表。几乎没有什么不同:1)数组在初始化期间固定大小; 2)数组通常支持的操作少于列表。

在大多数情况下,这可能是一个过大的杀伤力,但这是一个基本的2d数组实现,它利用python ctypes(c库)来利用硬件数组实现

import ctypes
class Array:
    def __init__(self,size,foo): #foo is the initial value
        self._size = size
        ArrayType = ctypes.py_object * size
        self._array = ArrayType()
        for i in range(size):
            self._array[i] = foo
    def __getitem__(self,index):
        return self._array[index]
    def __setitem__(self,index,value):
        self._array[index] = value
    def __len__(self):
        return self._size

class TwoDArray:
    def __init__(self,columns,rows,foo):
        self._2dArray = Array(rows,foo)
        for i in range(rows):
            self._2dArray[i] = Array(columns,foo)

    def numRows(self):
        return len(self._2dArray)
    def numCols(self):
        return len((self._2dArray)[0])
    def __getitem__(self,indexTuple):
        row = indexTuple[0]
        col = indexTuple[1]
        assert row >= 0 and row < self.numRows() \
               and col >=0 and col < self.numCols(),\
               "Array script out of range"
        return ((self._2dArray)[row])[col]

if(__name__ == "__main__"):
    twodArray = TwoDArray(4,5,5)#sample input
    print(twodArray[2,3])

As @Arnab and @Mike pointed out, an array is not a list. Few differences are 1) arrays are fixed size during initialization 2) arrays normally support lesser operations than a list.

Maybe an overkill in most cases, but here is a basic 2d array implementation that leverages hardware array implementation using python ctypes(c libraries)

import ctypes
class Array:
    def __init__(self,size,foo): #foo is the initial value
        self._size = size
        ArrayType = ctypes.py_object * size
        self._array = ArrayType()
        for i in range(size):
            self._array[i] = foo
    def __getitem__(self,index):
        return self._array[index]
    def __setitem__(self,index,value):
        self._array[index] = value
    def __len__(self):
        return self._size

class TwoDArray:
    def __init__(self,columns,rows,foo):
        self._2dArray = Array(rows,foo)
        for i in range(rows):
            self._2dArray[i] = Array(columns,foo)

    def numRows(self):
        return len(self._2dArray)
    def numCols(self):
        return len((self._2dArray)[0])
    def __getitem__(self,indexTuple):
        row = indexTuple[0]
        col = indexTuple[1]
        assert row >= 0 and row < self.numRows() \
               and col >=0 and col < self.numCols(),\
               "Array script out of range"
        return ((self._2dArray)[row])[col]

if(__name__ == "__main__"):
    twodArray = TwoDArray(4,5,5)#sample input
    print(twodArray[2,3])

回答 16

我了解的重要一点是:在初始化数组(任何维度)时,我们应该为数组的所有位置提供默认值。然后,仅初始化完成。之后,我们可以更改或接收新值到数组的任何位置。下面的代码非常适合我

N=7
F=2

#INITIALIZATION of 7 x 2 array with deafult value as 0
ar=[[0]*F for x in range(N)]

#RECEIVING NEW VALUES TO THE INITIALIZED ARRAY
for i in range(N):
    for j in range(F):
        ar[i][j]=int(input())
print(ar)

The important thing I understood is: While initializing an array(in any dimension) We should give a default value to all the positions of array. Then only initialization completes. After that, we can change or receive new values to any position of the array. The below code worked for me perfectly

N=7
F=2

#INITIALIZATION of 7 x 2 array with deafult value as 0
ar=[[0]*F for x in range(N)]

#RECEIVING NEW VALUES TO THE INITIALIZED ARRAY
for i in range(N):
    for j in range(F):
        ar[i][j]=int(input())
print(ar)


回答 17

如果使用numpy,则可以轻松创建2d数组:

import numpy as np

row = 3
col = 5
num = 10
x = np.full((row, col), num)

X

array([[10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10]])

If you use numpy, you can easily create 2d arrays:

import numpy as np

row = 3
col = 5
num = 10
x = np.full((row, col), num)

x

array([[10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10],
       [10, 10, 10, 10, 10]])

回答 18

row=5
col=5
[[x]*col for x in [b for b in range(row)]]

以上将为您提供5×5 2D阵列

[[0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [2, 2, 2, 2, 2],
 [3, 3, 3, 3, 3],
 [4, 4, 4, 4, 4]]

它使用嵌套列表推导。细分如下:

[[x]*col for x in [b for b in range(row)]]

[x] * col->
在-> x中对x 求值的最终表达式将是迭代器提供的值
[b表示range(row)中b的值]]->迭代器。

[b代表b在range(row)中的b]],因为row = 5,
所以其计算结果为[0,1,2,3,4],因此现在简化为

[[x]*col for x in [0,1,2,3,4]]

对于[0,1,2,3,4]中的x,这将计算为[[0] * 5]-> x = 0第一次迭代
[对于[ 0,1,2,3 中的x,[1] * 5, 3,4]]-> x = 1,第2次迭代
[[2] * 5 for x in [0,1,2,3,4]]-> x = 2,第3次迭代
[[3] * 5对于[0,1,2,3,4]中的x – x等于3 =第4次迭代
[[4] * 5对于[0,1,2,3,4]中的x [x] = 4第5次迭代

row=5
col=5
[[x]*col for x in [b for b in range(row)]]

The above will give you a 5×5 2D array

[[0, 0, 0, 0, 0],
 [1, 1, 1, 1, 1],
 [2, 2, 2, 2, 2],
 [3, 3, 3, 3, 3],
 [4, 4, 4, 4, 4]]

It is using nested list comprehension. Breakdown as below:

[[x]*col for x in [b for b in range(row)]]

[x]*col –> final expression that is evaluated
for x in –> x will be the value provided by the iterator
[b for b in range(row)]] –> Iterator.

[b for b in range(row)]] this will evaluate to [0,1,2,3,4] since row=5
so now it simplifies to

[[x]*col for x in [0,1,2,3,4]]

This will evaluate to [[0]*5 for x in [0,1,2,3,4]] –> with x=0 1st iteration
[[1]*5 for x in [0,1,2,3,4]] –> with x=1 2nd iteration
[[2]*5 for x in [0,1,2,3,4]] –> with x=2 3rd iteration
[[3]*5 for x in [0,1,2,3,4]] –> with x=3 4th iteration
[[4]*5 for x in [0,1,2,3,4]] –> with x=4 5th iteration


回答 19

这是我在不使用其他库的情况下教新程序员的最佳方法。我想要更好的东西。

def initialize_twodlist(value):
    list=[]
    for row in range(10):
        list.append([value]*10)
    return list

This is the best I’ve found for teaching new programmers, and without using additional libraries. I’d like something better though.

def initialize_twodlist(value):
    list=[]
    for row in range(10):
        list.append([value]*10)
    return list

回答 20

这是一个更简单的方法:

import numpy as np
twoD = np.array([[]*m]*n)

要使用任何“ x”值初始化所有单元格,请使用:

twoD = np.array([[x]*m]*n

Here is an easier way :

import numpy as np
twoD = np.array([[]*m]*n)

For initializing all cells with any ‘x’ value use :

twoD = np.array([[x]*m]*n

回答 21

我经常使用这种方法来初始化二维数组

n=[[int(x) for x in input().split()] for i in range(int(input())]

Often I use this approach for initializing a 2-dimensional array

n=[[int(x) for x in input().split()] for i in range(int(input())]


回答 22

可以从以下系列中得出添加尺寸的一般模式:

x = 0
mat1 = []
for i in range(3):
    mat1.append(x)
    x+=1
print(mat1)


x=0
mat2 = []
for i in range(3):
    tmp = []
    for j in range(4):
        tmp.append(x)
        x+=1
    mat2.append(tmp)

print(mat2)


x=0
mat3 = []
for i in range(3):
    tmp = []
    for j in range(4):
        tmp2 = []
        for k in range(5):
            tmp2.append(x)
            x+=1
        tmp.append(tmp2)
    mat3.append(tmp)

print(mat3)

The general pattern to add dimensions could be drawn from this series:

x = 0
mat1 = []
for i in range(3):
    mat1.append(x)
    x+=1
print(mat1)


x=0
mat2 = []
for i in range(3):
    tmp = []
    for j in range(4):
        tmp.append(x)
        x+=1
    mat2.append(tmp)

print(mat2)


x=0
mat3 = []
for i in range(3):
    tmp = []
    for j in range(4):
        tmp2 = []
        for k in range(5):
            tmp2.append(x)
            x+=1
        tmp.append(tmp2)
    mat3.append(tmp)

print(mat3)

回答 23

您可以尝试使用此[[0] * 10] * 10。这将返回由10行10列组成的二维数组,每个单元格的值为0。

You can try this [[0]*10]*10. This will return the 2d array of 10 rows and 10 columns with value 0 for each cell.


回答 24

lst = [[0] * m对于范围(n)中的i]

初始化所有矩阵n =行和m =列

lst=[[0]*m for i in range(n)]

initialize all matrix n=rows and m=columns


回答 25

另一种方法是使用字典来保存二维数组。

twoD = {}
twoD[0,0] = 0
print(twoD[0,0]) # ===> prints 0

这仅可以保存任何1D,2D值,并将其初始化为0任何其他int值,请使用collections

import collections
twoD = collections.defaultdict(int)
print(twoD[0,0]) # ==> prints 0
twoD[1,1] = 1
print(twoD[1,1]) # ==> prints 1

Another way is to use a dictionary to hold a two-dimensional array.

twoD = {}
twoD[0,0] = 0
print(twoD[0,0]) # ===> prints 0

This just can hold any 1D, 2D values and to initialize this to 0 or any other int value, use collections.

import collections
twoD = collections.defaultdict(int)
print(twoD[0,0]) # ==> prints 0
twoD[1,1] = 1
print(twoD[1,1]) # ==> prints 1

回答 26

码:

num_rows, num_cols = 4, 2
initial_val = 0
matrix = [[initial_val] * num_cols for _ in range(num_rows)]
print(matrix) 
# [[0, 0], [0, 0], [0, 0], [0, 0]]

initial_val 必须是不变的。

Code:

num_rows, num_cols = 4, 2
initial_val = 0
matrix = [[initial_val] * num_cols for _ in range(num_rows)]
print(matrix) 
# [[0, 0], [0, 0], [0, 0], [0, 0]]

initial_val must be immutable.


回答 27

from random import randint
l = []

for i in range(10):
    k=[]
    for j in range(10):
        a= randint(1,100)
        k.append(a)

    l.append(k)




print(l)
print(max(l[2]))

b = []
for i in range(10):
    a = l[i][5]
    b.append(a)

print(min(b))
from random import randint
l = []

for i in range(10):
    k=[]
    for j in range(10):
        a= randint(1,100)
        k.append(a)

    l.append(k)




print(l)
print(max(l[2]))

b = []
for i in range(10):
    a = l[i][5]
    b.append(a)

print(min(b))

numpy.array形状(R,1)和(R,)之间的区别

问题:numpy.array形状(R,1)和(R,)之间的区别

进入时numpy,一些操作恢复了形状,(R, 1)但有些恢复了(R,)。由于reshape需要显式运算,因此这将使矩阵乘法更加乏味。例如,给定矩阵M,如果我们想在numpy.dot(M[:,0], numpy.ones((1, R)))哪里做R行数(当然,同样的问题也会逐列出现)。我们会得到matrices are not aligned错误,因为M[:,0]是在外形(R,),但numpy.ones((1, R))在形状(1, R)

所以我的问题是:

  1. 什么形状之间的差异(R, 1)(R,)。我从字面上知道它是数字列表和列表列表,其中所有列表仅包含一个数字。只是想知道为什么不设计numpy使其偏爱形状(R, 1)而不是(R,)更容易进行矩阵乘法。

  2. 以上示例是否有更好的方法?无需像这样显式重塑:numpy.dot(M[:,0].reshape(R, 1), numpy.ones((1, R)))

In numpy, some of the operations return in shape (R, 1) but some return (R,). This will make matrix multiplication more tedious since explicit reshape is required. For example, given a matrix M, if we want to do numpy.dot(M[:,0], numpy.ones((1, R))) where R is the number of rows (of course, the same issue also occurs column-wise). We will get matrices are not aligned error since M[:,0] is in shape (R,) but numpy.ones((1, R)) is in shape (1, R).

So my questions are:

  1. What’s the difference between shape (R, 1) and (R,). I know literally it’s list of numbers and list of lists where all list contains only a number. Just wondering why not design numpy so that it favors shape (R, 1) instead of (R,) for easier matrix multiplication.

  2. Are there better ways for the above example? Without explicitly reshape like this: numpy.dot(M[:,0].reshape(R, 1), numpy.ones((1, R)))


回答 0

1. NumPy中形状的含义

您写道:“我从字面上知道这是一个数字列表和一个列表列表,其中所有列表都只包含一个数字”,但这是一种无益的思考方式。

考虑NumPy数组的最佳方法是它们由两部分组成,一个数据缓冲区只是一个原始元素块,另一个视图描述了如何解释数据缓冲区。

例如,如果我们创建一个包含12个整数的数组:

>>> a = numpy.arange(12)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

然后a由一个数据缓冲区组成,排列如下:

┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
  0   1   2   3   4   5   6   7   8   9  10  11 
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

还有一个描述如何解释数据的视图:

>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> a.dtype
dtype('int64')
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(12,)

这里的形状 (12,)表示该数组由一个从0到11的单个索引建立索引。从概念上讲,如果我们标记此单个索引i,则该数组a如下所示:

i= 0    1    2    3    4    5    6    7    8    9   10   11
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
  0   1   2   3   4   5   6   7   8   9  10  11 
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

如果我们调整数组的形状,则不会更改数据缓冲区。相反,它创建一个新视图,该视图描述了另一种解释数据的方式。所以之后:

>>> b = a.reshape((3, 4))

该数组b具有与相同的数据缓冲区a,但是现在它由两个索引分别从0到2和0到3进行索引。如果我们标记两个索引ij,则数组b如下所示:

i= 0    0    0    0    1    1    1    1    2    2    2    2
j= 0    1    2    3    0    1    2    3    0    1    2    3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
  0   1   2   3   4   5   6   7   8   9  10  11 
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

意思就是:

>>> b[2,1]
9

您可以看到第二个索引变化很快,而第一个索引变化缓慢。如果您不希望这样做,可以指定order参数:

>>> c = a.reshape((3, 4), order='F')

这将导致数组的索引如下:

i= 0    1    2    0    1    2    0    1    2    0    1    2
j= 0    0    0    1    1    1    2    2    2    3    3    3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
  0   1   2   3   4   5   6   7   8   9  10  11 
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

意思就是:

>>> c[2,1]
5

现在应该清楚一个数组具有一个或多个尺寸为1的尺寸的形状的含义。

>>> d = a.reshape((12, 1))

数组d由两个索引索引,第一个索引的范围是0到11,第二个索引始终是0:

i= 0    1    2    3    4    5    6    7    8    9   10   11
j= 0    0    0    0    0    0    0    0    0    0    0    0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
  0   1   2   3   4   5   6   7   8   9  10  11 
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

所以:

>>> d[10,0]
10

长度为1的尺寸是“自由的”(在某种意义上),因此没有什么可以阻止您进入城镇:

>>> e = a.reshape((1, 2, 1, 6, 1))

给出一个索引如下的数组:

i= 0    0    0    0    0    0    0    0    0    0    0    0
j= 0    0    0    0    0    0    1    1    1    1    1    1
k= 0    0    0    0    0    0    0    0    0    0    0    0
l= 0    1    2    3    4    5    0    1    2    3    4    5
m= 0    0    0    0    0    0    0    0    0    0    0    0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
  0   1   2   3   4   5   6   7   8   9  10  11 
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

所以:

>>> e[0,1,0,0,0]
6

有关如何实现数组的更多详细信息,请参见NumPy内部文档

2.怎么办?

由于numpy.reshape只是创建了一个新视图,因此不必在必要时使用它。当您想以其他方式索引数组时,它是使用的正确工具。

但是,在较长的计算中,通常可能首先要安排构造具有“正确”形状的数组,这样就可以最大程度地减少变形和转置的次数。但是,在没有看到导致需要重塑的实际环境的情况下,很难说应该改变什么。

您问题中的示例是:

numpy.dot(M[:,0], numpy.ones((1, R)))

但这是不现实的。首先,此表达式:

M[:,0].sum()

计算结果更简单。第二,第0列真的有什么特别之处吗?也许您实际需要的是:

M.sum(axis=0)

1. The meaning of shapes in NumPy

You write, “I know literally it’s list of numbers and list of lists where all list contains only a number” but that’s a bit of an unhelpful way to think about it.

The best way to think about NumPy arrays is that they consist of two parts, a data buffer which is just a block of raw elements, and a view which describes how to interpret the data buffer.

For example, if we create an array of 12 integers:

>>> a = numpy.arange(12)
>>> a
array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

Then a consists of a data buffer, arranged something like this:

┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and a view which describes how to interpret the data:

>>> a.flags
  C_CONTIGUOUS : True
  F_CONTIGUOUS : True
  OWNDATA : True
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False
>>> a.dtype
dtype('int64')
>>> a.itemsize
8
>>> a.strides
(8,)
>>> a.shape
(12,)

Here the shape (12,) means the array is indexed by a single index which runs from 0 to 11. Conceptually, if we label this single index i, the array a looks like this:

i= 0    1    2    3    4    5    6    7    8    9   10   11
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

If we reshape an array, this doesn’t change the data buffer. Instead, it creates a new view that describes a different way to interpret the data. So after:

>>> b = a.reshape((3, 4))

the array b has the same data buffer as a, but now it is indexed by two indices which run from 0 to 2 and 0 to 3 respectively. If we label the two indices i and j, the array b looks like this:

i= 0    0    0    0    1    1    1    1    2    2    2    2
j= 0    1    2    3    0    1    2    3    0    1    2    3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> b[2,1]
9

You can see that the second index changes quickly and the first index changes slowly. If you prefer this to be the other way round, you can specify the order parameter:

>>> c = a.reshape((3, 4), order='F')

which results in an array indexed like this:

i= 0    1    2    0    1    2    0    1    2    0    1    2
j= 0    0    0    1    1    1    2    2    2    3    3    3
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

which means that:

>>> c[2,1]
5

It should now be clear what it means for an array to have a shape with one or more dimensions of size 1. After:

>>> d = a.reshape((12, 1))

the array d is indexed by two indices, the first of which runs from 0 to 11, and the second index is always 0:

i= 0    1    2    3    4    5    6    7    8    9   10   11
j= 0    0    0    0    0    0    0    0    0    0    0    0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> d[10,0]
10

A dimension of length 1 is “free” (in some sense), so there’s nothing stopping you from going to town:

>>> e = a.reshape((1, 2, 1, 6, 1))

giving an array indexed like this:

i= 0    0    0    0    0    0    0    0    0    0    0    0
j= 0    0    0    0    0    0    1    1    1    1    1    1
k= 0    0    0    0    0    0    0    0    0    0    0    0
l= 0    1    2    3    4    5    0    1    2    3    4    5
m= 0    0    0    0    0    0    0    0    0    0    0    0
┌────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┬────┐
│  0 │  1 │  2 │  3 │  4 │  5 │  6 │  7 │  8 │  9 │ 10 │ 11 │
└────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┴────┘

and so:

>>> e[0,1,0,0,0]
6

See the NumPy internals documentation for more details about how arrays are implemented.

2. What to do?

Since numpy.reshape just creates a new view, you shouldn’t be scared about using it whenever necessary. It’s the right tool to use when you want to index an array in a different way.

However, in a long computation it’s usually possible to arrange to construct arrays with the “right” shape in the first place, and so minimize the number of reshapes and transposes. But without seeing the actual context that led to the need for a reshape, it’s hard to say what should be changed.

The example in your question is:

numpy.dot(M[:,0], numpy.ones((1, R)))

but this is not realistic. First, this expression:

M[:,0].sum()

computes the result more simply. Second, is there really something special about column 0? Perhaps what you actually need is:

M.sum(axis=0)

回答 1

(R,)和之间的区别(1,R)实际上是您需要使用的索引数。 ones((1,R))是一个二维数组,碰巧只有一行。 ones(R)是一个向量。通常,如果变量的行数/列数不超过一个,则应该使用向量,而不是单维度的矩阵。

对于您的特定情况,有两种选择:

1)只需将第二个参数设为向量。以下工作正常:

    np.dot(M[:,0], np.ones(R))

2)如果您想要矩阵等矩阵运算,请使用类matrix代替ndarray。所有*矩阵都被强制为二维数组,并且运算符执行矩阵乘法而不是按元素进行乘法(因此您不需要点)。以我的经验,这是值得解决的麻烦,但是如果您习惯使用matlab可能会很好。

The difference between (R,) and (1,R) is literally the number of indices that you need to use. ones((1,R)) is a 2-D array that happens to have only one row. ones(R) is a vector. Generally if it doesn’t make sense for the variable to have more than one row/column, you should be using a vector, not a matrix with a singleton dimension.

For your specific case, there are a couple of options:

1) Just make the second argument a vector. The following works fine:

    np.dot(M[:,0], np.ones(R))

2) If you want matlab like matrix operations, use the class matrix instead of ndarray. All matricies are forced into being 2-D arrays, and operator * does matrix multiplication instead of element-wise (so you don’t need dot). In my experience, this is more trouble that it is worth, but it may be nice if you are used to matlab.


回答 2

形状是一个元组。如果只有一维,则形状将是一个数字,并且逗号后仅是空白。对于2维以上的尺寸,所有逗号后面都会有一个数字。

# 1 dimension with 2 elements, shape = (2,). 
# Note there's nothing after the comma.
z=np.array([  # start dimension
    10,       # not a dimension
    20        # not a dimension
])            # end dimension
print(z.shape)

(2,)

# 2 dimensions, each with 1 element, shape = (2,1)
w=np.array([  # start outer dimension 
    [10],     # element is in an inner dimension
    [20]      # element is in an inner dimension
])            # end outer dimension
print(w.shape)

(2,1)

The shape is a tuple. If there is only 1 dimension the shape will be one number and just blank after a comma. For 2+ dimensions, there will be a number after all the commas.

# 1 dimension with 2 elements, shape = (2,). 
# Note there's nothing after the comma.
z=np.array([  # start dimension
    10,       # not a dimension
    20        # not a dimension
])            # end dimension
print(z.shape)

(2,)

# 2 dimensions, each with 1 element, shape = (2,1)
w=np.array([  # start outer dimension 
    [10],     # element is in an inner dimension
    [20]      # element is in an inner dimension
])            # end outer dimension
print(w.shape)

(2,1)


回答 3

对于其基本数组类,2d数组不比1d或3d数组更特殊。有一些操作可以保留尺寸,一些可以减小尺寸,其他可以组合甚至扩展尺寸。

M=np.arange(9).reshape(3,3)
M[:,0].shape # (3,) selects one column, returns a 1d array
M[0,:].shape # same, one row, 1d array
M[:,[0]].shape # (3,1), index with a list (or array), returns 2d
M[:,[0,1]].shape # (3,2)

In [20]: np.dot(M[:,0].reshape(3,1),np.ones((1,3)))

Out[20]: 
array([[ 0.,  0.,  0.],
       [ 3.,  3.,  3.],
       [ 6.,  6.,  6.]])

In [21]: np.dot(M[:,[0]],np.ones((1,3)))
Out[21]: 
array([[ 0.,  0.,  0.],
       [ 3.,  3.,  3.],
       [ 6.,  6.,  6.]])

其他给出相同数组的表达式

np.dot(M[:,0][:,np.newaxis],np.ones((1,3)))
np.dot(np.atleast_2d(M[:,0]).T,np.ones((1,3)))
np.einsum('i,j',M[:,0],np.ones((3)))
M1=M[:,0]; R=np.ones((3)); np.dot(M1[:,None], R[None,:])

MATLAB最初只是2D阵列。较新的版本允许更大的尺寸,但保留2的下限。但是,您仍然必须注意行矩阵和列1之间的差异,即形状为(1,3)v的列(3,1)。你多久写一次[1,2,3].'?我将要编写row vectorcolumn vector,但是受2d约束,MATLAB中没有任何矢量-至少从矢量的数学意义上讲不是1d。

您是否看过np.atleast_2d(还有_1d和_3d版本)?

For its base array class, 2d arrays are no more special than 1d or 3d ones. There are some operations the preserve the dimensions, some that reduce them, other combine or even expand them.

M=np.arange(9).reshape(3,3)
M[:,0].shape # (3,) selects one column, returns a 1d array
M[0,:].shape # same, one row, 1d array
M[:,[0]].shape # (3,1), index with a list (or array), returns 2d
M[:,[0,1]].shape # (3,2)

In [20]: np.dot(M[:,0].reshape(3,1),np.ones((1,3)))

Out[20]: 
array([[ 0.,  0.,  0.],
       [ 3.,  3.,  3.],
       [ 6.,  6.,  6.]])

In [21]: np.dot(M[:,[0]],np.ones((1,3)))
Out[21]: 
array([[ 0.,  0.,  0.],
       [ 3.,  3.,  3.],
       [ 6.,  6.,  6.]])

Other expressions that give the same array

np.dot(M[:,0][:,np.newaxis],np.ones((1,3)))
np.dot(np.atleast_2d(M[:,0]).T,np.ones((1,3)))
np.einsum('i,j',M[:,0],np.ones((3)))
M1=M[:,0]; R=np.ones((3)); np.dot(M1[:,None], R[None,:])

MATLAB started out with just 2D arrays. Newer versions allow more dimensions, but retain the lower bound of 2. But you still have to pay attention to the difference between a row matrix and column one, one with shape (1,3) v (3,1). How often have you written [1,2,3].'? I was going to write row vector and column vector, but with that 2d constraint, there aren’t any vectors in MATLAB – at least not in the mathematical sense of vector as being 1d.

Have you looked at np.atleast_2d (also _1d and _3d versions)?


回答 4

1)不喜欢的形状的原因(R, 1)(R,)在于,它不必要地复杂的事情。此外,为什么最好在(R, 1)长度R向量上默认使用形状而不是(1, R)?当您需要其他尺寸时,最好使其简单明了。

2)以您的示例为例,您正在计算外部产品,因此可以reshape通过使用np.outer以下命令来执行此操作而无需调用

np.outer(M[:,0], numpy.ones((1, R)))

1) The reason not to prefer a shape of (R, 1) over (R,) is that it unnecessarily complicates things. Besides, why would it be preferable to have shape (R, 1) by default for a length-R vector instead of (1, R)? It’s better to keep it simple and be explicit when you require additional dimensions.

2) For your example, you are computing an outer product so you can do this without a reshape call by using np.outer:

np.outer(M[:,0], numpy.ones((1, R)))

回答 5

这里已经有很多好的答案。但是对我来说,很难找到一些例子,其中形状或数组会破坏所有程序。

所以这是一个:

import numpy as np
a = np.array([1,2,3,4])
b = np.array([10,20,30,40])


from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(a,b)

这将因错误而失败:

ValueError:预期的2D数组,取而代之的是1D数组

但是如果我们添加reshapea

a = np.array([1,2,3,4]).reshape(-1,1)

这正常工作!

There are a lot of good answers here already. But for me it was hard to find some example, where the shape or array can break all the program.

So here is the one:

import numpy as np
a = np.array([1,2,3,4])
b = np.array([10,20,30,40])


from sklearn.linear_model import LinearRegression
regr = LinearRegression()
regr.fit(a,b)

This will fail with error:

ValueError: Expected 2D array, got 1D array instead

but if we add reshape to a:

a = np.array([1,2,3,4]).reshape(-1,1)

this works correctly!


Python / NumPy中的meshgrid的用途是什么?

问题:Python / NumPy中的meshgrid的用途是什么?

有人可以向我解释meshgridNumpy 中功能的目的是什么?我知道它会为绘图创建某种坐标网格,但是我真的看不到它的直接好处。

我正在研究Sebastian Raschka的“ Python机器学习”,他正在使用它来绘制决策边界。请参阅此处的输入11 。

我也从官方文档中尝试过此代码,但是再次,输出对我来说真的没有意义。

x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)
xx, yy = np.meshgrid(x, y, sparse=True)
z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
h = plt.contourf(x,y,z)

请,如果可能的话,还请给我展示很多真实的例子。

Can someone explain to me what is the purpose of meshgrid function in Numpy? I know it creates some kind of grid of coordinates for plotting, but I can’t really see the direct benefit of it.

I am studying “Python Machine Learning” from Sebastian Raschka, and he is using it for plotting the decision borders. See input 11 here.

I have also tried this code from official documentation, but, again, the output doesn’t really make sense to me.

x = np.arange(-5, 5, 1)
y = np.arange(-5, 5, 1)
xx, yy = np.meshgrid(x, y, sparse=True)
z = np.sin(xx**2 + yy**2) / (xx**2 + yy**2)
h = plt.contourf(x,y,z)

Please, if possible, also show me a lot of real-world examples.


回答 0

目的meshgrid是根据x值数组和y值数组创建矩形网格。

因此,例如,如果我们要创建一个网格,在x和y方向上每个介于0和4之间的整数值处都有一个点。要创建矩形网格,我们需要xy点的每个组合。

这将是25分,对吧?因此,如果我们想为所有这些点创建一个x和y数组,则可以执行以下操作。

x[0,0] = 0    y[0,0] = 0
x[0,1] = 1    y[0,1] = 0
x[0,2] = 2    y[0,2] = 0
x[0,3] = 3    y[0,3] = 0
x[0,4] = 4    y[0,4] = 0
x[1,0] = 0    y[1,0] = 1
x[1,1] = 1    y[1,1] = 1
...
x[4,3] = 3    y[4,3] = 4
x[4,4] = 4    y[4,4] = 4

这将导致以下xy矩阵,使得每个矩阵中对应元素的配对给出网格中一个点的x和y坐标。

x =   0 1 2 3 4        y =   0 0 0 0 0
      0 1 2 3 4              1 1 1 1 1
      0 1 2 3 4              2 2 2 2 2
      0 1 2 3 4              3 3 3 3 3
      0 1 2 3 4              4 4 4 4 4

然后,我们可以绘制这些图形以验证它们是否为网格:

plt.plot(x,y, marker='.', color='k', linestyle='none')

在此处输入图片说明

显然,这特别繁琐,尤其对于x和的范围y。相反,meshgrid实际上可以为我们生成此代码:我们只需指定唯一值xy值即可。

xvalues = np.array([0, 1, 2, 3, 4]);
yvalues = np.array([0, 1, 2, 3, 4]);

现在,当我们调用时meshgrid,我们将自动获得先前的输出。

xx, yy = np.meshgrid(xvalues, yvalues)

plt.plot(xx, yy, marker='.', color='k', linestyle='none')

在此处输入图片说明

这些矩形网格的创建对于许多任务很有用。在您的帖子中提供的示例中,这只是一种sin(x**2 + y**2) / (x**2 + y**2)x和的值范围内对函数()进行采样的方法y

由于此函数已在矩形网格上采样,因此现在可以将其可视化为“图像”。

在此处输入图片说明

此外,现在可以将结果传递给期望在矩形网格上获得数据的函数(例如contourf

The purpose of meshgrid is to create a rectangular grid out of an array of x values and an array of y values.

So, for example, if we want to create a grid where we have a point at each integer value between 0 and 4 in both the x and y directions. To create a rectangular grid, we need every combination of the x and y points.

This is going to be 25 points, right? So if we wanted to create an x and y array for all of these points, we could do the following.

x[0,0] = 0    y[0,0] = 0
x[0,1] = 1    y[0,1] = 0
x[0,2] = 2    y[0,2] = 0
x[0,3] = 3    y[0,3] = 0
x[0,4] = 4    y[0,4] = 0
x[1,0] = 0    y[1,0] = 1
x[1,1] = 1    y[1,1] = 1
...
x[4,3] = 3    y[4,3] = 4
x[4,4] = 4    y[4,4] = 4

This would result in the following x and y matrices, such that the pairing of the corresponding element in each matrix gives the x and y coordinates of a point in the grid.

x =   0 1 2 3 4        y =   0 0 0 0 0
      0 1 2 3 4              1 1 1 1 1
      0 1 2 3 4              2 2 2 2 2
      0 1 2 3 4              3 3 3 3 3
      0 1 2 3 4              4 4 4 4 4

We can then plot these to verify that they are a grid:

plt.plot(x,y, marker='.', color='k', linestyle='none')

enter image description here

Obviously, this gets very tedious especially for large ranges of x and y. Instead, meshgrid can actually generate this for us: all we have to specify are the unique x and y values.

xvalues = np.array([0, 1, 2, 3, 4]);
yvalues = np.array([0, 1, 2, 3, 4]);

Now, when we call meshgrid, we get the previous output automatically.

xx, yy = np.meshgrid(xvalues, yvalues)

plt.plot(xx, yy, marker='.', color='k', linestyle='none')

enter image description here

Creation of these rectangular grids is useful for a number of tasks. In the example that you have provided in your post, it is simply a way to sample a function (sin(x**2 + y**2) / (x**2 + y**2)) over a range of values for x and y.

Because this function has been sampled on a rectangular grid, the function can now be visualized as an “image”.

enter image description here

Additionally, the result can now be passed to functions which expect data on rectangular grid (i.e. contourf)


回答 1

由Microsoft Excel提供: 

在此处输入图片说明

Courtesy of Microsoft Excel: 

enter image description here


回答 2

实际上np.meshgrid,文档中已经提到的目的:

np.meshgrid

从坐标向量返回坐标矩阵。

给定一维坐标数组x1,x2,…,xn,制作ND坐标数组以对ND网格上的ND标量/矢量场进行矢量化评估。

因此,其主要目的是创建坐标矩阵。

您可能只是问自己:

为什么我们需要创建坐标矩阵?

使用Python / NumPy需要坐标矩阵的原因是,从坐标到值没有直接关系,除非坐标从零开始并且是纯正整数。然后,您可以只使用数组的索引作为索引。但是,如果不是这种情况,则您需要以某种方式将坐标存储在数据旁边。那就是网格进来的地方。

假设您的数据是:

1  2  1
2  5  2
1  2  1

但是,每个值代表一个水平2公里,垂直3公里的区域。假设您的原点是左上角,并且您想要一个表示可以使用的距离的数组:

import numpy as np
h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)

其中v是:

array([[0, 0, 0],
       [2, 2, 2],
       [4, 4, 4]])

和h:

array([[0, 3, 6],
       [0, 3, 6],
       [0, 3, 6]])

所以,如果你有两个指标,比方说xy(这就是为什么的返回值meshgrid通常是xxxs,而不是x在这种情况下,我选择h了水平!),那么你可以得到该点的x坐标,在Y点和坐标通过使用以下方法在那时的价值:

h[x, y]    # horizontal coordinate
v[x, y]    # vertical coordinate
data[x, y]  # value

这样可以更轻松地跟踪坐标(更重要的是)您可以将其传递给需要知道坐标的函数。

稍长的解释

但是,np.meshgrid本身并不经常直接使用,大多数人只是使用类似对象np.mgrid或中的一种np.ogrid。这里np.mgrid代表sparse=Falsenp.ogridsparse=True情况(我指的是的sparse参数np.meshgrid)。请注意,np.meshgridnp.ogrid和之间有显着差异 np.mgrid:返回的前两个值(如果有两个或多个)将颠倒。通常这无关紧要,但是您应该根据上下文提供有意义的变量名称。

例如,在2D网格的情况下,matplotlib.pyplot.imshow命名第一个返回的项np.meshgrid x和第二个返回项是有意义的,y而对于np.mgrid和则相反np.ogrid

np.ogrid 和稀疏的网格

>>> import numpy as np
>>> yy, xx = np.ogrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

正如已经说过的,与相比np.meshgrid,输出是相反的,这就是为什么我解压缩yy, xx而不是xx, yy

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6), sparse=True)
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

这已经看起来像座标,特别是2D图的x和y线。

可视化:

yy, xx = np.ogrid[-5:6, -5:6]
plt.figure()
plt.title('ogrid (sparse meshgrid)')
plt.grid()
plt.xticks(xx.ravel())
plt.yticks(yy.ravel())
plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*")
plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")

在此处输入图片说明

np.mgrid 和密集/充实的网格

>>> yy, xx = np.mgrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])

此处同样适用:与相比,输出反转np.meshgrid

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6))
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])

ogrid这些数组不同的是,它们在-5 <= xx <= 5中包含所有 xxyy坐标;-5 <= yy <= 5格。

yy, xx = np.mgrid[-5:6, -5:6]
plt.figure()
plt.title('mgrid (dense meshgrid)')
plt.grid()
plt.xticks(xx[0])
plt.yticks(yy[:, 0])
plt.scatter(xx, yy, color="red", marker="x")

在此处输入图片说明

功能性

它不仅限于二维,而且这些函数适用于任意尺寸(嗯,Python中为函数提供了最大数量的参数,而NumPy允许最大数量的尺寸):

>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6]
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
x1
array([[[[0]]],


       [[[1]]],


       [[[2]]]])
x2
array([[[[1]],

        [[2]],

        [[3]]]])
x3
array([[[[2],
         [3],
         [4]]]])
x4
array([[[[3, 4, 5]]]])

>>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking
>>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True)
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
# Identical output so it's omitted here.

即使这些也适用于一维,也有两个(更为常见的)一维网格创建功能:

除了startand stop参数外,它还支持step参数(即使是代表步骤数的复杂步骤):

>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j]
>>> x1  # The dimension with the explicit step width of 2
array([[1., 1., 1., 1.],
       [3., 3., 3., 3.],
       [5., 5., 5., 5.],
       [7., 7., 7., 7.],
       [9., 9., 9., 9.]])
>>> x2  # The dimension with the "number of steps"
array([[ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.]])

应用领域

您专门询问了目的,实际上,如果需要坐标系,这些网格非常有用。

例如,如果您有一个NumPy函数,它可以在两个维度上计算距离:

def distance_2d(x_point, y_point, x, y):
    return np.hypot(x-x_point, y-y_point)

您想知道每个点的距离:

>>> ys, xs = np.ogrid[-5:5, -5:5]
>>> distances = distance_2d(1, 2, xs, ys)  # distance to point (1, 2)
>>> distances
array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989,
        7.07106781, 7.        , 7.07106781, 7.28010989, 7.61577311],
       [8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532,
        6.08276253, 6.        , 6.08276253, 6.32455532, 6.70820393],
       [7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481,
        5.09901951, 5.        , 5.09901951, 5.38516481, 5.83095189],
       [7.21110255, 6.40312424, 5.65685425, 5.        , 4.47213595,
        4.12310563, 4.        , 4.12310563, 4.47213595, 5.        ],
       [6.70820393, 5.83095189, 5.        , 4.24264069, 3.60555128,
        3.16227766, 3.        , 3.16227766, 3.60555128, 4.24264069],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.        , 5.        , 4.        , 3.        , 2.        ,
        1.        , 0.        , 1.        , 2.        , 3.        ],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128]])

如果通过密集网格而不是开放网格,则输出将是相同的。NumPys广播使之成为可能!

让我们可视化结果:

plt.figure()
plt.title('distance to point (1, 2)')
plt.imshow(distances, origin='lower', interpolation="none")
plt.xticks(np.arange(xs.shape[1]), xs.ravel())  # need to set the ticks manually
plt.yticks(np.arange(ys.shape[0]), ys.ravel())
plt.colorbar()

在此处输入图片说明

这也是当NumPys mgridogrid变得非常方便,因为它可以让你轻松更改网格的分辨率:

ys, xs = np.ogrid[-5:5:200j, -5:5:200j]
# otherwise same code as above

在此处输入图片说明

但是,由于imshow不支持xy输入,因此必须手动更改报价。如果接受x和,这将非常方便。y坐标,对吗?

使用NumPy编写自然处理网格的函数很容易。此外,NumPy,SciPy,matplotlib中有几个函数希望您传递到网格中。

我喜欢图片,因此让我们来探索一下matplotlib.pyplot.contour

ys, xs = np.mgrid[-5:5:200j, -5:5:200j]
density = np.sin(ys)-np.cos(xs)
plt.figure()
plt.contour(xs, ys, density)

在此处输入图片说明

注意如何正确设置坐标!如果您只是传入,则不会是这种情况density

或举另一个使用天体模型的有趣示例(这次我不太在乎坐标,我只是使用它们来创建一些网格):

from astropy.modeling import models
z = np.zeros((100, 100))
y, x = np.mgrid[0:100, 0:100]
for _ in range(10):
    g2d = models.Gaussian2D(amplitude=100, 
                           x_mean=np.random.randint(0, 100), 
                           y_mean=np.random.randint(0, 100), 
                           x_stddev=3, 
                           y_stddev=3)
    z += g2d(x, y)
    a2d = models.AiryDisk2D(amplitude=70, 
                            x_0=np.random.randint(0, 100), 
                            y_0=np.random.randint(0, 100), 
                            radius=5)
    z += a2d(x, y)

在此处输入图片说明

尽管那只是为了“外观”,但与Scipy中的功能模型和拟合(例如scipy.interpolate.interp2dscipy.interpolate.griddata甚至显示示例使用np.mgrid)相关的一些功能仍需要网格。其中大多数都适用于开放式网格和密集型网格,但是有些仅适用于其中之一。

Actually the purpose of np.meshgrid is already mentioned in the documentation:

np.meshgrid

Return coordinate matrices from coordinate vectors.

Make N-D coordinate arrays for vectorized evaluations of N-D scalar/vector fields over N-D grids, given one-dimensional coordinate arrays x1, x2,…, xn.

So it’s primary purpose is to create a coordinates matrices.

You probably just asked yourself:

Why do we need to create coordinate matrices?

The reason you need coordinate matrices with Python/NumPy is that there is no direct relation from coordinates to values, except when your coordinates start with zero and are purely positive integers. Then you can just use the indices of an array as the index. However when that’s not the case you somehow need to store coordinates alongside your data. That’s where grids come in.

Suppose your data is:

1  2  1
2  5  2
1  2  1

However, each value represents a 2 kilometers wide region horizontally and 3 kilometers vertically. Suppose your origin is the upper left corner and you want arrays that represent the distance you could use:

import numpy as np
h, v = np.meshgrid(np.arange(3)*3, np.arange(3)*2)

where v is:

array([[0, 0, 0],
       [2, 2, 2],
       [4, 4, 4]])

and h:

array([[0, 3, 6],
       [0, 3, 6],
       [0, 3, 6]])

So if you have two indices, let’s say x and y (that’s why the return value of meshgrid is usually xx or xs instead of x in this case I chose h for horizontally!) then you can get the x coordinate of the point, the y coordinate of the point and the value at that point by using:

h[x, y]    # horizontal coordinate
v[x, y]    # vertical coordinate
data[x, y]  # value

That makes it much easier to keep track of coordinates and (even more importantly) you can pass them to functions that need to know the coordinates.

A slightly longer explanation

However, np.meshgrid itself isn’t often used directly, mostly one just uses one of similar objects np.mgrid or np.ogrid. Here np.mgrid represents the sparse=False and np.ogrid the sparse=True case (I refer to the sparse argument of np.meshgrid). Note that there is a significant difference between np.meshgrid and np.ogrid and np.mgrid: The first two returned values (if there are two or more) are reversed. Often this doesn’t matter but you should give meaningful variable names depending on the context.

For example, in case of a 2D grid and matplotlib.pyplot.imshow it makes sense to name the first returned item of np.meshgrid x and the second one y while it’s the other way around for np.mgrid and np.ogrid.

np.ogrid and sparse grids

>>> import numpy as np
>>> yy, xx = np.ogrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])
       

As already said the output is reversed when compared to np.meshgrid, that’s why I unpacked it as yy, xx instead of xx, yy:

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6), sparse=True)
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5],
       [-4],
       [-3],
       [-2],
       [-1],
       [ 0],
       [ 1],
       [ 2],
       [ 3],
       [ 4],
       [ 5]])

This already looks like coordinates, specifically the x and y lines for 2D plots.

Visualized:

yy, xx = np.ogrid[-5:6, -5:6]
plt.figure()
plt.title('ogrid (sparse meshgrid)')
plt.grid()
plt.xticks(xx.ravel())
plt.yticks(yy.ravel())
plt.scatter(xx, np.zeros_like(xx), color="blue", marker="*")
plt.scatter(np.zeros_like(yy), yy, color="red", marker="x")

enter image description here

np.mgrid and dense/fleshed out grids

>>> yy, xx = np.mgrid[-5:6, -5:6]
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])
       

The same applies here: The output is reversed compared to np.meshgrid:

>>> xx, yy = np.meshgrid(np.arange(-5, 6), np.arange(-5, 6))
>>> xx
array([[-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5],
       [-5, -4, -3, -2, -1,  0,  1,  2,  3,  4,  5]])
>>> yy
array([[-5, -5, -5, -5, -5, -5, -5, -5, -5, -5, -5],
       [-4, -4, -4, -4, -4, -4, -4, -4, -4, -4, -4],
       [-3, -3, -3, -3, -3, -3, -3, -3, -3, -3, -3],
       [-2, -2, -2, -2, -2, -2, -2, -2, -2, -2, -2],
       [-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1],
       [ 0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0],
       [ 1,  1,  1,  1,  1,  1,  1,  1,  1,  1,  1],
       [ 2,  2,  2,  2,  2,  2,  2,  2,  2,  2,  2],
       [ 3,  3,  3,  3,  3,  3,  3,  3,  3,  3,  3],
       [ 4,  4,  4,  4,  4,  4,  4,  4,  4,  4,  4],
       [ 5,  5,  5,  5,  5,  5,  5,  5,  5,  5,  5]])
       

Unlike ogrid these arrays contain all xx and yy coordinates in the -5 <= xx <= 5; -5 <= yy <= 5 grid.

yy, xx = np.mgrid[-5:6, -5:6]
plt.figure()
plt.title('mgrid (dense meshgrid)')
plt.grid()
plt.xticks(xx[0])
plt.yticks(yy[:, 0])
plt.scatter(xx, yy, color="red", marker="x")

enter image description here

Functionality

It’s not only limited to 2D, these functions work for arbitrary dimensions (well, there is a maximum number of arguments given to function in Python and a maximum number of dimensions that NumPy allows):

>>> x1, x2, x3, x4 = np.ogrid[:3, 1:4, 2:5, 3:6]
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
x1
array([[[[0]]],


       [[[1]]],


       [[[2]]]])
x2
array([[[[1]],

        [[2]],

        [[3]]]])
x3
array([[[[2],
         [3],
         [4]]]])
x4
array([[[[3, 4, 5]]]])

>>> # equivalent meshgrid output, note how the first two arguments are reversed and the unpacking
>>> x2, x1, x3, x4 = np.meshgrid(np.arange(1,4), np.arange(3), np.arange(2, 5), np.arange(3, 6), sparse=True)
>>> for i, x in enumerate([x1, x2, x3, x4]):
...     print('x{}'.format(i+1))
...     print(repr(x))
# Identical output so it's omitted here.

Even if these also work for 1D there are two (much more common) 1D grid creation functions:

Besides the start and stop argument it also supports the step argument (even complex steps that represent the number of steps):

>>> x1, x2 = np.mgrid[1:10:2, 1:10:4j]
>>> x1  # The dimension with the explicit step width of 2
array([[1., 1., 1., 1.],
       [3., 3., 3., 3.],
       [5., 5., 5., 5.],
       [7., 7., 7., 7.],
       [9., 9., 9., 9.]])
>>> x2  # The dimension with the "number of steps"
array([[ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.],
       [ 1.,  4.,  7., 10.]])
       

Applications

You specifically asked about the purpose and in fact, these grids are extremely useful if you need a coordinate system.

For example if you have a NumPy function that calculates the distance in two dimensions:

def distance_2d(x_point, y_point, x, y):
    return np.hypot(x-x_point, y-y_point)
    

And you want to know the distance of each point:

>>> ys, xs = np.ogrid[-5:5, -5:5]
>>> distances = distance_2d(1, 2, xs, ys)  # distance to point (1, 2)
>>> distances
array([[9.21954446, 8.60232527, 8.06225775, 7.61577311, 7.28010989,
        7.07106781, 7.        , 7.07106781, 7.28010989, 7.61577311],
       [8.48528137, 7.81024968, 7.21110255, 6.70820393, 6.32455532,
        6.08276253, 6.        , 6.08276253, 6.32455532, 6.70820393],
       [7.81024968, 7.07106781, 6.40312424, 5.83095189, 5.38516481,
        5.09901951, 5.        , 5.09901951, 5.38516481, 5.83095189],
       [7.21110255, 6.40312424, 5.65685425, 5.        , 4.47213595,
        4.12310563, 4.        , 4.12310563, 4.47213595, 5.        ],
       [6.70820393, 5.83095189, 5.        , 4.24264069, 3.60555128,
        3.16227766, 3.        , 3.16227766, 3.60555128, 4.24264069],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.        , 5.        , 4.        , 3.        , 2.        ,
        1.        , 0.        , 1.        , 2.        , 3.        ],
       [6.08276253, 5.09901951, 4.12310563, 3.16227766, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.23606798, 3.16227766],
       [6.32455532, 5.38516481, 4.47213595, 3.60555128, 2.82842712,
        2.23606798, 2.        , 2.23606798, 2.82842712, 3.60555128]])
        

The output would be identical if one passed in a dense grid instead of an open grid. NumPys broadcasting makes it possible!

Let’s visualize the result:

plt.figure()
plt.title('distance to point (1, 2)')
plt.imshow(distances, origin='lower', interpolation="none")
plt.xticks(np.arange(xs.shape[1]), xs.ravel())  # need to set the ticks manually
plt.yticks(np.arange(ys.shape[0]), ys.ravel())
plt.colorbar()

enter image description here

And this is also when NumPys mgrid and ogrid become very convenient because it allows you to easily change the resolution of your grids:

ys, xs = np.ogrid[-5:5:200j, -5:5:200j]
# otherwise same code as above

enter image description here

However, since imshow doesn’t support x and y inputs one has to change the ticks by hand. It would be really convenient if it would accept the x and y coordinates, right?

It’s easy to write functions with NumPy that deal naturally with grids. Furthermore, there are several functions in NumPy, SciPy, matplotlib that expect you to pass in the grid.

I like images so let’s explore matplotlib.pyplot.contour:

ys, xs = np.mgrid[-5:5:200j, -5:5:200j]
density = np.sin(ys)-np.cos(xs)
plt.figure()
plt.contour(xs, ys, density)

enter image description here

Note how the coordinates are already correctly set! That wouldn’t be the case if you just passed in the density.

Or to give another fun example using astropy models (this time I don’t care much about the coordinates, I just use them to create some grid):

from astropy.modeling import models
z = np.zeros((100, 100))
y, x = np.mgrid[0:100, 0:100]
for _ in range(10):
    g2d = models.Gaussian2D(amplitude=100, 
                           x_mean=np.random.randint(0, 100), 
                           y_mean=np.random.randint(0, 100), 
                           x_stddev=3, 
                           y_stddev=3)
    z += g2d(x, y)
    a2d = models.AiryDisk2D(amplitude=70, 
                            x_0=np.random.randint(0, 100), 
                            y_0=np.random.randint(0, 100), 
                            radius=5)
    z += a2d(x, y)
    

enter image description here

Although that’s just “for the looks” several functions related to functional models and fitting (for example scipy.interpolate.interp2d, scipy.interpolate.griddata even show examples using np.mgrid) in Scipy, etc. require grids. Most of these work with open grids and dense grids, however some only work with one of them.


回答 3

假设您有一个功能:

def sinus2d(x, y):
    return np.sin(x) + np.sin(y)

例如,您想要查看0到2 * pi范围内的图像。你会怎么做?有np.meshgrid来自于:

xx, yy = np.meshgrid(np.linspace(0,2*np.pi,100), np.linspace(0,2*np.pi,100))
z = sinus2d(xx, yy) # Create the image on this grid

这样的情节看起来像:

import matplotlib.pyplot as plt
plt.imshow(z, origin='lower', interpolation='none')
plt.show()

在此处输入图片说明

所以np.meshgrid只是一个方便。原则上,可以通过以下方式完成此操作:

z2 = sinus2d(np.linspace(0,2*np.pi,100)[:,None], np.linspace(0,2*np.pi,100)[None,:])

但是您需要了解自己的尺寸(假设您有两个以上…)和正确的广播。np.meshgrid为您完成所有这一切。

此外,例如,如果您想进行插值但排除某些值,则meshgrid允许您将坐标与数据一起删除:

condition = z>0.6
z_new = z[condition] # This will make your array 1D

那么您现在将如何进行插值?你可以给xy一个插值函数,scipy.interpolate.interp2d因此您需要一种方法来知道删除了哪些坐标:

x_new = xx[condition]
y_new = yy[condition]

然后您仍然可以使用“正确的”坐标进行插值(在没有网状网格的情况下进行尝试,您将获得很多额外的代码):

from scipy.interpolate import interp2d
interpolated = interp2d(x_new, y_new, z_new)

并且原始网格物体允许您再次在原始网格物体上获得插值:

interpolated_grid = interpolated(xx[0], yy[:, 0]).reshape(xx.shape)

这些只是我使用过的一些示例,meshgrid可能还会更多。

Suppose you have a function:

def sinus2d(x, y):
    return np.sin(x) + np.sin(y)

and you want, for example, to see what it looks like in the range 0 to 2*pi. How would you do it? There np.meshgrid comes in:

xx, yy = np.meshgrid(np.linspace(0,2*np.pi,100), np.linspace(0,2*np.pi,100))
z = sinus2d(xx, yy) # Create the image on this grid

and such a plot would look like:

import matplotlib.pyplot as plt
plt.imshow(z, origin='lower', interpolation='none')
plt.show()

enter image description here

So np.meshgrid is just a convenience. In principle the same could be done by:

z2 = sinus2d(np.linspace(0,2*np.pi,100)[:,None], np.linspace(0,2*np.pi,100)[None,:])

but there you need to be aware of your dimensions (suppose you have more than two …) and the right broadcasting. np.meshgrid does all of this for you.

Also meshgrid allows you to delete coordinates together with the data if you, for example, want to do an interpolation but exclude certain values:

condition = z>0.6
z_new = z[condition] # This will make your array 1D

so how would you do the interpolation now? You can give x and y to an interpolation function like scipy.interpolate.interp2d so you need a way to know which coordinates were deleted:

x_new = xx[condition]
y_new = yy[condition]

and then you can still interpolate with the “right” coordinates (try it without the meshgrid and you will have a lot of extra code):

from scipy.interpolate import interp2d
interpolated = interp2d(x_new, y_new, z_new)

and the original meshgrid allows you to get the interpolation on the original grid again:

interpolated_grid = interpolated(xx[0], yy[:, 0]).reshape(xx.shape)

These are just some examples where I used the meshgrid there might be a lot more.


回答 4

meshgrid帮助从两个数组的所有成对点的两个一维数组中创建一个矩形网格。

x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 2, 3, 4])

现在,如果您定义了函数f(x,y),并且想将此函数应用于数组’x’和’y’的所有可能的点组合,则可以执行以下操作:

f(*np.meshgrid(x, y))

假设,如果您的函数仅产生两个元素的乘积,那么这就是可以有效地为大型数组获得笛卡尔积的方法。

这里引用

meshgrid helps in creating a rectangular grid from two 1-D arrays of all pairs of points from the two arrays.

x = np.array([0, 1, 2, 3, 4])
y = np.array([0, 1, 2, 3, 4])

Now, if you have defined a function f(x,y) and you wanna apply this function to all the possible combination of points from the arrays ‘x’ and ‘y’, then you can do this:

f(*np.meshgrid(x, y))

Say, if your function just produces the product of two elements, then this is how a cartesian product can be achieved, efficiently for large arrays.

Referred from here


回答 5

基本思想

给定可能的x值xs(将其视为图的x轴上的刻度线)和可能的y值ys,将meshgrid生成对应的(x,y)网格点集-与类似set((x, y) for x in xs for y in yx)。例如,如果xs=[1,2,3]ys=[4,5,6],我们将获得一组坐标{(1,4), (2,4), (3,4), (1,5), (2,5), (3,5), (1,6), (2,6), (3,6)}

返回值的形式

但是,meshgrid返回的表示形式在两个方面与上述表达式不同:

首先meshgrid在2d数组中布置网格点:行对应于不同的y值,列对应于不同的x值-如在中list(list((x, y) for x in xs) for y in ys),将得到以下数组:

   [[(1,4), (2,4), (3,4)],
    [(1,5), (2,5), (3,5)],
    [(1,6), (2,6), (3,6)]]

其次,分别meshgrid返回x和y坐标(即在两个不同的numpy 2d数组中):

   xcoords, ycoords = (
       array([[1, 2, 3],
              [1, 2, 3],
              [1, 2, 3]]),
       array([[4, 4, 4],
              [5, 5, 5],
              [6, 6, 6]]))
   # same thing using np.meshgrid:
   xcoords, ycoords = np.meshgrid([1,2,3], [4,5,6])
   # same thing without meshgrid:
   xcoords = np.array([xs] * len(ys)
   ycoords = np.array([ys] * len(xs)).T

注意,np.meshgrid也可以生成更大尺寸的网格。给定xs,ys和zs,您将把xcoords,ycoords,zcoords作为3d数组返回。meshgrid还支持维度的反向排序以及结果的稀疏表示。

应用领域

我们为什么要这种形式的输出?

在网格上的每个点上应用一个函数: 一种动机是像(+,-,*,/,**)这样的二进制运算符对于numpy数组作为元素操作进行了重载。这意味着,如果我有一个def f(x, y): return (x - y) ** 2可以在两个标量上使用的函数,那么我也可以将其应用于两个numpy数组以获取按元素排列的结果数组:例如f(xcoords, ycoords)f(*np.meshgrid(xs, ys))在上面的示例中给出以下内容:

array([[ 9,  4,  1],
       [16,  9,  4],
       [25, 16,  9]])

高维外部产品:我不确定这有多有效,但是您可以通过以下方式获得高维外部产品:np.prod(np.meshgrid([1,2,3], [1,2], [1,2,3,4]), axis=0)

在matplotlib云图:我遇到meshgrid调查时绘制等高线图与matplotlib用于绘制的决策边界。为此,使用生成一个网格meshgrid,在每个网格点评估函数(例如,如上所示),然后将xcoords,ycoords和计算出的f值(即zcoords)传递给contourf函数。

Basic Idea

Given possible x values, xs, (think of them as the tick-marks on the x-axis of a plot) and possible y values, ys, meshgrid generates the corresponding set of (x, y) grid points—analogous to set((x, y) for x in xs for y in yx). For example, if xs=[1,2,3] and ys=[4,5,6], we’d get the set of coordinates {(1,4), (2,4), (3,4), (1,5), (2,5), (3,5), (1,6), (2,6), (3,6)}.

Form of the Return Value

However, the representation that meshgrid returns is different from the above expression in two ways:

First, meshgrid lays out the grid points in a 2d array: rows correspond to different y-values, columns correspond to different x-values—as in list(list((x, y) for x in xs) for y in ys), which would give the following array:

   [[(1,4), (2,4), (3,4)],
    [(1,5), (2,5), (3,5)],
    [(1,6), (2,6), (3,6)]]

Second, meshgrid returns the x and y coordinates separately (i.e. in two different numpy 2d arrays):

   xcoords, ycoords = (
       array([[1, 2, 3],
              [1, 2, 3],
              [1, 2, 3]]),
       array([[4, 4, 4],
              [5, 5, 5],
              [6, 6, 6]]))
   # same thing using np.meshgrid:
   xcoords, ycoords = np.meshgrid([1,2,3], [4,5,6])
   # same thing without meshgrid:
   xcoords = np.array([xs] * len(ys)
   ycoords = np.array([ys] * len(xs)).T

Note, np.meshgrid can also generate grids for higher dimensions. Given xs, ys, and zs, you’d get back xcoords, ycoords, zcoords as 3d arrays. meshgrid also supports reverse ordering of the dimensions as well as sparse representation of the result.

Applications

Why would we want this form of output?

Apply a function at every point on a grid: One motivation is that binary operators like (+, -, *, /, **) are overloaded for numpy arrays as elementwise operations. This means that if I have a function def f(x, y): return (x - y) ** 2 that works on two scalars, I can also apply it on two numpy arrays to get an array of elementwise results: e.g. f(xcoords, ycoords) or f(*np.meshgrid(xs, ys)) gives the following on the above example:

array([[ 9,  4,  1],
       [16,  9,  4],
       [25, 16,  9]])

Higher dimensional outer product: I’m not sure how efficient this is, but you can get high-dimensional outer products this way: np.prod(np.meshgrid([1,2,3], [1,2], [1,2,3,4]), axis=0).

Contour plots in matplotlib: I came across meshgrid when investigating drawing contour plots with matplotlib for plotting decision boundaries. For this, you generate a grid with meshgrid, evaluate the function at each grid point (e.g. as shown above), and then pass the xcoords, ycoords, and computed f-values (i.e. zcoords) into the contourf function.


numpy中的flatten和ravel函数有什么区别?

问题:numpy中的flatten和ravel函数有什么区别?

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

这两个函数返回相同的列表。那么需要两个不同的功能来执行相同的工作。

import numpy as np
y = np.array(((1,2,3),(4,5,6),(7,8,9)))
OUTPUT:
print(y.flatten())
[1   2   3   4   5   6   7   8   9]
print(y.ravel())
[1   2   3   4   5   6   7   8   9]

Both function return the same list. Then what is the need of two different functions performing same job.


回答 0

当前的API是:

  • flatten 总是返回一个副本。
  • ravel尽可能返回原始数组的视图。这在打印输出中不可见,但是如果您修改ravel返回的数组,则可能会修改原始数组中的条目。如果您修改从flatten返回的数组中的条目,则将永远不会发生。ravel通常会更快,因为没有内存被复制,但是您在修改返回的数组时要格外小心。
  • reshape((-1,)) 只要数组的步幅允许,就可以得到一个视图,即使这意味着您并不总是可以获得连续的数组。

The current API is that:

  • flatten always returns a copy.
  • ravel returns a view of the original array whenever possible. This isn’t visible in the printed output, but if you modify the array returned by ravel, it may modify the entries in the original array. If you modify the entries in an array returned from flatten this will never happen. ravel will often be faster since no memory is copied, but you have to be more careful about modifying the array it returns.
  • reshape((-1,)) gets a view whenever the strides of the array allow it even if that means you don’t always get a contiguous array.

回答 1

如此所述,关键区别在于:

  • flatten 是ndarray对象的方法,因此只能用于真正的numpy数组。

  • ravel 是库级别的函数,因此可以在任何可以成功解析的对象上调用。

例如,ravel将对ndarray列表起作用,flatten而不适用于该类型的对象。

@IanH还在回答中指出了与内存处理的重要区别。

As explained here a key difference is that:

  • flatten is a method of an ndarray object and hence can only be called for true numpy arrays.

  • ravel is a library-level function and hence can be called on any object that can successfully be parsed.

For example ravel will work on a list of ndarrays, while flatten is not available for that type of object.

@IanH also points out important differences with memory handling in his answer.


回答 2

这是函数的正确命名空间:

这两个函数均返回指向新存储器结构的展平一维数组。

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

在上例中:

  • 结果的存储位置不同,
  • 结果看起来一样
  • 展平将返回副本
  • ravel将返回一个视图。

我们如何检查某物是否是副本?使用的.base属性ndarray。如果是视图,则基础将是原始数组;如果是副本,则基数为None

Here is the correct namespace for the functions:

Both functions return flattened 1D arrays pointing to the new memory structures.

import numpy
a = numpy.array([[1,2],[3,4]])

r = numpy.ravel(a)
f = numpy.ndarray.flatten(a)  

print(id(a))
print(id(r))
print(id(f))

print(r)
print(f)

print("\nbase r:", r.base)
print("\nbase f:", f.base)

---returns---
140541099429760
140541099471056
140541099473216

[1 2 3 4]
[1 2 3 4]

base r: [[1 2]
 [3 4]]

base f: None

In the upper example:

  • the memory locations of the results are different,
  • the results look the same
  • flatten would return a copy
  • ravel would return a view.

How we check if something is a copy? Using the .base attribute of the ndarray. If it’s a view, the base will be the original array; if it is a copy, the base will be None.


如何计算Python中ndarray中某些项目的出现?

问题:如何计算Python中ndarray中某些项目的出现?

在Python中,我有一个ndarray y 打印为array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

我试图计算这个数组中有多少个0和多少个1

但是当我输入y.count(0)or时y.count(1),它说

numpy.ndarray 对象没有属性 count

我该怎么办?

In Python, I have an ndarray y that is printed as array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

I’m trying to count how many 0s and how many 1s are there in this array.

But when I type y.count(0) or y.count(1), it says

numpy.ndarray object has no attribute count

What should I do?


回答 0

>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> unique, counts = numpy.unique(a, return_counts=True)
>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}

非numpy方式

使用collections.Counter;

>> import collections, numpy

>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> collections.Counter(a)
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})
>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> unique, counts = numpy.unique(a, return_counts=True)
>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}

Non-numpy way:

Use collections.Counter;

>> import collections, numpy

>>> a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
>>> collections.Counter(a)
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})

回答 1

那使用numpy.count_nonzero什么呢

>>> import numpy as np
>>> y = np.array([1, 2, 2, 2, 2, 0, 2, 3, 3, 3, 0, 0, 2, 2, 0])

>>> np.count_nonzero(y == 1)
1
>>> np.count_nonzero(y == 2)
7
>>> np.count_nonzero(y == 3)
3

What about using numpy.count_nonzero, something like

>>> import numpy as np
>>> y = np.array([1, 2, 2, 2, 2, 0, 2, 3, 3, 3, 0, 0, 2, 2, 0])

>>> np.count_nonzero(y == 1)
1
>>> np.count_nonzero(y == 2)
7
>>> np.count_nonzero(y == 3)
3

回答 2

就个人而言,我会去: (y == 0).sum()(y == 1).sum()

例如

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

Personally, I’d go for: (y == 0).sum() and (y == 1).sum()

E.g.

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

回答 3

对于您的情况,您还可以查看numpy.bincount

In [56]: a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

In [57]: np.bincount(a)
Out[57]: array([8, 4])  #count of zeros is at index 0 : 8
                        #count of ones is at index 1 : 4

For your case you could also look into numpy.bincount

In [56]: a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

In [57]: np.bincount(a)
Out[57]: array([8, 4])  #count of zeros is at index 0 : 8
                        #count of ones is at index 1 : 4

回答 4

将数组转换y为列表l,然后执行l.count(1)l.count(0)

>>> y = numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>> l = list(y)
>>> l.count(1)
4
>>> l.count(0)
8 

Convert your array y to list l and then do l.count(1) and l.count(0)

>>> y = numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>> l = list(y)
>>> l.count(1)
4
>>> l.count(0)
8 

回答 5

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

如果你知道,他们只是01

np.sum(y)

给你的数量。 np.sum(1-y)给出零。

为了稍微概括起见,如果要计数0而不是零(但可能是2或3):

np.count_nonzero(y)

给出非零的数量。

但是,如果您需要更复杂的东西,我认为numpy不会提供一个不错的count选择。在这种情况下,请转到集合:

import collections
collections.Counter(y)
> Counter({0: 8, 1: 4})

这就像一个字典

collections.Counter(y)[0]
> 8
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

If you know that they are just 0 and 1:

np.sum(y)

gives you the number of ones. np.sum(1-y) gives the zeroes.

For slight generality, if you want to count 0 and not zero (but possibly 2 or 3):

np.count_nonzero(y)

gives the number of nonzero.

But if you need something more complicated, I don’t think numpy will provide a nice count option. In that case, go to collections:

import collections
collections.Counter(y)
> Counter({0: 8, 1: 4})

This behaves like a dict

collections.Counter(y)[0]
> 8

回答 6

如果您确切知道要查找的号码,则可以使用以下代码;

lst = np.array([1,1,2,3,3,6,6,6,3,2,1])
(lst == 2).sum()

返回数组中发生2的次数。

If you know exactly which number you’re looking for, you can use the following;

lst = np.array([1,1,2,3,3,6,6,6,3,2,1])
(lst == 2).sum()

returns how many times 2 is occurred in your array.


回答 7

老实说,我发现将其转换为pandas系列或DataFrame最简单:

import pandas as pd
import numpy as np

df = pd.DataFrame({'data':np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])})
print df['data'].value_counts()

或罗伯特·穆伊(Robert Muil)提出的这一好话:

pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()

Honestly I find it easiest to convert to a pandas Series or DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'data':np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])})
print df['data'].value_counts()

Or this nice one-liner suggested by Robert Muil:

pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()

回答 8

没有人建议使用numpy.bincount(input, minlength)minlength = np.size(input),但它似乎是一个很好的解决方案,并且绝对是最快的

In [1]: choices = np.random.randint(0, 100, 10000)

In [2]: %timeit [ np.sum(choices == k) for k in range(min(choices), max(choices)+1) ]
100 loops, best of 3: 2.67 ms per loop

In [3]: %timeit np.unique(choices, return_counts=True)
1000 loops, best of 3: 388 µs per loop

In [4]: %timeit np.bincount(choices, minlength=np.size(choices))
100000 loops, best of 3: 16.3 µs per loop

numpy.unique(x, return_counts=True)和之间的疯狂加速numpy.bincount(x, minlength=np.max(x))

No one suggested to use numpy.bincount(input, minlength) with minlength = np.size(input), but it seems to be a good solution, and definitely the fastest:

In [1]: choices = np.random.randint(0, 100, 10000)

In [2]: %timeit [ np.sum(choices == k) for k in range(min(choices), max(choices)+1) ]
100 loops, best of 3: 2.67 ms per loop

In [3]: %timeit np.unique(choices, return_counts=True)
1000 loops, best of 3: 388 µs per loop

In [4]: %timeit np.bincount(choices, minlength=np.size(choices))
100000 loops, best of 3: 16.3 µs per loop

That’s a crazy speedup between numpy.unique(x, return_counts=True) and numpy.bincount(x, minlength=np.max(x)) !


回答 9

怎么样len(y[y==0])len(y[y==1])

What about len(y[y==0]) and len(y[y==1]) ?


回答 10

y.tolist().count(val)

使用val 0或1

由于python列表具有本机函数count,因此在使用该函数之前将其转换为list是一个简单的解决方案。

y.tolist().count(val)

with val 0 or 1

Since a python list has a native function count, converting to list before using that function is a simple solution.


回答 11

另一个简单的解决方案可能是使用numpy.count_nonzero()

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y_nonzero_num = np.count_nonzero(y==1)
y_zero_num = np.count_nonzero(y==0)
y_nonzero_num
4
y_zero_num
8

不要让名称误导您,如果您像示例中那样将其与布尔值一起使用,就可以解决问题。

Yet another simple solution might be to use numpy.count_nonzero():

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y_nonzero_num = np.count_nonzero(y==1)
y_zero_num = np.count_nonzero(y==0)
y_nonzero_num
4
y_zero_num
8

Don’t let the name mislead you, if you use it with the boolean just like in the example, it will do the trick.


回答 12

要计算出现次数,可以使用np.unique(array, return_counts=True)

In [75]: boo = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

# use bool value `True` or equivalently `1`
In [77]: uniq, cnts = np.unique(boo, return_counts=1)
In [81]: uniq
Out[81]: array([0, 1])   #unique elements in input array are: 0, 1

In [82]: cnts
Out[82]: array([8, 4])   # 0 occurs 8 times, 1 occurs 4 times

To count the number of occurrences, you can use np.unique(array, return_counts=True):

In [75]: boo = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

# use bool value `True` or equivalently `1`
In [77]: uniq, cnts = np.unique(boo, return_counts=1)
In [81]: uniq
Out[81]: array([0, 1])   #unique elements in input array are: 0, 1

In [82]: cnts
Out[82]: array([8, 4])   # 0 occurs 8 times, 1 occurs 4 times

回答 13

我会使用np.where:

how_many_0 = len(np.where(a==0.)[0])
how_many_1 = len(np.where(a==1.)[0])

I’d use np.where:

how_many_0 = len(np.where(a==0.)[0])
how_many_1 = len(np.where(a==1.)[0])

回答 14

利用系列提供的方法:

>>> import pandas as pd
>>> y = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
>>> pd.Series(y).value_counts()
0    8
1    4
dtype: int64

take advantage of the methods offered by a Series:

>>> import pandas as pd
>>> y = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
>>> pd.Series(y).value_counts()
0    8
1    4
dtype: int64

回答 15

一个简单的一般答案是:

numpy.sum(MyArray==x)   # sum of a binary list of the occurence of x (=0 or 1) in MyArray

这将导致完整的代码作为示例

import numpy
MyArray=numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])  # array we want to search in
x=0   # the value I want to count (can be iterator, in a list, etc.)
numpy.sum(MyArray==0)   # sum of a binary list of the occurence of x in MyArray

现在,如果MyArray具有多个维度,并且您要计算行中值分布的出现次数(此后为pattern)

MyArray=numpy.array([[6, 1],[4, 5],[0, 7],[5, 1],[2, 5],[1, 2],[3, 2],[0, 2],[2, 5],[5, 1],[3, 0]])
x=numpy.array([5,1])   # the value I want to count (can be iterator, in a list, etc.)
temp = numpy.ascontiguousarray(MyArray).view(numpy.dtype((numpy.void, MyArray.dtype.itemsize * MyArray.shape[1])))  # convert the 2d-array into an array of analyzable patterns
xt=numpy.ascontiguousarray(x).view(numpy.dtype((numpy.void, x.dtype.itemsize * x.shape[0])))  # convert what you search into one analyzable pattern
numpy.sum(temp==xt)  # count of the searched pattern in the list of patterns

A general and simple answer would be:

numpy.sum(MyArray==x)   # sum of a binary list of the occurence of x (=0 or 1) in MyArray

which would result into this full code as exemple

import numpy
MyArray=numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])  # array we want to search in
x=0   # the value I want to count (can be iterator, in a list, etc.)
numpy.sum(MyArray==0)   # sum of a binary list of the occurence of x in MyArray

Now if MyArray is in multiple dimensions and you want to count the occurence of a distribution of values in line (= pattern hereafter)

MyArray=numpy.array([[6, 1],[4, 5],[0, 7],[5, 1],[2, 5],[1, 2],[3, 2],[0, 2],[2, 5],[5, 1],[3, 0]])
x=numpy.array([5,1])   # the value I want to count (can be iterator, in a list, etc.)
temp = numpy.ascontiguousarray(MyArray).view(numpy.dtype((numpy.void, MyArray.dtype.itemsize * MyArray.shape[1])))  # convert the 2d-array into an array of analyzable patterns
xt=numpy.ascontiguousarray(x).view(numpy.dtype((numpy.void, x.dtype.itemsize * x.shape[0])))  # convert what you search into one analyzable pattern
numpy.sum(temp==xt)  # count of the searched pattern in the list of patterns

回答 16

您可以使用字典理解来创建整齐的单线。可以在这里找到有关字典理解的更多信息

>>>counts = {int(value): list(y).count(value) for value in set(y)}
>>>print(counts)
{0: 8, 1: 4}

这将创建一个字典,将ndarray中的值作为键,并将值的计数分别作为键的值。

每当您要计算此格式数组中某个值的出现次数时,此方法都将起作用。

You can use dictionary comprehension to create a neat one-liner. More about dictionary comprehension can be found here

>>>counts = {int(value): list(y).count(value) for value in set(y)}
>>>print(counts)
{0: 8, 1: 4}

This will create a dictionary with the values in your ndarray as keys, and the counts of the values as the values for the keys respectively.

This will work whenever you want to count occurences of a value in arrays of this format.


回答 17

尝试这个:

a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
list(a).count(1)

Try this:

a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
list(a).count(1)

回答 18

这可以通过以下方法轻松完成

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y.tolist().count(1)

This can be done easily in the following method

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y.tolist().count(1)

回答 19

由于您的ndarray仅包含0和1,因此您可以使用sum()获得1的出现,并使用len()-sum()获得0的出现。

num_of_ones = sum(array)
num_of_zeros = len(array)-sum(array)

Since your ndarray contains only 0 and 1, you can use sum() to get the occurrence of 1s and len()-sum() to get the occurrence of 0s.

num_of_ones = sum(array)
num_of_zeros = len(array)-sum(array)

回答 20

您有一个只有1和0的特殊数组。所以一个诀窍是使用

np.mean(x)

这将为您提供数组中1s的百分比。或者,使用

np.sum(x)
np.sum(1-x)

将为您提供数组中1和0的绝对数。

You have a special array with only 1 and 0 here. So a trick is to use

np.mean(x)

which gives you the percentage of 1s in your array. Alternatively, use

np.sum(x)
np.sum(1-x)

will give you the absolute number of 1 and 0 in your array.


回答 21

dict(zip(*numpy.unique(y, return_counts=True)))

刚刚在此处复制了Seppo Enarvi的评论,这应该是一个正确的答案

dict(zip(*numpy.unique(y, return_counts=True)))

Just copied Seppo Enarvi’s comment here which deserves to be a proper answer


回答 22

它涉及更多的步骤,但是对2d数组和更复杂的过滤器也适用的更灵活的解决方案是创建一个布尔掩码,然后在掩码上使用.sum()。

>>>>y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>>>mask = y == 0
>>>>mask.sum()
8

It involves one more step, but a more flexible solution which would also work for 2d arrays and more complicated filters is to create a boolean mask and then use .sum() on the mask.

>>>>y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>>>mask = y == 0
>>>>mask.sum()
8

回答 23

如果您不想使用numpy或collections模块,则可以使用字典:

d = dict()
a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
for item in a:
    try:
        d[item]+=1
    except KeyError:
        d[item]=1

结果:

>>>d
{0: 8, 1: 4}

当然,您也可以使用if / else语句。我认为Counter函数的功能几乎相同,但这更加透明。

If you don’t want to use numpy or a collections module you can use a dictionary:

d = dict()
a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
for item in a:
    try:
        d[item]+=1
    except KeyError:
        d[item]=1

result:

>>>d
{0: 8, 1: 4}

Of course you can also use an if/else statement. I think the Counter function does almost the same thing but this is more transparant.


回答 24

对于通用条目:

x = np.array([11, 2, 3, 5, 3, 2, 16, 10, 10, 3, 11, 4, 5, 16, 3, 11, 4])
n = {i:len([j for j in np.where(x==i)[0]]) for i in set(x)}
ix = {i:[j for j in np.where(x==i)[0]] for i in set(x)}

将输出一个计数:

{2: 2, 3: 4, 4: 2, 5: 2, 10: 2, 11: 3, 16: 2}

和索引:

{2: [1, 5],
3: [2, 4, 9, 14],
4: [11, 16],
5: [3, 12],
10: [7, 8],
11: [0, 10, 15],
16: [6, 13]}

For generic entries:

x = np.array([11, 2, 3, 5, 3, 2, 16, 10, 10, 3, 11, 4, 5, 16, 3, 11, 4])
n = {i:len([j for j in np.where(x==i)[0]]) for i in set(x)}
ix = {i:[j for j in np.where(x==i)[0]] for i in set(x)}

Will output a count:

{2: 2, 3: 4, 4: 2, 5: 2, 10: 2, 11: 3, 16: 2}

And indices:

{2: [1, 5],
3: [2, 4, 9, 14],
4: [11, 16],
5: [3, 12],
10: [7, 8],
11: [0, 10, 15],
16: [6, 13]}

回答 25

这里有一些东西,通过它您可以计算出特定数字的出现次数:根据您的代码

count_of_zero = list(y [y == 0])。count(0)

打印(count_of_zero)

//根据匹配项,将有布尔值,根据True值,将返回数字0

here I have something, through which you can count the number of occurrence of a particular number: according to your code

count_of_zero=list(y[y==0]).count(0)

print(count_of_zero)

// according to the match there will be boolean values and according to True value the number 0 will be return


回答 26

如果您对最快的执行感兴趣,那么您会事先知道要查找的值,并且您的数组是一维的,否则您对展平数组上的结果感兴趣(在这种情况下,函数的输入应是np.flatten(arr)不是只arr),然后Numba是你的朋友:

import numba as nb


@nb.jit
def count_nb(arr, value):
    result = 0
    for x in arr:
        if x == value:
            result += 1
    return result

或者,对于超大型阵列,并行化可能会有所帮助:

@nb.jit(parallel=True)
def count_nbp(arr, value):
    result = 0
    for i in nb.prange(arr.size):
        if arr[i] == value:
            result += 1
    return result

对这些基准进行基准测试np.count_nonzero()(也存在创建可以避免的临时数组的问题)和np.unique()基于-的解决方案

import numpy as np


def count_np(arr, value):
    return np.count_nonzero(arr == value)
import numpy as np


def count_np2(arr, value):
    uniques, counts = np.unique(a, return_counts=True)
    counter = dict(zip(uniques, counts))
    return counter[value] if value in counter else 0 

用于使用以下命令生成的输入:

def gen_input(n, a=0, b=100):
    return np.random.randint(a, b, n)

获得以下图(图的第二行是对更快方法的放大):

bm_full bm_zoom

表明基于Numba的解决方案比NumPy的解决方案明显更快,并且对于非常大的输入,并行方法比朴素的方法要快。


完整的代码在这里

If you are interested in the fastest execution, you know in advance which value(s) to look for, and your array is 1D, or you are otherwise interested in the result on the flattened array (in which case the input of the function should be np.flatten(arr) rather than just arr), then Numba is your friend:

import numba as nb


@nb.jit
def count_nb(arr, value):
    result = 0
    for x in arr:
        if x == value:
            result += 1
    return result

or, for very large arrays where parallelization may be beneficial:

@nb.jit(parallel=True)
def count_nbp(arr, value):
    result = 0
    for i in nb.prange(arr.size):
        if arr[i] == value:
            result += 1
    return result

Benchmarking these against np.count_nonzero() (which also has a problem of creating a temporary array which may be avoided) and np.unique()-based solution

import numpy as np


def count_np(arr, value):
    return np.count_nonzero(arr == value)
import numpy as np


def count_np2(arr, value):
    uniques, counts = np.unique(a, return_counts=True)
    counter = dict(zip(uniques, counts))
    return counter[value] if value in counter else 0 

for input generated with:

def gen_input(n, a=0, b=100):
    return np.random.randint(a, b, n)

the following plots are obtained (the second row of plots is a zoom on the faster approach):

bm_full bm_zoom

Showing that Numba-based solution are noticeably faster than the NumPy counterparts, and, for very large inputs, the parallel approach is faster than the naive one.


Full code available here.


回答 27

如果使用生成器处理非常大的数组,则可以选择。令人高兴的是,这种方法对数组和列表都适用,并且您不需要任何其他程序包。此外,您没有使用太多的内存。

my_array = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
sum(1 for val in my_array if val==0)
Out: 8

if you are dealing with very large arrays using generators could be an option. The nice thing here it that this approach works fine for both arrays and lists and you dont need any additional package. Additionally, you are not using that much memory.

my_array = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
sum(1 for val in my_array if val==0)
Out: 8

回答 28

Numpy为此提供了一个模块。只是一个小技巧。将您的输入数组作为垃圾箱。

numpy.histogram(y, bins=y)

输出是2个数组。一个带有值本身,另一个带有相应的频率。

Numpy has a module for this. Just a small hack. Put your input array as bins.

numpy.histogram(y, bins=y)

The output are 2 arrays. One with the values itself, other with the corresponding frequencies.


回答 29

using numpy.count

$ a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]

$ np.count(a, 1)
using numpy.count

$ a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]

$ np.count(a, 1)

如何从列表列表中制作平面列表?

问题:如何从列表列表中制作平面列表?

我想知道是否有捷径可以从Python的列表清单中做出一个简单的清单。

我可以for循环执行此操作,但是也许有一些很酷的“单线”功能?我尝试使用reduce(),但出现错误。

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

错误信息

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

I wonder whether there is a shortcut to make a simple list out of list of lists in Python.

I can do that in a for loop, but maybe there is some cool “one-liner”? I tried it with reduce(), but I get an error.

Code

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
reduce(lambda x, y: x.extend(y), l)

Error message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 1, in <lambda>
AttributeError: 'NoneType' object has no attribute 'extend'

回答 0

给定一个列表列表l

flat_list = [item for sublist in l for item in sublist]

意思是:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

比到目前为止发布的快捷方式快。(l是要展平的列表。)

这是相应的功能:

flatten = lambda l: [item for sublist in l for item in sublist]

作为证据,您可以使用timeit标准库中的模块:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

说明:基于快捷方式+(包括中的隐含使用sum)的必要性是O(L**2)当有L个子列表时-随着中间结果列表的长度越来越长,每一步都会分配一个新的中间结果列表对象,并且所有项目必须复制前一个中间结果中的结果(以及最后添加的一些新结果)。因此,为简单起见,而又不失去一般性,请说您有I个项目的L个子列表:第一个I项目来回复制L-1次,第二个I项目L-2次,依此类推;等等。总份数是I乘以x从1到L的x的总和,即I * (L**2)/2

列表理解只生成一次列表,然后将每个项目(从其原始居住地复制到结果列表)也恰好复制一次。

Given a list of lists l,

flat_list = [item for sublist in l for item in sublist]

which means:

flat_list = []
for sublist in l:
    for item in sublist:
        flat_list.append(item)

is faster than the shortcuts posted so far. (l is the list to flatten.)

Here is the corresponding function:

flatten = lambda l: [item for sublist in l for item in sublist]

As evidence, you can use the timeit module in the standard library:

$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' '[item for sublist in l for item in sublist]'
10000 loops, best of 3: 143 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'sum(l, [])'
1000 loops, best of 3: 969 usec per loop
$ python -mtimeit -s'l=[[1,2,3],[4,5,6], [7], [8,9]]*99' 'reduce(lambda x,y: x+y,l)'
1000 loops, best of 3: 1.1 msec per loop

Explanation: the shortcuts based on + (including the implied use in sum) are, of necessity, O(L**2) when there are L sublists — as the intermediate result list keeps getting longer, at each step a new intermediate result list object gets allocated, and all the items in the previous intermediate result must be copied over (as well as a few new ones added at the end). So, for simplicity and without actual loss of generality, say you have L sublists of I items each: the first I items are copied back and forth L-1 times, the second I items L-2 times, and so on; total number of copies is I times the sum of x for x from 1 to L excluded, i.e., I * (L**2)/2.

The list comprehension just generates one list, once, and copies each item over (from its original place of residence to the result list) also exactly once.


回答 1

您可以使用itertools.chain()

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

或者,您可以使用itertools.chain.from_iterable()不需要使用*运算符解压缩列表的方法:

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain.from_iterable(list2d))

You can use itertools.chain():

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain(*list2d))

Or you can use itertools.chain.from_iterable() which doesn’t require unpacking the list with the * operator:

import itertools
list2d = [[1,2,3], [4,5,6], [7], [8,9]]
merged = list(itertools.chain.from_iterable(list2d))

回答 2

作者注意:这是低效的。但是很有趣,因为类人猿很棒。它不适用于生产Python代码。

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

这只是对在第一个参数中传递的iterable元素求和,将第二个参数视为和的初始值(如果未给出,0则改为使用和,这种情况下会给您带来错误)。

由于您是对嵌套列表求和,因此实际上得到[1,3]+[2,4]的结果sum([[1,3],[2,4]],[])等于[1,3,2,4]

请注意,仅适用于列表列表。对于列表列表列表,您将需要其他解决方案。

Note from the author: This is inefficient. But fun, because monoids are awesome. It’s not appropriate for production Python code.

>>> sum(l, [])
[1, 2, 3, 4, 5, 6, 7, 8, 9]

This just sums the elements of iterable passed in the first argument, treating second argument as the initial value of the sum (if not given, 0 is used instead and this case will give you an error).

Because you are summing nested lists, you actually get [1,3]+[2,4] as a result of sum([[1,3],[2,4]],[]), which is equal to [1,3,2,4].

Note that only works on lists of lists. For lists of lists of lists, you’ll need another solution.


回答 3

我使用perfplot(我的一个宠物项目,本质上是一个包装纸timeit)测试了大多数建议的解决方案,然后发现

functools.reduce(operator.iconcat, a, [])

串联多个小列表和几个长列表时,这是最快的解决方案。(operator.iadd同样快。)

在此处输入图片说明

在此处输入图片说明


复制剧情的代码:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

I tested most suggested solutions with perfplot (a pet project of mine, essentially a wrapper around timeit), and found

functools.reduce(operator.iconcat, a, [])

to be the fastest solution, both when many small lists and few long lists are concatenated. (operator.iadd is equally fast.)

enter image description here

enter image description here


Code to reproduce the plot:

import functools
import itertools
import numpy
import operator
import perfplot


def forfor(a):
    return [item for sublist in a for item in sublist]


def sum_brackets(a):
    return sum(a, [])


def functools_reduce(a):
    return functools.reduce(operator.concat, a)


def functools_reduce_iconcat(a):
    return functools.reduce(operator.iconcat, a, [])


def itertools_chain(a):
    return list(itertools.chain.from_iterable(a))


def numpy_flat(a):
    return list(numpy.array(a).flat)


def numpy_concatenate(a):
    return list(numpy.concatenate(a))


perfplot.show(
    setup=lambda n: [list(range(10))] * n,
    # setup=lambda n: [list(range(n))] * 10,
    kernels=[
        forfor,
        sum_brackets,
        functools_reduce,
        functools_reduce_iconcat,
        itertools_chain,
        numpy_flat,
        numpy_concatenate,
    ],
    n_range=[2 ** k for k in range(16)],
    xlabel="num lists (of length 10)",
    # xlabel="len lists (10 lists total)"
)

回答 4

from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

extend()您的示例中的方法将修改x而不是返回有用的值(期望值reduce())。

reduce版本的更快方法是

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]
from functools import reduce #python 3

>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(lambda x,y: x+y,l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

The extend() method in your example modifies x instead of returning a useful value (which reduce() expects).

A faster way to do the reduce version would be

>>> import operator
>>> l = [[1,2,3],[4,5,6], [7], [8,9]]
>>> reduce(operator.concat, l)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 5

如果您使用Django,请不要重新发明轮子:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

熊猫

>>> from pandas.core.common import flatten
>>> list(flatten(l))

Itertools

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

Unipath

>>> from unipath.path import flatten
>>> list(flatten(l))

Setuptools

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

Don’t reinvent the wheel if you use Django:

>>> from django.contrib.admin.utils import flatten
>>> l = [[1,2,3], [4,5], [6]]
>>> flatten(l)
>>> [1, 2, 3, 4, 5, 6]

Pandas:

>>> from pandas.core.common import flatten
>>> list(flatten(l))

Itertools:

>>> import itertools
>>> flatten = itertools.chain.from_iterable
>>> list(flatten(l))

Matplotlib

>>> from matplotlib.cbook import flatten
>>> list(flatten(l))

Unipath:

>>> from unipath.path import flatten
>>> list(flatten(l))

Setuptools:

>>> from setuptools.namespaces import flatten
>>> list(flatten(l))

回答 6

这是适用于数字字符串嵌套列表和混合容器的通用方法。

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

注意事项

  • 在Python 3中,yield from flatten(x)可以替换for sub_x in flatten(x): yield sub_x
  • 在Python 3.8,抽象基类移动collection.abc所述typing模块。

演示版

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

参考

  • 此解决方案是根据Beazley,D.和B. Jones的食谱修改的。食谱4.14,Python Cookbook第三版,O’Reilly Media Inc.,塞巴斯托波尔,加利福尼亚:2013年。
  • 找到了较早的SO帖子,可能是原始的演示。

Here is a general approach that applies to numbers, strings, nested lists and mixed containers.

Code

#from typing import Iterable 
from collections import Iterable                            # < py38


def flatten(items):
    """Yield items from any nested iterable; see Reference."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            for sub_x in flatten(x):
                yield sub_x
        else:
            yield x

Notes:

  • In Python 3, yield from flatten(x) can replace for sub_x in flatten(x): yield sub_x
  • In Python 3.8, abstract base classes are moved from collection.abc to the typing module.

Demo

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(flatten(lst))                                         # nested lists
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

mixed = [[1, [2]], (3, 4, {5, 6}, 7), 8, "9"]              # numbers, strs, nested & mixed
list(flatten(mixed))
# [1, 2, 3, 4, 5, 6, 7, 8, '9']

Reference

  • This solution is modified from a recipe in Beazley, D. and B. Jones. Recipe 4.14, Python Cookbook 3rd Ed., O’Reilly Media Inc. Sebastopol, CA: 2013.
  • Found an earlier SO post, possibly the original demonstration.

回答 7

如果要展平不知道嵌套深度的数据结构,可以使用1iteration_utilities.deepflatten

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

它是一个生成器,因此您需要将结果list强制转换为或对其进行显式迭代。


如果只展平一个级别,并且每个项目本身都是可迭代的,则还可以使用iteration_utilities.flatten它本身只是一个薄包装itertools.chain.from_iterable

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

只是添加一些时间(基于NicoSchlömer的答案,其中不包括此答案中提供的功能):

在此处输入图片说明

这是一个对数对数图,可以容纳跨度很大的值。对于定性推理:越低越好。

研究结果表明,如果迭代只包含几个内部iterables然后sum将最快,但长期iterables只itertools.chain.from_iterableiteration_utilities.deepflatten或嵌套的理解与合理的性能itertools.chain.from_iterable是最快的(如已被尼科Schlömer注意到)。

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

1免责声明:我是该图书馆的作者

If you want to flatten a data-structure where you don’t know how deep it’s nested you could use iteration_utilities.deepflatten1

>>> from iteration_utilities import deepflatten

>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(deepflatten(l, depth=1))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

>>> l = [[1, 2, 3], [4, [5, 6]], 7, [8, 9]]
>>> list(deepflatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

It’s a generator so you need to cast the result to a list or explicitly iterate over it.


To flatten only one level and if each of the items is itself iterable you can also use iteration_utilities.flatten which itself is just a thin wrapper around itertools.chain.from_iterable:

>>> from iteration_utilities import flatten
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> list(flatten(l))
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Just to add some timings (based on Nico Schlömer answer that didn’t include the function presented in this answer):

enter image description here

It’s a log-log plot to accommodate for the huge range of values spanned. For qualitative reasoning: Lower is better.

The results show that if the iterable contains only a few inner iterables then sum will be fastest, however for long iterables only the itertools.chain.from_iterable, iteration_utilities.deepflatten or the nested comprehension have reasonable performance with itertools.chain.from_iterable being the fastest (as already noticed by Nico Schlömer).

from itertools import chain
from functools import reduce
from collections import Iterable  # or from collections.abc import Iterable
import operator
from iteration_utilities import deepflatten

def nested_list_comprehension(lsts):
    return [item for sublist in lsts for item in sublist]

def itertools_chain_from_iterable(lsts):
    return list(chain.from_iterable(lsts))

def pythons_sum(lsts):
    return sum(lsts, [])

def reduce_add(lsts):
    return reduce(lambda x, y: x + y, lsts)

def pylangs_flatten(lsts):
    return list(flatten(lsts))

def flatten(items):
    """Yield items from any nested iterable; see REF."""
    for x in items:
        if isinstance(x, Iterable) and not isinstance(x, (str, bytes)):
            yield from flatten(x)
        else:
            yield x

def reduce_concat(lsts):
    return reduce(operator.concat, lsts)

def iteration_utilities_deepflatten(lsts):
    return list(deepflatten(lsts, depth=1))


from simple_benchmark import benchmark

b = benchmark(
    [nested_list_comprehension, itertools_chain_from_iterable, pythons_sum, reduce_add,
     pylangs_flatten, reduce_concat, iteration_utilities_deepflatten],
    arguments={2**i: [[0]*5]*(2**i) for i in range(1, 13)},
    argument_name='number of inner lists'
)

b.plot()

1 Disclaimer: I’m the author of that library


回答 8

我收回我的声明。总和不是赢家。尽管列表较小时速度更快。但是,列表较大时,性能会大大降低。

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

sum版本仍在运行一分钟以上,尚未处理!

对于中型列表:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

使用小清单和时间:number = 1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

I take my statement back. sum is not the winner. Although it is faster when the list is small. But the performance degrades significantly with larger lists.

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10000'
    ).timeit(100)
2.0440959930419922

The sum version is still running for more than a minute and it hasn’t done processing yet!

For medium lists:

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
20.126545906066895
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
22.242258071899414
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]] * 10'
    ).timeit()
16.449732065200806

Using small lists and timeit: number=1000000

>>> timeit.Timer(
        '[item for sublist in l for item in sublist]',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
2.4598159790039062
>>> timeit.Timer(
        'reduce(lambda x,y: x+y,l)',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.5289170742034912
>>> timeit.Timer(
        'sum(l, [])',
        'l=[[1, 2, 3], [4, 5, 6, 7, 8], [1, 2, 3, 4, 5, 6, 7]]'
    ).timeit()
1.0598428249359131

回答 9

似乎与operator.add!当您将两个列表加在一起时,正确的术语是concat,而不是添加。operator.concat是您需要使用的。

如果您认为功能正常,那么就这么简单:

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

您会看到reduce尊重序列类型,因此在提供元组时,您会得到一个元组。让我们尝试一个列表:

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

啊哈,您会得到一个清单。

性能如何:

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable相当快!但这是无法与相比的concat

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

There seems to be a confusion with operator.add! When you add two lists together, the correct term for that is concat, not add. operator.concat is what you need to use.

If you’re thinking functional, it is as easy as this::

>>> from functools import reduce
>>> list2d = ((1, 2, 3), (4, 5, 6), (7,), (8, 9))
>>> reduce(operator.concat, list2d)
(1, 2, 3, 4, 5, 6, 7, 8, 9)

You see reduce respects the sequence type, so when you supply a tuple, you get back a tuple. Let’s try with a list::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> reduce(operator.concat, list2d)
[1, 2, 3, 4, 5, 6, 7, 8, 9]

Aha, you get back a list.

How about performance::

>>> list2d = [[1, 2, 3],[4, 5, 6], [7], [8, 9]]
>>> %timeit list(itertools.chain.from_iterable(list2d))
1000000 loops, best of 3: 1.36 µs per loop

from_iterable is pretty fast! But it’s no comparison to reduce with concat.

>>> list2d = ((1, 2, 3),(4, 5, 6), (7,), (8, 9))
>>> %timeit reduce(operator.concat, list2d)
1000000 loops, best of 3: 492 ns per loop

回答 10

为什么使用扩展?

reduce(lambda x, y: x+y, l)

这应该工作正常。

Why do you use extend?

reduce(lambda x, y: x+y, l)

This should work fine.


回答 11

考虑安装more_itertools软件包。

> pip install more_itertools

它附带了一个实现flattensource,来自itertools配方):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

从2.4版开始,您可以使用more_itertools.collapsesource,由abarnet提供)来展平更复杂,嵌套的可迭代对象。

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

Consider installing the more_itertools package.

> pip install more_itertools

It ships with an implementation for flatten (source, from the itertools recipes):

import more_itertools


lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.flatten(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

As of version 2.4, you can flatten more complicated, nested iterables with more_itertools.collapse (source, contributed by abarnet).

lst = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
list(more_itertools.collapse(lst)) 
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

lst = [[1, 2, 3], [[4, 5, 6]], [[[7]]], 8, 9]              # complex nesting
list(more_itertools.collapse(lst))
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

回答 12

您的函数不起作用的原因是因为扩展名扩展了数组就位并且不返回它。您仍然可以使用以下方法从lambda返回x:

reduce(lambda x,y: x.extend(y) or x, l)

注意:扩展比列表上的+更有效。

The reason your function didn’t work is because the extend extends an array in-place and doesn’t return it. You can still return x from lambda, using something like this:

reduce(lambda x,y: x.extend(y) or x, l)

Note: extend is more efficient than + on lists.


回答 13

def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]
def flatten(l, a):
    for i in l:
        if isinstance(i, list):
            flatten(i, a)
        else:
            a.append(i)
    return a

print(flatten([[[1, [1,1, [3, [4,5,]]]], 2, 3], [4, 5],6], []))

# [1, 1, 1, 3, 4, 5, 2, 3, 4, 5, 6]

回答 14

递归版本

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

Recursive version

x = [1,2,[3,4],[5,[6,[7]]],8,9,[10]]

def flatten_list(k):
    result = list()
    for i in k:
        if isinstance(i,list):

            #The isinstance() function checks if the object (first argument) is an 
            #instance or subclass of classinfo class (second argument)

            result.extend(flatten_list(i)) #Recursive call
        else:
            result.append(i)
    return result

flatten_list(x)
#result = [1,2,3,4,5,6,7,8,9,10]

回答 15

matplotlib.cbook.flatten() 即使嵌套列表比示例嵌套更深,它也适用于嵌套列表。

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

结果:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

这比下划线._。flatten快18倍:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

matplotlib.cbook.flatten() will work for nested lists even if they nest more deeply than the example.

import matplotlib
l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
print(list(matplotlib.cbook.flatten(l)))
l2 = [[1, 2, 3], [4, 5, 6], [7], [8, [9, 10, [11, 12, [13]]]]]
print list(matplotlib.cbook.flatten(l2))

Result:

[1, 2, 3, 4, 5, 6, 7, 8, 9]
[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]

This is 18x faster than underscore._.flatten:

Average time over 1000 trials of matplotlib.cbook.flatten: 2.55e-05 sec
Average time over 1000 trials of underscore._.flatten: 4.63e-04 sec
(time for underscore._)/(time for matplotlib.cbook) = 18.1233394636

回答 16

在处理基于文本的可变长度列表时,可接受的答案对我不起作用。这是对我有用的另一种方法。

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

接受的答案无效

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

新提出的解决方案,没有工作对我来说:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

The accepted answer did not work for me when dealing with text-based lists of variable lengths. Here is an alternate approach that did work for me.

l = ['aaa', 'bb', 'cccccc', ['xx', 'yyyyyyy']]

Accepted answer that did not work:

flat_list = [item for sublist in l for item in sublist]
print(flat_list)
['a', 'a', 'a', 'b', 'b', 'c', 'c', 'c', 'c', 'c', 'c', 'xx', 'yyyyyyy']

New proposed solution that did work for me:

flat_list = []
_ = [flat_list.extend(item) if isinstance(item, list) else flat_list.append(item) for item in l if item]
print(flat_list)
['aaa', 'bb', 'cccccc', 'xx', 'yyyyyyy']

回答 17

上面的Anil函数的一个坏功能是,它要求用户始终手动将第二个参数指定为空列表[]。相反,这应该是默认设置。由于Python对象的工作方式,这些对象应在函数内部而不是参数中设置。

这是一个工作功能:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

测试:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

An bad feature of Anil’s function above is that it requires the user to always manually specify the second argument to be an empty list []. This should instead be a default. Due to the way Python objects work, these should be set inside the function, not in the arguments.

Here’s a working function:

def list_flatten(l, a=None):
    #check a
    if a is None:
        #initialize with empty list
        a = []

    for i in l:
        if isinstance(i, list):
            list_flatten(i, a)
        else:
            a.append(i)
    return a

Testing:

In [2]: lst = [1, 2, [3], [[4]],[5,[6]]]

In [3]: lst
Out[3]: [1, 2, [3], [[4]], [5, [6]]]

In [11]: list_flatten(lst)
Out[11]: [1, 2, 3, 4, 5, 6]

回答 18

以下对我来说似乎最简单:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

Following seem simplest to me:

>>> import numpy as np
>>> l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
>>> print (np.concatenate(l))
[1 2 3 4 5 6 7 8 9]

回答 19

也可以使用NumPy的flat

import numpy as np
list(np.array(l).flat)

编辑11/02/2016:仅当子列表具有相同尺寸时才可用。

One can also use NumPy’s flat:

import numpy as np
list(np.array(l).flat)

Edit 11/02/2016: Only works when sublists have identical dimensions.


回答 20

您可以使用numpy:
flat_list = list(np.concatenate(list_of_list))

You can use numpy :
flat_list = list(np.concatenate(list_of_list))


回答 21

如果您愿意放弃一点速度以获得更干净的外观,则可以使用numpy.concatenate().tolist()numpy.concatenate().ravel().tolist()

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

您可以在docs numpy.concatenatenumpy.ravel中找到更多信息

If you are willing to give up a tiny amount of speed for a cleaner look, then you could use numpy.concatenate().tolist() or numpy.concatenate().ravel().tolist():

import numpy

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]] * 99

%timeit numpy.concatenate(l).ravel().tolist()
1000 loops, best of 3: 313 µs per loop

%timeit numpy.concatenate(l).tolist()
1000 loops, best of 3: 312 µs per loop

%timeit [item for sublist in l for item in sublist]
1000 loops, best of 3: 31.5 µs per loop

You can find out more here in the docs numpy.concatenate and numpy.ravel


回答 22

我找到的最快解决方案(无论如何都是大型列表):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

做完了!您当然可以通过执行list(l)将其转换为列表

Fastest solution I have found (for large list anyway):

import numpy as np
#turn list into an array and flatten()
np.array(l).flatten()

Done! You can of course turn it back into a list by executing list(l)


回答 23

underscore.py包装风扇的简单代码

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

它解决了所有扁平化问题(无列表项或复杂的嵌套)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

您可以underscore.py使用pip 安装

pip install underscore.py

Simple code for underscore.py package fan

from underscore import _
_.flatten([[1, 2, 3], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

It solves all flatten problems (none list item or complex nesting)

from underscore import _
# 1 is none list item
# [2, [3]] is complex nesting
_.flatten([1, [2, [3]], [4, 5, 6], [7], [8, 9]])
# [1, 2, 3, 4, 5, 6, 7, 8, 9]

You can install underscore.py with pip

pip install underscore.py

回答 24

def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])
def flatten(alist):
    if alist == []:
        return []
    elif type(alist) is not list:
        return [alist]
    else:
        return flatten(alist[0]) + flatten(alist[1:])

回答 25

注意:以下内容适用于Python 3.3+,因为它使用yield_fromsix也是第三方软件包,尽管它很稳定。或者,您可以使用sys.version


在的情况下obj = [[1, 2,], [3, 4], [5, 6]],这里的所有解决方案都不错,包括列表理解和itertools.chain.from_iterable

但是,请考虑以下稍微复杂的情况:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

这里有几个问题:

  • 一个元素6只是一个标量。它是不可迭代的,因此上述路由将在此处失败。
  • 其中一个要素,'abc'技术上可迭代(所有str s为)。但是,在行与行之间进行一点阅读时,您并不想这样处理-您希望将其视为单个元素。
  • 最后一个元素[8, [9, 10]]本身就是嵌套的可迭代对象。基本列表理解,chain.from_iterable仅提取“下一级”。

您可以通过以下方法对此进行补救:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

在这里,您检查子元素(1)是否可通过Iterable,ABC从进行迭代itertools,但还要确保(2)元素不是 “字符串状”的。

Note: Below applies to Python 3.3+ because it uses yield_from. six is also a third-party package, though it is stable. Alternately, you could use sys.version.


In the case of obj = [[1, 2,], [3, 4], [5, 6]], all of the solutions here are good, including list comprehension and itertools.chain.from_iterable.

However, consider this slightly more complex case:

>>> obj = [[1, 2, 3], [4, 5], 6, 'abc', [7], [8, [9, 10]]]

There are several problems here:

  • One element, 6, is just a scalar; it’s not iterable, so the above routes will fail here.
  • One element, 'abc', is technically iterable (all strs are). However, reading between the lines a bit, you don’t want to treat it as such–you want to treat it as a single element.
  • The final element, [8, [9, 10]] is itself a nested iterable. Basic list comprehension and chain.from_iterable only extract “1 level down.”

You can remedy this as follows:

>>> from collections import Iterable
>>> from six import string_types

>>> def flatten(obj):
...     for i in obj:
...         if isinstance(i, Iterable) and not isinstance(i, string_types):
...             yield from flatten(i)
...         else:
...             yield i


>>> list(flatten(obj))
[1, 2, 3, 4, 5, 6, 'abc', 7, 8, 9, 10]

Here, you check that the sub-element (1) is iterable with Iterable, an ABC from itertools, but also want to ensure that (2) the element is not “string-like.”


回答 26

flat_list = []
for i in list_of_list:
    flat_list+=i

该代码也可以很好地工作,因为它会一直扩展列表。虽然非常相似,但是只有一个for循环。因此,它比添加2 for循环具有更少的复杂性。

flat_list = []
for i in list_of_list:
    flat_list+=i

This Code also works fine as it just extend the list all the way. Although it is much similar but only have one for loop. So It have less complexity than adding 2 for loops.


回答 27

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

这种解决方案相对于大多数其他解决方案的优势在于,如果您有类似以下的列表:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

虽然其他大多数解决方案都会引发错误,但此解决方案可以解决这些问题。

from nltk import flatten

l = [[1, 2, 3], [4, 5, 6], [7], [8, 9]]
flatten(l)

The advantage of this solution over most others here is that if you have a list like:

l = [1, [2, 3], [4, 5, 6], [7], [8, 9]]

while most other solutions throw an error this solution handles them.


回答 28

这可能不是最有效的方法,但我认为应该放一个衬里(实际上是两个衬里)。两种版本均可在任意层次的嵌套列表上使用,并利用语言功能(Python3.5)和递归。

def make_list_flat (l):
    flist = []
    flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l]
    return flist

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = make_list_flat(a)
print (flist)

输出是

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

这以深度优先的方式工作。递归向下进行,直到找到一个非列表元素,然后扩展局部变量flist,然后将其回滚到父变量。每当flist返回时,它就会扩展到flist列表理解中的父级。因此,从根本上返回一个平面列表。

上面的代码创建了几个本地列表并返回它们,用于扩展父级列表。我认为解决此问题的方法可能是创建gloabl flist,如下所示。

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = []
def make_list_flat (l):
    flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l]

make_list_flat(a)
print (flist)

输出再次

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

尽管目前我不确定效率。

This may not be the most efficient way but I thought to put a one-liner (actually a two-liner). Both versions will work on arbitrary hierarchy nested lists, and exploits language features (Python3.5) and recursion.

def make_list_flat (l):
    flist = []
    flist.extend ([l]) if (type (l) is not list) else [flist.extend (make_list_flat (e)) for e in l]
    return flist

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = make_list_flat(a)
print (flist)

The output is

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

This works in a depth first manner. The recursion goes down until it finds a non-list element, then extends the local variable flist and then rolls back it to the parent. Whenever flist is returned, it is extended to the parent’s flist in the list comprehension. Therefore, at the root, a flat list is returned.

The above one creates several local lists and returns them which are used to extend the parent’s list. I think the way around for this may be creating a gloabl flist, like below.

a = [[1, 2], [[[[3, 4, 5], 6]]], 7, [8, [9, [10, 11], 12, [13, 14, [15, [[16, 17], 18]]]]]]
flist = []
def make_list_flat (l):
    flist.extend ([l]) if (type (l) is not list) else [make_list_flat (e) for e in l]

make_list_flat(a)
print (flist)

The output is again

[1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18]

Although I am not sure at this time about the efficiency.


回答 29

适用于整数的异质和均质列表的另一种异常方法:

from typing import List


def flatten(l: list) -> List[int]:
    """Flatten an arbitrary deep nested list of lists of integers.

    Examples:
        >>> flatten([1, 2, [1, [10]]])
        [1, 2, 1, 10]

    Args:
        l: Union[l, Union[int, List[int]]

    Returns:
        Flatted list of integer
    """
    return [int(i.strip('[ ]')) for i in str(l).split(',')]

Another unusual approach that works for hetero- and homogeneous lists of integers:

from typing import List


def flatten(l: list) -> List[int]:
    """Flatten an arbitrary deep nested list of lists of integers.

    Examples:
        >>> flatten([1, 2, [1, [10]]])
        [1, 2, 1, 10]

    Args:
        l: Union[l, Union[int, List[int]]

    Returns:
        Flatted list of integer
    """
    return [int(i.strip('[ ]')) for i in str(l).split(',')]