标签归档:arrays

Python的清单是如何实现的?

问题:Python的清单是如何实现的?

它是一个链表,一个数组吗?我四处搜寻,只发现有人猜测。我的C语言知识不足以查看源代码。

Is it a linked list, an array? I searched around and only found people guessing. My C knowledge isn’t good enough to look at the source code.


回答 0

这是一个动态数组。实际证明:不管索引如何,索引都需要花费同一时间(当然,差异很小(0.0013微秒!)):

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

如果IronPython或Jython使用链接列表,我会感到惊讶-它们会破坏许多基于列表是动态数组的假设而广泛使用的库的性能。

It’s a dynamic array. Practical proof: Indexing takes (of course with extremely small differences (0.0013 µsecs!)) the same time regardless of index:

...>python -m timeit --setup="x = [None]*1000" "x[500]"
10000000 loops, best of 3: 0.0579 usec per loop

...>python -m timeit --setup="x = [None]*1000" "x[0]"
10000000 loops, best of 3: 0.0566 usec per loop

I would be astounded if IronPython or Jython used linked lists – they would ruin the performance of many many widely-used libraries built on the assumption that lists are dynamic arrays.


回答 1

实际上,C代码非常简单。扩展一个宏并修剪一些不相关的注释,其基本结构在中listobject.h,该结构将列表定义为:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD包含引用计数和类型标识符。因此,它是一个整体的向量/数组。用于在此类数组已满时调整其大小的代码在中listobject.c。它实际上并没有将数组加倍,而是通过分配来增长

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

每次达到最大容量时,newsize请求的大小在哪里(不一定allocated + 1是因为您可以添加extend任意数量的元素,而不是append一个一个地添加它们)。

另请参阅Python FAQ

The C code is pretty simple, actually. Expanding one macro and pruning some irrelevant comments, the basic structure is in listobject.h, which defines a list as:

typedef struct {
    PyObject_HEAD
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
     */
    Py_ssize_t allocated;
} PyListObject;

PyObject_HEAD contains a reference count and a type identifier. So, it’s a vector/array that overallocates. The code for resizing such an array when it’s full is in listobject.c. It doesn’t actually double the array, but grows by allocating

new_allocated = (newsize >> 3) + (newsize < 9 ? 3 : 6);
new_allocated += newsize;

to the capacity each time, where newsize is the requested size (not necessarily allocated + 1 because you can extend by an arbitrary number of elements instead of append‘ing them one by one).

See also the Python FAQ.


回答 2

在CPython中,列表是指针数组。Python的其他实现可能选择以不同的方式存储它们。

In CPython, lists are arrays of pointers. Other implementations of Python may choose to store them in different ways.


回答 3

这取决于实现,但是IIRC:

  • CPython使用了一个指针数组
  • Jython使用 ArrayList
  • IronPython显然也使用数组。您可以浏览源代码以找出答案。

因此,它们都具有O(1)随机访问权限。

This is implementation dependent, but IIRC:

  • CPython uses an array of pointers
  • Jython uses an ArrayList
  • IronPython apparently also uses an array. You can browse the source code to find out.

Thus they all have O(1) random access.


回答 4

我建议Laurent Luce的文章“ Python列表实现”。这对我真的很有用,因为作者解释了该列表是如何在CPython中实现的,并为此使用了出色的图表。

列出对象C的结构

CPython中的列表对象由以下C结构表示。ob_item是指向列表元素的指针的列表。已分配是在内存中分配的插槽数。

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

重要的是要注意分配的插槽和列表大小之间的差异。列表的大小与相同len(l)。分配的插槽数是已在内存中分配的数量。通常,您会看到分配的大小可能大于大小。这是为了避免realloc每次将新元素添加到列表时都需要调用。

附加

我们将一个整数附加到列表中:l.append(1)。怎么了?
在此处输入图片说明

我们继续添加一个元素:l.append(2)list_resize在n + 1 = 2时调用,但是由于分配的大小为4,因此无需分配更多的内存。同样的事情发生时,我们增加2个整数:l.append(3)l.append(4)。下图显示了到目前为止的内容。

在此处输入图片说明

让我们在位置1处插入一个新的整数(5),l.insert(1,5)看看内部发生了什么。在此处输入图片说明

流行音乐

当您弹出最后一个元素:时l.pop(),将listpop()被调用。list_resize在内部调用listpop(),如果新大小小于分配大小的一半,则列表将缩小。在此处输入图片说明

您可以观察到插槽4仍指向整数,但重要的是列表的大小现在为4。让我们再弹出一个元素。在中list_resize(),大小– 1 = 4 – 1 = 3小于分配的插槽的一半,因此列表缩小为6个插槽,现在列表的新大小为3。

您可以观察到插槽3和4仍指向一些整数,但重要的是列表的大小现在为3。在此处输入图片说明

删除 Python列表对象具有删除特定元素的方法:l.remove(5)在此处输入图片说明

I would suggest Laurent Luce’s article “Python list implementation”. Was really useful for me because the author explains how the list is implemented in CPython and uses excellent diagrams for this purpose.

List object C structure

A list object in CPython is represented by the following C structure. ob_item is a list of pointers to the list elements. allocated is the number of slots allocated in memory.

typedef struct {
    PyObject_VAR_HEAD
    PyObject **ob_item;
    Py_ssize_t allocated;
} PyListObject;

It is important to notice the difference between allocated slots and the size of the list. The size of a list is the same as len(l). The number of allocated slots is what has been allocated in memory. Often, you will see that allocated can be greater than size. This is to avoid needing calling realloc each time a new elements is appended to the list.

Append

We append an integer to the list: l.append(1). What happens?
enter image description here

We continue by adding one more element: l.append(2). list_resize is called with n+1 = 2 but because the allocated size is 4, there is no need to allocate more memory. Same thing happens when we add 2 more integers: l.append(3), l.append(4). The following diagram shows what we have so far.

enter image description here

Insert

Let’s insert a new integer (5) at position 1: l.insert(1,5) and look at what happens internally. enter image description here

Pop

When you pop the last element: l.pop(), listpop() is called. list_resize is called inside listpop() and if the new size is less than half of the allocated size then the list is shrunk.enter image description here

You can observe that slot 4 still points to the integer but the important thing is the size of the list which is now 4. Let’s pop one more element. In list_resize(), size – 1 = 4 – 1 = 3 is less than half of the allocated slots so the list is shrunk to 6 slots and the new size of the list is now 3.

You can observe that slot 3 and 4 still point to some integers but the important thing is the size of the list which is now 3.enter image description here

Remove Python list object has a method to remove a specific element: l.remove(5).enter image description here


回答 5

根据文档

Python的列表实际上是可变长度数组,而不是Lisp样式的链接列表。

According to the documentation,

Python’s lists are really variable-length arrays, not Lisp-style linked lists.


回答 6

正如上面其他人所述,这些列表(当列表很大时)是通过分配固定数量的空间来实现的;如果该空间应填充,则分配更大的空间并在元素上进行复制。

为了理解为什么在不损失一般性的情况下对O(1)进行摊销,假设我们插入了a = 2 ^ n个元素,现在我们必须将表加倍为2 ^(n + 1)个大小。这意味着我们当前正在执行2 ^(n + 1)个操作。最后一个副本,我们进行了2 ^ n次操作。在此之前,我们做了2 ^(n-1)…一直到8,4,2,1。现在,如果将它们加起来,我们得到1 + 2 + 4 + 8 + … + 2 ^(n + 1)= 2 ^(n + 2)-1 <4 * 2 ^ n = O(2 ^ n)= O(a)总插入(即O(1)摊销时间)。另外,应注意,如果表允许删除,则表收缩必须以其他因子(例如3倍)完成

As others have stated above, the lists (when appreciably large) are implemented by allocating a fixed amount of space, and, if that space should fill, allocating a larger amount of space and copying over the elements.

To understand why the method is O(1) amortized, without loss of generality, assume we have inserted a = 2^n elements, and we now have to double our table to 2^(n+1) size. That means we’re currently doing 2^(n+1) operations. Last copy, we did 2^n operations. Before that we did 2^(n-1)… all the way down to 8,4,2,1. Now, if we add these up, we get 1 + 2 + 4 + 8 + … + 2^(n+1) = 2^(n+2) – 1 < 4*2^n = O(2^n) = O(a) total insertions (i.e. O(1) amortized time). Also, it should be noted that if the table allows deletions the table shrinking has to be done at a different factor (e.g 3x)


回答 7

Python中的列表类似于数组,您可以在其中存储多个值。列表是可变的,这意味着您可以更改它。您应该知道的更重要的事情是,当我们创建列表时,Python会自动为该列表变量创建reference_id。如果通过分配其他变量来更改它,则主列表将被更改。让我们尝试一个例子:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

我们追加了,my_list但是我们的主要列表已更改。这意味着没有将列表指定为副本列表,将其指定为参考。

A list in Python is something like an array, where you can store multiple values. List is mutable that means you can change it. The more important thing you should know, when we create a list, Python automatically creates a reference_id for that list variable. If you change it by assigning others variable the main list will be change. Let’s try with a example:

list_one = [1,2,3,4]

my_list = list_one

#my_list: [1,2,3,4]

my_list.append("new")

#my_list: [1,2,3,4,'new']
#list_one: [1,2,3,4,'new']

We append my_list but our main list has changed. That mean’s list didn’t assign as a copy list assign as its reference.


回答 8

在CPython中,列表是作为动态数组实现的,因此,当我们在那时追加时,不仅添加了一个宏,而且还分配了更多空间,因此每次都不应添加新空间。

In CPython list is implemented as dynamic array, and therefore when we append at that time not only one macro is added but some more space is allocated so that everytime new space should not be added.


比较两个NumPy数组的相等性,按元素

问题:比较两个NumPy数组的相等性,按元素

比较两个NumPy数组是否相等的最简单方法是什么(其中相等定义为:对于所有索引i:,A = B iff A[i] == B[i])?

简单地使用==给我一个布尔数组:

 >>> numpy.array([1,1,1]) == numpy.array([1,1,1])

array([ True,  True,  True], dtype=bool)

是否and需要确定该数组的元素是否相等,或者是否有更简单的比较方法?

What is the simplest way to compare two NumPy arrays for equality (where equality is defined as: A = B iff for all indices i: A[i] == B[i])?

Simply using == gives me a boolean array:

 >>> numpy.array([1,1,1]) == numpy.array([1,1,1])

array([ True,  True,  True], dtype=bool)

Do I have to and the elements of this array to determine if the arrays are equal, or is there a simpler way to compare?


回答 0

(A==B).all()

测试数组(A == B)的所有值是否均为True。

注意:也许您还想测试A和B形状,例如 A.shape == B.shape

特殊情况和替代方法(来自dbaupp的回答和yoavram的评论)

应当指出的是:

  • 在特定情况下,此解决方案可能会产生奇怪的行为:如果AB为空且另一个包含单个元素,则返回True。由于某种原因,比较会A==B返回一个空数组,all操作员将为此返回一个空数组True
  • 另一个风险是,如果AB形状不相同且不可广播,则此方法将引发错误。

总之,如果你有一个关于怀疑AB形状或只是想安全:的专业功能用途之一:

np.array_equal(A,B)  # test if same shape, same elements values
np.array_equiv(A,B)  # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values
(A==B).all()

test if all values of array (A==B) are True.

Note: maybe you also want to test A and B shape, such as A.shape == B.shape

Special cases and alternatives (from dbaupp’s answer and yoavram’s comment)

It should be noted that:

  • this solution can have a strange behavior in a particular case: if either A or B is empty and the other one contains a single element, then it return True. For some reason, the comparison A==B returns an empty array, for which the all operator returns True.
  • Another risk is if A and B don’t have the same shape and aren’t broadcastable, then this approach will raise an error.

In conclusion, if you have a doubt about A and B shape or simply want to be safe: use one of the specialized functions:

np.array_equal(A,B)  # test if same shape, same elements values
np.array_equiv(A,B)  # test if broadcastable shape, same elements values
np.allclose(A,B,...) # test if same shape, elements have close enough values

回答 1

(A==B).all()解决方案是很整齐,但也有完成这个任务的一些内置的功能。也就是说array_equalallclosearray_equiv

(尽管,使用进行一些快速测试timeit似乎表明该(A==B).all()方法是最快的,由于必须分配一个全新的数组,因此该方法有点特殊。)

The (A==B).all() solution is very neat, but there are some built-in functions for this task. Namely array_equal, allclose and array_equiv.

(Although, some quick testing with timeit seems to indicate that the (A==B).all() method is the fastest, which is a little peculiar, given it has to allocate a whole new array.)


回答 2

让我们通过使用以下代码来评估性能。

import numpy as np
import time

exec_time0 = []
exec_time1 = []
exec_time2 = []

sizeOfArray = 5000
numOfIterations = 200

for i in xrange(numOfIterations):

    A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
    B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))

    a = time.clock() 
    res = (A==B).all()
    b = time.clock()
    exec_time0.append( b - a )

    a = time.clock() 
    res = np.array_equal(A,B)
    b = time.clock()
    exec_time1.append( b - a )

    a = time.clock() 
    res = np.array_equiv(A,B)
    b = time.clock()
    exec_time2.append( b - a )

print 'Method: (A==B).all(),       ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)

输出量

Method: (A==B).all(),        0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515

根据上面的结果,numpy方法似乎比==运算符和all()方法的组合要快,并且通过比较numpy方法,最快的方法似乎是numpy.array_equal方法。

Let’s measure the performance by using the following piece of code.

import numpy as np
import time

exec_time0 = []
exec_time1 = []
exec_time2 = []

sizeOfArray = 5000
numOfIterations = 200

for i in xrange(numOfIterations):

    A = np.random.randint(0,255,(sizeOfArray,sizeOfArray))
    B = np.random.randint(0,255,(sizeOfArray,sizeOfArray))

    a = time.clock() 
    res = (A==B).all()
    b = time.clock()
    exec_time0.append( b - a )

    a = time.clock() 
    res = np.array_equal(A,B)
    b = time.clock()
    exec_time1.append( b - a )

    a = time.clock() 
    res = np.array_equiv(A,B)
    b = time.clock()
    exec_time2.append( b - a )

print 'Method: (A==B).all(),       ', np.mean(exec_time0)
print 'Method: np.array_equal(A,B),', np.mean(exec_time1)
print 'Method: np.array_equiv(A,B),', np.mean(exec_time2)

Output

Method: (A==B).all(),        0.03031857
Method: np.array_equal(A,B), 0.030025185
Method: np.array_equiv(A,B), 0.030141515

According to the results above, the numpy methods seem to be faster than the combination of the == operator and the all() method and by comparing the numpy methods the fastest one seems to be the numpy.array_equal method.


回答 3

如果要检查两个数组是否相同shape并且elements应该使用,np.array_equal因为这是文档中建议的方法。

在性能方面,不要期望任何相等性检查会胜过另一个,因为没有太多的优化空间comparing two elements。为了方便起见,我仍然进行了一些测试。

import numpy as np
import timeit

A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))

timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761

几乎相等,无需谈论速度。

(A==B).all()的行为几乎如下面的代码片段:

x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True

If you want to check if two arrays have the same shape AND elements you should use np.array_equal as it is the method recommended in the documentation.

Performance-wise don’t expect that any equality check will beat another, as there is not much room to optimize comparing two elements. Just for the sake, i still did some tests.

import numpy as np
import timeit

A = np.zeros((300, 300, 3))
B = np.zeros((300, 300, 3))
C = np.ones((300, 300, 3))

timeit.timeit(stmt='(A==B).all()', setup='from __main__ import A, B', number=10**5)
timeit.timeit(stmt='np.array_equal(A, B)', setup='from __main__ import A, B, np', number=10**5)
timeit.timeit(stmt='np.array_equiv(A, B)', setup='from __main__ import A, B, np', number=10**5)
> 51.5094
> 52.555
> 52.761

So pretty much equal, no need to talk about the speed.

The (A==B).all() behaves pretty much as the following code snippet:

x = [1,2,3]
y = [1,2,3]
print all([x[i]==y[i] for i in range(len(x))])
> True

回答 4

通常两个数组会有一些小的数字误差,

您可以使用numpy.allclose(A,B)代替(A==B).all()。这将返回布尔值True / False

Usually two arrays will have some small numeric errors,

You can use numpy.allclose(A,B), instead of (A==B).all(). This returns a bool True/False


回答 5

现在使用np.array_equal。从文档:

np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False

Now use np.array_equal. From documentation:

np.array_equal([1, 2], [1, 2])
True
np.array_equal(np.array([1, 2]), np.array([1, 2]))
True
np.array_equal([1, 2], [1, 2, 3])
False
np.array_equal([1, 2], [1, 4])
False

numpy:数组中唯一值的最有效频率计数

问题:numpy:数组中唯一值的最有效频率计数

numpy/中scipy,是否有一种有效的方法来获取数组中唯一值的频率计数?

遵循以下原则:

x = array( [1,1,1,2,2,2,5,25,1,1] )
y = freq_count( x )
print y

>> [[1, 5], [2,3], [5,1], [25,1]]

(对于您来说,R用户在那里,我基本上是在寻找该table()功能)

In numpy / scipy, is there an efficient way to get frequency counts for unique values in an array?

Something along these lines:

x = array( [1,1,1,2,2,2,5,25,1,1] )
y = freq_count( x )
print y

>> [[1, 5], [2,3], [5,1], [25,1]]

( For you, R users out there, I’m basically looking for the table() function )


回答 0

看一下np.bincount

http://docs.scipy.org/doc/numpy/reference/generation/numpy.bincount.html

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]

然后:

zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

要么:

np.vstack((ii,y[ii])).T
# array([[ 1,  5],
         [ 2,  3],
         [ 5,  1],
         [25,  1]])

或者您想将计数和唯一值结合起来。

Take a look at np.bincount:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.bincount.html

import numpy as np
x = np.array([1,1,1,2,2,2,5,25,1,1])
y = np.bincount(x)
ii = np.nonzero(y)[0]

And then:

zip(ii,y[ii]) 
# [(1, 5), (2, 3), (5, 1), (25, 1)]

or:

np.vstack((ii,y[ii])).T
# array([[ 1,  5],
         [ 2,  3],
         [ 5,  1],
         [25,  1]])

or however you want to combine the counts and the unique values.


回答 1

从Numpy 1.9开始,最简单,最快的方法是简单地使用numpy.unique,现在有了return_counts关键字参数:

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

print np.asarray((unique, counts)).T

这使:

 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

scipy.stats.itemfreq以下内容进行快速比较:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop

As of Numpy 1.9, the easiest and fastest method is to simply use numpy.unique, which now has a return_counts keyword argument:

import numpy as np

x = np.array([1,1,1,2,2,2,5,25,1,1])
unique, counts = np.unique(x, return_counts=True)

print np.asarray((unique, counts)).T

Which gives:

 [[ 1  5]
  [ 2  3]
  [ 5  1]
  [25  1]]

A quick comparison with scipy.stats.itemfreq:

In [4]: x = np.random.random_integers(0,100,1e6)

In [5]: %timeit unique, counts = np.unique(x, return_counts=True)
10 loops, best of 3: 31.5 ms per loop

In [6]: %timeit scipy.stats.itemfreq(x)
10 loops, best of 3: 170 ms per loop

回答 2

更新:不建议使用原始答案中提到的方法,而应使用新方法:

>>> import numpy as np
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> np.array(np.unique(x, return_counts=True)).T
    array([[ 1,  5],
           [ 2,  3],
           [ 5,  1],
           [25,  1]])

原始答案:

您可以使用scipy.stats.itemfreq

>>> from scipy.stats import itemfreq
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> itemfreq(x)
/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`
array([[  1.,   5.],
       [  2.,   3.],
       [  5.,   1.],
       [ 25.,   1.]])

Update: The method mentioned in the original answer is deprecated, we should use the new way instead:

>>> import numpy as np
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> np.array(np.unique(x, return_counts=True)).T
    array([[ 1,  5],
           [ 2,  3],
           [ 5,  1],
           [25,  1]])

Original answer:

you can use scipy.stats.itemfreq

>>> from scipy.stats import itemfreq
>>> x = [1,1,1,2,2,2,5,25,1,1]
>>> itemfreq(x)
/usr/local/bin/python:1: DeprecationWarning: `itemfreq` is deprecated! `itemfreq` is deprecated and will be removed in a future version. Use instead `np.unique(..., return_counts=True)`
array([[  1.,   5.],
       [  2.,   3.],
       [  5.,   1.],
       [ 25.,   1.]])

回答 3

我对此也很感兴趣,因此我做了一些性能比较(使用perfplot,这是我的一个宠物项目)。结果:

y = np.bincount(a)
ii = np.nonzero(y)[0]
out = np.vstack((ii, y[ii])).T

是迄今为止最快的。(请注意对数缩放。)

在此处输入图片说明


生成绘图的代码:

import numpy as np
import pandas as pd
import perfplot
from scipy.stats import itemfreq


def bincount(a):
    y = np.bincount(a)
    ii = np.nonzero(y)[0]
    return np.vstack((ii, y[ii])).T


def unique(a):
    unique, counts = np.unique(a, return_counts=True)
    return np.asarray((unique, counts)).T


def unique_count(a):
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)
    return np.vstack((unique, count)).T


def pandas_value_counts(a):
    out = pd.value_counts(pd.Series(a))
    out.sort_index(inplace=True)
    out = np.stack([out.keys().values, out.values]).T
    return out


perfplot.show(
    setup=lambda n: np.random.randint(0, 1000, n),
    kernels=[bincount, unique, itemfreq, unique_count, pandas_value_counts],
    n_range=[2 ** k for k in range(26)],
    logx=True,
    logy=True,
    xlabel="len(a)",
)

I was also interested in this, so I did a little performance comparison (using perfplot, a pet project of mine). Result:

y = np.bincount(a)
ii = np.nonzero(y)[0]
out = np.vstack((ii, y[ii])).T

is by far the fastest. (Note the log-scaling.)

enter image description here


Code to generate the plot:

import numpy as np
import pandas as pd
import perfplot
from scipy.stats import itemfreq


def bincount(a):
    y = np.bincount(a)
    ii = np.nonzero(y)[0]
    return np.vstack((ii, y[ii])).T


def unique(a):
    unique, counts = np.unique(a, return_counts=True)
    return np.asarray((unique, counts)).T


def unique_count(a):
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)
    return np.vstack((unique, count)).T


def pandas_value_counts(a):
    out = pd.value_counts(pd.Series(a))
    out.sort_index(inplace=True)
    out = np.stack([out.keys().values, out.values]).T
    return out


perfplot.show(
    setup=lambda n: np.random.randint(0, 1000, n),
    kernels=[bincount, unique, itemfreq, unique_count, pandas_value_counts],
    n_range=[2 ** k for k in range(26)],
    logx=True,
    logy=True,
    xlabel="len(a)",
)

回答 4

使用熊猫模块:

>>> import pandas as pd
>>> import numpy as np
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> pd.value_counts(x)
1     5
2     3
25    1
5     1
dtype: int64

Using pandas module:

>>> import pandas as pd
>>> import numpy as np
>>> x = np.array([1,1,1,2,2,2,5,25,1,1])
>>> pd.value_counts(x)
1     5
2     3
25    1
5     1
dtype: int64

回答 5

这是迄今为止最通用,最有效的解决方案。惊讶的是它还没有发布。

import numpy as np

def unique_count(a):
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)
    return np.vstack(( unique, count)).T

print unique_count(np.random.randint(-10,10,100))

与当前接受的答案不同,它适用于可排序的任何数据类型(不仅是正整数),而且具有最佳性能。唯一的重大支出是由np.unique完成的排序。

This is by far the most general and performant solution; surprised it hasn’t been posted yet.

import numpy as np

def unique_count(a):
    unique, inverse = np.unique(a, return_inverse=True)
    count = np.zeros(len(unique), np.int)
    np.add.at(count, inverse, 1)
    return np.vstack(( unique, count)).T

print unique_count(np.random.randint(-10,10,100))

Unlike the currently accepted answer, it works on any datatype that is sortable (not just positive ints), and it has optimal performance; the only significant expense is in the sorting done by np.unique.


回答 6

numpy.bincount是最好的选择。如果您的数组除了小的密集整数之外还包含其他任何内容,则将其包装起来可能会很有用:

def count_unique(keys):
    uniq_keys = np.unique(keys)
    bins = uniq_keys.searchsorted(keys)
    return uniq_keys, np.bincount(bins)

例如:

>>> x = array([1,1,1,2,2,2,5,25,1,1])
>>> count_unique(x)
(array([ 1,  2,  5, 25]), array([5, 3, 1, 1]))

numpy.bincount is the probably the best choice. If your array contains anything besides small dense integers it might be useful to wrap it something like this:

def count_unique(keys):
    uniq_keys = np.unique(keys)
    bins = uniq_keys.searchsorted(keys)
    return uniq_keys, np.bincount(bins)

For example:

>>> x = array([1,1,1,2,2,2,5,25,1,1])
>>> count_unique(x)
(array([ 1,  2,  5, 25]), array([5, 3, 1, 1]))

回答 7

即使已经回答过,我还是建议使用一种不同的方法numpy.histogram。给定一个序列的此类函数,它返回归类为bin的元素的频率。

请注意:由于数字是整数,因此在此示例中有效。如果它们是实数,则此解决方案将不太适用。

>>> from numpy import histogram
>>> y = histogram (x, bins=x.max()-1)
>>> y
(array([5, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1]),
 array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,
        23.,  24.,  25.]))

Even though it has already been answered, I suggest a different approach that makes use of numpy.histogram. Such function given a sequence it returns the frequency of its elements grouped in bins.

Beware though: it works in this example because numbers are integers. If they where real numbers, then this solution would not apply as nicely.

>>> from numpy import histogram
>>> y = histogram (x, bins=x.max()-1)
>>> y
(array([5, 3, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       1]),
 array([  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.,  11.,
        12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.,  20.,  21.,  22.,
        23.,  24.,  25.]))

回答 8

import pandas as pd
import numpy as np
x = np.array( [1,1,1,2,2,2,5,25,1,1] )
print(dict(pd.Series(x).value_counts()))

这给您:{1:5,2:3,5:1,25:1}

import pandas as pd
import numpy as np
x = np.array( [1,1,1,2,2,2,5,25,1,1] )
print(dict(pd.Series(x).value_counts()))

This gives you: {1: 5, 2: 3, 5: 1, 25: 1}


回答 9

为了计算唯一的非整数 -与Eelco Hoogendoorn的答案类似,但是速度更快(在我的机器上为5),我曾经weave.inline结合numpy.unique了一些c代码;

import numpy as np
from scipy import weave

def count_unique(datain):
  """
  Similar to numpy.unique function for returning unique members of
  data, but also returns their counts
  """
  data = np.sort(datain)
  uniq = np.unique(data)
  nums = np.zeros(uniq.shape, dtype='int')

  code="""
  int i,count,j;
  j=0;
  count=0;
  for(i=1; i<Ndata[0]; i++){
      count++;
      if(data(i) > data(i-1)){
          nums(j) = count;
          count = 0;
          j++;
      }
  }
  // Handle last value
  nums(j) = count+1;
  """
  weave.inline(code,
      ['data', 'nums'],
      extra_compile_args=['-O2'],
      type_converters=weave.converters.blitz)
  return uniq, nums

个人资料信息

> %timeit count_unique(data)
> 10000 loops, best of 3: 55.1 µs per loop

Eelco的纯numpy版本:

> %timeit unique_count(data)
> 1000 loops, best of 3: 284 µs per loop

注意

这里有冗余(unique也可以执行排序),这意味着可以通过将unique功能放入c代码循环中来进一步优化代码。

To count unique non-integers – similar to Eelco Hoogendoorn’s answer but considerably faster (factor of 5 on my machine), I used weave.inline to combine numpy.unique with a bit of c-code;

import numpy as np
from scipy import weave

def count_unique(datain):
  """
  Similar to numpy.unique function for returning unique members of
  data, but also returns their counts
  """
  data = np.sort(datain)
  uniq = np.unique(data)
  nums = np.zeros(uniq.shape, dtype='int')

  code="""
  int i,count,j;
  j=0;
  count=0;
  for(i=1; i<Ndata[0]; i++){
      count++;
      if(data(i) > data(i-1)){
          nums(j) = count;
          count = 0;
          j++;
      }
  }
  // Handle last value
  nums(j) = count+1;
  """
  weave.inline(code,
      ['data', 'nums'],
      extra_compile_args=['-O2'],
      type_converters=weave.converters.blitz)
  return uniq, nums

Profile info

> %timeit count_unique(data)
> 10000 loops, best of 3: 55.1 µs per loop

Eelco’s pure numpy version:

> %timeit unique_count(data)
> 1000 loops, best of 3: 284 µs per loop

Note

There’s redundancy here (unique performs a sort also), meaning that the code could probably be further optimized by putting the unique functionality inside the c-code loop.


回答 10

有一个老问题,但是我想提供自己的解决方案,该解决方案是最快的,根据我的基准测试,使用常规list而不是np.array输入(或首先转移到列表)。

如果也遇到了,请检查一下

def count(a):
    results = {}
    for x in a:
        if x not in results:
            results[x] = 1
        else:
            results[x] += 1
    return results

例如,

>>>timeit count([1,1,1,2,2,2,5,25,1,1]) would return:

100000个循环,每个循环最好为3:2.26 µs

>>>timeit count(np.array([1,1,1,2,2,2,5,25,1,1]))

100000个循环,最佳3:每个循环8.8 µs

>>>timeit count(np.array([1,1,1,2,2,2,5,25,1,1]).tolist())

100000次循环,每循环3:5.85 µs最佳

虽然可接受的答案会更慢,但scipy.stats.itemfreq解决方案甚至更糟。


更深入的测试并没有证实制定的期望。

from zmq import Stopwatch
aZmqSTOPWATCH = Stopwatch()

aDataSETasARRAY = ( 100 * abs( np.random.randn( 150000 ) ) ).astype( np.int )
aDataSETasLIST  = aDataSETasARRAY.tolist()

import numba
@numba.jit
def numba_bincount( anObject ):
    np.bincount(    anObject )
    return

aZmqSTOPWATCH.start();np.bincount(    aDataSETasARRAY );aZmqSTOPWATCH.stop()
14328L

aZmqSTOPWATCH.start();numba_bincount( aDataSETasARRAY );aZmqSTOPWATCH.stop()
592L

aZmqSTOPWATCH.start();count(          aDataSETasLIST  );aZmqSTOPWATCH.stop()
148609L

参考 以下是有关影响小型数据集的大规模重复测试结果的缓存和RAM中其他副作用的评论。

Old question, but I’d like to provide my own solution which turn out to be the fastest, use normal list instead of np.array as input (or transfer to list firstly), based on my bench test.

Check it out if you encounter it as well.

def count(a):
    results = {}
    for x in a:
        if x not in results:
            results[x] = 1
        else:
            results[x] += 1
    return results

For example,

>>>timeit count([1,1,1,2,2,2,5,25,1,1]) would return:

100000 loops, best of 3: 2.26 µs per loop

>>>timeit count(np.array([1,1,1,2,2,2,5,25,1,1]))

100000 loops, best of 3: 8.8 µs per loop

>>>timeit count(np.array([1,1,1,2,2,2,5,25,1,1]).tolist())

100000 loops, best of 3: 5.85 µs per loop

While the accepted answer would be slower, and the scipy.stats.itemfreq solution is even worse.


A more indepth testing did not confirm the formulated expectation.

from zmq import Stopwatch
aZmqSTOPWATCH = Stopwatch()

aDataSETasARRAY = ( 100 * abs( np.random.randn( 150000 ) ) ).astype( np.int )
aDataSETasLIST  = aDataSETasARRAY.tolist()

import numba
@numba.jit
def numba_bincount( anObject ):
    np.bincount(    anObject )
    return

aZmqSTOPWATCH.start();np.bincount(    aDataSETasARRAY );aZmqSTOPWATCH.stop()
14328L

aZmqSTOPWATCH.start();numba_bincount( aDataSETasARRAY );aZmqSTOPWATCH.stop()
592L

aZmqSTOPWATCH.start();count(          aDataSETasLIST  );aZmqSTOPWATCH.stop()
148609L

Ref. comments below on cache and other in-RAM side-effects that influence a small dataset massively repetitive testing results.


回答 11

这样的事情应该做到:

#create 100 random numbers
arr = numpy.random.random_integers(0,50,100)

#create a dictionary of the unique values
d = dict([(i,0) for i in numpy.unique(arr)])
for number in arr:
    d[j]+=1   #increment when that value is found

另外,除非我缺少某些内容,否则上一篇有关 有效计数唯一元素的文章似乎与您的问题非常相似。

some thing like this should do it:

#create 100 random numbers
arr = numpy.random.random_integers(0,50,100)

#create a dictionary of the unique values
d = dict([(i,0) for i in numpy.unique(arr)])
for number in arr:
    d[j]+=1   #increment when that value is found

Also, this previous post on Efficiently counting unique elements seems pretty similar to your question, unless I’m missing something.


回答 12

多维频率计数,即计数数组。

>>> print(color_array    )
  array([[255, 128, 128],
   [255, 128, 128],
   [255, 128, 128],
   ...,
   [255, 128, 128],
   [255, 128, 128],
   [255, 128, 128]], dtype=uint8)


>>> np.unique(color_array,return_counts=True,axis=0)
  (array([[ 60, 151, 161],
    [ 60, 155, 162],
    [ 60, 159, 163],
    [ 61, 143, 162],
    [ 61, 147, 162],
    [ 61, 162, 163],
    [ 62, 166, 164],
    [ 63, 137, 162],
    [ 63, 169, 164],
   array([     1,      2,      2,      1,      4,      1,      1,      2,
         3,      1,      1,      1,      2,      5,      2,      2,
       898,      1,      1,  

multi-dimentional frequency count, i.e. counting arrays.

>>> print(color_array    )
  array([[255, 128, 128],
   [255, 128, 128],
   [255, 128, 128],
   ...,
   [255, 128, 128],
   [255, 128, 128],
   [255, 128, 128]], dtype=uint8)


>>> np.unique(color_array,return_counts=True,axis=0)
  (array([[ 60, 151, 161],
    [ 60, 155, 162],
    [ 60, 159, 163],
    [ 61, 143, 162],
    [ 61, 147, 162],
    [ 61, 162, 163],
    [ 62, 166, 164],
    [ 63, 137, 162],
    [ 63, 169, 164],
   array([     1,      2,      2,      1,      4,      1,      1,      2,
         3,      1,      1,      1,      2,      5,      2,      2,
       898,      1,      1,  

回答 13

import pandas as pd
import numpy as np

print(pd.Series(name_of_array).value_counts())
import pandas as pd
import numpy as np

print(pd.Series(name_of_array).value_counts())

回答 14

from collections import Counter
x = array( [1,1,1,2,2,2,5,25,1,1] )
mode = counter.most_common(1)[0][0]
from collections import Counter
x = array( [1,1,1,2,2,2,5,25,1,1] )
mode = counter.most_common(1)[0][0]

numpy中的ndarray和array有什么区别?

问题:numpy中的ndarray和array有什么区别?

ndarrayarrayNumpy有什么区别?我在哪里可以找到numpy源代码中的实现?

What is the difference between ndarray and array in Numpy? And where can I find the implementations in the numpy source code?


回答 0

numpy.array只是创建一个便利函数ndarray; 它本身不是类。

您也可以使用创建数组numpy.ndarray,但不建议这样做。来自的文档字符串numpy.ndarray

阵列应该使用来构造arrayzerosempty…这里给出的参数是指低级方法(ndarray(...)用于实例化阵列)。

实现的大部分内容都在C代码中(在multiarray中),但是您可以在这里开始查看ndarray接口:

https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py

numpy.array is just a convenience function to create an ndarray; it is not a class itself.

You can also create an array using numpy.ndarray, but it is not the recommended way. From the docstring of numpy.ndarray:

Arrays should be constructed using array, zeros or empty … The parameters given here refer to a low-level method (ndarray(...)) for instantiating an array.

Most of the meat of the implementation is in C code, here in multiarray, but you can start looking at the ndarray interfaces here:

https://github.com/numpy/numpy/blob/master/numpy/core/numeric.py


回答 1

numpy.array是一个返回的函数numpy.ndarray。没有对象类型numpy.array。

numpy.array is a function that returns a numpy.ndarray. There is no object type numpy.array.


回答 2

只需几行示例代码即可显示numpy.array和numpy.ndarray之间的区别

热身步骤:构建列表

a = [1,2,3]

检查类型

print(type(a))

你会得到

<class 'list'>

使用np.array构造一个数组(从列表中)

a = np.array(a)

或者,您可以跳过热身步骤,直接进行

a = np.array([1,2,3])

检查类型

print(type(a))

你会得到

<class 'numpy.ndarray'>

告诉你numpy数组的类型是numpy.ndarray

您还可以通过以下方式检查类型

isinstance(a, (np.ndarray))

你会得到

True

以下两行均会给您一条错误消息

np.ndarray(a)                # should be np.array(a)
isinstance(a, (np.array))    # should be isinstance(a, (np.ndarray))

Just a few lines of example code to show the difference between numpy.array and numpy.ndarray

Warm up step: Construct a list

a = [1,2,3]

Check the type

print(type(a))

You will get

<class 'list'>

Construct an array (from a list) using np.array

a = np.array(a)

Or, you can skip the warm up step, directly have

a = np.array([1,2,3])

Check the type

print(type(a))

You will get

<class 'numpy.ndarray'>

which tells you the type of the numpy array is numpy.ndarray

You can also check the type by

isinstance(a, (np.ndarray))

and you will get

True

Either of the following two lines will give you an error message

np.ndarray(a)                # should be np.array(a)
isinstance(a, (np.array))    # should be isinstance(a, (np.ndarray))

回答 3

numpy.ndarray()是一个类,numpy.array()而是要创建的方法/函数ndarray

在numpy docs中,如果您想从ndarray类创建数组,则可以使用以下两种方式进行引用:

1-使用array()zeros()empty()方法: 阵列应该使用数组,零或空构造(参考也参见下文部分)。此处给出的参数指的是ndarray(…)用于实例化数组的低级方法()。

2- ndarray直接来自类: 有两种创建数组的方式__new__:使用:如果buffer为None,则仅使用shape,dtype和order。如果buffer是暴露buffer接口的对象,则将解释所有关键字。

下面的示例给出了一个随机数组,因为我们没有分配缓冲区值:

np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)

array([[ -1.13698227e+002,   4.25087011e-303],
       [  2.88528414e-306,   3.27025015e-309]])         #random

另一个示例是将数组对象分配给缓冲区示例:

>>> np.ndarray((2,), buffer=np.array([1,2,3]),
...            offset=np.int_().itemsize,
...            dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])

从上面的示例中,我们注意到我们无法为“缓冲区”分配列表,我们不得不使用numpy.array()返回缓冲区的ndarray对象

结论:numpy.array()如果要制造numpy.ndarray()物体,则使用“

numpy.ndarray() is a class, while numpy.array() is a method / function to create ndarray.

In numpy docs if you want to create an array from ndarray class you can do it with 2 ways as quoted:

1- using array(), zeros() or empty() methods: Arrays should be constructed using array, zeros or empty (refer to the See Also section below). The parameters given here refer to a low-level method (ndarray(…)) for instantiating an array.

2- from ndarray class directly: There are two modes of creating an array using __new__: If buffer is None, then only shape, dtype, and order are used. If buffer is an object exposing the buffer interface, then all keywords are interpreted.

The example below gives a random array because we didn’t assign buffer value:

np.ndarray(shape=(2,2), dtype=float, order='F', buffer=None)

array([[ -1.13698227e+002,   4.25087011e-303],
       [  2.88528414e-306,   3.27025015e-309]])         #random

another example is to assign array object to the buffer example:

>>> np.ndarray((2,), buffer=np.array([1,2,3]),
...            offset=np.int_().itemsize,
...            dtype=int) # offset = 1*itemsize, i.e. skip first element
array([2, 3])

from above example we notice that we can’t assign a list to “buffer” and we had to use numpy.array() to return ndarray object for the buffer

Conclusion: use numpy.array() if you want to make a numpy.ndarray() object”


回答 4

我认为与np.array()您一起只能创建C,尽管您提到了该命令,但是当您使用np.isfortran()它检查时说是false。但是,np.ndarrray()当您指定订单时,它将根据提供的订单创建订单。

I think with np.array() you can only create C like though you mention the order, when you check using np.isfortran() it says false. but with np.ndarrray() when you specify the order it creates based on the order provided.


如何从多维数组中提取列?

问题:如何从多维数组中提取列?

有人知道如何在Python中从多维数组中提取列吗?

Does anybody know how to extract a column from a multi-dimensional array in Python?


回答 0

>>> import numpy as np
>>> A = np.array([[1,2,3,4],[5,6,7,8]])

>>> A
array([[1, 2, 3, 4],
    [5, 6, 7, 8]])

>>> A[:,2] # returns the third columm
array([3, 7])

另请参阅:“ numpy.arange”和“ reshape”以分配内存

示例:(使用矩阵整形(3×4)分配数组)

nrows = 3
ncols = 4
my_array = numpy.arange(nrows*ncols, dtype='double')
my_array = my_array.reshape(nrows, ncols)
>>> import numpy as np
>>> A = np.array([[1,2,3,4],[5,6,7,8]])

>>> A
array([[1, 2, 3, 4],
    [5, 6, 7, 8]])

>>> A[:,2] # returns the third columm
array([3, 7])

See also: “numpy.arange” and “reshape” to allocate memory

Example: (Allocating a array with shaping of matrix (3×4))

nrows = 3
ncols = 4
my_array = numpy.arange(nrows*ncols, dtype='double')
my_array = my_array.reshape(nrows, ncols)

回答 1

可能是您正在使用NumPy数组吗?Python具有数组模块,但是不支持多维数组。普通的Python列表也是一维的。

但是,如果您有一个简单的二维列表,例如:

A = [[1,2,3,4],
     [5,6,7,8]]

然后您可以提取一个像这样的列:

def column(matrix, i):
    return [row[i] for row in matrix]

提取第二列(索引1):

>>> column(A, 1)
[2, 6]

或者,简单地:

>>> [row[1] for row in A]
[2, 6]

Could it be that you’re using a NumPy array? Python has the array module, but that does not support multi-dimensional arrays. Normal Python lists are single-dimensional too.

However, if you have a simple two-dimensional list like this:

A = [[1,2,3,4],
     [5,6,7,8]]

then you can extract a column like this:

def column(matrix, i):
    return [row[i] for row in matrix]

Extracting the second column (index 1):

>>> column(A, 1)
[2, 6]

Or alternatively, simply:

>>> [row[1] for row in A]
[2, 6]

回答 2

如果你有一个像

a = [[1, 2], [2, 3], [3, 4]]

然后像这样提取第一列:

[row[0] for row in a]

所以结果看起来像这样:

[1, 2, 3]

If you have an array like

a = [[1, 2], [2, 3], [3, 4]]

Then you extract the first column like that:

[row[0] for row in a]

So the result looks like this:

[1, 2, 3]

回答 3

看看这个!

a = [[1, 2], [2, 3], [3, 4]]
a2 = zip(*a)
a2[0]

它与上面相同,只是它使zip工作更整洁,但需要单个数组作为参数,* a语法将多维数组解压缩为单个数组参数

check it out!

a = [[1, 2], [2, 3], [3, 4]]
a2 = zip(*a)
a2[0]

it is the same thing as above except somehow it is neater the zip does the work but requires single arrays as arguments, the *a syntax unpacks the multidimensional array into single array arguments


回答 4

def get_col(arr, col):
    return map(lambda x : x[col], arr)

a = [[1,2,3,4], [5,6,7,8], [9,10,11,12],[13,14,15,16]]

print get_col(a, 3)

Python中的map函数是另一种方法。

def get_col(arr, col):
    return map(lambda x : x[col], arr)

a = [[1,2,3,4], [5,6,7,8], [9,10,11,12],[13,14,15,16]]

print get_col(a, 3)

map function in Python is another way to go.


回答 5

>>> x = arange(20).reshape(4,5)
>>> x array([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])

如果您想使用第二栏,可以使用

>>> x[:, 1]
array([ 1,  6, 11, 16])
>>> x = arange(20).reshape(4,5)
>>> x array([[ 0,  1,  2,  3,  4],
        [ 5,  6,  7,  8,  9],
        [10, 11, 12, 13, 14],
        [15, 16, 17, 18, 19]])

if you want the second column you can use

>>> x[:, 1]
array([ 1,  6, 11, 16])

回答 6

[matrix[i][column] for i in range(len(matrix))]
[matrix[i][column] for i in range(len(matrix))]

回答 7

如果您喜欢map-reduce样式的python,而不是列表推导,itemgetter运算符也可以提供帮助!

# tested in 2.4
from operator import itemgetter
def column(matrix,i):
    f = itemgetter(i)
    return map(f,matrix)

M = [range(x,x+5) for x in range(10)]
assert column(M,1) == range(1,11)

The itemgetter operator can help too, if you like map-reduce style python, rather than list comprehensions, for a little variety!

# tested in 2.4
from operator import itemgetter
def column(matrix,i):
    f = itemgetter(i)
    return map(f,matrix)

M = [range(x,x+5) for x in range(10)]
assert column(M,1) == range(1,11)

回答 8

您也可以使用此:

values = np.array([[1,2,3],[4,5,6]])
values[...,0] # first column
#[1,4]

注意:这不适用于内置数组且未对齐(例如np.array([[1,2,3],[4,5,6,7]]))

You can use this as well:

values = np.array([[1,2,3],[4,5,6]])
values[...,0] # first column
#[1,4]

Note: This is not working for built-in array and not aligned (e.g. np.array([[1,2,3],[4,5,6,7]]) )


回答 9

我想您想从数组(例如下面的数组)中提取列

import numpy as np
A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

现在,如果要获取格式的第三列

D=array[[3],
[7],
[11]]

然后,您需要首先使数组成为矩阵

B=np.asmatrix(A)
C=B[:,2]
D=asarray(C)

现在,您可以像在excel中一样进行元素明智的计算。

I think you want to extract a column from an array such as an array below

import numpy as np
A = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])

Now if you want to get the third column in the format

D=array[[3],
[7],
[11]]

Then you need to first make the array a matrix

B=np.asmatrix(A)
C=B[:,2]
D=asarray(C)

And now you can do element wise calculations much like you would do in excel.


回答 10

假设我们有n X m矩阵(n行和m列)说5行和4列

matrix = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16],[17,18,19,20]]

要提取python中的列,我们可以像这样使用列表理解

[ [row[i] for row in matrix] for in range(4) ]

您可以将矩阵中的列数替换为4。结果是

[ [1,5,9,13,17],[2,10,14,18],[3,7,11,15,19],[4,8,12,16,20] ]

let’s say we have n X m matrix(n rows and m columns) say 5 rows and 4 columns

matrix = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16],[17,18,19,20]]

To extract the columns in python, we can use list comprehension like this

[ [row[i] for row in matrix] for in range(4) ]

You can replace 4 by whatever number of columns your matrix has. The result is

[ [1,5,9,13,17],[2,10,14,18],[3,7,11,15,19],[4,8,12,16,20] ]


回答 11

array = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]

col1 = [val[1] for val in array]
col2 = [val[2] for val in array]
col3 = [val[3] for val in array]
col4 = [val[4] for val in array]
print(col1)
print(col2)
print(col3)
print(col4)

Output:
[1, 5, 9, 13]
[2, 6, 10, 14]
[3, 7, 11, 15]
[4, 8, 12, 16]
array = [[1,2,3,4],[5,6,7,8],[9,10,11,12],[13,14,15,16]]

col1 = [val[1] for val in array]
col2 = [val[2] for val in array]
col3 = [val[3] for val in array]
col4 = [val[4] for val in array]
print(col1)
print(col2)
print(col3)
print(col4)

Output:
[1, 5, 9, 13]
[2, 6, 10, 14]
[3, 7, 11, 15]
[4, 8, 12, 16]

回答 12

使用矩阵的另一种方法

>>> from numpy import matrix
>>> a = [ [1,2,3],[4,5,6],[7,8,9] ]
>>> matrix(a).transpose()[1].getA()[0]
array([2, 5, 8])
>>> matrix(a).transpose()[0].getA()[0]
array([1, 4, 7])

One more way using matrices

>>> from numpy import matrix
>>> a = [ [1,2,3],[4,5,6],[7,8,9] ]
>>> matrix(a).transpose()[1].getA()[0]
array([2, 5, 8])
>>> matrix(a).transpose()[0].getA()[0]
array([1, 4, 7])

回答 13

如果您在Python中有一个二维数组(不是numpy),则可以像这样提取所有列,

data = [
['a', 1, 2], 
['b', 3, 4], 
['c', 5, 6]
]

columns = list(zip(*data))

print("column[0] = {}".format(columns[0]))
print("column[1] = {}".format(columns[1]))
print("column[2] = {}".format(columns[2]))

执行此代码会产生

>>> print("column[0] = {}".format(columns[0]))
column[0] = ('a', 'b', 'c')

>>> print("column[1] = {}".format(columns[1]))
column[1] = (1, 3, 5)

>>> print("column[2] = {}".format(columns[2]))
column[2] = (2, 4, 6)

当然,您可以按索引提取单个列(例如columns[0]

If you have a two-dimensional array in Python (not numpy), you can extract all the columns like so,

data = [
['a', 1, 2], 
['b', 3, 4], 
['c', 5, 6]
]

columns = list(zip(*data))

print("column[0] = {}".format(columns[0]))
print("column[1] = {}".format(columns[1]))
print("column[2] = {}".format(columns[2]))

Executing this code will yield,

>>> print("column[0] = {}".format(columns[0]))
column[0] = ('a', 'b', 'c')

>>> print("column[1] = {}".format(columns[1]))
column[1] = (1, 3, 5)

>>> print("column[2] = {}".format(columns[2]))
column[2] = (2, 4, 6)

Of course, you can extract a single column by index (e.g. columns[0])


回答 14

尽管使用zip(*iterable)了转置嵌套列表,但是如果嵌套列表的长度不同,也可以使用以下内容:

map(None, *[(1,2,3,), (4,5,), (6,)])

结果是:

[(1, 4, 6), (2, 5, None), (3, None, None)]

因此,第一列是:

map(None, *[(1,2,3,), (4,5,), (6,)])[0]
#>(1, 4, 6)

Despite using zip(*iterable) to transpose a nested list, you can also use the following if the nested lists vary in length:

map(None, *[(1,2,3,), (4,5,), (6,)])

results in:

[(1, 4, 6), (2, 5, None), (3, None, None)]

The first column is thus:

map(None, *[(1,2,3,), (4,5,), (6,)])[0]
#>(1, 4, 6)

回答 15

好吧,有点晚…

如果性能很重要并且您的数据呈矩形,则也可以将其存储为一维,并通过常规切片访问列,例如…

A = [[1,2,3,4],[5,6,7,8]]     #< assume this 4x2-matrix
B = reduce( operator.add, A ) #< get it one-dimensional

def column1d( matrix, dimX, colIdx ):
  return matrix[colIdx::dimX]

def row1d( matrix, dimX, rowIdx ):
  return matrix[rowIdx:rowIdx+dimX] 

>>> column1d( B, 4, 1 )
[2, 6]
>>> row1d( B, 4, 1 )
[2, 3, 4, 5]

整洁的是,这真的很快。但是,负索引在这里不起作用!因此,您无法按索引-1访问最后一列或最后一行。

如果需要负索引,可以稍微调整访问器功能,例如

def column1d( matrix, dimX, colIdx ):
  return matrix[colIdx % dimX::dimX]

def row1d( matrix, dimX, dimY, rowIdx ):
  rowIdx = (rowIdx % dimY) * dimX
  return matrix[rowIdx:rowIdx+dimX]

Well a ‘bit’ late …

In case performance matters and your data is shaped rectangular, you might also store it in one dimension and access the columns by regular slicing e.g. …

A = [[1,2,3,4],[5,6,7,8]]     #< assume this 4x2-matrix
B = reduce( operator.add, A ) #< get it one-dimensional

def column1d( matrix, dimX, colIdx ):
  return matrix[colIdx::dimX]

def row1d( matrix, dimX, rowIdx ):
  return matrix[rowIdx:rowIdx+dimX] 

>>> column1d( B, 4, 1 )
[2, 6]
>>> row1d( B, 4, 1 )
[2, 3, 4, 5]

The neat thing is this is really fast. However, negative indexes don’t work here! So you can’t access the last column or row by index -1.

If you need negative indexing you can tune the accessor-functions a bit, e.g.

def column1d( matrix, dimX, colIdx ):
  return matrix[colIdx % dimX::dimX]

def row1d( matrix, dimX, dimY, rowIdx ):
  rowIdx = (rowIdx % dimY) * dimX
  return matrix[rowIdx:rowIdx+dimX]

回答 16

如果要获取不止一列,请使用slice:

 a = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
    print(a[:, [1, 2]])
[[2 3]
[5 6]
[8 9]]

If you want to grab more than just one column just use slice:

 a = np.array([[1, 2, 3],[4, 5, 6],[7, 8, 9]])
    print(a[:, [1, 2]])
[[2 3]
[5 6]
[8 9]]

回答 17

我更喜欢下一条提示:将矩阵命名为matrix_ause column_number,例如:

import numpy as np
matrix_a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
column_number=2

# you can get the row from transposed matrix - it will be a column:
col=matrix_a.transpose()[column_number]

I prefer the next hint: having the matrix named matrix_a and use column_number, for example:

import numpy as np
matrix_a = np.array([[1,2,3,4],[5,6,7,8],[9,10,11,12]])
column_number=2

# you can get the row from transposed matrix - it will be a column:
col=matrix_a.transpose()[column_number]

回答 18

只需使用transpose(),然后您就可以像获得行一样轻松获得列

matrix=np.array(originalMatrix).transpose()
print matrix[NumberOfColum]

Just use transpose(), then you can get the colummns as easy as you get rows

matrix=np.array(originalMatrix).transpose()
print matrix[NumberOfColum]

回答 19

矩阵中的所有列到新列表中:

N = len(matrix) 
column_list = [ [matrix[row][column] for row in range(N)] for column in range(N) ]

All columns from a matrix into a new list:

N = len(matrix) 
column_list = [ [matrix[row][column] for row in range(N)] for column in range(N) ]

NumPy数组初始化(使用相同的值填充)

问题:NumPy数组初始化(使用相同的值填充)

我需要创建一个长度为NumPy的数组n,其中每个元素为v

还有什么比:

a = empty(n)
for i in range(n):
    a[i] = v

我知道zeros并且ones可以在v = 0,1下使用。我可以使用v * ones(n),但是vis 上将不起作用None,而且速度会慢很多。

I need to create a NumPy array of length n, each element of which is v.

Is there anything better than:

a = empty(n)
for i in range(n):
    a[i] = v

I know zeros and ones would work for v = 0, 1. I could use v * ones(n), but it won’t work when v is None, and also would be much slower.


回答 0

NumPy的1.8引入np.full(),这是比更直接的方法empty(),接着fill()用于创建填充有一定值的数组:

>>> np.full((3, 5), 7)
array([[ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.]])

>>> np.full((3, 5), 7, dtype=int)
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

可以说这是创建一个填充有某些值的数组方法,因为它明确描述了要实现的目标(并且从原理上讲,它可以执行非常具体的任务,因此非常高效)。

NumPy 1.8 introduced np.full(), which is a more direct method than empty() followed by fill() for creating an array filled with a certain value:

>>> np.full((3, 5), 7)
array([[ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.],
       [ 7.,  7.,  7.,  7.,  7.]])

>>> np.full((3, 5), 7, dtype=int)
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

This is arguably the way of creating an array filled with certain values, because it explicitly describes what is being achieved (and it can in principle be very efficient since it performs a very specific task).


回答 1

已为Numpy 1.7.0更新:(@ Rolf Bartstra的提示)。

a=np.empty(n); a.fill(5) 最快。

以降序排列:

%timeit a=np.empty(1e4); a.fill(5)
100000 loops, best of 3: 5.85 us per loop

%timeit a=np.empty(1e4); a[:]=5 
100000 loops, best of 3: 7.15 us per loop

%timeit a=np.ones(1e4)*5
10000 loops, best of 3: 22.9 us per loop

%timeit a=np.repeat(5,(1e4))
10000 loops, best of 3: 81.7 us per loop

%timeit a=np.tile(5,[1e4])
10000 loops, best of 3: 82.9 us per loop

Updated for Numpy 1.7.0:(Hat-tip to @Rolf Bartstra.)

a=np.empty(n); a.fill(5) is fastest.

In descending speed order:

%timeit a=np.empty(1e4); a.fill(5)
100000 loops, best of 3: 5.85 us per loop

%timeit a=np.empty(1e4); a[:]=5 
100000 loops, best of 3: 7.15 us per loop

%timeit a=np.ones(1e4)*5
10000 loops, best of 3: 22.9 us per loop

%timeit a=np.repeat(5,(1e4))
10000 loops, best of 3: 81.7 us per loop

%timeit a=np.tile(5,[1e4])
10000 loops, best of 3: 82.9 us per loop

回答 2

我相信这fill是最快的方法。

a = np.empty(10)
a.fill(7)

您还应该始终避免像在示例中那样进行迭代。一个简单的a[:] = v函数将使用numpy 广播来完成您的迭代操作。

I believe fill is the fastest way to do this.

a = np.empty(10)
a.fill(7)

You should also always avoid iterating like you are doing in your example. A simple a[:] = v will accomplish what your iteration does using numpy broadcasting.


回答 3

显然,不仅绝对速度而且速度顺序(如user1579844所报告)均取决于机器。这是我发现的:

a=np.empty(1e4); a.fill(5) 最快

以降序排列:

timeit a=np.empty(1e4); a.fill(5) 
# 100000 loops, best of 3: 10.2 us per loop
timeit a=np.empty(1e4); a[:]=5
# 100000 loops, best of 3: 16.9 us per loop
timeit a=np.ones(1e4)*5
# 100000 loops, best of 3: 32.2 us per loop
timeit a=np.tile(5,[1e4])
# 10000 loops, best of 3: 90.9 us per loop
timeit a=np.repeat(5,(1e4))
# 10000 loops, best of 3: 98.3 us per loop
timeit a=np.array([5]*int(1e4))
# 1000 loops, best of 3: 1.69 ms per loop (slowest BY FAR!)

因此,请尝试找出并使用平台上最快的功能。

Apparently, not only the absolute speeds but also the speed order (as reported by user1579844) are machine dependent; here’s what I found:

a=np.empty(1e4); a.fill(5) is fastest;

In descending speed order:

timeit a=np.empty(1e4); a.fill(5) 
# 100000 loops, best of 3: 10.2 us per loop
timeit a=np.empty(1e4); a[:]=5
# 100000 loops, best of 3: 16.9 us per loop
timeit a=np.ones(1e4)*5
# 100000 loops, best of 3: 32.2 us per loop
timeit a=np.tile(5,[1e4])
# 10000 loops, best of 3: 90.9 us per loop
timeit a=np.repeat(5,(1e4))
# 10000 loops, best of 3: 98.3 us per loop
timeit a=np.array([5]*int(1e4))
# 1000 loops, best of 3: 1.69 ms per loop (slowest BY FAR!)

So, try and find out, and use what’s fastest on your platform.


回答 4

我有

numpy.array(n * [value])

请记住,但是显然,这比所有其他建议都足够慢n

这是与perfplot(我的一个宠物项目)的完整比较。

在此处输入图片说明

这两种empty选择仍然是最快的(使用NumPy 1.12.1)。full赶上大型阵列。


生成绘图的代码:

import numpy as np
import perfplot


def empty_fill(n):
    a = np.empty(n)
    a.fill(3.14)
    return a


def empty_colon(n):
    a = np.empty(n)
    a[:] = 3.14
    return a


def ones_times(n):
    return 3.14 * np.ones(n)


def repeat(n):
    return np.repeat(3.14, (n))


def tile(n):
    return np.repeat(3.14, [n])


def full(n):
    return np.full((n), 3.14)


def list_to_array(n):
    return np.array(n * [3.14])


perfplot.show(
    setup=lambda n: n,
    kernels=[empty_fill, empty_colon, ones_times, repeat, tile, full, list_to_array],
    n_range=[2 ** k for k in range(27)],
    xlabel="len(a)",
    logx=True,
    logy=True,
)

I had

numpy.array(n * [value])

in mind, but apparently that is slower than all other suggestions for large enough n.

Here is full comparison with perfplot (a pet project of mine).

enter image description here

The two empty alternatives are still the fastest (with NumPy 1.12.1). full catches up for large arrays.


Code to generate the plot:

import numpy as np
import perfplot


def empty_fill(n):
    a = np.empty(n)
    a.fill(3.14)
    return a


def empty_colon(n):
    a = np.empty(n)
    a[:] = 3.14
    return a


def ones_times(n):
    return 3.14 * np.ones(n)


def repeat(n):
    return np.repeat(3.14, (n))


def tile(n):
    return np.repeat(3.14, [n])


def full(n):
    return np.full((n), 3.14)


def list_to_array(n):
    return np.array(n * [3.14])


perfplot.show(
    setup=lambda n: n,
    kernels=[empty_fill, empty_colon, ones_times, repeat, tile, full, list_to_array],
    n_range=[2 ** k for k in range(27)],
    xlabel="len(a)",
    logx=True,
    logy=True,
)

回答 5

您可以使用numpy.tile,例如:

v = 7
rows = 3
cols = 5
a = numpy.tile(v, (rows,cols))
a
Out[1]: 
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

尽管tile是为了“平铺”一个数组(而不是这种情况下的标量),但它可以完成工作,创建任何大小和尺寸的预填充数组。

You can use numpy.tile, e.g. :

v = 7
rows = 3
cols = 5
a = numpy.tile(v, (rows,cols))
a
Out[1]: 
array([[7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7],
       [7, 7, 7, 7, 7]])

Although tile is meant to ’tile’ an array (instead of a scalar, as in this case), it will do the job, creating pre-filled arrays of any size and dimension.


回答 6

没有numpy的

>>>[2]*3
[2, 2, 2]

without numpy

>>>[2]*3
[2, 2, 2]

如何将numpy数组转换为(并显示)图像?

问题:如何将numpy数组转换为(并显示)图像?

我因此创建了一个数组:

import numpy as np
data = np.zeros( (512,512,3), dtype=np.uint8)
data[256,256] = [255,0,0]

我要执行的操作是在512×512图像的中心显示一个红点。(至少从…开始,我想我可以从那里找出其余的内容)

I have created an array thusly:

import numpy as np
data = np.zeros( (512,512,3), dtype=np.uint8)
data[256,256] = [255,0,0]

What I want this to do is display a single red dot in the center of a 512×512 image. (At least to begin with… I think I can figure out the rest from there)


回答 0

您可以使用PIL创建(并显示)图像:

from PIL import Image
import numpy as np

w, h = 512, 512
data = np.zeros((h, w, 3), dtype=np.uint8)
data[0:256, 0:256] = [255, 0, 0] # red patch in upper left
img = Image.fromarray(data, 'RGB')
img.save('my.png')
img.show()

You could use PIL to create (and display) an image:

from PIL import Image
import numpy as np

w, h = 512, 512
data = np.zeros((h, w, 3), dtype=np.uint8)
data[0:256, 0:256] = [255, 0, 0] # red patch in upper left
img = Image.fromarray(data, 'RGB')
img.save('my.png')
img.show()

回答 1

以下应该工作:

from matplotlib import pyplot as plt
plt.imshow(data, interpolation='nearest')
plt.show()

如果您使用的是Jupyter笔记本/实验室,请在导入matplotlib之前使用以下内联命令:

%matplotlib inline 

The following should work:

from matplotlib import pyplot as plt
plt.imshow(data, interpolation='nearest')
plt.show()

If you are using Jupyter notebook/lab, use this inline command before importing matplotlib:

%matplotlib inline 

回答 2

最短的路径是使用scipy,如下所示:

from scipy.misc import toimage
toimage(data).show()

这也需要安装PIL或Pillow。

同样需要PIL或Pillow但可以调用其他查看器的类似方法是:

from scipy.misc import imshow
imshow(data)

Shortest path is to use scipy, like this:

from scipy.misc import toimage
toimage(data).show()

This requires PIL or Pillow to be installed as well.

A similar approach also requiring PIL or Pillow but which may invoke a different viewer is:

from scipy.misc import imshow
imshow(data)

回答 3

使用pygame,您可以打开一个窗口,以像素阵列的形式获取表面,然后从那里进行操作。但是,您需要将numpy数组复制到Surface数组中,这比在pygame Surface本身上进行实际图形操作要慢得多。

Using pygame, you can open a window, get the surface as an array of pixels, and manipulate as you want from there. You’ll need to copy your numpy array into the surface array, however, which will be much slower than doing actual graphics operations on the pygame surfaces themselves.


回答 4

如何使用示例显示存储在numpy数组中的图像(在Jupyter笔记本中有效)

我知道有更简单的答案,但是这一答案将使您了解如何从numpy数组中淹没图像。

加载示例

from sklearn.datasets import load_digits
digits = load_digits()
digits.images.shape   #this will give you (1797, 8, 8). 1797 images, each 8 x 8 in size

显示一幅图像的阵列

digits.images[0]
array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.],
       [ 0.,  0., 13., 15., 10., 15.,  5.,  0.],
       [ 0.,  3., 15.,  2.,  0., 11.,  8.,  0.],
       [ 0.,  4., 12.,  0.,  0.,  8.,  8.,  0.],
       [ 0.,  5.,  8.,  0.,  0.,  9.,  8.,  0.],
       [ 0.,  4., 11.,  0.,  1., 12.,  7.,  0.],
       [ 0.,  2., 14.,  5., 10., 12.,  0.,  0.],
       [ 0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

创建空的10 x 10子图以可视化100张图像

import matplotlib.pyplot as plt
fig, axes = plt.subplots(10,10, figsize=(8,8))

绘制100张图像

for i,ax in enumerate(axes.flat):
    ax.imshow(digits.images[i])

结果:

在此处输入图片说明

怎么axes.flat办? 它创建了numpy枚举器,因此您可以在轴上迭代以在其上绘制对象。 例:

import numpy as np
x = np.arange(6).reshape(2,3)
x.flat
for item in (x.flat):
    print (item, end=' ')

How to show images stored in numpy array with example (works in Jupyter notebook)

I know there are simpler answers but this one will give you understanding of how images are actually drawn from a numpy array.

Load example

from sklearn.datasets import load_digits
digits = load_digits()
digits.images.shape   #this will give you (1797, 8, 8). 1797 images, each 8 x 8 in size

Display array of one image

digits.images[0]
array([[ 0.,  0.,  5., 13.,  9.,  1.,  0.,  0.],
       [ 0.,  0., 13., 15., 10., 15.,  5.,  0.],
       [ 0.,  3., 15.,  2.,  0., 11.,  8.,  0.],
       [ 0.,  4., 12.,  0.,  0.,  8.,  8.,  0.],
       [ 0.,  5.,  8.,  0.,  0.,  9.,  8.,  0.],
       [ 0.,  4., 11.,  0.,  1., 12.,  7.,  0.],
       [ 0.,  2., 14.,  5., 10., 12.,  0.,  0.],
       [ 0.,  0.,  6., 13., 10.,  0.,  0.,  0.]])

Create empty 10 x 10 subplots for visualizing 100 images

import matplotlib.pyplot as plt
fig, axes = plt.subplots(10,10, figsize=(8,8))

Plotting 100 images

for i,ax in enumerate(axes.flat):
    ax.imshow(digits.images[i])

Result:

enter image description here

What does axes.flat do? It creates a numpy enumerator so you can iterate over axis in order to draw objects on them. Example:

import numpy as np
x = np.arange(6).reshape(2,3)
x.flat
for item in (x.flat):
    print (item, end=' ')

回答 5

例如,使用枕头的fromarray:

from PIL import Image
from numpy import *

im = array(Image.open('image.jpg'))
Image.fromarray(im).show()

Using pillow’s fromarray, for example:

from PIL import Image
from numpy import *

im = array(Image.open('image.jpg'))
Image.fromarray(im).show()

回答 6

Python图像库可以显示使用numpy的阵列的图像。查看此页面以获取示例代码:

编辑:正如该页面底部的注释所述,您应该检查最新的发行说明,这会使此过程变得更加简单:

http://effbot.org/zone/pil-changes-116.htm

The Python Imaging Library can display images using Numpy arrays. Take a look at this page for sample code:

EDIT: As the note on the bottom of that page says, you should check the latest release notes which make this much simpler:

http://effbot.org/zone/pil-changes-116.htm


回答 7

使用matplotlib进行补充。我发现在执行计算机视觉任务时很方便。假设您有dtype = int32的数据

from matplotlib import pyplot as plot
import numpy as np

fig = plot.figure()
ax = fig.add_subplot(1, 1, 1)
# make sure your data is in H W C, otherwise you can change it by
# data = data.transpose((_, _, _))
data = np.zeros((512,512,3), dtype=np.int32)
data[256,256] = [255,0,0]
ax.imshow(data.astype(np.uint8))

Supplement for doing so with matplotlib. I found it handy doing computer vision tasks. Let’s say you got data with dtype = int32

from matplotlib import pyplot as plot
import numpy as np

fig = plot.figure()
ax = fig.add_subplot(1, 1, 1)
# make sure your data is in H W C, otherwise you can change it by
# data = data.transpose((_, _, _))
data = np.zeros((512,512,3), dtype=np.int32)
data[256,256] = [255,0,0]
ax.imshow(data.astype(np.uint8))

python:如何识别变量是数组还是标量

问题:python:如何识别变量是数组还是标量

我有一个接受参数的函数NBins。我想用标量50或数组对此函数进行调用[0, 10, 20, 30]。我如何识别函数的长度NBins是多少?或换句话说,如果它是标量或向量?

我尝试了这个:

>>> N=[2,3,5]
>>> P = 5
>>> len(N)
3
>>> len(P)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'int' has no len()
>>> 

正如你看到的,我不能申请lenP,因为它不是一个数组….有什么样isarrayisscalar在Python?

谢谢

I have a function that takes the argument NBins. I want to make a call to this function with a scalar 50 or an array [0, 10, 20, 30]. How can I identify within the function, what the length of NBins is? or said differently, if it is a scalar or a vector?

I tried this:

>>> N=[2,3,5]
>>> P = 5
>>> len(N)
3
>>> len(P)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: object of type 'int' has no len()
>>> 

As you see, I can’t apply len to P, since it’s not an array…. Is there something like isarray or isscalar in python?

thanks


回答 0

>>> isinstance([0, 10, 20, 30], list)
True
>>> isinstance(50, list)
False

要支持任何类型的序列,请选中collections.Sequence而不是list

注意isinstance还支持一个元组类,type(x) in (..., ...)应避免检查,这是不必要的。

您可能还想检查 not isinstance(x, (str, unicode))

>>> isinstance([0, 10, 20, 30], list)
True
>>> isinstance(50, list)
False

To support any type of sequence, check collections.Sequence instead of list.

note: isinstance also supports a tuple of classes, check type(x) in (..., ...) should be avoided and is unnecessary.

You may also wanna check not isinstance(x, (str, unicode))


回答 1

先前的答案假定该数组是python标准列表。作为经常使用numpy的人,我建议使用以下Python测试:

if hasattr(N, "__len__")

Previous answers assume that the array is a python standard list. As someone who uses numpy often, I’d recommend a very pythonic test of:

if hasattr(N, "__len__")

回答 2

将@jamylak和@ jpaddison3的答案结合在一起,如果您需要对作为输入的numpy数组保持鲁棒性,并以与列表相同的方式处理它们,则应使用

import numpy as np
isinstance(P, (list, tuple, np.ndarray))

对于list,tuple和numpy数组的子类,这是可靠的。

而且,如果您还想对序列的所有其他子类(不仅是列表和元组)具有鲁棒性,请使用

import collections
import numpy as np
isinstance(P, (collections.Sequence, np.ndarray))

为什么要用这种方法isinstance而不是type(P)与目标值进行比较?这是一个示例,我们制作并研究NewListlist的一个琐碎子类的行为。

>>> class NewList(list):
...     isThisAList = '???'
... 
>>> x = NewList([0,1])
>>> y = list([0,1])
>>> print x
[0, 1]
>>> print y
[0, 1]
>>> x==y
True
>>> type(x)
<class '__main__.NewList'>
>>> type(x) is list
False
>>> type(y) is list
True
>>> type(x).__name__
'NewList'
>>> isinstance(x, list)
True

尽管xy比较平等,通过处理它们type会导致不同的行为。然而,由于x是的子类的实例list,使用isinstance(x,list)得到所需的行为和治疗xy以相同的方式。

Combining @jamylak and @jpaddison3’s answers together, if you need to be robust against numpy arrays as the input and handle them in the same way as lists, you should use

import numpy as np
isinstance(P, (list, tuple, np.ndarray))

This is robust against subclasses of list, tuple and numpy arrays.

And if you want to be robust against all other subclasses of sequence as well (not just list and tuple), use

import collections
import numpy as np
isinstance(P, (collections.Sequence, np.ndarray))

Why should you do things this way with isinstance and not compare type(P) with a target value? Here is an example, where we make and study the behaviour of NewList, a trivial subclass of list.

>>> class NewList(list):
...     isThisAList = '???'
... 
>>> x = NewList([0,1])
>>> y = list([0,1])
>>> print x
[0, 1]
>>> print y
[0, 1]
>>> x==y
True
>>> type(x)
<class '__main__.NewList'>
>>> type(x) is list
False
>>> type(y) is list
True
>>> type(x).__name__
'NewList'
>>> isinstance(x, list)
True

Despite x and y comparing as equal, handling them by type would result in different behaviour. However, since x is an instance of a subclass of list, using isinstance(x,list) gives the desired behaviour and treats x and y in the same manner.


回答 3

numpy中有与isscalar()等效的东西吗?是。

>>> np.isscalar(3.1)
True
>>> np.isscalar([3.1])
False
>>> np.isscalar(False)
True

Is there an equivalent to isscalar() in numpy? Yes.

>>> np.isscalar(3.1)
True
>>> np.isscalar([3.1])
False
>>> np.isscalar(False)
True

回答 4

虽然@jamylak的方法更好,但这是另一种方法

>>> N=[2,3,5]
>>> P = 5
>>> type(P) in (tuple, list)
False
>>> type(N) in (tuple, list)
True

While, @jamylak’s approach is the better one, here is an alternative approach

>>> N=[2,3,5]
>>> P = 5
>>> type(P) in (tuple, list)
False
>>> type(N) in (tuple, list)
True

回答 5

另一种替代方法(使用类属性):

N = [2,3,5]
P = 5

type(N).__name__ == 'list'
True

type(P).__name__ == 'int'
True

type(N).__name__ in ('list', 'tuple')
True

无需导入任何东西。

Another alternative approach (use of class name property):

N = [2,3,5]
P = 5

type(N).__name__ == 'list'
True

type(P).__name__ == 'int'
True

type(N).__name__ in ('list', 'tuple')
True

No need to import anything.


回答 6

这是我找到的最佳方法:检查__len__和的存在__getitem__

您可能会问为什么?原因包括:

  1. 该流行方法isinstance(obj, abc.Sequence)在某些对象(包括PyTorch的Tensor)上失败,因为它们未实现__contains__
  2. 不幸的是,Python的collections.abc中没有任何东西可以检查__len__并且__getitem__我认为这是处理类似数组对象的最小方法。
  3. 它适用于列表,元组,ndarray,Tensor等。

因此,事不宜迟:

def is_array_like(obj, string_is_array=False, tuple_is_array=True):
    result = hasattr(obj, "__len__") and hasattr(obj, '__getitem__') 
    if result and not string_is_array and isinstance(obj, (str, abc.ByteString)):
        result = False
    if result and not tuple_is_array and isinstance(obj, tuple):
        result = False
    return result

请注意,我添加了默认参数,因为大多数时候您可能希望将字符串视为值,而不是数组。元组也是如此。

Here is the best approach I have found: Check existence of __len__ and __getitem__.

You may ask why? The reasons includes:

  1. The popular method isinstance(obj, abc.Sequence) fails on some objects including PyTorch’s Tensor because they do not implement __contains__.
  2. Unfortunately, there is nothing in Python’s collections.abc that checks for only __len__ and __getitem__ which I feel are minimal methods for array-like objects.
  3. It works on list, tuple, ndarray, Tensor etc.

So without further ado:

def is_array_like(obj, string_is_array=False, tuple_is_array=True):
    result = hasattr(obj, "__len__") and hasattr(obj, '__getitem__') 
    if result and not string_is_array and isinstance(obj, (str, abc.ByteString)):
        result = False
    if result and not tuple_is_array and isinstance(obj, tuple):
        result = False
    return result

Note that I’ve added default parameters because most of the time you might want to consider strings as values, not arrays. Similarly for tuples.


回答 7

>>> N=[2,3,5]
>>> P = 5
>>> type(P)==type(0)
True
>>> type([1,2])==type(N)
True
>>> type(P)==type([1,2])
False
>>> N=[2,3,5]
>>> P = 5
>>> type(P)==type(0)
True
>>> type([1,2])==type(N)
True
>>> type(P)==type([1,2])
False

回答 8

您可以检查变量的数据类型。

N = [2,3,5]
P = 5
type(P)

它将以P的数据类型输出。

<type 'int'>

这样就可以区分它是整数还是数组。

You can check data type of variable.

N = [2,3,5]
P = 5
type(P)

It will give you out put as data type of P.

<type 'int'>

So that you can differentiate that it is an integer or an array.


回答 9

令我惊讶的是,这样的基本问题似乎在python中没有即时的答案。在我看来,几乎所有建议的答案都使用某种类型检查,通常在python中不建议这样做,并且它们似乎仅限于特定情况(它们因使用不同的数字类型或非元组或列表的通用可迭代对象而失败)。

对我来说,更好的方法是导入numpy并使用array.size,例如:

>>> a=1
>>> np.array(a)
Out[1]: array(1)

>>> np.array(a).size
Out[2]: 1

>>> np.array([1,2]).size
Out[3]: 2

>>> np.array('125')
Out[4]: 1

另请注意:

>>> len(np.array([1,2]))

Out[5]: 2

但:

>>> len(np.array(a))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-f5055b93f729> in <module>()
----> 1 len(np.array(a))

TypeError: len() of unsized object

I am surprised that such a basic question doesn’t seem to have an immediate answer in python. It seems to me that nearly all proposed answers use some kind of type checking, that is usually not advised in python and they seem restricted to a specific case (they fail with different numerical types or generic iteratable objects that are not tuples or lists).

For me, what works better is importing numpy and using array.size, for example:

>>> a=1
>>> np.array(a)
Out[1]: array(1)

>>> np.array(a).size
Out[2]: 1

>>> np.array([1,2]).size
Out[3]: 2

>>> np.array('125')
Out[4]: 1

Note also:

>>> len(np.array([1,2]))

Out[5]: 2

but:

>>> len(np.array(a))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-40-f5055b93f729> in <module>()
----> 1 len(np.array(a))

TypeError: len() of unsized object

回答 10

只需使用size代替len

>>> from numpy import size
>>> N = [2, 3, 5]
>>> size(N)
3
>>> N = array([2, 3, 5])
>>> size(N)
3
>>> P = 5
>>> size(P)
1

Simply use size instead of len!

>>> from numpy import size
>>> N = [2, 3, 5]
>>> size(N)
3
>>> N = array([2, 3, 5])
>>> size(N)
3
>>> P = 5
>>> size(P)
1

回答 11

preds_test [0]的形状为(128,128,1),让我们使用isinstance()函数检查其数据类型isinstance接受2个参数。第一个参数是数据第二个参数是数据类型isinstance(preds_test [0],np.ndarray)给出Output为True。这意味着preds_test [0]是一个数组。

preds_test[0] is of shape (128,128,1) Lets check its data type using isinstance() function isinstance takes 2 arguments. 1st argument is data 2nd argument is data type isinstance(preds_test[0], np.ndarray) gives Output as True. It means preds_test[0] is an array.


回答 12

为了回答标题中的问题,判断变量是否为标量的直接方法是尝试将其转换为浮点数。如果得到TypeError,则不是。

N = [1, 2, 3]
try:
    float(N)
except TypeError:
    print('it is not a scalar')
else:
    print('it is a scalar')

To answer the question in the title, a direct way to tell if a variable is a scalar is to try to convert it to a float. If you get TypeError, it’s not.

N = [1, 2, 3]
try:
    float(N)
except TypeError:
    print('it is not a scalar')
else:
    print('it is a scalar')

连接两个一维NumPy数组

问题:连接两个一维NumPy数组

我在NumPy中有两个简单的一维数组。我应该能够使用numpy.concatenate将它们连接起来。但是我收到以下代码的错误:

TypeError:只有length-1数组可以转换为Python标量

import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)

为什么?

I have two simple one-dimensional arrays in NumPy. I should be able to concatenate them using numpy.concatenate. But I get this error for the code below:

TypeError: only length-1 arrays can be converted to Python scalars

Code

import numpy
a = numpy.array([1, 2, 3])
b = numpy.array([5, 6])
numpy.concatenate(a, b)

Why?


回答 0

该行应为:

numpy.concatenate([a,b])

要连接的数组需要作为一个序列而不是作为单独的参数传递。

NumPy文档中

numpy.concatenate((a1, a2, ...), axis=0)

将一系列数组连接在一起。

它试图将您解释b为axis参数,这就是为什么它抱怨无法将其转换为标量。

The line should be:

numpy.concatenate([a,b])

The arrays you want to concatenate need to be passed in as a sequence, not as separate arguments.

From the NumPy documentation:

numpy.concatenate((a1, a2, ...), axis=0)

Join a sequence of arrays together.

It was trying to interpret your b as the axis parameter, which is why it complained it couldn’t convert it into a scalar.


回答 1

连接一维数组有多种可能性,例如,

numpy.r_[a, a],
numpy.stack([a, a]).reshape(-1),
numpy.hstack([a, a]),
numpy.concatenate([a, a])

对于大型阵列,所有这些选项都同样快。对于小型的,concatenate有一点优势:

在此处输入图片说明

该图是使用perfplot创建的:

import numpy
import perfplot

perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.r_[a, a],
        lambda a: numpy.stack([a, a]).reshape(-1),
        lambda a: numpy.hstack([a, a]),
        lambda a: numpy.concatenate([a, a]),
    ],
    labels=["r_", "stack+reshape", "hstack", "concatenate"],
    n_range=[2 ** k for k in range(19)],
    xlabel="len(a)",
)

There are several possibilities for concatenating 1D arrays, e.g.,

numpy.r_[a, a],
numpy.stack([a, a]).reshape(-1),
numpy.hstack([a, a]),
numpy.concatenate([a, a])

All those options are equally fast for large arrays; for small ones, concatenate has a slight edge:

enter image description here

The plot was created with perfplot:

import numpy
import perfplot

perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[
        lambda a: numpy.r_[a, a],
        lambda a: numpy.stack([a, a]).reshape(-1),
        lambda a: numpy.hstack([a, a]),
        lambda a: numpy.concatenate([a, a]),
    ],
    labels=["r_", "stack+reshape", "hstack", "concatenate"],
    n_range=[2 ** k for k in range(19)],
    xlabel="len(a)",
)

回答 2

的第一个参数concatenate本身应该是要串联的数组序列

numpy.concatenate((a,b)) # Note the extra parentheses.

The first parameter to concatenate should itself be a sequence of arrays to concatenate:

numpy.concatenate((a,b)) # Note the extra parentheses.

回答 3

另一种方法是使用“ concatenate”的缩写形式,即“ r _ […]”或“ c _ […]”,如下面的示例代码所示(请参见http://wiki.scipy.org / NumPy_for_Matlab_Users以获取更多信息):

%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'

a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'

a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b

print type(vector_b)

结果是:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[1 1 1 1]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.  1.  1.  1.  1.] 


[[ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]] 

[[ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]] 


[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]] 

[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 1.  1.  1.]]

An alternative ist to use the short form of “concatenate” which is either “r_[…]” or “c_[…]” as shown in the example code beneath (see http://wiki.scipy.org/NumPy_for_Matlab_Users for additional information):

%pylab
vector_a = r_[0.:10.] #short form of "arange"
vector_b = array([1,1,1,1])
vector_c = r_[vector_a,vector_b]
print vector_a
print vector_b
print vector_c, '\n\n'

a = ones((3,4))*4
print a, '\n'
c = array([1,1,1])
b = c_[a,c]
print b, '\n\n'

a = ones((4,3))*4
print a, '\n'
c = array([[1,1,1]])
b = r_[a,c]
print b

print type(vector_b)

Which results in:

[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.]
[1 1 1 1]
[ 0.  1.  2.  3.  4.  5.  6.  7.  8.  9.  1.  1.  1.  1.] 


[[ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]
 [ 4.  4.  4.  4.]] 

[[ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]
 [ 4.  4.  4.  4.  1.]] 


[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]] 

[[ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 4.  4.  4.]
 [ 1.  1.  1.]]

回答 4

以下是使用numpy.ravel(),的更多方法:numpy.array()利用一维数组可以解包为普通元素的事实:

# we'll utilize the concept of unpacking
In [15]: (*a, *b)
Out[15]: (1, 2, 3, 5, 6)

# using `numpy.ravel()`
In [14]: np.ravel((*a, *b))
Out[14]: array([1, 2, 3, 5, 6])

# wrap the unpacked elements in `numpy.array()`
In [16]: np.array((*a, *b))
Out[16]: array([1, 2, 3, 5, 6])

Here are more approaches for doing this by using numpy.ravel(), numpy.array(), utilizing the fact that 1D arrays can be unpacked into plain elements:

# we'll utilize the concept of unpacking
In [15]: (*a, *b)
Out[15]: (1, 2, 3, 5, 6)

# using `numpy.ravel()`
In [14]: np.ravel((*a, *b))
Out[14]: array([1, 2, 3, 5, 6])

# wrap the unpacked elements in `numpy.array()`
In [16]: np.array((*a, *b))
Out[16]: array([1, 2, 3, 5, 6])

回答 5

来自numpy 文档的更多事实:

语法为 numpy.concatenate((a1, a2, ...), axis=0, out=None)

轴= 0用于行连接轴= 1用于列连接

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])

# Appending below last row
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

# Appending after last column
>>> np.concatenate((a, b.T), axis=1)    # Notice the transpose
array([[1, 2, 5],
       [3, 4, 6]])

# Flattening the final array
>>> np.concatenate((a, b), axis=None)
array([1, 2, 3, 4, 5, 6])

希望对您有所帮助!

Some more facts from the numpy docs :

With syntax as numpy.concatenate((a1, a2, ...), axis=0, out=None)

axis = 0 for row-wise concatenation axis = 1 for column-wise concatenation

>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])

# Appending below last row
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
       [3, 4],
       [5, 6]])

# Appending after last column
>>> np.concatenate((a, b.T), axis=1)    # Notice the transpose
array([[1, 2, 5],
       [3, 4, 6]])

# Flattening the final array
>>> np.concatenate((a, b), axis=None)
array([1, 2, 3, 4, 5, 6])

I hope it helps !


使用python随机整理数组,使用python随机化数组项顺序

问题:使用python随机整理数组,使用python随机化数组项顺序

用python重组数组的最简单方法是什么?

What’s the easiest way to shuffle an array with python?


回答 0

import random
random.shuffle(array)
import random
random.shuffle(array)

回答 1

import random
random.shuffle(array)
import random
random.shuffle(array)

回答 2

使用sklearn的另一种方法

from sklearn.utils import shuffle
X=[1,2,3]
y = ['one', 'two', 'three']
X, y = shuffle(X, y, random_state=0)
print(X)
print(y)

输出:

[2, 1, 3]
['two', 'one', 'three']

优点:您可以同时随机分配多个阵列,而不会破坏映射。并且“ random_state”可以控制改组以实现可重现的行为。

Alternative way to do this using sklearn

from sklearn.utils import shuffle
X=[1,2,3]
y = ['one', 'two', 'three']
X, y = shuffle(X, y, random_state=0)
print(X)
print(y)

Output:

[2, 1, 3]
['two', 'one', 'three']

Advantage: You can random multiple arrays simultaneously without disrupting the mapping. And ‘random_state’ can control the shuffling for reproducible behavior.


回答 3

其他答案最简单,但是令人讨厌的是该random.shuffle方法实际上不返回任何内容,它只是对给定列表进行排序。如果要链接调用,或者只想在一行中声明一个改组数组,则可以执行以下操作:

    import random
    def my_shuffle(array):
        random.shuffle(array)
        return array

然后,您可以执行以下操作:

    for suit in my_shuffle(['hearts', 'spades', 'clubs', 'diamonds']):

The other answers are the easiest, however it’s a bit annoying that the random.shuffle method doesn’t actually return anything – it just sorts the given list. If you want to chain calls or just be able to declare a shuffled array in one line you can do:

    import random
    def my_shuffle(array):
        random.shuffle(array)
        return array

Then you can do lines like:

    for suit in my_shuffle(['hearts', 'spades', 'clubs', 'diamonds']):

回答 4

当处理常规的Python列表时,random.shuffle()将按照前面的答案所示进行操作。

但是,当谈到ndarraynumpy.array)时,random.shuffle似乎打破了原来的ndarray。这是一个例子:

import random
import numpy as np
import numpy.random

a = np.array([1,2,3,4,5,6])
a.shape = (3,2)
print a
random.shuffle(a) # a will definitely be destroyed
print a

只需使用: np.random.shuffle(a)

像一样random.shufflenp.random.shuffle就地调整数组的位置。

When dealing with regular Python lists, random.shuffle() will do the job just as the previous answers show.

But when it come to ndarray(numpy.array), random.shuffle seems to break the original ndarray. Here is an example:

import random
import numpy as np
import numpy.random

a = np.array([1,2,3,4,5,6])
a.shape = (3,2)
print a
random.shuffle(a) # a will definitely be destroyed
print a

Just use: np.random.shuffle(a)

Like random.shuffle, np.random.shuffle shuffles the array in-place.


回答 5

万一您想要一个新的数组,可以使用sample

import random
new_array = random.sample( array, len(array) )

Just in case you want a new array you can use sample:

import random
new_array = random.sample( array, len(array) )

回答 6

您可以使用随机键对数组进行排序

sorted(array, key = lambda x: random.random())

密钥只能读取一次,因此排序期间的比较项目仍然有效。

但是看起来好像random.shuffle(array)会更快,因为它是用C编写的

You can sort your array with random key

sorted(array, key = lambda x: random.random())

key only be read once so comparing item during sort still efficient.

but look like random.shuffle(array) will be faster since it written in C


回答 7

除了前面的答复,我还要介绍另一个功能。

numpy.random.shuffle以及random.shuffle执行就地改组。但是,如果要返回经过改组的数组,numpy.random.permutation则可以使用该函数。

In addition to the previous replies, I would like to introduce another function.

numpy.random.shuffle as well as random.shuffle perform in-place shuffling. However, if you want to return a shuffled array numpy.random.permutation is the function to use.


回答 8

我不知道我曾经用过,random.shuffle()但是它返回“ None”给我,所以我写了这个,可能对某人有帮助

def shuffle(arr):
    for n in range(len(arr) - 1):
        rnd = random.randint(0, (len(arr) - 1))
        val1 = arr[rnd]
        val2 = arr[rnd - 1]

        arr[rnd - 1] = val1
        arr[rnd] = val2

    return arr

I don’t know I used random.shuffle() but it return ‘None’ to me, so I wrote this, might helpful to someone

def shuffle(arr):
    for n in range(len(arr) - 1):
        rnd = random.randint(0, (len(arr) - 1))
        val1 = arr[rnd]
        val2 = arr[rnd - 1]

        arr[rnd - 1] = val1
        arr[rnd] = val2

    return arr

回答 9

# arr = numpy array to shuffle

def shuffle(arr):
    a = numpy.arange(len(arr))
    b = numpy.empty(1)
    for i in range(len(arr)):
        sel = numpy.random.random_integers(0, high=len(a)-1, size=1)
        b = numpy.append(b, a[sel])
        a = numpy.delete(a, sel)
    b = b[1:].astype(int)
    return arr[b]
# arr = numpy array to shuffle

def shuffle(arr):
    a = numpy.arange(len(arr))
    b = numpy.empty(1)
    for i in range(len(arr)):
        sel = numpy.random.random_integers(0, high=len(a)-1, size=1)
        b = numpy.append(b, a[sel])
        a = numpy.delete(a, sel)
    b = b[1:].astype(int)
    return arr[b]

回答 10

请注意,random.shuffle()不应在多维数组上使用它,因为它会引起重复。

假设您想沿数组的第一维进行混洗,我们可以创建以下测试示例,

import numpy as np
x = np.zeros((10, 2, 3))

for i in range(10):
   x[i, ...] = i*np.ones((2,3))

因此,沿着第一个轴,第i个元素对应于2×3矩阵,其中所有元素都等于i。

如果我们对多维数组使用正确的随机播放功能,即np.random.shuffle(x)该数组将根据需要沿第一个轴随机播放。但是,使用random.shuffle(x)会导致重复。您可以通过len(np.unique(x))在改组后运行来检查此问题,使用时可以得到10(按预期),np.random.shuffle()但使用时只有5 random.shuffle()

Be aware that random.shuffle() should not be used on multi-dimensional arrays as it causes repetitions.

Imagine you want to shuffle an array along its first dimension, we can create the following test example,

import numpy as np
x = np.zeros((10, 2, 3))

for i in range(10):
   x[i, ...] = i*np.ones((2,3))

so that along the first axis, the i-th element corresponds to a 2×3 matrix where all the elements are equal to i.

If we use the correct shuffle function for multi-dimensional arrays, i.e. np.random.shuffle(x), the array will be shuffled along the first axis as desired. However, using random.shuffle(x) will cause repetitions. You can check this by running len(np.unique(x)) after shuffling which gives you 10 (as expected) with np.random.shuffle() but only around 5 when using random.shuffle().