标签归档:numpy-ndarray

如何获取NumPy数组中N个最大值的索引?

问题:如何获取NumPy数组中N个最大值的索引?

NumPy提出了一种通过来获取数组最大值的索引的方法np.argmax

我想要类似的事情,但是返回N最大值的索引。

例如,如果我有一个数组,[1, 3, 2, 4, 5]function(array, n=3)将返回的索引[4, 3, 1]相对应的元素[5, 4, 3]

NumPy proposes a way to get the index of the maximum value of an array via np.argmax.

I would like a similar thing, but returning the indexes of the N maximum values.

For instance, if I have an array, [1, 3, 2, 4, 5], function(array, n=3) would return the indices [4, 3, 1] which correspond to the elements [5, 4, 3].


回答 0

我想出的最简单的方法是:

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

这涉及数组的完整排序。我想知道是否numpy提供了一种进行部分排序的内置方法。到目前为止,我还没有找到一个。

如果这种解决方案太慢(尤其是对于小型解决方案n),则可能值得在Cython编写代码

The simplest I’ve been able to come up with is:

In [1]: import numpy as np

In [2]: arr = np.array([1, 3, 2, 4, 5])

In [3]: arr.argsort()[-3:][::-1]
Out[3]: array([4, 3, 1])

This involves a complete sort of the array. I wonder if numpy provides a built-in way to do a partial sort; so far I haven’t been able to find one.

If this solution turns out to be too slow (especially for small n), it may be worth looking at coding something up in Cython.


回答 1

较新的NumPy版本(1.8及更高版本)具有argpartition为此要求的功能。要获取四个最大元素的索引,请执行

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

与之不同的是argsort,此函数在最坏的情况下以线性时间运行,但是返回的索引未排序,从评估结果可以看出a[ind]。如果您也需要它,请对它们进行排序:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

要以这种方式获得排序前k个元素,需要O(n + k log k)时间。

Newer NumPy versions (1.8 and up) have a function called argpartition for this. To get the indices of the four largest elements, do

>>> a = np.array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> a
array([9, 4, 4, 3, 3, 9, 0, 4, 6, 0])
>>> ind = np.argpartition(a, -4)[-4:]
>>> ind
array([1, 5, 8, 0])
>>> a[ind]
array([4, 9, 6, 9])

Unlike argsort, this function runs in linear time in the worst case, but the returned indices are not sorted, as can be seen from the result of evaluating a[ind]. If you need that too, sort them afterwards:

>>> ind[np.argsort(a[ind])]
array([1, 8, 5, 0])

To get the top-k elements in sorted order in this way takes O(n + k log k) time.


回答 2

更简单了:

idx = (-arr).argsort()[:n]

其中,n是最大值的数量。

Simpler yet:

idx = (-arr).argsort()[:n]

where n is the number of maximum values.


回答 3

采用:

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

对于常规的Python列表:

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

如果您使用Python 2,请使用xrange代替range

来源:heapq —堆队列算法

Use:

>>> import heapq
>>> import numpy
>>> a = numpy.array([1, 3, 2, 4, 5])
>>> heapq.nlargest(3, range(len(a)), a.take)
[4, 3, 1]

For regular Python lists:

>>> a = [1, 3, 2, 4, 5]
>>> heapq.nlargest(3, range(len(a)), a.__getitem__)
[4, 3, 1]

If you use Python 2, use xrange instead of range.

Source: heapq — Heap queue algorithm


回答 4

如果碰巧正在使用多维数组,则需要展平和分解索引:

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

例如:

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

If you happen to be working with a multidimensional array then you’ll need to flatten and unravel the indices:

def largest_indices(ary, n):
    """Returns the n largest indices from a numpy array."""
    flat = ary.flatten()
    indices = np.argpartition(flat, -n)[-n:]
    indices = indices[np.argsort(-flat[indices])]
    return np.unravel_index(indices, ary.shape)

For example:

>>> xs = np.sin(np.arange(9)).reshape((3, 3))
>>> xs
array([[ 0.        ,  0.84147098,  0.90929743],
       [ 0.14112001, -0.7568025 , -0.95892427],
       [-0.2794155 ,  0.6569866 ,  0.98935825]])
>>> largest_indices(xs, 3)
(array([2, 0, 0]), array([2, 2, 1]))
>>> xs[largest_indices(xs, 3)]
array([ 0.98935825,  0.90929743,  0.84147098])

回答 5

如果您不在乎可以使用的第K个最大元素的顺序,则argpartition它们的性能应比完整排序要好argsort

K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])

学分到这个问题

我进行了一些测试,随着数组的大小和K值的增加,它的argpartition表现似乎都胜过argsort了。

If you don’t care about the order of the K-th largest elements you can use argpartition, which should perform better than a full sort through argsort.

K = 4 # We want the indices of the four largest values
a = np.array([0, 8, 0, 4, 5, 8, 8, 0, 4, 2])
np.argpartition(a,-K)[-K:]
array([4, 1, 5, 6])

Credits go to this question.

I ran a few tests and it looks like argpartition outperforms argsort as the size of the array and the value of K increase.


回答 6

对于多维数组,可以使用axis关键字以沿期望的轴应用分区。

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

对于抓取物品:

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

但是请注意,这不会返回排序结果。在这种情况下,您可以np.argsort()沿预期的轴使用:

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

这是一个例子:

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

For multidimensional arrays you can use the axis keyword in order to apply the partitioning along the expected axis.

# For a 2D array
indices = np.argpartition(arr, -N, axis=1)[:, -N:]

And for grabbing the items:

x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

But note that this won’t return a sorted result. In that case you can use np.argsort() along the intended axis:

indices = np.argsort(arr, axis=1)[:, -N:]

# Result
x = arr.shape[0]
arr[np.repeat(np.arange(x), N), indices.ravel()].reshape(x, N)

Here is an example:

In [42]: a = np.random.randint(0, 20, (10, 10))

In [44]: a
Out[44]:
array([[ 7, 11, 12,  0,  2,  3,  4, 10,  6, 10],
       [16, 16,  4,  3, 18,  5, 10,  4, 14,  9],
       [ 2,  9, 15, 12, 18,  3, 13, 11,  5, 10],
       [14,  0,  9, 11,  1,  4,  9, 19, 18, 12],
       [ 0, 10,  5, 15,  9, 18,  5,  2, 16, 19],
       [14, 19,  3, 11, 13, 11, 13, 11,  1, 14],
       [ 7, 15, 18,  6,  5, 13,  1,  7,  9, 19],
       [11, 17, 11, 16, 14,  3, 16,  1, 12, 19],
       [ 2,  4, 14,  8,  6,  9, 14,  9,  1,  5],
       [ 1, 10, 15,  0,  1,  9, 18,  2,  2, 12]])

In [45]: np.argpartition(a, np.argmin(a, axis=0))[:, 1:] # 1 is because the first item is the minimum one.
Out[45]:
array([[4, 5, 6, 8, 0, 7, 9, 1, 2],
       [2, 7, 5, 9, 6, 8, 1, 0, 4],
       [5, 8, 1, 9, 7, 3, 6, 2, 4],
       [4, 5, 2, 6, 3, 9, 0, 8, 7],
       [7, 2, 6, 4, 1, 3, 8, 5, 9],
       [2, 3, 5, 7, 6, 4, 0, 9, 1],
       [4, 3, 0, 7, 8, 5, 1, 2, 9],
       [5, 2, 0, 8, 4, 6, 3, 1, 9],
       [0, 1, 9, 4, 3, 7, 5, 2, 6],
       [0, 4, 7, 8, 5, 1, 9, 2, 6]])

In [46]: np.argpartition(a, np.argmin(a, axis=0))[:, -3:]
Out[46]:
array([[9, 1, 2],
       [1, 0, 4],
       [6, 2, 4],
       [0, 8, 7],
       [8, 5, 9],
       [0, 9, 1],
       [1, 2, 9],
       [3, 1, 9],
       [5, 2, 6],
       [9, 2, 6]])

In [89]: a[np.repeat(np.arange(x), 3), ind.ravel()].reshape(x, 3)
Out[89]:
array([[10, 11, 12],
       [16, 16, 18],
       [13, 15, 18],
       [14, 18, 19],
       [16, 18, 19],
       [14, 14, 19],
       [15, 18, 19],
       [16, 17, 19],
       [ 9, 14, 14],
       [12, 15, 18]])

回答 7

这将比完整排序要快,具体取决于原始数组的大小和所选内容的大小:

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

当然,它涉及篡改原始阵列。您可以通过复制或替换原始值来解决(如果需要)的问题。…以您的使用案例中较便宜的价格为准

This will be faster than a full sort depending on the size of your original array and the size of your selection:

>>> A = np.random.randint(0,10,10)
>>> A
array([5, 1, 5, 5, 2, 3, 2, 4, 1, 0])
>>> B = np.zeros(3, int)
>>> for i in xrange(3):
...     idx = np.argmax(A)
...     B[i]=idx; A[idx]=0 #something smaller than A.min()
...     
>>> B
array([0, 2, 3])

It, of course, involves tampering with your original array. Which you could fix (if needed) by making a copy or replacing back the original values. …whichever is cheaper for your use case.


回答 8

方法np.argpartition仅返回k个最大的索引,执行局部排序,并且比np.argsort数组很大时要快(执行完整排序)。但是返回的索引不是按升序/降序排列的。让我们举一个例子:

我们可以看到,如果您要对前k个索引使用严格的升序,np.argpartition则不会返回您想要的结果。

除了在np.argpartition之后手动进行排序之外,我的解决方案是使用PyTorch(torch.topk一种用于神经网络构建的工具),为类似NumPy的API提供CPU和GPU支持。它与带有MKL的NumPy一样快,并且如果需要大型矩阵/矢量计算,则可以提供GPU增强。

严格的上升/下降前k个索引代码将是:

请注意,它torch.topk接受火炬张量,并返回type中的前k个值和前k个索引torch.Tensor。与np相似,torch.topk也接受轴参数,以便您可以处理多维数组/张量。

Method np.argpartition only returns the k largest indices, performs a local sort, and is faster than np.argsort(performing a full sort) when array is quite large. But the returned indices are NOT in ascending/descending order. Let’s say with an example:

We can see that if you want a strict ascending order top k indices, np.argpartition won’t return what you want.

Apart from doing a sort manually after np.argpartition, my solution is to use PyTorch, torch.topk, a tool for neural network construction, providing NumPy-like APIs with both CPU and GPU support. It’s as fast as NumPy with MKL, and offers a GPU boost if you need large matrix/vector calculations.

Strict ascend/descend top k indices code will be:

Note that torch.topk accepts a torch tensor, and returns both top k values and top k indices in type torch.Tensor. Similar with np, torch.topk also accepts an axis argument so that you can handle multi-dimensional arrays/tensors.


回答 9

采用:

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

现在,result列表将包含N个元组(indexvalue),其中value已最大化。

Use:

from operator import itemgetter
from heapq import nlargest
result = nlargest(N, enumerate(your_list), itemgetter(1))

Now the result list would contain N tuples (index, value) where value is maximized.


回答 10

采用:

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

它也适用于2D阵列。例如,

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

Use:

def max_indices(arr, k):
    '''
    Returns the indices of the k first largest elements of arr
    (in descending order in values)
    '''
    assert k <= arr.size, 'k should be smaller or equal to the array size'
    arr_ = arr.astype(float)  # make a copy of arr
    max_idxs = []
    for _ in range(k):
        max_element = np.max(arr_)
        if np.isinf(max_element):
            break
        else:
            idx = np.where(arr_ == max_element)
        max_idxs.append(idx)
        arr_[idx] = -np.inf
    return max_idxs

It also works with 2D arrays. For example,

In [0]: A = np.array([[ 0.51845014,  0.72528114],
                     [ 0.88421561,  0.18798661],
                     [ 0.89832036,  0.19448609],
                     [ 0.89832036,  0.19448609]])
In [1]: max_indices(A, 8)
Out[1]:
    [(array([2, 3], dtype=int64), array([0, 0], dtype=int64)),
     (array([1], dtype=int64), array([0], dtype=int64)),
     (array([0], dtype=int64), array([1], dtype=int64)),
     (array([0], dtype=int64), array([0], dtype=int64)),
     (array([2, 3], dtype=int64), array([1, 1], dtype=int64)),
     (array([1], dtype=int64), array([1], dtype=int64))]

In [2]: A[max_indices(A, 8)[0]][0]
Out[2]: array([ 0.89832036])

回答 11

bottleneck 如果仅为了获得N个最大值而对整个数组进行排序的开销太大,则具有部分排序函数。

我对这个模块一无所知。我只是谷歌搜索numpy partial sort

bottleneck has a partial sort function, if the expense of sorting the entire array just to get the N largest values is too great.

I know nothing about this module; I just googled numpy partial sort.


回答 12

以下是查看最大元素及其位置的非常简单的方法。这axis是域;axis= 0表示按列最大数量,而axis1表示2D情况下按行最大数量。对于更大的尺寸,则取决于您。

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))

The following is a very easy way to see the maximum elements and its positions. Here axis is the domain; axis = 0 means column wise maximum number and axis = 1 means row wise max number for the 2D case. And for higher dimensions it depends upon you.

M = np.random.random((3, 4))
print(M)
print(M.max(axis=1), M.argmax(axis=1))

回答 13

我发现使用起来最直观np.unique

这个想法是,唯一方法返回输入值的索引。然后,根据最大唯一值和指标,可以重新创建原始值的位置。

multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]

I found it most intuitive to use np.unique.

The idea is, that the unique method returns the indices of the input values. Then from the max unique value and the indicies, the position of the original values can be recreated.

multi_max = [1,1,2,2,4,0,0,4]
uniques, idx = np.unique(multi_max, return_inverse=True)
print np.squeeze(np.argwhere(idx == np.argmax(uniques)))
>> [4 7]

回答 14

我认为,最省时的方法是手动遍历数组,并保持k大小的最小堆大小,正如其他人提到的那样。

我还提出了一种蛮力方法:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

在使用argmax获取其索引之后,将最大元素设置为较大的负值。然后下一次调用argmax将返回第二大元素。您可以记录这些元素的原始值,并根据需要恢复它们。

I think the most time efficiency way is manually iterate through the array and keep a k-size min-heap, as other people have mentioned.

And I also come up with a brute force approach:

top_k_index_list = [ ]
for i in range(k):
    top_k_index_list.append(np.argmax(my_array))
    my_array[top_k_index_list[-1]] = -float('inf')

Set the largest element to a large negative value after you use argmax to get its index. And then the next call of argmax will return the second largest element. And you can log the original value of these elements and recover them if you want.


回答 15

这段代码适用于numpy矩阵数组:

mat = np.array([[1, 3], [2, 5]]) # numpy matrix

n = 2  # n
n_largest_mat = np.sort(mat, axis=None)[-n:] # n_largest 
tf_n_largest = np.zeros((2,2), dtype=bool) # all false matrix
for x in n_largest_mat: 
  tf_n_largest = (tf_n_largest) | (mat == x) # true-false  

n_largest_elems = mat[tf_n_largest] # true-false indexing 

这会产生一个真假n_largest矩阵索引,该索引也可以从矩阵数组中提取n_largest个元素

This code works for a numpy matrix array:

mat = np.array([[1, 3], [2, 5]]) # numpy matrix

n = 2  # n
n_largest_mat = np.sort(mat, axis=None)[-n:] # n_largest 
tf_n_largest = np.zeros((2,2), dtype=bool) # all false matrix
for x in n_largest_mat: 
  tf_n_largest = (tf_n_largest) | (mat == x) # true-false  

n_largest_elems = mat[tf_n_largest] # true-false indexing 

This produces a true-false n_largest matrix indexing that also works to extract n_largest elements from a matrix array


与常规Python列表相比,NumPy有什么优势?

问题:与常规Python列表相比,NumPy有什么优势?

与常规Python列表相比,NumPy有什么优势?

我大约有100个金融市场系列,我将创建一个100x100x100 = 1百万个单元的多维数据集数组。我将每个x与y和z回归(3变量),以用标准误差填充数组。

我听说对于“大型矩阵”,出于性能和可伸缩性的原因,我应该使用NumPy而不是Python列表。事实是,我知道Python列表,它们似乎对我有用。

如果我转到NumPy,会有什么好处?

如果我有1000个序列(即立方体中有10亿个浮点单元)怎么办?

What are the advantages of NumPy over regular Python lists?

I have approximately 100 financial markets series, and I am going to create a cube array of 100x100x100 = 1 million cells. I will be regressing (3-variable) each x with each y and z, to fill the array with standard errors.

I have heard that for “large matrices” I should use NumPy as opposed to Python lists, for performance and scalability reasons. Thing is, I know Python lists and they seem to work for me.

What will the benefits be if I move to NumPy?

What if I had 1000 series (that is, 1 billion floating point cells in the cube)?


回答 0

NumPy的数组比Python列表更紧凑-您在Python中描述的列表列表至少需要20 MB左右,而单元格中具有单精度浮点数的NumPy 3D数组则需要4 MB。使用NumPy可以更快地读取和写入项目。

也许您只关心一百万个单元就不会那么在意,但是您肯定会关心十亿个单元-两种方法都不适合32位体系结构,但是使用64位版本,NumPy可以节省约4 GB ,仅Python一项就至少需要12 GB(很多指针的大小加倍),这是一个昂贵得多的硬件!

差异主要是由于“间接性”造成的-Python列表是指向Python对象的指针的数组,每个指针至少4个字节,对于最小的Python对象也至少包含16个字节(类型指针为4,引用计数为4,类型为4值-内存分配器舍入为16)。NumPy数组是统一值的数组-单精度数字每个占用4个字节,双精度数字每个占用8个字节。灵活性较差,但您要为标准Python列表的灵活性付出高昂的代价!

NumPy’s arrays are more compact than Python lists — a list of lists as you describe, in Python, would take at least 20 MB or so, while a NumPy 3D array with single-precision floats in the cells would fit in 4 MB. Access in reading and writing items is also faster with NumPy.

Maybe you don’t care that much for just a million cells, but you definitely would for a billion cells — neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) — a much costlier piece of hardware!

The difference is mostly due to “indirectness” — a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value — and the memory allocators rounds up to 16). A NumPy array is an array of uniform values — single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!


回答 1

NumPy不仅效率更高;这也更加方便。您可以免费获得许多矢量和矩阵运算,有时这可以避免不必要的工作。而且它们也得到有效实施。

例如,您可以将多维数据集直接从文件读取到数组中:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100))

沿第二维求和:

s = x.sum(axis=1)

查找哪些单元格高于阈值:

(x > 0.5).nonzero()

删除沿第三维的每个偶数索引切片:

x[:, :, ::2]

同样,许多有用的库都可以与NumPy数组一起使用。例如,统计分析和可视化库。

即使您没有性能问题,学习NumPy也是值得的。

NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

For example, you could read your cube directly from a file into an array:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100))

Sum along the second dimension:

s = x.sum(axis=1)

Find which cells are above a threshold:

(x > 0.5).nonzero()

Remove every even-indexed slice along the third dimension:

x[:, :, ::2]

Also, many useful libraries work with NumPy arrays. For example, statistical analysis and visualization libraries.

Even if you don’t have performance problems, learning NumPy is worth the effort.


回答 2

Alex提到了内存效率,Roberto提到了便利性,这些都是不错的地方。对于其他一些想法,我将提到速度功能

功能性:NumPy,FFT,卷积,快速搜索,基本统计信息,线性代数,直方图等都内置了很多功能。实际上,没有FFT谁能活下去?

速度:这是一项对列表和NumPy数组求和的测试,表明NumPy数组的求和速度快10倍(在此测试中,里程可能会有所不同)。

from numpy import arange
from timeit import Timer

Nelements = 10000
Ntimeits = 10000

x = arange(Nelements)
y = range(Nelements)

t_numpy = Timer("x.sum()", "from __main__ import x")
t_list = Timer("sum(y)", "from __main__ import y")
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))

在我的系统上(运行备份时),它会给出:

numpy: 3.004e-05
list:  5.363e-04

Alex mentioned memory efficiency, and Roberto mentions convenience, and these are both good points. For a few more ideas, I’ll mention speed and functionality.

Functionality: You get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc. And really, who can live without FFTs?

Speed: Here’s a test on doing a sum over a list and a NumPy array, showing that the sum on the NumPy array is 10x faster (in this test — mileage may vary).

from numpy import arange
from timeit import Timer

Nelements = 10000
Ntimeits = 10000

x = arange(Nelements)
y = range(Nelements)

t_numpy = Timer("x.sum()", "from __main__ import x")
t_list = Timer("sum(y)", "from __main__ import y")
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))

which on my systems (while I’m running a backup) gives:

numpy: 3.004e-05
list:  5.363e-04

回答 3

这是scipy.org网站上的常见问题解答中的一个很好的答案:

与(嵌套)Python列表相比,NumPy数组有什么优势?

Python的列表是有效的通用容器。它们支持(相当)高效的插入,删除,附加和连接,并且Python的列表理解使它们易于构造和操作。但是,它们有一定的局限性:它们不支持“向量化”操作,例如逐元素加法和乘法,并且它们可以包含不同类型的对象这一事实意味着Python必须为每个元素存储类型信息,并且必须执行类型分派代码在每个元素上操作时。这也意味着有效的C循环几乎无法执行列表操作-每次迭代都需要类型检查和其他Python API簿记。

Here’s a nice answer from the FAQ on the scipy.org website:

What advantages do NumPy arrays offer over (nested) Python lists?

Python’s lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending, and concatenation, and Python’s list comprehensions make them easy to construct and manipulate. However, they have certain limitations: they don’t support “vectorized” operations like elementwise addition and multiplication, and the fact that they can contain objects of differing types mean that Python must store type information for every element, and must execute type dispatching code when operating on each element. This also means that very few list operations can be carried out by efficient C loops – each iteration would require type checks and other Python API bookkeeping.


回答 4

所有人都强调了numpy数组和python列表之间的几乎所有主要区别,在这里我将向大家简单介绍一下:

  1. Numpy数组在创建时具有固定的大小,这与python列表(可以动态增长)不同。更改ndarray的大小将创建一个新数组并删除原始数组。

  2. Numpy数组中的所有元素都必须具有相同的数据类型(我们也可以具有异构类型,但这将不允许您进行数学运算),因此在内存中的大小将相同

  3. Numpy数组有助于对大量数据进行数学运算和其他类型的运算。通常,与使用python顺序构建相比,此类操作执行效率更高且代码更少

All have highlighted almost all major differences between numpy array and python list, I will just brief them out here:

  1. Numpy arrays have a fixed size at creation, unlike python lists (which can grow dynamically). Changing the size of ndarray will create a new array and delete the original.

  2. The elements in a Numpy array are all required to be of the same data type (we can have the heterogeneous type as well but that will not gonna permit you mathematical operations) and thus will be the same size in memory

  3. Numpy arrays are facilitated advances mathematical and other types of operations on large numbers of data. Typically such operations are executed more efficiently and with less code than is possible using pythons build in sequences


-1在numpy重塑中是什么意思?

问题:-1在numpy重塑中是什么意思?

可以使用参数为-1的整形函数将numpy矩阵整形为向量。但我不知道-1在这里意味着什么。

例如:

a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])
b = numpy.reshape(a, -1)

结果b是:matrix([[1, 2, 3, 4, 5, 6, 7, 8]])

有人知道-1在这里意味着什么吗?并且似乎python赋予-1几种含义,例如:array[-1]表示最后一个元素。你能解释一下吗?

A numpy matrix can be reshaped into a vector using reshape function with parameter -1. But I don’t know what -1 means here.

For example:

a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])
b = numpy.reshape(a, -1)

The result of b is: matrix([[1, 2, 3, 4, 5, 6, 7, 8]])

Does anyone know what -1 means here? And it seems python assign -1 several meanings, such as: array[-1] means the last element. Can you give an explanation?


回答 0

提供新形状所需满足的标准是“新形状应与原始形状兼容”

numpy允许我们将新形状参数之一设为-1(例如:(2,-1)或(-1,3),但不提供(-1,-1))。它只是意味着它是一个未知的维,我们希望numpy弄清楚。numpy将通过查看 “数组的长度和剩余维数”并确保满足上述条件来解决这个问题

现在看示例。

z = np.array([[1, 2, 3, 4],
         [5, 6, 7, 8],
         [9, 10, 11, 12]])
z.shape
(3, 4)

现在尝试用(-1)重塑形状。结果新形状为(12,)并与原始形状(3,4)兼容

z.reshape(-1)
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

现在尝试用(-1,1)重塑形状。我们将列设置为1,将行设置为unknown。因此我们得到的新形状为(12,1)。又与原始形状(3,4)兼容

z.reshape(-1,1)
array([[ 1],
   [ 2],
   [ 3],
   [ 4],
   [ 5],
   [ 6],
   [ 7],
   [ 8],
   [ 9],
   [10],
   [11],
   [12]])

以上与numpy建议/错误消息一致,reshape(-1,1)用于单个功能;即单列

array.reshape(-1, 1)如果数据具有单一功能,则使用来重塑数据

新形状为(-1,2)。未知行,第2列。我们得到的新形状为(6,2)

z.reshape(-1, 2)
array([[ 1,  2],
   [ 3,  4],
   [ 5,  6],
   [ 7,  8],
   [ 9, 10],
   [11, 12]])

现在尝试使列为未知。新形状为(1,-1)。即,行为1,列未知。我们得到的结果新形状为(1,12)

z.reshape(1,-1)
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

以上与numpy建议/错误消息一致,reshape(1,-1)用于单个示例;即单排

使用数据array.reshape(1, -1)是否包含单个样本来重塑数据

新形状(2,-1)。第2行,列不明。我们得到的结果新形状为(2,6)

z.reshape(2, -1)
array([[ 1,  2,  3,  4,  5,  6],
   [ 7,  8,  9, 10, 11, 12]])

新形状为(3,-1)。第3行,列不明。我们得到的结果新形状为(3,4)

z.reshape(3, -1)
array([[ 1,  2,  3,  4],
   [ 5,  6,  7,  8],
   [ 9, 10, 11, 12]])

最后,如果我们尝试提供两个未知尺寸,即新形状为(-1,-1)。会抛出错误

z.reshape(-1, -1)
ValueError: can only specify one unknown dimension

The criterion to satisfy for providing the new shape is that ‘The new shape should be compatible with the original shape’

numpy allow us to give one of new shape parameter as -1 (eg: (2,-1) or (-1,3) but not (-1, -1)). It simply means that it is an unknown dimension and we want numpy to figure it out. And numpy will figure this by looking at the ‘length of the array and remaining dimensions’ and making sure it satisfies the above mentioned criteria

Now see the example.

z = np.array([[1, 2, 3, 4],
         [5, 6, 7, 8],
         [9, 10, 11, 12]])
z.shape
(3, 4)

Now trying to reshape with (-1) . Result new shape is (12,) and is compatible with original shape (3,4)

z.reshape(-1)
array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12])

Now trying to reshape with (-1, 1) . We have provided column as 1 but rows as unknown . So we get result new shape as (12, 1).again compatible with original shape(3,4)

z.reshape(-1,1)
array([[ 1],
   [ 2],
   [ 3],
   [ 4],
   [ 5],
   [ 6],
   [ 7],
   [ 8],
   [ 9],
   [10],
   [11],
   [12]])

The above is consistent with numpy advice/error message, to use reshape(-1,1) for a single feature; i.e. single column

Reshape your data using array.reshape(-1, 1) if your data has a single feature

New shape as (-1, 2). row unknown, column 2. we get result new shape as (6, 2)

z.reshape(-1, 2)
array([[ 1,  2],
   [ 3,  4],
   [ 5,  6],
   [ 7,  8],
   [ 9, 10],
   [11, 12]])

Now trying to keep column as unknown. New shape as (1,-1). i.e, row is 1, column unknown. we get result new shape as (1, 12)

z.reshape(1,-1)
array([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12]])

The above is consistent with numpy advice/error message, to use reshape(1,-1) for a single sample; i.e. single row

Reshape your data using array.reshape(1, -1) if it contains a single sample

New shape (2, -1). Row 2, column unknown. we get result new shape as (2,6)

z.reshape(2, -1)
array([[ 1,  2,  3,  4,  5,  6],
   [ 7,  8,  9, 10, 11, 12]])

New shape as (3, -1). Row 3, column unknown. we get result new shape as (3,4)

z.reshape(3, -1)
array([[ 1,  2,  3,  4],
   [ 5,  6,  7,  8],
   [ 9, 10, 11, 12]])

And finally, if we try to provide both dimension as unknown i.e new shape as (-1,-1). It will throw an error

z.reshape(-1, -1)
ValueError: can only specify one unknown dimension

回答 1

用于整形数组。

假设我们有一个尺寸为2 x 10 x 10的3维数组:

r = numpy.random.rand(2, 10, 10) 

现在我们要重塑为5 X 5 x 8:

numpy.reshape(r, shape=(5, 5, 8)) 

会做的工作。

请注意,一旦固定了第一个dim = 5和第二个dim = 5,就不需要确定第三维。为了帮助您懒惰,python提供了-1选项:

numpy.reshape(r, shape=(5, 5, -1)) 

将为您提供形状=(5,5,8)的数组。

同样

numpy.reshape(r, shape=(50, -1)) 

将为您提供形状=(50,4)的数组

您可以在http://anie.me/numpy-reshape-transpose-theano-dimshuffle/了解更多信息

Used to reshape an array.

Say we have a 3 dimensional array of dimensions 2 x 10 x 10:

r = numpy.random.rand(2, 10, 10) 

Now we want to reshape to 5 X 5 x 8:

numpy.reshape(r, shape=(5, 5, 8)) 

will do the job.

Note that, once you fix first dim = 5 and second dim = 5, you don’t need to determine third dimension. To assist your laziness, python gives the option of -1:

numpy.reshape(r, shape=(5, 5, -1)) 

will give you an array of shape = (5, 5, 8).

Likewise,

numpy.reshape(r, shape=(50, -1)) 

will give you an array of shape = (50, 4)

You can read more at http://anie.me/numpy-reshape-transpose-theano-dimshuffle/


回答 2

根据the documentation

newshape:int或int的元组

新形状应与原始形状兼容。如果是整数,则结果将是该长度的一维数组。一个形状尺寸可以为-1。在这种情况下,该值是根据数组的长度和其余维来推断的。

According to the documentation:

newshape : int or tuple of ints

The new shape should be compatible with the original shape. If an integer, then the result will be a 1-D array of that length. One shape dimension can be -1. In this case, the value is inferred from the length of the array and remaining dimensions.


回答 3

numpy.reshape(a,newshape,order {})检查以下链接以获取更多信息。 https://docs.scipy.org/doc/numpy/reference/generation/numpy.reshape.html

对于以下示例,您提到的输出将结果向量解释为单行。(-1)表示行数为1。

a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])
b = numpy.reshape(a, -1)

输出:

矩阵([[1、2、3、4、5、6、7、8]])

这可以用另一个示例更精确地解释:

b = np.arange(10).reshape((-1,1))

输出:(是一维列式数组)

数组([[0],

   [1],
   [2],
   [3],
   [4],
   [5],
   [6],
   [7],
   [8],
   [9]])

b = np.arange(10).reshape((1,-1))

输出:(是一维行数组)

数组([[0,1,2,3,4,5,6,7,8,9]])

numpy.reshape(a,newshape,order{}) check the below link for more info. https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

for the below example you mentioned the output explains the resultant vector to be a single row.(-1) indicates the number of rows to be 1. if the

a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])
b = numpy.reshape(a, -1)

output:

matrix([[1, 2, 3, 4, 5, 6, 7, 8]])

this can be explained more precisely with another example:

b = np.arange(10).reshape((-1,1))

output:(is a 1 dimensional columnar array)

array([[0],

   [1],
   [2],
   [3],
   [4],
   [5],
   [6],
   [7],
   [8],
   [9]])

b = np.arange(10).reshape((1,-1))

output:(is a 1 dimensional row array)

array([[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]])


回答 4

这很容易理解。“ -1”代表“未知尺寸”,可以从另一个尺寸推断出来。在这种情况下,如果您这样设置矩阵:

a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])

像这样修改矩阵:

b = numpy.reshape(a, -1)

它将对矩阵a调用一些默认操作,这将返回1-d numpy数组/矩阵。

但是,我认为使用这样的代码不是一个好主意。为什么不尝试:

b = a.reshape(1,-1)

它将为您提供相同的结果,并使读者更清楚地理解:将b设置为a的另一种形状。对于a,我们没有多少列(将其设置为-1!),但是我们想要一维数组(将第一个参数设置为1!)。

It is fairly easy to understand. The “-1” stands for “unknown dimension” which can should be infered from another dimension. In this case, if you set your matrix like this:

a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])

Modify your matrix like this:

b = numpy.reshape(a, -1)

It will call some deafult operations to the matrix a, which will return a 1-d numpy array/martrix.

However, I don’t think it is a good idea to use code like this. Why not try:

b = a.reshape(1,-1)

It will give you the same result and it’s more clear for readers to understand: Set b as another shape of a. For a, we don’t how much columns it should have(set it to -1!), but we want a 1-dimension array(set the first parameter to 1!).


回答 5

长话短说:您设置了一些尺寸,然后让NumPy设置了其余的尺寸。

(userDim1, userDim2, ..., -1) -->>

(userDim1, userDim1, ..., TOTAL_DIMENSION - (userDim1 + userDim2 + ...))

Long story short: you set some dimensions and let NumPy set the remaining(s).

(userDim1, userDim2, ..., -1) -->>

(userDim1, userDim1, ..., TOTAL_DIMENSION - (userDim1 + userDim2 + ...))

回答 6

这只是意味着您不确定可以提供多少行或列,而您正在让numpy建议要重整的列数或行数。

numpy提供了-1 https://docs.scipy.org/doc/numpy/reference/genic/numpy.reshape.html的最后一个示例

检查下面的代码及其输出以更好地了解(-1):

码:-

import numpy
a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])
print("Without reshaping  -> ")
print(a)
b = numpy.reshape(a, -1)
print("HERE We don't know about what number we should give to row/col")
print("Reshaping as (a,-1)")
print(b)
c = numpy.reshape(a, (-1,2))
print("HERE We just know about number of columns")
print("Reshaping as (a,(-1,2))")
print(c)
d = numpy.reshape(a, (2,-1))
print("HERE We just know about number of rows")
print("Reshaping as (a,(2,-1))")
print(d)

输出:-

Without reshaping  -> 
[[1 2 3 4]
 [5 6 7 8]]
HERE We don't know about what number we should give to row/col
Reshaping as (a,-1)
[[1 2 3 4 5 6 7 8]]
HERE We just know about number of columns
Reshaping as (a,(-1,2))
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
HERE We just know about number of rows
Reshaping as (a,(2,-1))
[[1 2 3 4]
 [5 6 7 8]]

It simply means that you are not sure about what number of rows or columns you can give and you are asking numpy to suggest number of column or rows to get reshaped in.

numpy provides last example for -1 https://docs.scipy.org/doc/numpy/reference/generated/numpy.reshape.html

check below code and its output to better understand about (-1):

CODE:-

import numpy
a = numpy.matrix([[1, 2, 3, 4], [5, 6, 7, 8]])
print("Without reshaping  -> ")
print(a)
b = numpy.reshape(a, -1)
print("HERE We don't know about what number we should give to row/col")
print("Reshaping as (a,-1)")
print(b)
c = numpy.reshape(a, (-1,2))
print("HERE We just know about number of columns")
print("Reshaping as (a,(-1,2))")
print(c)
d = numpy.reshape(a, (2,-1))
print("HERE We just know about number of rows")
print("Reshaping as (a,(2,-1))")
print(d)

OUTPUT :-

Without reshaping  -> 
[[1 2 3 4]
 [5 6 7 8]]
HERE We don't know about what number we should give to row/col
Reshaping as (a,-1)
[[1 2 3 4 5 6 7 8]]
HERE We just know about number of columns
Reshaping as (a,(-1,2))
[[1 2]
 [3 4]
 [5 6]
 [7 8]]
HERE We just know about number of rows
Reshaping as (a,(2,-1))
[[1 2 3 4]
 [5 6 7 8]]

回答 7

import numpy as np
x = np.array([[2,3,4], [5,6,7]]) 

# Convert any shape to 1D shape
x = np.reshape(x, (-1)) # Making it 1 row -> (6,)

# When you don't care about rows and just want to fix number of columns
x = np.reshape(x, (-1, 1)) # Making it 1 column -> (6, 1)
x = np.reshape(x, (-1, 2)) # Making it 2 column -> (3, 2)
x = np.reshape(x, (-1, 3)) # Making it 3 column -> (2, 3)

# When you don't care about columns and just want to fix number of rows
x = np.reshape(x, (1, -1)) # Making it 1 row -> (1, 6)
x = np.reshape(x, (2, -1)) # Making it 2 row -> (2, 3)
x = np.reshape(x, (3, -1)) # Making it 3 row -> (3, 2)
import numpy as np
x = np.array([[2,3,4], [5,6,7]]) 

# Convert any shape to 1D shape
x = np.reshape(x, (-1)) # Making it 1 row -> (6,)

# When you don't care about rows and just want to fix number of columns
x = np.reshape(x, (-1, 1)) # Making it 1 column -> (6, 1)
x = np.reshape(x, (-1, 2)) # Making it 2 column -> (3, 2)
x = np.reshape(x, (-1, 3)) # Making it 3 column -> (2, 3)

# When you don't care about columns and just want to fix number of rows
x = np.reshape(x, (1, -1)) # Making it 1 row -> (1, 6)
x = np.reshape(x, (2, -1)) # Making it 2 row -> (2, 3)
x = np.reshape(x, (3, -1)) # Making it 3 row -> (3, 2)

回答 8

转换的最终结果是,最终数组中的元素数量与初始数组或数据帧的元素数量相同。

-1对应于行或列的未知计数。我们可以将其视为x(未知)。x通过将原始数组中元素的数量除以有序对的其他值-1而获得。

例子

具有reshape(-1,1)的12个元素对应于x= 12/1 = 12行和1列的数组。


具有reshape(1,-1)的12个元素对应于具有1行x= 12/1 = 12列的数组。

The final outcome of the conversion is that the number of elements in the final array is same as that of the initial array or data frame.

-1 corresponds to the unknown count of the row or column. we can think of it as x(unknown). x is obtained by dividing the umber of elements in the original array by the other value of the ordered pair with -1.

Examples

12 elements with reshape(-1,1) corresponds to an array with x=12/1=12 rows and 1 column.


12 elements with reshape(1,-1) corresponds to an array with 1 row and x=12/1=12 columns.