Numpy argsort-它在做什么?

问题:Numpy argsort-它在做什么?

为什么numpy给出以下结果:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

当我期望它能做到这一点时:

[3 2 0 1]

显然,我对该功能缺乏了解。

Why is numpy giving this result:

x = numpy.array([1.48,1.41,0.0,0.1])
print x.argsort()

>[2 3 1 0]

when I’d expect it to do this:

[3 2 0 1]

Clearly my understanding of the function is lacking.


回答 0

根据文档

返回将对数组进行排序的索引。

  • 2是的索引0.0
  • 3是的索引0.1
  • 1是的索引1.41
  • 0是的索引1.48

According to the documentation

Returns the indices that would sort an array.

  • 2 is the index of 0.0.
  • 3 is the index of 0.1.
  • 1 is the index of 1.41.
  • 0 is the index of 1.48.

回答 1

[2, 3, 1, 0] 表示最小的元素位于索引2,其次最小的元素位于索引3,然后是索引1,然后是索引0。

多种方法可以获取您想要的结果:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

例如,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

这将检查它们是否都产生相同的结果:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

这些IPython %timeit基准测试建议大型阵列using_indexed_assignment最快:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

对于小型阵列,using_argsort_twice可能会更快:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

还请注意,这stats.rankdata使您可以更好地控制如何处理相等值的元素。

[2, 3, 1, 0] indicates that the smallest element is at index 2, the next smallest at index 3, then index 1, then index 0.

There are a number of ways to get the result you are looking for:

import numpy as np
import scipy.stats as stats

def using_indexed_assignment(x):
    "https://stackoverflow.com/a/5284703/190597 (Sven Marnach)"
    result = np.empty(len(x), dtype=int)
    temp = x.argsort()
    result[temp] = np.arange(len(x))
    return result

def using_rankdata(x):
    return stats.rankdata(x)-1

def using_argsort_twice(x):
    "https://stackoverflow.com/a/6266510/190597 (k.rooijers)"
    return np.argsort(np.argsort(x))

def using_digitize(x):
    unique_vals, index = np.unique(x, return_inverse=True)
    return np.digitize(x, bins=unique_vals) - 1

For example,

In [72]: x = np.array([1.48,1.41,0.0,0.1])

In [73]: using_indexed_assignment(x)
Out[73]: array([3, 2, 0, 1])

This checks that they all produce the same result:

x = np.random.random(10**5)
expected = using_indexed_assignment(x)
for func in (using_argsort_twice, using_digitize, using_rankdata):
    assert np.allclose(expected, func(x))

These IPython %timeit benchmarks suggests for large arrays using_indexed_assignment is the fastest:

In [50]: x = np.random.random(10**5)
In [66]: %timeit using_indexed_assignment(x)
100 loops, best of 3: 9.32 ms per loop

In [70]: %timeit using_rankdata(x)
100 loops, best of 3: 10.6 ms per loop

In [56]: %timeit using_argsort_twice(x)
100 loops, best of 3: 16.2 ms per loop

In [59]: %timeit using_digitize(x)
10 loops, best of 3: 27 ms per loop

For small arrays, using_argsort_twice may be faster:

In [78]: x = np.random.random(10**2)

In [81]: %timeit using_argsort_twice(x)
100000 loops, best of 3: 3.45 µs per loop

In [79]: %timeit using_indexed_assignment(x)
100000 loops, best of 3: 4.78 µs per loop

In [80]: %timeit using_rankdata(x)
100000 loops, best of 3: 19 µs per loop

In [82]: %timeit using_digitize(x)
10000 loops, best of 3: 26.2 µs per loop

Note also that stats.rankdata gives you more control over how to handle elements of equal value.


回答 2

由于文档说,argsort

返回将对数组进行排序的索引。

这意味着argsort的第一个元素是应首先排序的元素的索引,第二个元素是应第二个排序的元素的索引,依此类推。

您似乎想要的是值的排名顺序,这是由提供的scipy.stats.rankdata。请注意,如果队伍中有平局,您需要考虑应该怎么做。

As the documentation says, argsort:

Returns the indices that would sort an array.

That means the first element of the argsort is the index of the element that should be sorted first, the second element is the index of the element that should be second, etc.

What you seem to want is the rank order of the values, which is what is provided by scipy.stats.rankdata. Note that you need to think about what should happen if there are ties in the ranks.


回答 3

numpy.argsort(a,axis = -1,kind =’quicksort’,order = None)

返回将对数组进行排序的索引

使用kind关键字指定的算法沿给定的轴执行间接排序。它沿着给定的轴按排序顺序返回与该索引数据具有相同形状的索引数组。

考虑一下python中的一个示例,其中包含一个值列表

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

现在我们使用argsort函数:

import numpy as np
list(np.argsort(listExample))

输出将是

[0, 5, 6, 1, 3, 2, 4]

这是listExample中值索引的列表,如果将这些索引映射到各自的值,则将得到如下结果:

[0, 0, 1, 2, 2000, 2456, 5000]

(我发现此功能在许多地方都非常有用,例如,如果您想对列表/数组进行排序,但又不想使用list.sort()函数(即,不更改列表中实际值的顺序),则可以使用此功能功能。)

有关更多详细信息,请参见以下链接:https : //docs.scipy.org/doc/numpy-1.15.0/reference/genic/numpy.argsort.html

numpy.argsort(a, axis=-1, kind=’quicksort’, order=None)

Returns the indices that would sort an array

Perform an indirect sort along the given axis using the algorithm specified by the kind keyword. It returns an array of indices of the same shape as that index data along the given axis in sorted order.

Consider one example in python, having a list of values as

listExample  = [0 , 2, 2456,  2000, 5000, 0, 1]

Now we use argsort function:

import numpy as np
list(np.argsort(listExample))

The output will be

[0, 5, 6, 1, 3, 2, 4]

This is the list of indices of values in listExample if you map these indices to the respective values then we will get the result as follows:

[0, 0, 1, 2, 2000, 2456, 5000]

(I find this function very useful in many places e.g. If you want to sort the list/array but don’t want to use list.sort() function (i.e. without changing the order of actual values in the list) you can use this function.)

For more details refer this link: https://docs.scipy.org/doc/numpy-1.15.0/reference/generated/numpy.argsort.html


回答 4

输入:
将numpy导入为np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort()。argsort()

输出:
array([3,2,0,1])

input:
import numpy as np
x = np.array([1.48,1.41,0.0,0.1])
x.argsort().argsort()

output:
array([3, 2, 0, 1])


回答 5

首先,对数组进行了排序。然后使用数组的初始索引生成一个数组。

First, it was ordered the array. Then generate an array with the initial index of the array.


回答 6

np.argsort返回“种类”(指定排序算法的类型)给定的排序数组的索引。但是,当列表与np.argmax一起使用时,它将返回列表中最大元素的索引。而np.sort对给定的数组,列表进行排序。

np.argsort returns the index of the sorted array given by the ‘kind’ (which specifies the type of sorting algorithm). However, when a list is used with np.argmax, it returns the index of the largest element in the list. While, np.sort, sorts the given array, list.


回答 7

只是想直接将OP的原始理解与使用代码的实际实现进行对比。

numpy.argsort 定义为对于一维数组:

x[x.argsort()] == numpy.sort(x) # this will be an array of True's

OP最初认为其定义是针对一维数组:

x == numpy.sort(x)[x.argsort()] # this will not be True

注意:此代码在一般情况下不起作用(仅适用于1D),此答案仅用于说明目的。

Just want to directly contrast the OP’s original understanding against the actual implementation with code.

numpy.argsort is defined such that for 1D arrays:

x[x.argsort()] == numpy.sort(x) # this will be an array of True's

The OP originally thought that it was defined such that for 1D arrays:

x == numpy.sort(x)[x.argsort()] # this will not be True

Note: This code doesn’t work in the general case (only works for 1D), this answer is purely for illustration purposes.


回答 8

它根据给定的数组索引返回索引[1.48,1.41,0.0,0.1],这意味着: 0.0是索引[2]中的第一个元素。 0.1是index [3]中的第二个元素。 1.41是索引[1]中的第三个元素。 1.48是索引[0]中的第四个元素。输出:

[2,3,1,0]

It returns indices according to the given array indices,[1.48,1.41,0.0,0.1],that means: 0.0 is the first element, in index [2]. 0.1 is the second element, in index[3]. 1.41 is the third element, in index [1]. 1.48 is the fourth element, in index[0]. Output:

[2,3,1,0]