## 问题：可以按降序使用argsort吗？

``````avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]``````

Consider the following code:

``````avgDists = np.array([1, 8, 6, 9, 4])
ids = avgDists.argsort()[:n]
``````

This gives me indices of the `n` smallest elements. Is it possible to use this same `argsort` in descending order to get the indices of `n` highest elements?

## 回答 0

``(-avgDists).argsort()[:n]``

``avgDists.argsort()[::-1][:n]``

``````>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)``````

``````>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)``````

If you negate an array, the lowest elements become the highest elements and vice-versa. Therefore, the indices of the `n` highest elements are:

``````(-avgDists).argsort()[:n]
``````

Another way to reason about this, as mentioned in the comments, is to observe that the big elements are coming last in the argsort. So, you can read from the tail of the argsort to find the `n` highest elements:

``````avgDists.argsort()[::-1][:n]
``````

Both methods are O(n log n) in time complexity, because the `argsort` call is the dominant term here. But the second approach has a nice advantage: it replaces an O(n) negation of the array with an O(1) slice. If you’re working with small arrays inside loops then you may get some performance gains from avoiding that negation, and if you’re working with huge arrays then you can save on memory usage because the negation creates a copy of the entire array.

Note that these methods do not always give equivalent results: if a stable sort implementation is requested to `argsort`, e.g. by passing the keyword argument `kind='mergesort'`, then the first strategy will preserve the sorting stability, but the second strategy will break stability (i.e. the positions of equal items will get reversed).

Example timings:

Using a small array of 100 floats and a length 30 tail, the view method was about 15% faster

``````>>> avgDists = np.random.rand(100)
>>> n = 30
>>> timeit (-avgDists).argsort()[:n]
1.93 µs ± 6.68 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
1.64 µs ± 3.39 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
1.64 µs ± 3.66 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)
``````

For larger arrays, the argsort is dominant and there is no significant timing difference

``````>>> avgDists = np.random.rand(1000)
>>> n = 300
>>> timeit (-avgDists).argsort()[:n]
21.9 µs ± 51.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[::-1][:n]
21.7 µs ± 33.3 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> timeit avgDists.argsort()[-n:][::-1]
21.9 µs ± 37.1 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
``````

Please note that the comment from nedim below is incorrect. Whether to truncate before or after reversing makes no difference in efficiency, since both of these operations are only striding a view of the array differently and not actually copying data.

## 回答 1

``````>>> avgDists=np.array([1, 8, 6, 9, 4])
>>> n=3
>>> ids = avgDists.argsort()[::-1][:n]
>>> ids
array([3, 1, 2])``````

``````>>> ids.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False``````

（“ OWNDATA”为False表示这是一个视图，而不是副本）

``(-avgDists).argsort()[:n]``

``````>>> (-avgDists)
array([-1, -8, -6, -9, -4])``````

ANd为此创建了一个副本：

``````>>> (-avgDists_n).flags['OWNDATA']
True``````

``````>>> import timeit
>>> timeit.timeit('(-avgDists).argsort()[:3]', setup="from __main__ import avgDists")
4.2879798610229045
>>> timeit.timeit('avgDists.argsort()[::-1][:3]', setup="from __main__ import avgDists")
2.8372560259886086``````

Just like Python, in that `[::-1]` reverses the array returned by `argsort()` and `[:n]` gives that last n elements:

``````>>> avgDists=np.array([1, 8, 6, 9, 4])
>>> n=3
>>> ids = avgDists.argsort()[::-1][:n]
>>> ids
array([3, 1, 2])
``````

The advantage of this method is that `ids` is a view of avgDists:

``````>>> ids.flags
C_CONTIGUOUS : False
F_CONTIGUOUS : False
OWNDATA : False
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
``````

(The ‘OWNDATA’ being False indicates this is a view, not a copy)

Another way to do this is something like:

``````(-avgDists).argsort()[:n]
``````

The problem is that the way this works is to create negative of each element in the array:

``````>>> (-avgDists)
array([-1, -8, -6, -9, -4])
``````

ANd creates a copy to do so:

``````>>> (-avgDists_n).flags['OWNDATA']
True
``````

So if you time each, with this very small data set:

``````>>> import timeit
>>> timeit.timeit('(-avgDists).argsort()[:3]', setup="from __main__ import avgDists")
4.2879798610229045
>>> timeit.timeit('avgDists.argsort()[::-1][:3]', setup="from __main__ import avgDists")
2.8372560259886086
``````

The view method is substantially faster (and uses 1/2 the memory…)

## 回答 2

You can use the flip commands `numpy.flipud()` or `numpy.fliplr()` to get the indexes in descending order after sorting using the `argsort` command. Thats what I usually do.

## 回答 3

``````>>> avgDists = [1, 8, 6, 9, 4]
>>> np.array(avgDists).argpartition(2)[:2]  # indices of lowest 2 items
array([0, 4], dtype=int64)

>>> np.array(avgDists).argpartition(-2)[-2:]  # indices of highest 2 items
array([1, 3], dtype=int64)``````

Instead of using `np.argsort` you could use `np.argpartition` – if you only need the indices of the lowest/highest n elements.

That doesn’t require to sort the whole array but just the part that you need but note that the “order inside your partition” is undefined, so while it gives the correct indices they might not be correctly ordered:

``````>>> avgDists = [1, 8, 6, 9, 4]
>>> np.array(avgDists).argpartition(2)[:2]  # indices of lowest 2 items
array([0, 4], dtype=int64)

>>> np.array(avgDists).argpartition(-2)[-2:]  # indices of highest 2 items
array([1, 3], dtype=int64)
``````

## 回答 4

You could create a copy of the array and then multiply each element with -1.
As an effect the before largest elements would become the smallest.
The indeces of the n smallest elements in the copy are the n greatest elements in the original.

## 回答 5

``````import numpy as np

avgDists = np.array([1, 8, 6, 9, 4])
ids = np.flip(np.argsort(avgDists))
print(ids)``````

As @Kanmani hinted, an easier to interpret implementation may use `numpy.flip`, as in the following:

``````import numpy as np

avgDists = np.array([1, 8, 6, 9, 4])
ids = np.flip(np.argsort(avgDists))
print(ids)
``````

By using the visitor pattern rather than member functions, it is easier to read the order of operations.

## 回答 6

``avgDists = np.array([1, 8, 6, 9, 4])``

``ids = np.argpartition(avgDists, -n)[-n:]``

``ids = ids[np.argsort(avgDists[ids])[::-1]]``

``````>>> avgDists[ids]
array([9, 8, 6, 4])``````

With your example:

``````avgDists = np.array([1, 8, 6, 9, 4])
``````

Obtain indexes of n maximal values:

``````ids = np.argpartition(avgDists, -n)[-n:]
``````

Sort them in descending order:

``````ids = ids[np.argsort(avgDists[ids])[::-1]]
``````

Obtain results (for n=4):

``````>>> avgDists[ids]
array([9, 8, 6, 4])
``````

## 回答 7

Another way is to use only a ‘-‘ in the argument for argsort as in : “df[np.argsort(-df[:, 0])]”, provided df is the dataframe and you want to sort it by the first column (represented by the column number ‘0’). Change the column-name as appropriate. Of course, the column has to be a numeric one.