The documentation doesn’t guarantee that. Is there any other place that it is documented?
I’m guessing it might be stable since the sort method on lists is guaranteed to be stable (Notes 9th point: “Starting with Python 2.3, the sort() method is guaranteed to be stable”), and sorted is functionally similar. However, I’m not able to find any definitive source that says so.
Purpose: I need to sort based on a primary key and also a secondary key in cases where the primary key is equal in both records. If sorted() is guaranteed to be stable, I can sort on the secondary key, then sort on the primary key and get the result I need.
PS: To avoid any confusion, I’m using stable in the sense of “a sort is stable if it guarantees not to change the relative order of elements that compare equal”.
Yes, the intention of the manual is indeed to guarantee that sorted is stable and indeed that it uses exactly the same algorithm as the sort method. I do realize that the docs aren’t 100% clear about this identity; doc patches are always happily accepted!
The documentation changed in the meantime (relevant commit) and the current documentation of sorted explicitly guarantees it:
The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).
This part of the documentation was added to Python 2.7 and Python 3.4(+) so any compliant implementation of that language version should have a stable sorted.
Note that for CPython the list.sort has been stable since Python 2.3
Tim Peters rewrote his list.sort() implementation – this one is a “stable sort” (equal inputs appear in the same order in the output) and faster than before.
I’m not 100% sure on sorted, nowadays it simple uses list.sort, but I haven’t checked the history for that. But it’s likely that it “always” used list.sort.
The “What’s New” docs for Python 2.4 effectively make the point that sorted() first creates a list, then calls sort() on it, providing you with the guarantee you need though not in the “official” docs. You could also just check the source, if you’re really concerned.
Now if I sort this list to obtain [1, 2, 3, 5, 100].
What I want is the indices of the elements from the
original list in the sorted order i.e. [0, 1, 2, 4, 3]
— ala MATLAB’s sort function that returns both
values and indices.
>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]
enumerate(myList) gives you a list containing tuples of (index, value):
[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]
You sort the list by passing it to sorted and specifying a function to extract the sort key (the second element of each tuple; that’s what the lambda is for. Finally, the original index of each sorted element is extracted using the [i[0] for i in ...] list comprehension.
The answers with enumerate are nice, but I personally don’t like the lambda used to sort by the value. The following just reverses the index and the value, and sorts that. So it’ll first sort by value, then by index.
Zip the lists together: The first element in the tuple will the index, the second is the value (then sort it using the second value of the tuple x[1], x is the tuple)
Or using itemgetter from the operatormodule`:
from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))
I did a quick performance check on these with perfplot (a project of mine) and found that it’s hard to recommend anything else but numpy (note the log scale):
Code to reproduce the plot:
import perfplot
import numpy
def sorted_enumerate(seq):
return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]
def sorted_enumerate_key(seq):
return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]
def sorted_range(seq):
return sorted(range(len(seq)), key=seq.__getitem__)
def numpy_argsort(x):
return numpy.argsort(x)
perfplot.save(
"argsort.png",
setup=lambda n: numpy.random.rand(n),
kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, numpy_argsort],
n_range=[2 ** k for k in range(15)],
xlabel="len(x)",
)
Essentially you need to do an argsort, what implementation you need depends if you want to use external libraries (e.g. NumPy) or if you want to stay pure-Python without dependencies.
The question you need to ask yourself is: Do you want the
indices that would sort the array/list
indices that the elements would have in the sorted array/list
Unfortunately the example in the question doesn’t make it clear what is desired because both will give the same result:
An implementation without NumPy was mentioned in some other answers already, so I’ll just recap the fastest solution according to the benchmark answer here
Getting the indices that would sort the array/list
To get the indices that would sort the array/list you can simply call argsort on the array or list. I’m using the NumPy versions here but the Python implementation should give the same results
the first element of the original is 3, which is the third largest value so it would have index 2 in the sorted array/list so the first element is 2.
the second element of the original is 1, which is the smallest value so it would have index 0 in the sorted array/list so the second element is 0.
the third element of the original is 2, which is the second-smallest value so it would have index 1 in the sorted array/list so the third element is 1.
the fourth element of the original is 4 which is the largest value so it would have index 3 in the sorted array/list so the last element is 3.
回答 8
其他答案是错误的。
运行argsort一次不是解决方案。例如,以下代码:
import numpy as np
x =[3,1,2]
np.argsort(x)
Yieldarray([1, 2, 0], dtype=int64)不是我们想要的。
答案应该是运行argsort两次:
import numpy as np
x =[3,1,2]
np.argsort(np.argsort(x))