标签归档:sorted

python的sorted()函数是否保证稳定?

问题:python的sorted()函数是否保证稳定?

文档不能保证。还有其他记录在案的地方吗?

我猜想它可能是稳定的,因为可以保证列表上的sort方法是稳定的(注9:“从Python 2.3开始,保证sort()方法是稳定的”),并且sorted在功能上相似。但是,我找不到任何明确的说法。

目的:如果两个记录中的主键相等,则需要基于主键和辅助键进行排序。如果保证sorted()是稳定的,那么我可以对辅助键进行排序,然后对主键进行排序,并获得所需的结果。

PS:为避免任何混乱,我使用“稳定”的含义是“排序是稳定的,如果它保证不更改比较相等的元素的相对顺序”。

The documentation doesn’t guarantee that. Is there any other place that it is documented?

I’m guessing it might be stable since the sort method on lists is guaranteed to be stable (Notes 9th point: “Starting with Python 2.3, the sort() method is guaranteed to be stable”), and sorted is functionally similar. However, I’m not able to find any definitive source that says so.

Purpose: I need to sort based on a primary key and also a secondary key in cases where the primary key is equal in both records. If sorted() is guaranteed to be stable, I can sort on the secondary key, then sort on the primary key and get the result I need.

PS: To avoid any confusion, I’m using stable in the sense of “a sort is stable if it guarantees not to change the relative order of elements that compare equal”.


回答 0

是的,该手册的目的实际上是为了确保它sorted的稳定性,并且确实使用与该sort方法完全相同的算法。我的确意识到文档不是100%清楚这种身份。总是很高兴地接受doc补丁!

Yes, the intention of the manual is indeed to guarantee that sorted is stable and indeed that it uses exactly the same algorithm as the sort method. I do realize that the docs aren’t 100% clear about this identity; doc patches are always happily accepted!


回答 1

他们是稳定的

顺便说一句:您有时可以通过将多遍排序组合到单遍排序中而忽略了解排序和排序是否稳定。

例如,如果要排序基于它们的对象last_namefirst_name属性,你可以做一个合格:

sorted_list= sorted(
    your_sequence_of_items,
    key= lambda item: (item.last_name, item.first_name))

利用元组比较。

此答案按原样涵盖了原始问题。对于与排序有关的其他问题,有Python排序方法

They are stable.

By the way: you sometimes can ignore knowing whether sort and sorted are stable, by combining a multi-pass sort in a single-pass one.

For example, if you want to sort objects based on their last_name, first_name attributes, you can do it in one pass:

sorted_list= sorted(
    your_sequence_of_items,
    key= lambda item: (item.last_name, item.first_name))

taking advantage of tuple comparison.

This answer, as-is, covers the original question. For further sorting-related questions, there is the Python Sorting How-To.


回答 2

同时更改了文档(相关的commit),并且的当前文档sorted明确保证:

内置sorted()功能保证稳定。如果可以保证不更改比较相等的元素的相对顺序,则排序是稳定的-这有助于多次通过排序(例如,按部门排序,然后按薪级等级排序)。

该文档的这一部分已添加到Python 2.7和Python 3.4(+)中,因此该语言版本的任何兼容实现都应具有稳定的sorted

请注意,对于CPython,list.sortPython 2.3起一直保持稳定

  • 蒂姆·彼得斯(Tim Peters)重新编写了他的list.sort()实现-这是一种“稳定的排序”(相等的输入在输出中以相同的顺序出现)并且比以前更快。

我目前尚不确定100%的sorted使用率list.sort,但现在还可以查看历史记录。但是很可能“总是”使用了它list.sort

The documentation changed in the meantime (relevant commit) and the current documentation of sorted explicitly guarantees it:

The built-in sorted() function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade).

This part of the documentation was added to Python 2.7 and Python 3.4(+) so any compliant implementation of that language version should have a stable sorted.

Note that for CPython the list.sort has been stable since Python 2.3

  • Tim Peters rewrote his list.sort() implementation – this one is a “stable sort” (equal inputs appear in the same order in the output) and faster than before.

I’m not 100% sure on sorted, nowadays it simple uses list.sort, but I haven’t checked the history for that. But it’s likely that it “always” used list.sort.


回答 3

Python 2.4“新增功能”文档有效地指出了sorted()首先创建一个列表,然后在其上调用sort()的事实,这为您提供了所需的保证,尽管不在“官方”文档中。如果您真的很担心,也可以只检查源。

The “What’s New” docs for Python 2.4 effectively make the point that sorted() first creates a list, then calls sort() on it, providing you with the guarantee you need though not in the “official” docs. You could also just check the source, if you’re really concerned.


回答 4

现在,有关排序的Python 3.6文档指出:

排序保证稳定

此外,在该文档中,有一个指向稳定的Timsort的链接,其中指出:

自2.3版以来,Timsort一直是Python的标准排序算法

The Python 3.6 doc on sorting now states that

Sorts are guaranteed to be stable

Furthermore, in that document, there is a link to the stable Timsort, which states that

Timsort has been Python’s standard sorting algorithm since version 2.3


如何在Python中获取排序数组的索引

问题:如何在Python中获取排序数组的索引

我有一个数字列表:

myList = [1, 2, 3, 100, 5]

现在,如果我对该列表进行排序以获得[1, 2, 3, 5, 100]。我想要的是按排序顺序排列的原始列表中元素的索引,即[0, 1, 2, 4, 3] — ala MATLAB的sort函数,它既返回值又返回索引。

I have a numerical list:

myList = [1, 2, 3, 100, 5]

Now if I sort this list to obtain [1, 2, 3, 5, 100]. What I want is the indices of the elements from the original list in the sorted order i.e. [0, 1, 2, 4, 3] — ala MATLAB’s sort function that returns both values and indices.


回答 0

如果使用的是numpy,则可以使用argsort()函数:

>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

这将返回对数组或列表进行排序的参数。

If you are using numpy, you have the argsort() function available:

>>> import numpy
>>> numpy.argsort(myList)
array([0, 1, 2, 4, 3])

http://docs.scipy.org/doc/numpy/reference/generated/numpy.argsort.html

This returns the arguments that would sort the array or list.


回答 1

如下所示:

>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]

enumerate(myList) 给您一个包含(索引,值)元组的列表:

[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]

您可以通过将列表传递给sorted并指定一个函数来提取排序键(每个元组的第二个元素;这就是它的lambda目的)对列表进行排序。最后,使用[i[0] for i in ...]列表推导来提取每个已排序元素的原始索引。

Something like next:

>>> myList = [1, 2, 3, 100, 5]
>>> [i[0] for i in sorted(enumerate(myList), key=lambda x:x[1])]
[0, 1, 2, 4, 3]

enumerate(myList) gives you a list containing tuples of (index, value):

[(0, 1), (1, 2), (2, 3), (3, 100), (4, 5)]

You sort the list by passing it to sorted and specifying a function to extract the sort key (the second element of each tuple; that’s what the lambda is for. Finally, the original index of each sorted element is extracted using the [i[0] for i in ...] list comprehension.


回答 2

myList = [1, 2, 3, 100, 5]    
sorted(range(len(myList)),key=myList.__getitem__)

[0, 1, 2, 4, 3]
myList = [1, 2, 3, 100, 5]    
sorted(range(len(myList)),key=myList.__getitem__)

[0, 1, 2, 4, 3]

回答 3

答案enumerate很好,但我个人不喜欢用于按值排序的lambda。以下只是反转索引和值,并对它们进行排序。因此,它将首先按值排序,然后按索引排序。

sorted((e,i) for i,e in enumerate(myList))

The answers with enumerate are nice, but I personally don’t like the lambda used to sort by the value. The following just reverses the index and the value, and sorts that. So it’ll first sort by value, then by index.

sorted((e,i) for i,e in enumerate(myList))

回答 4

使用enumerate和更新了答案itemgetter

sorted(enumerate(a), key=lambda x: x[1])
# [(0, 1), (1, 2), (2, 3), (4, 5), (3, 100)]

将列表压缩在一起:元组中的第一个元素将是索引,第二个是值(然后使用元组的第二个值对其进行排序x[1],x是元组)

或者用itemgetteroperatormodule`:

from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))

Updated answer with enumerate and itemgetter:

sorted(enumerate(a), key=lambda x: x[1])
# [(0, 1), (1, 2), (2, 3), (4, 5), (3, 100)]

Zip the lists together: The first element in the tuple will the index, the second is the value (then sort it using the second value of the tuple x[1], x is the tuple)

Or using itemgetter from the operatormodule`:

from operator import itemgetter
sorted(enumerate(a), key=itemgetter(1))

回答 5

我使用perfplot(我的一个项目)对这些进行了快速性能检查,发现很难推荐除numpy之外的其他任何东西(请注意对数刻度):


复制剧情的代码:

import perfplot
import numpy


def sorted_enumerate(seq):
    return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]


def sorted_enumerate_key(seq):
    return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]


def sorted_range(seq):
    return sorted(range(len(seq)), key=seq.__getitem__)


def numpy_argsort(x):
    return numpy.argsort(x)


perfplot.save(
    "argsort.png",
    setup=lambda n: numpy.random.rand(n),
    kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, numpy_argsort],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(x)",
)

I did a quick performance check on these with perfplot (a project of mine) and found that it’s hard to recommend anything else but numpy (note the log scale):


Code to reproduce the plot:

import perfplot
import numpy


def sorted_enumerate(seq):
    return [i for (v, i) in sorted((v, i) for (i, v) in enumerate(seq))]


def sorted_enumerate_key(seq):
    return [x for x, y in sorted(enumerate(seq), key=lambda x: x[1])]


def sorted_range(seq):
    return sorted(range(len(seq)), key=seq.__getitem__)


def numpy_argsort(x):
    return numpy.argsort(x)


perfplot.save(
    "argsort.png",
    setup=lambda n: numpy.random.rand(n),
    kernels=[sorted_enumerate, sorted_enumerate_key, sorted_range, numpy_argsort],
    n_range=[2 ** k for k in range(15)],
    xlabel="len(x)",
)

回答 6

如果您不想使用numpy,

sorted(range(len(seq)), key=seq.__getitem__)

是最快的,这表现在这里

If you do not want to use numpy,

sorted(range(len(seq)), key=seq.__getitem__)

is fastest, as demonstrated here.


回答 7

本质上,您需要argsort执行,所需的实现取决于您是要使用外部库(例如NumPy)还是要保持纯Python的依赖关系。

您需要问自己的问题是:您是否想要

  • 将数组/列表排序的索引
  • 元素在排序数组/列表中将具有的索引

不幸的是,问题中的示例并未明确说明所需的内容,因为两者都会给出相同的结果:

>>> arr = np.array([1, 2, 3, 100, 5])

>>> np.argsort(np.argsort(arr))
array([0, 1, 2, 4, 3], dtype=int64)

>>> np.argsort(arr)
array([0, 1, 2, 4, 3], dtype=int64)

选择argsort实施

如果您可以使用NumPy,则只需使用该函数numpy.argsort或方法即可numpy.ndarray.argsort

已经在其他一些答案中提到了没有NumPy的实现,因此我将根据此处的基准答案来概述最快的解决方案

def argsort(l):
    return sorted(range(len(l)), key=l.__getitem__)

获取将对数组/列表进行排序的索引

要获取对数组/列表进行排序的索引,您只需调用argsort数组或列表即可。我在这里使用的是NumPy版本,但是Python实现应该给出相同的结果

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(arr)
array([1, 2, 0, 3], dtype=int64)

结果包含获取排序数组所需的索引。

由于排序数组将是[1, 2, 3, 4]argsorted数组,因此包含原始元素中这些元素的索引。

  • 最小值为1,它1在原始索引中为index ,因此结果的第一个元素为1
  • 由于2at 2是原始索引的索引,因此结果的第二个元素是2
  • 由于3at 0是原始索引的索引,因此结果的第三个元素是0
  • 最大值4,它3在原始索引中,因此结果的最后一个元素是3

获取元素在排序数组/列表中的索引

在这种情况下,您需要申请argsort 两次

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(np.argsort(arr))
array([2, 0, 1, 3], dtype=int64)

在这种情况下 :

  • 原始元素的第一个元素是3,这是第三个最大值,因此它将2在排序后的数组/列表中具有索引,因此第一个元素是2
  • 原始元素的第二个元素是1,这是最小值,因此它将0在排序后的数组/列表中具有索引,因此第二个元素是0
  • 原始元素的第三个元素是2,这是第二个最小的值,因此它将1在排序后的数组/列表中具有索引,因此第三个元素是1
  • 原始元素的第四个元素4是最大值,因此它将3在排序后的数组/列表中具有索引,因此最后一个元素是3

Essentially you need to do an argsort, what implementation you need depends if you want to use external libraries (e.g. NumPy) or if you want to stay pure-Python without dependencies.

The question you need to ask yourself is: Do you want the

  • indices that would sort the array/list
  • indices that the elements would have in the sorted array/list

Unfortunately the example in the question doesn’t make it clear what is desired because both will give the same result:

>>> arr = np.array([1, 2, 3, 100, 5])

>>> np.argsort(np.argsort(arr))
array([0, 1, 2, 4, 3], dtype=int64)

>>> np.argsort(arr)
array([0, 1, 2, 4, 3], dtype=int64)

Choosing the argsort implementation

If you have NumPy at your disposal you can simply use the function numpy.argsort or method numpy.ndarray.argsort.

An implementation without NumPy was mentioned in some other answers already, so I’ll just recap the fastest solution according to the benchmark answer here

def argsort(l):
    return sorted(range(len(l)), key=l.__getitem__)

Getting the indices that would sort the array/list

To get the indices that would sort the array/list you can simply call argsort on the array or list. I’m using the NumPy versions here but the Python implementation should give the same results

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(arr)
array([1, 2, 0, 3], dtype=int64)

The result contains the indices that are needed to get the sorted array.

Since the sorted array would be [1, 2, 3, 4] the argsorted array contains the indices of these elements in the original.

  • The smallest value is 1 and it is at index 1 in the original so the first element of the result is 1.
  • The 2 is at index 2 in the original so the second element of the result is 2.
  • The 3 is at index 0 in the original so the third element of the result is 0.
  • The largest value 4 and it is at index 3 in the original so the last element of the result is 3.

Getting the indices that the elements would have in the sorted array/list

In this case you would need to apply argsort twice:

>>> arr = np.array([3, 1, 2, 4])
>>> np.argsort(np.argsort(arr))
array([2, 0, 1, 3], dtype=int64)

In this case :

  • the first element of the original is 3, which is the third largest value so it would have index 2 in the sorted array/list so the first element is 2.
  • the second element of the original is 1, which is the smallest value so it would have index 0 in the sorted array/list so the second element is 0.
  • the third element of the original is 2, which is the second-smallest value so it would have index 1 in the sorted array/list so the third element is 1.
  • the fourth element of the original is 4 which is the largest value so it would have index 3 in the sorted array/list so the last element is 3.

回答 8

其他答案是错误的。

运行argsort一次不是解决方案。例如,以下代码:

import numpy as np
x = [3,1,2]
np.argsort(x)

Yieldarray([1, 2, 0], dtype=int64)不是我们想要的。

答案应该是运行argsort两次:

import numpy as np
x = [3,1,2]
np.argsort(np.argsort(x))

给出array([2, 0, 1], dtype=int64)预期。

The other answers are WRONG.

Running argsort once is not the solution. For example, the following code:

import numpy as np
x = [3,1,2]
np.argsort(x)

yields array([1, 2, 0], dtype=int64) which is not what we want.

The answer should be to run argsort twice:

import numpy as np
x = [3,1,2]
np.argsort(np.argsort(x))

gives array([2, 0, 1], dtype=int64) as expected.


回答 9

将numpy导入为np

索引

S=[11,2,44,55,66,0,10,3,33]

r=np.argsort(S)

[output]=array([5, 1, 7, 6, 0, 8, 2, 3, 4])

argsort按排序顺序返回S的索引

物有所值

np.sort(S)

[output]=array([ 0,  2,  3, 10, 11, 33, 44, 55, 66])

Import numpy as np

FOR INDEX

S=[11,2,44,55,66,0,10,3,33]

r=np.argsort(S)

[output]=array([5, 1, 7, 6, 0, 8, 2, 3, 4])

argsort Returns the indices of S in sorted order

FOR VALUE

np.sort(S)

[output]=array([ 0,  2,  3, 10, 11, 33, 44, 55, 66])

回答 10

我们将创建另一个从0到n-1的索引数组,然后将其压缩到原始数组,然后根据原始值对其进行排序

ar = [1,2,3,4,5]
new_ar = list(zip(ar,[i for i in range(len(ar))]))
new_ar.sort()

`

We will create another array of indexes from 0 to n-1 Then zip this to the original array and then sort it on the basis of the original values

ar = [1,2,3,4,5]
new_ar = list(zip(ar,[i for i in range(len(ar))]))
new_ar.sort()

`