标签归档:sorting

在Python中,如何按已排序的键顺序遍历字典?

问题:在Python中,如何按已排序的键顺序遍历字典?

有一个现有功能以以下结尾,其中d是一个字典:

return d.iteritems()

返回给定字典的未排序迭代器。我想返回一个遍历按key排序的项目的迭代器。我怎么做?

There’s an existing function that ends in the following, where d is a dictionary:

return d.iteritems()

that returns an unsorted iterator for a given dictionary. I would like to return an iterator that goes through the items sorted by key. How do I do that?


回答 0

尚未对此进行广泛的测试,但是可以在Python 2.5.2中使用。

>>> d = {"x":2, "h":15, "a":2222}
>>> it = iter(sorted(d.iteritems()))
>>> it.next()
('a', 2222)
>>> it.next()
('h', 15)
>>> it.next()
('x', 2)
>>>

如果您习惯于使用for key, value in d.iteritems(): ...迭代器而不是迭代器,那么上述方法仍然可以使用

>>> d = {"x":2, "h":15, "a":2222}
>>> for key, value in sorted(d.iteritems()):
>>>     print(key, value)
('a', 2222)
('h', 15)
('x', 2)
>>>

在Python 3.x中,使用d.items()代替d.iteritems()返回迭代器。

Haven’t tested this very extensively, but works in Python 2.5.2.

>>> d = {"x":2, "h":15, "a":2222}
>>> it = iter(sorted(d.iteritems()))
>>> it.next()
('a', 2222)
>>> it.next()
('h', 15)
>>> it.next()
('x', 2)
>>>

If you are used to doing for key, value in d.iteritems(): ... instead of iterators, this will still work with the solution above

>>> d = {"x":2, "h":15, "a":2222}
>>> for key, value in sorted(d.iteritems()):
>>>     print(key, value)
('a', 2222)
('h', 15)
('x', 2)
>>>

With Python 3.x, use d.items() instead of d.iteritems() to return an iterator.


回答 1

使用 sorted()功能:

return sorted(dict.iteritems())

如果您想在排序结果上使用实际的迭代器,由于sorted()返回列表,请使用:

return iter(sorted(dict.iteritems()))

Use the sorted() function:

return sorted(dict.iteritems())

If you want an actual iterator over the sorted results, since sorted() returns a list, use:

return iter(sorted(dict.iteritems()))

回答 2

字典的键存储在哈希表中,这就是它们的“自然顺序”,即伪随机。任何其他顺序都是字典使用者的概念。

sorted()始终返回列表,而不是字典。如果将其传递给dict.items()(将生成一个元组列表),它将返回一个元组列表[[k1,v1),(k2,v2),…],可在循环中使用在某种程度上非常像一个字典,但无论如何它都不是一个字典

foo = {
    'a':    1,
    'b':    2,
    'c':    3,
    }

print foo
>>> {'a': 1, 'c': 3, 'b': 2}

print foo.items()
>>> [('a', 1), ('c', 3), ('b', 2)]

print sorted(foo.items())
>>> [('a', 1), ('b', 2), ('c', 3)]

以下内容看起来像是循环中的字典,但事实并非如此,它是将元组解压缩为k,v的列表:

for k,v in sorted(foo.items()):
    print k, v

大致相当于:

for k in sorted(foo.keys()):
    print k, foo[k]

A dict’s keys are stored in a hashtable so that is their ‘natural order’, i.e. psuedo-random. Any other ordering is a concept of the consumer of the dict.

sorted() always returns a list, not a dict. If you pass it a dict.items() (which produces a list of tuples), it will return a list of tuples [(k1,v1), (k2,v2), …] which can be used in a loop in a way very much like a dict, but it is not in anyway a dict!

foo = {
    'a':    1,
    'b':    2,
    'c':    3,
    }

print foo
>>> {'a': 1, 'c': 3, 'b': 2}

print foo.items()
>>> [('a', 1), ('c', 3), ('b', 2)]

print sorted(foo.items())
>>> [('a', 1), ('b', 2), ('c', 3)]

The following feels like a dict in a loop, but it’s not, it’s a list of tuples being unpacked into k,v:

for k,v in sorted(foo.items()):
    print k, v

Roughly equivalent to:

for k in sorted(foo.keys()):
    print k, foo[k]

回答 3

格雷格的答案是正确的。请注意,在Python 3.0中,您必须

sorted(dict.items())

iteritems将不复存在。

Greg’s answer is right. Note that in Python 3.0 you’ll have to do

sorted(dict.items())

as iteritems will be gone.


回答 4

您现在也可以OrderedDict在Python 2.7中使用:

>>> from collections import OrderedDict
>>> d = OrderedDict([('first', 1),
...                  ('second', 2),
...                  ('third', 3)])
>>> d.items()
[('first', 1), ('second', 2), ('third', 3)]

在这里,您将获得2.7版本的新功能页面和OrderedDict API

You can now use OrderedDict in Python 2.7 as well:

>>> from collections import OrderedDict
>>> d = OrderedDict([('first', 1),
...                  ('second', 2),
...                  ('third', 3)])
>>> d.items()
[('first', 1), ('second', 2), ('third', 3)]

Here you have the what’s new page for 2.7 version and the OrderedDict API.


回答 5

通常,可以将这样的命令排序为:

for k in sorted(d):
    print k, d[k]

对于问题中的特定情况,对于d.iteritems()具有“替换”功能,请添加以下函数:

def sortdict(d, **opts):
    # **opts so any currently supported sorted() options can be passed
    for k in sorted(d, **opts):
        yield k, d[k]

所以终点线从

return dict.iteritems()

return sortdict(dict)

要么

return sortdict(dict, reverse = True)

In general, one may sort a dict like so:

for k in sorted(d):
    print k, d[k]

For the specific case in the question, having a “drop in replacement” for d.iteritems(), add a function like:

def sortdict(d, **opts):
    # **opts so any currently supported sorted() options can be passed
    for k in sorted(d, **opts):
        yield k, d[k]

and so the ending line changes from

return dict.iteritems()

to

return sortdict(dict)

or

return sortdict(dict, reverse = True)

回答 6

>>> import heapq
>>> d = {"c": 2, "b": 9, "a": 4, "d": 8}
>>> def iter_sorted(d):
        keys = list(d)
        heapq.heapify(keys) # Transforms to heap in O(N) time
        while keys:
            k = heapq.heappop(keys) # takes O(log n) time
            yield (k, d[k])


>>> i = iter_sorted(d)
>>> for x in i:
        print x


('a', 4)
('b', 9)
('c', 2)
('d', 8)

此方法仍然具有O(N log N)排序,但是,经过短暂的线性堆化后,它会按排序顺序生成项目,从理论上讲,当您不总是需要整个列表时,它会更加高效。

>>> import heapq
>>> d = {"c": 2, "b": 9, "a": 4, "d": 8}
>>> def iter_sorted(d):
        keys = list(d)
        heapq.heapify(keys) # Transforms to heap in O(N) time
        while keys:
            k = heapq.heappop(keys) # takes O(log n) time
            yield (k, d[k])


>>> i = iter_sorted(d)
>>> for x in i:
        print x


('a', 4)
('b', 9)
('c', 2)
('d', 8)

This method still has an O(N log N) sort, however, after a short linear heapify, it yields the items in sorted order as it goes, making it theoretically more efficient when you do not always need the whole list.


回答 7

如果要按插入项的顺序而不是键的顺序进行排序,则应查看Python的collections.OrderedDict。(仅适用于Python 3)

If you want to sort by the order that items were inserted instead of of the order of the keys, you should have a look to Python’s collections.OrderedDict. (Python 3 only)


回答 8

sorted返回一个列表,因此在尝试对其进行迭代时会出错,但是由于无法订购字典,因此必须处理列表。

我不知道您的代码的较大上下文是什么,但是您可以尝试将迭代器添加到结果列表中。像这样吗?:

return iter(sorted(dict.iteritems()))

当然,您现在将返回元组,因为排序使您的字典变成了元组列表

例如:说您的字典是: {'a':1,'c':3,'b':2} 排序后将其变成一个列表:

[('a',1),('b',2),('c',3)]

因此,当您实际遍历该列表时,您会返回(在本示例中)一个由字符串和整数组成的元组,但是至少您可以对它进行遍历。

sorted returns a list, hence your error when you try to iterate over it, but because you can’t order a dict you will have to deal with a list.

I have no idea what the larger context of your code is, but you could try adding an iterator to the resulting list. like this maybe?:

return iter(sorted(dict.iteritems()))

of course you will be getting back tuples now because sorted turned your dict into a list of tuples

ex: say your dict was: {'a':1,'c':3,'b':2} sorted turns it into a list:

[('a',1),('b',2),('c',3)]

so when you actually iterate over the list you get back (in this example) a tuple composed of a string and an integer, but at least you will be able to iterate over it.


回答 9

假设您正在使用CPython 2.x并拥有一个较大的字典mydict,那么使用sorted(mydict)将会很慢,因为sorted会建立mydict键的排序列表。

在那种情况下,您可能要看一下我的orderdict包,其中包括sorteddictin C 的C实现。尤其是如果您必须在字典生命周期的不同阶段(即元素数)多次遍历键的排序列表时,请注意。

http://anthon.home.xs4all.nl/Python/ordereddict/

Assuming you are using CPython 2.x and have a large dictionary mydict, then using sorted(mydict) is going to be slow because sorted builds a sorted list of the keys of mydict.

In that case you might want to look at my ordereddict package which includes a C implementation of sorteddict in C. Especially if you have to go over the sorted list of keys multiple times at different stages (ie. number of elements) of the dictionaries lifetime.

http://anthon.home.xs4all.nl/Python/ordereddict/


sorted(list)和list.sort()有什么区别?

问题:sorted(list)和list.sort()有什么区别?

list.sort()对列表进行排序并替换原始列表,而sorted(list)返回列表的排序后的副本,而不更改原始列表。

  • 什么时候比另一种更好?
  • 哪个更有效?减多少
  • list.sort()执行完列表后,可以将列表还原为未排序状态吗?

list.sort() sorts the list and replaces the original list, whereas sorted(list) returns a sorted copy of the list, without changing the original list.

  • When is one preferred over the other?
  • Which is more efficient? By how much?
  • Can a list be reverted to the unsorted state after list.sort() has been performed?

回答 0

sorted()返回一个新的排序列表,而原始列表不受影响。就地list.sort()对列表进行排序,使列表索引突变,然后返回None(就像所有就地操作一样)。

sorted()适用于任何可迭代的对象,而不仅仅是列表。字符串,元组,字典(您将获得键),生成器等,返回包含所有已排序元素的列表。

  • 使用list.sort()时要变异列表中,sorted()当你想要一个新的排序的对象返回。使用sorted()时要排序的东西,是一个可迭代,而不是一个名单尚未

  • 对于列表,list.sort()它比sorted()不必创建副本要快。对于其他任何迭代,您别无选择。

  • 不,您无法检索原始位置。一旦您调用list.sort(),原始订单就消失了。

sorted() returns a new sorted list, leaving the original list unaffected. list.sort() sorts the list in-place, mutating the list indices, and returns None (like all in-place operations).

sorted() works on any iterable, not just lists. Strings, tuples, dictionaries (you’ll get the keys), generators, etc., returning a list containing all elements, sorted.

  • Use list.sort() when you want to mutate the list, sorted() when you want a new sorted object back. Use sorted() when you want to sort something that is an iterable, not a list yet.

  • For lists, list.sort() is faster than sorted() because it doesn’t have to create a copy. For any other iterable, you have no choice.

  • No, you cannot retrieve the original positions. Once you called list.sort() the original order is gone.


回答 1

sorted(list)vs和有list.sort()什么区别?

  • list.sort 原位修改列表并返回 None
  • sorted 接受所有可迭代的并返回一个已排序的新列表。

sorted 等效于此Python实现,但是CPython内置函数应该是用C编写的,因此运行起来应可测量得更快:

def sorted(iterable, key=None):
    new_list = list(iterable)    # make a new list
    new_list.sort(key=key)       # sort it
    return new_list              # return it

什么时候使用?

  • 使用list.sort时,你不希望保留原来的排序顺序(这样你就可以重新使用内存就地列表。)当你在列表的唯一所有者(如果该名单是由其他代码和你共享对其进行更改,则可以在使用该列表的地方引入错误。)
  • 使用sorted时要创建一个新的列表,只有本地代码拥有,当你想保留原来的排序顺序或。

list.sort()之后是否可以检索列表的原始位置?

否-除非您自己进行复制,否则该信息将丢失,因为该排序是就地完成的。

“哪个更快?又快多少?”

为了说明创建新列表的代价,请使用timeit模块,这是我们的设置:

import timeit
setup = """
import random
lists = [list(range(10000)) for _ in range(1000)]  # list of lists
for l in lists:
    random.shuffle(l) # shuffle each list
shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
"""

这是我们随机排列的10000个整数的列表的结果,正如我们在此处看到的,我们已经证明了一个较旧的列表创建费用神话

Python 2.7

>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[3.75168503401801, 3.7473005310166627, 3.753129180986434]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[3.702025591977872, 3.709248117986135, 3.71071034099441]

Python 3

>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[2.797430992126465, 2.796825885772705, 2.7744789123535156]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[2.675589084625244, 2.8019039630889893, 2.849375009536743]

经过一番反馈,我决定采用具有不同特征的另一种测试是可取的。在这里,我为每次迭代1000次提供了相同的随机排序列表,其长度为100,000,而每次迭代的次数为1000。

import timeit
setup = """
import random
random.seed(0)
lst = list(range(100000))
random.shuffle(lst)
"""

我解释这种较大的差异是由Martijn提到的复制引起的,但是并不能支配到此处更老的答案中所指出的程度,此处的时间增加仅约10%

>>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
[572.919036605, 573.1384446719999, 568.5923951]
>>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
[647.0584738299999, 653.4040515829997, 657.9457361929999]

我还以较小的类型运行了上述内容,并且看到新的sorted副本版本在1000长度的长度上仍需要大约2%的运行时间。

Poke也运行了自己的代码,这是代码:

setup = '''
import random
random.seed(12122353453462456)
lst = list(range({length}))
random.shuffle(lst)
lists = [lst[:] for _ in range({repeats})]
it = iter(lists)
'''
t1 = 'l = next(it); l.sort()'
t2 = 'l = next(it); sorted(l)'
length = 10 ** 7
repeats = 10 ** 2
print(length, repeats)
for t in t1, t2:
    print(t)
    print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))

他发现长度为1000000的排序,(运行了100次)类似的结果,但时间仅增加了5%,以下是输出:

10000000 100
l = next(it); l.sort()
610.5015971539542
l = next(it); sorted(l)
646.7786222379655

结论:

sorted进行复制并进行排序的大型列表可能会主导差异,但是排序本身将主导操作,围绕这些差异组织代码将是过早的优化。我sorted需要一个新的数据排序列表list.sort时使用,而我需要就地排序列表时使用,然后由它确定我的用法。

What is the difference between sorted(list) vs list.sort()?

  • list.sort mutates the list in-place & returns None
  • sorted takes any iterable & returns a new list, sorted.

sorted is equivalent to this Python implementation, but the CPython builtin function should run measurably faster as it is written in C:

def sorted(iterable, key=None):
    new_list = list(iterable)    # make a new list
    new_list.sort(key=key)       # sort it
    return new_list              # return it

when to use which?

  • Use list.sort when you do not wish to retain the original sort order (Thus you will be able to reuse the list in-place in memory.) and when you are the sole owner of the list (if the list is shared by other code and you mutate it, you could introduce bugs where that list is used.)
  • Use sorted when you want to retain the original sort order or when you wish to create a new list that only your local code owns.

Can a list’s original positions be retrieved after list.sort()?

No – unless you made a copy yourself, that information is lost because the sort is done in-place.

“And which is faster? And how much faster?”

To illustrate the penalty of creating a new list, use the timeit module, here’s our setup:

import timeit
setup = """
import random
lists = [list(range(10000)) for _ in range(1000)]  # list of lists
for l in lists:
    random.shuffle(l) # shuffle each list
shuffled_iter = iter(lists) # wrap as iterator so next() yields one at a time
"""

And here’s our results for a list of randomly arranged 10000 integers, as we can see here, we’ve disproven an older list creation expense myth:

Python 2.7

>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[3.75168503401801, 3.7473005310166627, 3.753129180986434]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[3.702025591977872, 3.709248117986135, 3.71071034099441]

Python 3

>>> timeit.repeat("next(shuffled_iter).sort()", setup=setup, number = 1000)
[2.797430992126465, 2.796825885772705, 2.7744789123535156]
>>> timeit.repeat("sorted(next(shuffled_iter))", setup=setup, number = 1000)
[2.675589084625244, 2.8019039630889893, 2.849375009536743]

After some feedback, I decided another test would be desirable with different characteristics. Here I provide the same randomly ordered list of 100,000 in length for each iteration 1,000 times.

import timeit
setup = """
import random
random.seed(0)
lst = list(range(100000))
random.shuffle(lst)
"""

I interpret this larger sort’s difference coming from the copying mentioned by Martijn, but it does not dominate to the point stated in the older more popular answer here, here the increase in time is only about 10%

>>> timeit.repeat("lst[:].sort()", setup=setup, number = 10000)
[572.919036605, 573.1384446719999, 568.5923951]
>>> timeit.repeat("sorted(lst[:])", setup=setup, number = 10000)
[647.0584738299999, 653.4040515829997, 657.9457361929999]

I also ran the above on a much smaller sort, and saw that the new sorted copy version still takes about 2% longer running time on a sort of 1000 length.

Poke ran his own code as well, here’s the code:

setup = '''
import random
random.seed(12122353453462456)
lst = list(range({length}))
random.shuffle(lst)
lists = [lst[:] for _ in range({repeats})]
it = iter(lists)
'''
t1 = 'l = next(it); l.sort()'
t2 = 'l = next(it); sorted(l)'
length = 10 ** 7
repeats = 10 ** 2
print(length, repeats)
for t in t1, t2:
    print(t)
    print(timeit(t, setup=setup.format(length=length, repeats=repeats), number=repeats))

He found for 1000000 length sort, (ran 100 times) a similar result, but only about a 5% increase in time, here’s the output:

10000000 100
l = next(it); l.sort()
610.5015971539542
l = next(it); sorted(l)
646.7786222379655

Conclusion:

A large sized list being sorted with sorted making a copy will likely dominate differences, but the sorting itself dominates the operation, and organizing your code around these differences would be premature optimization. I would use sorted when I need a new sorted list of the data, and I would use list.sort when I need to sort a list in-place, and let that determine my usage.


回答 2

主要区别是sorted(some_list)返回一个新的list

a = [3, 2, 1]
print sorted(a) # new list
print a         # is not modified

some_list.sort()将清单排序到位

a = [3, 2, 1]
print a.sort() # in place
print a         # it's modified

请注意,由于a.sort()不返回任何内容,因此print a.sort()将打印None


list.sort()之后是否可以检索列表的原始位置?

否,因为它会修改原始列表。

The main difference is that sorted(some_list) returns a new list:

a = [3, 2, 1]
print sorted(a) # new list
print a         # is not modified

and some_list.sort(), sorts the list in place:

a = [3, 2, 1]
print a.sort() # in place
print a         # it's modified

Note that since a.sort() doesn’t return anything, print a.sort() will print None.


Can a list original positions be retrieved after list.sort()?

No, because it modifies the original list.


回答 3

.sort()函数将新列表的值直接存储在list变量中;因此,第三个问题的答案为否。另外,如果使用sorted(list)进行此操作,则可以使用它,因为它没有存储在list变量中。也有时.sort()方法充当函数,或者说它在其中接受参数。

您必须将sorted(list)的值显式存储在变量中。

同样对于短数据处理,速度没有差别。但名单很长;您应该直接使用.sort()方法进行快速工作;但是您将再次面临不可逆转的行动。

The .sort() function stores the value of new list directly in the list variable; so answer for your third question would be NO. Also if you do this using sorted(list), then you can get it use because it is not stored in the list variable. Also sometimes .sort() method acts as function, or say that it takes arguments in it.

You have to store the value of sorted(list) in a variable explicitly.

Also for short data processing the speed will have no difference; but for long lists; you should directly use .sort() method for fast work; but again you will face irreversible actions.


回答 4

以下是一些简单的示例,以了解操作上的区别:

请在此处查看数字列表:

nums = [1, 9, -3, 4, 8, 5, 7, 14]

调用sorted此列表时,sorted复制该列表。(意味着您的原始列表将保持不变。)

让我们来看看。

sorted(nums)

退货

[-3, 1, 4, 5, 7, 8, 9, 14]

纵观nums再次

nums

我们看到原始列表(未更改且未排序)。sorted没有更改原始列表

[1, 2, -3, 4, 8, 5, 7, 14]

采用相同的nums列表并对其应用sort功能,将更改实际列表。

让我们来看看。

从我们的nums列表开始,以确保内容仍然相同。

nums

[-3, 1, 4, 5, 7, 8, 9, 14]

nums.sort()

现在原始的nums列表已更改,查看nums后,我们看到原始列表已更改并进行了排序。

nums
[-3, 1, 2, 4, 5, 7, 8, 14]

Here are a few simple examples to see the difference in action:

See the list of numbers here:

nums = [1, 9, -3, 4, 8, 5, 7, 14]

When calling sorted on this list, sorted will make a copy of the list. (Meaning your original list will remain unchanged.)

Let’s see.

sorted(nums)

returns

[-3, 1, 4, 5, 7, 8, 9, 14]

Looking at the nums again

nums

We see the original list (unaltered and NOT sorted.). sorted did not change the original list

[1, 2, -3, 4, 8, 5, 7, 14]

Taking the same nums list and applying the sort function on it, will change the actual list.

Let’s see.

Starting with our nums list to make sure, the content is still the same.

nums

[-3, 1, 4, 5, 7, 8, 9, 14]

nums.sort()

Now the original nums list is changed and looking at nums we see our original list has changed and is now sorted.

nums
[-3, 1, 2, 4, 5, 7, 8, 14]

回答 5

注意:sort()和sorted()之间最简单的区别是:sort()不返回任何值,而sorted()返回可迭代的列表。

sort()不返回任何值。

sort()方法仅按给定列表的元素以特定顺序排序-升序或降序,而不返回任何值。

sort()方法的语法为:

list.sort(key=..., reverse=...)

另外,您也可以出于相同目的使用Python的内置函数sorted()。排序函数返回排序列表

 list=sorted(list, key=..., reverse=...)

Note: Simplest difference between sort() and sorted() is: sort() doesn’t return any value while, sorted() returns an iterable list.

sort() doesn’t return any value.

The sort() method just sorts the elements of a given list in a specific order – Ascending or Descending without returning any value.

The syntax of sort() method is:

list.sort(key=..., reverse=...)

Alternatively, you can also use Python’s in-built function sorted() for the same purpose. sorted function return sorted list

 list=sorted(list, key=..., reverse=...)

在Python中查找列表的中位数

问题:在Python中查找列表的中位数

您如何在Python中找到列表的中位数?该列表可以是任何大小,并且不能保证数字以任何特定顺序排列。

如果列表包含偶数个元素,则该函数应返回中间两个元素的平均值。

以下是一些示例(排序用于显示目的):

median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

How do you find the median of a list in Python? The list can be of any size and the numbers are not guaranteed to be in any particular order.

If the list contains an even number of elements, the function should return the average of the middle two.

Here are some examples (sorted for display purposes):

median([1]) == 1
median([1, 1]) == 1
median([1, 1, 2, 4]) == 1.5
median([0, 2, 5, 6, 8, 9, 9]) == 6
median([0, 0, 0, 0, 4, 4, 6, 8]) == 2

回答 0

Python 3.4具有statistics.median

返回数值数据的中位数(中间值)。

当数据点数为奇数时,返回中间数据点。当数据点的数量为偶数时,通过取两个中间值的平均值来对中位数进行插值:

>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

用法:

import statistics

items = [6, 1, 8, 2, 3]

statistics.median(items)
#>>> 3

类型也非常小心:

statistics.median(map(float, items))
#>>> 3.0

from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

Python 3.4 has statistics.median:

Return the median (middle value) of numeric data.

When the number of data points is odd, return the middle data point. When the number of data points is even, the median is interpolated by taking the average of the two middle values:

>>> median([1, 3, 5])
3
>>> median([1, 3, 5, 7])
4.0

Usage:

import statistics

items = [6, 1, 8, 2, 3]

statistics.median(items)
#>>> 3

It’s pretty careful with types, too:

statistics.median(map(float, items))
#>>> 3.0

from decimal import Decimal
statistics.median(map(Decimal, items))
#>>> Decimal('3')

回答 1

(与 ):

def median(lst):
    n = len(lst)
    s = sorted(lst)
    return (sum(s[n//2-1:n//2+1])/2.0, s[n//2])[n % 2] if n else None

>>> median([-5, -5, -3, -4, 0, -1])
-3.5

numpy.median()

>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0

对于 ,使用statistics.median

>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

(Works with ):

def median(lst):
    n = len(lst)
    s = sorted(lst)
    return (sum(s[n//2-1:n//2+1])/2.0, s[n//2])[n % 2] if n else None

>>> median([-5, -5, -3, -4, 0, -1])
-3.5

numpy.median():

>>> from numpy import median
>>> median([1, -4, -1, -1, 1, -3])
-1.0

For , use statistics.median:

>>> from statistics import median
>>> median([5, 2, 3, 8, 9, -2])
4.0

回答 2

sorted()函数对此很有帮助。使用排序功能对列表进行排序,然后简单地返回中间值(如果列表包含偶数个元素,则对两个中间值求平均值)。

def median(lst):
    sortedLst = sorted(lst)
    lstLen = len(lst)
    index = (lstLen - 1) // 2

    if (lstLen % 2):
        return sortedLst[index]
    else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0

The sorted() function is very helpful for this. Use the sorted function to order the list, then simply return the middle value (or average the two middle values if the list contains an even amount of elements).

def median(lst):
    sortedLst = sorted(lst)
    lstLen = len(lst)
    index = (lstLen - 1) // 2
   
    if (lstLen % 2):
        return sortedLst[index]
    else:
        return (sortedLst[index] + sortedLst[index + 1])/2.0

回答 3

这是一个更清洁的解决方案:

def median(lst):
    quotient, remainder = divmod(len(lst), 2)
    if remainder:
        return sorted(lst)[quotient]
    return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.

注意:答案已更改为将建议纳入注释中。

Here’s a cleaner solution:

def median(lst):
    quotient, remainder = divmod(len(lst), 2)
    if remainder:
        return sorted(lst)[quotient]
    return sum(sorted(lst)[quotient - 1:quotient + 1]) / 2.

Note: Answer changed to incorporate suggestion in comments.


回答 4

如果需要更快的平均情况运行时间,则可以尝试使用quickselect算法。O(n)尽管Quickselect 可能会O(n²)遇到糟糕的一天,但它具有平均(和最佳)的案例性能。

这是一个随机选择的实现的实现:

import random

def select_nth(n, items):
    pivot = random.choice(items)

    lesser = [item for item in items if item < pivot]
    if len(lesser) > n:
        return select_nth(n, lesser)
    n -= len(lesser)

    numequal = items.count(pivot)
    if numequal > n:
        return pivot
    n -= numequal

    greater = [item for item in items if item > pivot]
    return select_nth(n, greater)

您可以简单地将其转换为查找中位数的方法:

def median(items):
    if len(items) % 2:
        return select_nth(len(items)//2, items)

    else:
        left  = select_nth((len(items)-1) // 2, items)
        right = select_nth((len(items)+1) // 2, items)

        return (left + right) / 2

这是非常未经优化的,但即使是经过优化的版本也不太可能胜过Tim Sort(CPython的内置功能sort),因为这确实非常快。我以前尝试过,但输了。

You can try the quickselect algorithm if faster average-case running times are needed. Quickselect has average (and best) case performance O(n), although it can end up O(n²) on a bad day.

Here’s an implementation with a randomly chosen pivot:

import random

def select_nth(n, items):
    pivot = random.choice(items)

    lesser = [item for item in items if item < pivot]
    if len(lesser) > n:
        return select_nth(n, lesser)
    n -= len(lesser)

    numequal = items.count(pivot)
    if numequal > n:
        return pivot
    n -= numequal

    greater = [item for item in items if item > pivot]
    return select_nth(n, greater)

You can trivially turn this into a method to find medians:

def median(items):
    if len(items) % 2:
        return select_nth(len(items)//2, items)

    else:
        left  = select_nth((len(items)-1) // 2, items)
        right = select_nth((len(items)+1) // 2, items)

        return (left + right) / 2

This is very unoptimised, but it’s not likely that even an optimised version will outperform Tim Sort (CPython’s built-in sort) because that’s really fast. I’ve tried before and I lost.


回答 5

当然,您可以使用内置函数,但是如果您想创建自己的函数,则可以执行以下操作。这里的技巧是使用〜运算符将正数翻转为负数。例如〜2-> -3,并且在Python中对列表使用负数将从末尾开始计数。因此,如果您的mid == 2,则它将从开始处获取第三个元素,而从结尾处获取第三个元素。

def median(data):
    data.sort()
    mid = len(data) // 2
    return (data[mid] + data[~mid]) / 2

Of course you can use build in functions, but if you would like to create your own you can do something like this. The trick here is to use ~ operator that flip positive number to negative. For instance ~2 -> -3 and using negative in for list in Python will count items from the end. So if you have mid == 2 then it will take third element from beginning and third item from the end.

def median(data):
    data.sort()
    mid = len(data) // 2
    return (data[mid] + data[~mid]) / 2

回答 6

您可以使用list.sort来避免使用创建新列表sorted并在适当位置对列表进行排序。

另外,您不应使用它list作为变量名,因为它会遮盖python自己的list

def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2.0
    return l[half]

You can use the list.sort to avoid creating new lists with sorted and sort the lists in place.

Also you should not use list as a variable name as it shadows python’s own list.

def median(l):
    half = len(l) // 2
    l.sort()
    if not len(l) % 2:
        return (l[half - 1] + l[half]) / 2.0
    return l[half]

回答 7

def median(array):
    """Calculate median of the given list.
    """
    # TODO: use statistics.median in Python 3
    array = sorted(array)
    half, odd = divmod(len(array), 2)
    if odd:
        return array[half]
    return (array[half - 1] + array[half]) / 2.0
def median(array):
    """Calculate median of the given list.
    """
    # TODO: use statistics.median in Python 3
    array = sorted(array)
    half, odd = divmod(len(array), 2)
    if odd:
        return array[half]
    return (array[half - 1] + array[half]) / 2.0

回答 8

def median(x):
    x = sorted(x)
    listlength = len(x) 
    num = listlength//2
    if listlength%2==0:
        middlenum = (x[num]+x[num-1])/2
    else:
        middlenum = x[num]
    return middlenum
def median(x):
    x = sorted(x)
    listlength = len(x) 
    num = listlength//2
    if listlength%2==0:
        middlenum = (x[num]+x[num-1])/2
    else:
        middlenum = x[num]
    return middlenum

回答 9

我在“中位数中值”算法的Python实现中发布了我的解决方案,它比使用sort()快一点。我的解决方案每列使用15个数字,速度约为5N,这比每列使用5个数字的速度〜10N要快。最佳速度是〜4N,但是我可能会错。

根据Tom在评论中的要求,我在此处添加了代码,以供参考。我相信速度的关键部分是每列使用15个数字,而不是5个。

#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random


items_per_column = 15


def find_i_th_smallest( A, i ):
    t = len(A)
    if(t <= items_per_column):
        # if A is a small list with less than items_per_column items, then:
        #
        # 1. do sort on A
        # 2. find i-th smallest item of A
        #
        return sorted(A)[i]
    else:
        # 1. partition A into columns of k items each. k is odd, say 5.
        # 2. find the median of every column
        # 3. put all medians in a new list, say, B
        #
        B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]

        # 4. find M, the median of B
        #
        M = find_i_th_smallest(B, (len(B) - 1)/2)


        # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
        # 6. find which above set has A's i-th smallest, recursively.
        #
        P1 = [ j for j in A if j < M ]
        if(i < len(P1)):
            return find_i_th_smallest( P1, i)
        P3 = [ j for j in A if j > M ]
        L3 = len(P3)
        if(i < (t - L3)):
            return M
        return find_i_th_smallest( P3, i - (t - L3))


# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])


# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]


# Show the original list
#
# print L


# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]


# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

I posted my solution at Python implementation of “median of medians” algorithm , which is a little bit faster than using sort(). My solution uses 15 numbers per column, for a speed ~5N which is faster than the speed ~10N of using 5 numbers per column. The optimal speed is ~4N, but I could be wrong about it.

Per Tom’s request in his comment, I added my code here, for reference. I believe the critical part for speed is using 15 numbers per column, instead of 5.

#!/bin/pypy
#
# TH @stackoverflow, 2016-01-20, linear time "median of medians" algorithm
#
import sys, random


items_per_column = 15


def find_i_th_smallest( A, i ):
    t = len(A)
    if(t <= items_per_column):
        # if A is a small list with less than items_per_column items, then:
        #
        # 1. do sort on A
        # 2. find i-th smallest item of A
        #
        return sorted(A)[i]
    else:
        # 1. partition A into columns of k items each. k is odd, say 5.
        # 2. find the median of every column
        # 3. put all medians in a new list, say, B
        #
        B = [ find_i_th_smallest(k, (len(k) - 1)/2) for k in [A[j:(j + items_per_column)] for j in range(0,len(A),items_per_column)]]

        # 4. find M, the median of B
        #
        M = find_i_th_smallest(B, (len(B) - 1)/2)


        # 5. split A into 3 parts by M, { < M }, { == M }, and { > M }
        # 6. find which above set has A's i-th smallest, recursively.
        #
        P1 = [ j for j in A if j < M ]
        if(i < len(P1)):
            return find_i_th_smallest( P1, i)
        P3 = [ j for j in A if j > M ]
        L3 = len(P3)
        if(i < (t - L3)):
            return M
        return find_i_th_smallest( P3, i - (t - L3))


# How many numbers should be randomly generated for testing?
#
number_of_numbers = int(sys.argv[1])


# create a list of random positive integers
#
L = [ random.randint(0, number_of_numbers) for i in range(0, number_of_numbers) ]


# Show the original list
#
# print L


# This is for validation
#
# print sorted(L)[int((len(L) - 1)/2)]


# This is the result of the "median of medians" function.
# Its result should be the same as the above.
#
print find_i_th_smallest( L, (len(L) - 1) / 2)

回答 10

这是我在Codecademy练习中想出的内容:

def median(data):
    new_list = sorted(data)
    if len(new_list)%2 > 0:
        return new_list[len(new_list)/2]
    elif len(new_list)%2 == 0:
        return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0

print median([1,2,3,4,5,9])

Here what I came up with during this exercise in Codecademy:

def median(data):
    new_list = sorted(data)
    if len(new_list)%2 > 0:
        return new_list[len(new_list)/2]
    elif len(new_list)%2 == 0:
        return (new_list[(len(new_list)/2)] + new_list[(len(new_list)/2)-1]) /2.0

print median([1,2,3,4,5,9])

回答 11

中位数函数

def median(midlist):
    midlist.sort()
    lens = len(midlist)
    if lens % 2 != 0: 
        midl = (lens / 2)
        res = midlist[midl]
    else:
        odd = (lens / 2) -1
        ev = (lens / 2) 
        res = float(midlist[odd] + midlist[ev]) / float(2)
    return res

median Function

def median(midlist):
    midlist.sort()
    lens = len(midlist)
    if lens % 2 != 0: 
        midl = (lens / 2)
        res = midlist[midl]
    else:
        odd = (lens / 2) -1
        ev = (lens / 2) 
        res = float(midlist[odd] + midlist[ev]) / float(2)
    return res

回答 12

我在浮点值列表方面遇到了一些问题。我最终使用了来自python3 statistics.median的代码片段,并且可以完美地处理没有导入的浮点值。资源

def calculateMedian(list):
    data = sorted(list)
    n = len(data)
    if n == 0:
        return None
    if n % 2 == 1:
        return data[n // 2]
    else:
        i = n // 2
        return (data[i - 1] + data[i]) / 2

I had some problems with lists of float values. I ended up using a code snippet from the python3 statistics.median and is working perfect with float values without imports. source

def calculateMedian(list):
    data = sorted(list)
    n = len(data)
    if n == 0:
        return None
    if n % 2 == 1:
        return data[n // 2]
    else:
        i = n // 2
        return (data[i - 1] + data[i]) / 2

回答 13

def midme(list1):

    list1.sort()
    if len(list1)%2>0:
            x = list1[int((len(list1)/2))]
    else:
            x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
    return x


midme([4,5,1,7,2])
def midme(list1):

    list1.sort()
    if len(list1)%2>0:
            x = list1[int((len(list1)/2))]
    else:
            x = ((list1[int((len(list1)/2))-1])+(list1[int(((len(list1)/2)))]))/2
    return x


midme([4,5,1,7,2])

回答 14

我为数字列表定义了一个中位数函数为

def median(numbers):
    return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

I defined a median function for a list of numbers as

def median(numbers):
    return (sorted(numbers)[int(round((len(numbers) - 1) / 2.0))] + sorted(numbers)[int(round((len(numbers) - 1) // 2.0))]) / 2.0

回答 15

def median(array):
    if len(array) < 1:
        return(None)
    if len(array) % 2 == 0:
        median = (array[len(array)//2-1: len(array)//2+1])
        return sum(median) / len(median)
    else:
        return(array[len(array)//2])
def median(array):
    if len(array) < 1:
        return(None)
    if len(array) % 2 == 0:
        median = (array[len(array)//2-1: len(array)//2+1])
        return sum(median) / len(median)
    else:
        return(array[len(array)//2])

回答 16

功能中位数:

def median(d):
    d=np.sort(d)
    n2=int(len(d)/2)
    r=n2%2
    if (r==0):
        med=d[n2] 
    else:
        med=(d[n2] + data[m+1]) / 2
    return med

fuction median:

def median(d):
    d=np.sort(d)
    n2=int(len(d)/2)
    r=n2%2
    if (r==0):
        med=d[n2] 
    else:
        med=(d[n2] + data[m+1]) / 2
    return med

回答 17

如果您需要有关列表分配的其他信息,则百分比方法可能会很有用。中值对应于列表的第50个百分位数:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value 

In case you need additional information on the distribution of your list, the percentile method will probably be useful. And a median value corresponds to the 50th percentile of a list:

import numpy as np
a = np.array([1,2,3,4,5,6,7,8,9])
median_value = np.percentile(a, 50) # return 50th percentile
print median_value 

回答 18

import numpy as np
def get_median(xs):
        mid = len(xs) // 2  # Take the mid of the list
        if len(xs) % 2 == 1: # check if the len of list is odd
            return sorted(xs)[mid] #if true then mid will be median after sorting
        else:
            #return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
            return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))
import numpy as np
def get_median(xs):
        mid = len(xs) // 2  # Take the mid of the list
        if len(xs) % 2 == 1: # check if the len of list is odd
            return sorted(xs)[mid] #if true then mid will be median after sorting
        else:
            #return 0.5 * sum(sorted(xs)[mid - 1:mid + 1])
            return 0.5 * np.sum(sorted(xs)[mid - 1:mid + 1]) #if false take the avg of mid
print(get_median([7, 7, 3, 1, 4, 5]))
print(get_median([1,2,3, 4,5]))

回答 19

中位数(和百分位数)的更通用方法是:

def get_percentile(data, percentile):
    # Get the number of observations
    cnt=len(data)
    # Sort the list
    data=sorted(data)
    # Determine the split point
    i=(cnt-1)*percentile
    # Find the `floor` of the split point
    diff=i-int(i)
    # Return the weighted average of the value above and below the split point
    return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)

# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4

# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04

A more generalized approach for median (and percentiles) would be:

def get_percentile(data, percentile):
    # Get the number of observations
    cnt=len(data)
    # Sort the list
    data=sorted(data)
    # Determine the split point
    i=(cnt-1)*percentile
    # Find the `floor` of the split point
    diff=i-int(i)
    # Return the weighted average of the value above and below the split point
    return data[int(i)]*(1-diff)+data[int(i)+1]*(diff)

# Data
data=[1,2,3,4,5]
# For the median
print(get_percentile(data=data, percentile=.50))
# > 3
print(get_percentile(data=data, percentile=.75))
# > 4

# Note the weighted average difference when an int is not returned by the percentile
print(get_percentile(data=data, percentile=.51))
# > 3.04


回答 20

一个简单的函数返回给定列表的中位数:

def median(lsts):
        if len(lsts)%2 == 0:  #Checking if the length is even
            return (lsts[len(lsts)//2] + lsts[(len(lsts) - 1) //2]) //2 # Applying formula which is sum of middle two divided by 2
            
        else:
            return lsts[len(lsts)//2] # If length is odd then get middle value
            
        
median([2,3,5,6,10]) #Calling function

如果您想使用库,则只需做一下即可;

import statistics

statistics.median([9, 12, 20, 21, 34, 80])

A simple function to return the median of the given list:

def median(lst):
    lst.sort()  # Sort the list first
    if len(lst) % 2 == 0:  # Checking if the length is even
        # Applying formula which is sum of middle two divided by 2
        return (lst[len(lst) // 2] + lst[(len(lst) - 1) // 2]) / 2
    else:
        # If length is odd then get middle value
        return lst[len(lst) // 2]

Some examples with the medain function:

>>> median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> median([9, 12, 80, 21, 34])  # Odd
21

If you want to use library you can just simply do:

>>> import statistics
>>> statistics.median([9, 12, 20, 21, 34, 80])  # Even
20.5
>>> statistics.median([9, 12, 80, 21, 34])  # Odd
21

回答 21

这是不使用median函数来查找中位数的乏味方法:

def median(*arg):
    order(arg)
    numArg = len(arg)
    half = int(numArg/2)
    if numArg/2 ==half:
        print((arg[half-1]+arg[half])/2)
    else:
        print(int(arg[half]))

def order(tup):
    ordered = [tup[i] for i in range(len(tup))]
    test(ordered)
    while(test(ordered)):
        test(ordered)
    print(ordered)


def test(ordered):
    whileloop = 0 
    for i in range(len(ordered)-1):
        print(i)
        if (ordered[i]>ordered[i+1]):
            print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
            original = ordered[i+1]
            ordered[i+1]=ordered[i]
            ordered[i]=original
            whileloop = 1 #run the loop again if you had to switch values
    return whileloop

Here’s the tedious way to find median without using the median function:

def median(*arg):
    order(arg)
    numArg = len(arg)
    half = int(numArg/2)
    if numArg/2 ==half:
        print((arg[half-1]+arg[half])/2)
    else:
        print(int(arg[half]))

def order(tup):
    ordered = [tup[i] for i in range(len(tup))]
    test(ordered)
    while(test(ordered)):
        test(ordered)
    print(ordered)


def test(ordered):
    whileloop = 0 
    for i in range(len(ordered)-1):
        print(i)
        if (ordered[i]>ordered[i+1]):
            print(str(ordered[i]) + ' is greater than ' + str(ordered[i+1]))
            original = ordered[i+1]
            ordered[i+1]=ordered[i]
            ordered[i]=original
            whileloop = 1 #run the loop again if you had to switch values
    return whileloop

回答 22

这很简单;

def median(alist):
    #to find median you will have to sort the list first
    sList = sorted(alist)
    first = 0
    last = len(sList)-1
    midpoint = (first + last)//2
    return midpoint

你可以这样使用返回值 median = median(anyList)

It is very simple;

def median(alist):
    #to find median you will have to sort the list first
    sList = sorted(alist)
    first = 0
    last = len(sList)-1
    midpoint = (first + last)//2
    return midpoint

And you can use the return value like this median = median(anyList)


如何按两列或更多列对python pandas中的dataFrame进行排序?

问题:如何按两列或更多列对python pandas中的dataFrame进行排序?

假设我有一个数据列a,其中包含,bc,我想b按升序按列对数据帧进行排序,然后按c降序按列对数据帧进行排序,我该怎么做?

Suppose I have a dataframe with columns a, b and c, I want to sort the dataframe by column b in ascending order, and by column c in descending order, how do I do this?


回答 0

从0.17.0版本开始,sort不推荐使用该方法,而推荐使用sort_valuessort在0.20.0版本中被完全删除。参数(和结果)保持不变:

df.sort_values(['a', 'b'], ascending=[True, False])

您可以使用的升序参数sort

df.sort(['a', 'b'], ascending=[True, False])

例如:

In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])

In [12]: df1.sort(['a', 'b'], ascending=[True, False])
Out[12]:
   a  b
2  1  4
7  1  3
1  1  2
3  1  2
4  3  2
6  4  4
0  4  3
9  4  3
5  4  1
8  4  1

正如@renadeen所评论

默认情况下,排序不正确!因此,您应该将sort方法的结果分配给变量,或者将inplace = True添加到方法调用中。

也就是说,如果您想将df1用作已排序的DataFrame:

df1 = df1.sort(['a', 'b'], ascending=[True, False])

要么

df1.sort(['a', 'b'], ascending=[True, False], inplace=True)

As of the 0.17.0 release, the sort method was deprecated in favor of sort_values. sort was completely removed in the 0.20.0 release. The arguments (and results) remain the same:

df.sort_values(['a', 'b'], ascending=[True, False])

You can use the ascending argument of sort:

df.sort(['a', 'b'], ascending=[True, False])

For example:

In [11]: df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])

In [12]: df1.sort(['a', 'b'], ascending=[True, False])
Out[12]:
   a  b
2  1  4
7  1  3
1  1  2
3  1  2
4  3  2
6  4  4
0  4  3
9  4  3
5  4  1
8  4  1

As commented by @renadeen

Sort isn’t in place by default! So you should assign result of the sort method to a variable or add inplace=True to method call.

that is, if you want to reuse df1 as a sorted DataFrame:

df1 = df1.sort(['a', 'b'], ascending=[True, False])

or

df1.sort(['a', 'b'], ascending=[True, False], inplace=True)

回答 1

从熊猫0.17.0开始,DataFrame.sort()已弃用该熊猫,并将其设置为在以后的熊猫版本中将其删除。现在,按值对数据框进行排序的方法是DataFrame.sort_values

因此,您问题的答案现在是

df.sort_values(['b', 'c'], ascending=[True, False], inplace=True)

As of pandas 0.17.0, DataFrame.sort() is deprecated, and set to be removed in a future version of pandas. The way to sort a dataframe by its values is now is DataFrame.sort_values

As such, the answer to your question would now be

df.sort_values(['b', 'c'], ascending=[True, False], inplace=True)

回答 2

对于数字数据的大型数据框,您可能会通过看到显着的性能改进numpy.lexsort,该方法使用一系列键执行间接排序:

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])
df1 = pd.concat([df1]*100000)

def pdsort(df1):
    return df1.sort_values(['a', 'b'], ascending=[True, False])

def lex(df1):
    arr = df1.values
    return pd.DataFrame(arr[np.lexsort((-arr[:, 1], arr[:, 0]))])

assert (pdsort(df1).values == lex(df1).values).all()

%timeit pdsort(df1)  # 193 ms per loop
%timeit lex(df1)     # 143 ms per loop

一个特殊之处是定义的排序顺序numpy.lexsort颠倒了:首先(-'b', 'a')按系列排序a。我们否定级数b以反映我们希望该级数降序排列。

请注意,np.lexsort仅使用数字值排序,而同时pd.DataFrame.sort_values使用字符串或数字值。np.lexsort与字符串一起使用将给出:TypeError: bad operand type for unary -: 'str'

For large dataframes of numeric data, you may see a significant performance improvement via numpy.lexsort, which performs an indirect sort using a sequence of keys:

import pandas as pd
import numpy as np

np.random.seed(0)

df1 = pd.DataFrame(np.random.randint(1, 5, (10,2)), columns=['a','b'])
df1 = pd.concat([df1]*100000)

def pdsort(df1):
    return df1.sort_values(['a', 'b'], ascending=[True, False])

def lex(df1):
    arr = df1.values
    return pd.DataFrame(arr[np.lexsort((-arr[:, 1], arr[:, 0]))])

assert (pdsort(df1).values == lex(df1).values).all()

%timeit pdsort(df1)  # 193 ms per loop
%timeit lex(df1)     # 143 ms per loop

One peculiarity is that the defined sorting order with numpy.lexsort is reversed: (-'b', 'a') sorts by series a first. We negate series b to reflect we want this series in descending order.

Be aware that np.lexsort only sorts with numeric values, while pd.DataFrame.sort_values works with either string or numeric values. Using np.lexsort with strings will give: TypeError: bad operand type for unary -: 'str'.


如何从一列中排序熊猫数据框

问题:如何从一列中排序熊猫数据框

我有一个像这样的数据框:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

如您所见,月份不是按日历顺序排列的。因此,我创建了第二列以获取与每月(1-12)相对应的月份号。从那里,如何根据日历月的顺序对数据框进行排序?

I have a data frame like this:

print(df)

        0          1     2
0   354.7      April   4.0
1    55.4     August   8.0
2   176.5   December  12.0
3    95.5   February   2.0
4    85.6    January   1.0
5     152       July   7.0
6   238.7       June   6.0
7   104.8      March   3.0
8   283.5        May   5.0
9   278.8   November  11.0
10  249.6    October  10.0
11  212.7  September   9.0

As you can see, months are not in calendar order. So I created a second column to get the month number corresponding to each month (1-12). From there, how can I sort this data frame according to calendar months’ order?


回答 0

用于sort_values按特定列的值对df进行排序:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

如果要按两列进行排序sort_values,请将列标签列表传递给,并根据排序优先级对列标签进行排序。如果使用df.sort_values(['2', '0']),结果将按列2然后按列排序0。当然,对于这个示例,这实际上没有任何意义,因为其中的每个值df['2']都是唯一的。

Use sort_values to sort the df by a specific column’s values:

In [18]:
df.sort_values('2')

Out[18]:
        0          1     2
4    85.6    January   1.0
3    95.5   February   2.0
7   104.8      March   3.0
0   354.7      April   4.0
8   283.5        May   5.0
6   238.7       June   6.0
5   152.0       July   7.0
1    55.4     August   8.0
11  212.7  September   9.0
10  249.6    October  10.0
9   278.8   November  11.0
2   176.5   December  12.0

If you want to sort by two columns, pass a list of column labels to sort_values with the column labels ordered according to sort priority. If you use df.sort_values(['2', '0']), the result would be sorted by column 2 then column 0. Granted, this does not really make sense for this example because each value in df['2'] is unique.


回答 1

我尝试了上述解决方案,但没有取得结果,因此我找到了一个对我有用的解决方案。该升=假是订购数据框在递减顺序,默认为真。我正在使用python 3.6.6和pandas 0.23.4版本。

final_df = df.sort_values(by=['2'], ascending=False)

您可以在此处查看pandas文档中的更多详细信息。

I tried the solutions above and I do not achieve results, so I found a different solution that works for me. The ascending=False is to order the dataframe in descending order, by default it is True. I am using python 3.6.6 and pandas 0.23.4 versions.

final_df = df.sort_values(by=['2'], ascending=False)

You can see more details in pandas documentation here.


回答 2

只是添加一些对数据的操作。假设我们有一个数据框df,我们可以执行几个操作以获得所需的输出

ID         cost      tax    label
1       216590      1600    test      
2       523213      1800    test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

sorted标签输出作为dataframe

    index   label
0   test        2
1   experiment  1

Just adding some more operations on data. Suppose we have a dataframe df, we can do several operations to get desired outputs

ID         cost      tax    label
1       216590      1600    test      
2       523213      1800    test 
3          250      1500    experiment

(df['label'].value_counts().to_frame().reset_index()).sort_values('label', ascending=False)

will give sorted output of labels as a dataframe

    index   label
0   test        2
1   experiment  1

回答 3

就像另一个解决方案:

您可以对字符串数据(月份名称)进行分类并按如下方式进行排序:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

它将month name按照您在创建Categorical对象时指定的顺序为您提供排序的数据。

Just as another solution:

Instead of creating the second column, you can categorize your string data(month name) and sort by that like this:

df.rename(columns={1:'month'},inplace=True)
df['month'] = pd.Categorical(df['month'],categories=['December','November','October','September','August','July','June','May','April','March','February','January'],ordered=True)
df = df.sort_values('month',ascending=False)

It will give you the ordered data by month name as you specified while creating the Categorical object.


如何按内部列表的特定索引对列表列表进行排序?

问题:如何按内部列表的特定索引对列表列表进行排序?

我有一个清单清单。例如,

[
[0,1,'f'],
[4,2,'t'],
[9,4,'afsd']
]

如果我想通过内部列表的字符串字段对外部列表进行排序,那么您将如何在python中执行此操作?

I have a list of lists. For example,

[
[0,1,'f'],
[4,2,'t'],
[9,4,'afsd']
]

If I wanted to sort the outer list by the string field of the inner lists, how would you do that in python?


回答 0

这是itemgetter的工作

>>> from operator import itemgetter
>>> L=[[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>> sorted(L, key=itemgetter(2))
[[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]

也可以在此处使用lambda函数,但是在这种简单情况下,lambda函数的运行速度较慢

This is a job for itemgetter

>>> from operator import itemgetter
>>> L=[[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>> sorted(L, key=itemgetter(2))
[[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]

It is also possible to use a lambda function here, however the lambda function is slower in this simple case


回答 1

到位

>>> l = [[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>> l.sort(key=lambda x: x[2])

使用排序不到位:

>>> sorted(l, key=lambda x: x[2])

in place

>>> l = [[0, 1, 'f'], [4, 2, 't'], [9, 4, 'afsd']]
>>> l.sort(key=lambda x: x[2])

not in place using sorted:

>>> sorted(l, key=lambda x: x[2])

回答 2

Itemgetter使您可以按多个条件/列进行排序:

sorted_list = sorted(list_to_sort, key=itemgetter(2,0,1))

Itemgetter lets you to sort by multiple criteria / columns:

sorted_list = sorted(list_to_sort, key=itemgetter(2,0,1))

回答 3

也可以通过lambda函数实现多个条件

sorted_list = sorted(list_to_sort, key=lambda x: (x[1], x[0]))

multiple criteria can also be implemented through lambda function

sorted_list = sorted(list_to_sort, key=lambda x: (x[1], x[0]))

回答 4

array.sort(key = lambda x:x[1])

您可以使用此代码段轻松进行排序,其中1是元素的索引。

array.sort(key = lambda x:x[1])

You can easily sort using this snippet, where 1 is the index of the element.


回答 5

像这样:

import operator
l = [...]
sorted_list = sorted(l, key=operator.itemgetter(desired_item_index))

Like this:

import operator
l = [...]
sorted_list = sorted(l, key=operator.itemgetter(desired_item_index))

回答 6

我认为lambda函数可以解决您的问题。

old_list = [[0,1,'f'], [4,2,'t'],[9,4,'afsd']]

#let's assume we want to sort lists by last value ( old_list[2] )
new_list = sorted(old_list, key=lambda x: x[2])

#Resulst of new_list will be:

[[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]

I think lambda function can solve your problem.

old_list = [[0,1,'f'], [4,2,'t'],[9,4,'afsd']]

#let's assume we want to sort lists by last value ( old_list[2] )
new_list = sorted(old_list, key=lambda x: x[2])

#Resulst of new_list will be:

[[9, 4, 'afsd'], [0, 1, 'f'], [4, 2, 't']]

回答 7

**old_list = [[0,1,'f'], [4,2,'t'],[9,4,'afsd']]
    #let's assume we want to sort lists by last value ( old_list[2] )
    new_list = sorted(old_list, key=lambda x: x[2])**

如果我错了,请纠正我,但不是’x [2]’调用列表中的第三项,而不是嵌套列表中的第三项吗?应该是x [2] [2]吗?

**old_list = [[0,1,'f'], [4,2,'t'],[9,4,'afsd']]
    #let's assume we want to sort lists by last value ( old_list[2] )
    new_list = sorted(old_list, key=lambda x: x[2])**

correct me if i’m wrong but isnt the ‘x[2]’ calling the 3rd item in the list, not the 3rd item in the nested list? should it be x[2][2]?


回答 8

更容易理解(Lambda实际在做什么):

ls2=[[0,1,'f'],[4,2,'t'],[9,4,'afsd']]
def thirdItem(ls):
    #return the third item of the list
    return ls[2]
#Sort according to what the thirdItem function return 
ls2.sort(key=thirdItem)

More easy to understand (What is Lambda actually doing):

ls2=[[0,1,'f'],[4,2,'t'],[9,4,'afsd']]
def thirdItem(ls):
    #return the third item of the list
    return ls[2]
#Sort according to what the thirdItem function return 
ls2.sort(key=thirdItem)

回答 9

排序多维数组在这里执行

arr=[[2,1],[1,2],[3,5],[4,5],[3,1],[5,2],[3,8],[1,9],[1,3]]



arr.sort(key=lambda x:x[0])
la=set([i[0] for i in Points])

for i in la:
    tempres=list()
    for j in arr:
        if j[0]==i:
            tempres.append(j[1])

    for j in sorted(tempres,reverse=True):
        print(i,j)

Sorting a Multidimensional Array execute here

arr=[[2,1],[1,2],[3,5],[4,5],[3,1],[5,2],[3,8],[1,9],[1,3]]



arr.sort(key=lambda x:x[0])
la=set([i[0] for i in Points])

for i in la:
    tempres=list()
    for j in arr:
        if j[0]==i:
            tempres.append(j[1])

    for j in sorted(tempres,reverse=True):
        print(i,j)

Django order_by查询集,升序和降序

问题:Django order_by查询集,升序和降序

如何按日期降序对django中的查询集进行排序?

Reserved.objects.all().filter(client=client_id).order_by('check_in')

我只想从所有的check_in日期按降序过滤。

How can I order by descending my query set in django by date?

Reserved.objects.all().filter(client=client_id).order_by('check_in')

I just want to filter from descending all the Reserved by check_in date.


回答 0

Reserved.objects.filter(client=client_id).order_by('-check_in')

注意-之前check_in

Django文档

Reserved.objects.filter(client=client_id).order_by('-check_in')

Notice the - before check_in.

Django Documentation


回答 1

Reserved.objects.filter(client=client_id).order_by('-check_in')

“ check_in”前面的连字符“-”表示降序。隐含升序。

我们不必在filter()之前添加all()。那仍然可以工作,但是只需要从根QuerySet中获取所有对象时就需要添加all()。

此处的更多信息:https : //docs.djangoproject.com/en/dev/topics/db/queries/#retrieving-specific-objects-with-filters

Reserved.objects.filter(client=client_id).order_by('-check_in')

A hyphen “-” in front of “check_in” indicates descending order. Ascending order is implied.

We don’t have to add an all() before filter(). That would still work, but you only need to add all() when you want all objects from the root QuerySet.

More on this here: https://docs.djangoproject.com/en/dev/topics/db/queries/#retrieving-specific-objects-with-filters


回答 2

您还可以使用以下说明:

Reserved.objects.filter(client=client_id).order_by('check_in').reverse()

You can also use the following instruction:

Reserved.objects.filter(client=client_id).order_by('check_in').reverse()

回答 3

对于升序:

Reserved.objects.filter(client=client_id).order_by('check_in')

对于降序:

1.  Reserved.objects.filter(client=client_id).order_by('-check_in')

要么

2.  Reserved.objects.filter(client=client_id).order_by('check_in')[::-1]

for ascending order:

Reserved.objects.filter(client=client_id).order_by('check_in')

for descending order:

1.  Reserved.objects.filter(client=client_id).order_by('-check_in')

or

2.  Reserved.objects.filter(client=client_id).order_by('check_in')[::-1]

回答 4

它可以删除.all()

Reserved.objects.filter(client=client_id).order_by('-check_in')

It works removing .all():

Reserved.objects.filter(client=client_id).order_by('-check_in')

回答 5

添加-将以降序排列。您还可以通过向模型的元数据添加默认顺序来进行设置。这意味着当您执行查询时,只需执行MyModel.objects.all(),它将以正确的顺序显示。

class MyModel(models.Model):

    check_in = models.DateField()

    class Meta:
        ordering = ('-check_in',)

Adding the – will order it in descending order. You can also set this by adding a default ordering to the meta of your model. This will mean that when you do a query you just do MyModel.objects.all() and it will come out in the correct order.

class MyModel(models.Model):

    check_in = models.DateField()

    class Meta:
        ordering = ('-check_in',)

回答 6

  1. 升序

    Reserved.objects.all().filter(client=client_id).order_by('check_in')
  2. 降序

    Reserved.objects.all().filter(client=client_id).order_by('-check_in')

- (连字符)用于指示降序。

  1. Ascending order

    Reserved.objects.all().filter(client=client_id).order_by('check_in')
    
  2. Descending order

    Reserved.objects.all().filter(client=client_id).order_by('-check_in')
    

- (hyphen) is used to indicate descending order here.


回答 7

这对我有用。

latestsetuplist = SetupTemplate.objects.order_by('-creationTime')[:10][::1]

This is working for me.

latestsetuplist = SetupTemplate.objects.order_by('-creationTime')[:10][::1]

回答 8

67

Reserved.objects.filter(client = client_id).order_by(’-check_in’)

“-”表示降序,对于升序,仅提供类属性

67

Reserved.objects.filter(client=client_id).order_by(‘-check_in’)

‘-‘ is indicates Descending order and for Ascending order just give class attribute


是否有用于字符串自然排序的内置函数?

问题:是否有用于字符串自然排序的内置函数?

使用Python 3.x,我有一个要对其执行自然字母排序的字符串列表。

自然排序: Windows中文件的排序顺序。

例如,以下列表是自然排序的(我想要的):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

这是上面列表的“排序”版本(我所拥有的):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

我正在寻找一种类似于第一个的排序功能。

I have a list of strings for which I would like to perform a natural alphabetical sort.

For instance, the following list is naturally sorted (what I want):

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

And here’s the “sorted” version of the above list (what I get using sorted()):

['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

I’m looking for a sort function which behaves like the first one.


回答 0

在PyPI上有一个名为natsort的第三方库(完整披露,我是软件包的作者)。对于您的情况,可以执行以下任一操作:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

您应该注意,它natsort使用通用算法,因此它几乎可以处理您向其抛出的任何输入。如果您想了解为什么选择一个库而不是滚动自己的函数的更多详细信息,请查阅natsort文档的“ 如何工作”页面,尤其是“ 到处都是特殊情况”!部分。


如果您需要排序键而不是排序功能,请使用以下公式之一。

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

There is a third party library for this on PyPI called natsort (full disclosure, I am the package’s author). For your case, you can do either of the following:

>>> from natsort import natsorted, ns
>>> x = ['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']
>>> natsorted(x, key=lambda y: y.lower())
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsorted(x, alg=ns.IGNORECASE)  # or alg=ns.IC
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

You should note that natsort uses a general algorithm so it should work for just about any input that you throw at it. If you want more details on why you might choose a library to do this rather than rolling your own function, check out the natsort documentation’s How It Works page, in particular the Special Cases Everywhere! section.


If you need a sorting key instead of a sorting function, use either of the below formulas.

>>> from natsort import natsort_keygen, ns
>>> l1 = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> l2 = l1[:]
>>> natsort_key1 = natsort_keygen(key=lambda y: y.lower())
>>> l1.sort(key=natsort_key1)
>>> l1
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> natsort_key2 = natsort_keygen(alg=ns.IGNORECASE)
>>> l2.sort(key=natsort_key2)
>>> l2
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

回答 1

试试这个:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

输出:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

从此处改编的代码:为人类分类:自然排序顺序

Try this:

import re

def natural_sort(l): 
    convert = lambda text: int(text) if text.isdigit() else text.lower() 
    alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ] 
    return sorted(l, key = alphanum_key)

Output:

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

Code adapted from here: Sorting for Humans : Natural Sort Order.


回答 2

这是Mark Byer答案的更多Python版本:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]    

现在,这个功能可以作为在使用它的任何功能,就象是一把钥匙list.sortsortedmax,等。

作为lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

Here’s a much more pythonic version of Mark Byer’s answer:

import re

def natural_sort_key(s, _nsre=re.compile('([0-9]+)')):
    return [int(text) if text.isdigit() else text.lower()
            for text in _nsre.split(s)]    

Now this function can be used as a key in any function that uses it, like list.sort, sorted, max, etc.

As a lambda:

lambda s: [int(t) if t.isdigit() else t.lower() for t in re.split('(\d+)', s)]

回答 3

我基于http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html编写了一个函数,该函数增加了仍可以传递您自己的“键”参数的功能。我需要这样做,以便执行包含更复杂对象(不仅仅是字符串)的自然列表。

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

例如:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]

I wrote a function based on http://www.codinghorror.com/blog/2007/12/sorting-for-humans-natural-sort-order.html which adds the ability to still pass in your own ‘key’ parameter. I need this in order to perform a natural sort of lists that contain more complex objects (not just strings).

import re

def natural_sort(list, key=lambda s:s):
    """
    Sort the list into natural alphanumeric order.
    """
    def get_alphanum_key_func(key):
        convert = lambda text: int(text) if text.isdigit() else text 
        return lambda s: [convert(c) for c in re.split('([0-9]+)', key(s))]
    sort_key = get_alphanum_key_func(key)
    list.sort(key=sort_key)

For example:

my_list = [{'name':'b'}, {'name':'10'}, {'name':'a'}, {'name':'1'}, {'name':'9'}]
natural_sort(my_list, key=lambda x: x['name'])
print my_list
[{'name': '1'}, {'name': '9'}, {'name': '10'}, {'name': 'a'}, {'name': 'b'}]

回答 4

data = ['elm13', 'elm9', 'elm0', 'elm1', 'Elm11', 'Elm2', 'elm10']

让我们分析数据。所有元素的数字容量为2。公共文字部分共有3个字母'elm'

因此,元素的最大长度为5。我们可以增加此值以确保(例如,增加到8)。

牢记这一点,我们有一个单线解决方案:

data.sort(key=lambda x: '{0:0>8}'.format(x).lower())

没有正则表达式和外部库!

print(data)

>>> ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'elm13']

说明:

for elm in data:
    print('{0:0>8}'.format(elm).lower())

>>>
0000elm0
0000elm1
0000elm2
0000elm9
000elm10
000elm11
000elm13
data = ['elm13', 'elm9', 'elm0', 'elm1', 'Elm11', 'Elm2', 'elm10']

Let’s analyse the data. The digit capacity of all elements is 2. And there are 3 letters in common literal part 'elm'.

So, the maximal length of element is 5. We can increase this value to make sure (for example, to 8).

Bearing that in mind, we’ve got a one-line solution:

data.sort(key=lambda x: '{0:0>8}'.format(x).lower())

without regular expressions and external libraries!

print(data)

>>> ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'elm13']

Explanation:

for elm in data:
    print('{0:0>8}'.format(elm).lower())

>>>
0000elm0
0000elm1
0000elm2
0000elm9
000elm10
000elm11
000elm13

回答 5

鉴于:

data=['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

类似于SergO的解决方案,没有外部库1-liner将是

data.sort(key=lambda x : int(x[3:]))

要么

sorted_data=sorted(data, key=lambda x : int(x[3:]))

说明:

此解决方案使用排序关键功能来定义将用于排序的功能。因为我们知道每个数据条目都以’elm’开头,所以排序函数会将字符串中第三个字符之后的部分(即int(x [3:]))转换为整数。如果数据的数字部分位于其他位置,则函数的这一部分将必须更改。

干杯

Given:

data=['Elm11', 'Elm12', 'Elm2', 'elm0', 'elm1', 'elm10', 'elm13', 'elm9']

Similar to SergO’s solution, a 1-liner without external libraries would be:

data.sort(key=lambda x : int(x[3:]))

or

sorted_data=sorted(data, key=lambda x : int(x[3:]))

Explanation:

This solution uses the key feature of sort to define a function that will be employed for the sorting. Because we know that every data entry is preceded by ‘elm’ the sorting function converts to integer the portion of the string after the 3rd character (i.e. int(x[3:])). If the numerical part of the data is in a different location, then this part of the function would have to change.

Cheers


回答 6

现在, 有了更多*优雅(pythonic )的功能,只需触摸一下

有很多实现,尽管有些实现已经接近,但没有一个实现能够完全抓住现代python提供的优雅。

  • 使用python(3.5.1)测试
  • 包括一个附加列表,以证明当数字为中间字符串时它可以工作
  • 但是,没有测试,我假设如果您的列表很大,那么事先编译正则表达式会更有效
    • 如果这是错误的假设,我敢肯定有人会纠正我

速成
from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
全码
#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']

使用时的注意事项

  • from os.path import split
    • 您将需要区分进口

灵感来自

And now for something more* elegant (pythonic) -just a touch

There are many implementations out there, and while some have come close, none quite captured the elegance modern python affords.

  • Tested using python(3.5.1)
  • Included an additional list to demonstrate that it works when the numbers are mid string
  • Didn’t test, however, I am assuming that if your list was sizable it would be more efficient to compile the regex beforehand
    • I’m sure someone will correct me if this is an erroneous assumption

Quicky
from re import compile, split    
dre = compile(r'(\d+)')
mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
Full-Code
#!/usr/bin/python3
# coding=utf-8
"""
Natural-Sort Test
"""

from re import compile, split

dre = compile(r'(\d+)')
mylist = ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13', 'elm']
mylist2 = ['e0lm', 'e1lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm', 'e01lm']

mylist.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])
mylist2.sort(key=lambda l: [int(s) if s.isdigit() else s.lower() for s in split(dre, l)])

print(mylist)  
  # ['elm', 'elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
print(mylist2)  
  # ['e0lm', 'e1lm', 'e01lm', 'E2lm', 'e9lm', 'e10lm', 'E12lm', 'e13lm', 'elm']

Caution when using

  • from os.path import split
    • you will need to differentiate the imports

Inspiration from


回答 7

这篇文章的价值

我的观点是提供可以普遍应用的非正则表达式解决方案。
我将创建三个函数:

  1. find_first_digit我从@AnuragUniyal借来的。它将查找字符串中第一个数字或非数字的位置。
  2. split_digits这是一个生成器,用于将字符串分成数字和非数字块。yield如果是数字,它也将是整数。
  3. natural_key只是包装split_digits成一个tuple。这是我们作为一个关键的使用sortedmaxmin

功能

def find_first_digit(s, non=False):
    for i, x in enumerate(s):
        if x.isdigit() ^ non:
            return i
    return -1

def split_digits(s, case=False):
    non = True
    while s:
        i = find_first_digit(s, non)
        if i == 0:
            non = not non
        elif i == -1:
            yield int(s) if s.isdigit() else s if case else s.lower()
            s = ''
        else:
            x, s = s[:i], s[i:]
            yield int(x) if x.isdigit() else x if case else x.lower()

def natural_key(s, *args, **kwargs):
    return tuple(split_digits(s, *args, **kwargs))

我们可以看到它是通用的,因为我们可以有多个数字块:

# Note that the key has lower case letters
natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')

('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

或区分大小写:

natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)

('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

我们可以看到它以适当的顺序对OP的列表进行了排序

sorted(
    ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
    key=natural_key
)

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

但是它也可以处理更复杂的列表:

sorted(
    ['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
    key=natural_key
)

['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']

我的正则表达式将是

def int_maybe(x):
    return int(x) if str(x).isdigit() else x

def split_digits_re(s, case=False):
    parts = re.findall('\d+|\D+', s)
    if not case:
        return map(int_maybe, (x.lower() for x in parts))
    else:
        return map(int_maybe, parts)
    
def natural_key_re(s, *args, **kwargs):
    return tuple(split_digits_re(s, *args, **kwargs))

Value Of This Post

My point is to offer a non regex solution that can be applied generally.
I’ll create three functions:

  1. find_first_digit which I borrowed from @AnuragUniyal. It will find the position of the first digit or non-digit in a string.
  2. split_digits which is a generator that picks apart a string into digit and non digit chunks. It will also yield integers when it is a digit.
  3. natural_key just wraps split_digits into a tuple. This is what we use as a key for sorted, max, min.

Functions

def find_first_digit(s, non=False):
    for i, x in enumerate(s):
        if x.isdigit() ^ non:
            return i
    return -1

def split_digits(s, case=False):
    non = True
    while s:
        i = find_first_digit(s, non)
        if i == 0:
            non = not non
        elif i == -1:
            yield int(s) if s.isdigit() else s if case else s.lower()
            s = ''
        else:
            x, s = s[:i], s[i:]
            yield int(x) if x.isdigit() else x if case else x.lower()

def natural_key(s, *args, **kwargs):
    return tuple(split_digits(s, *args, **kwargs))

We can see that it is general in that we can have multiple digit chunks:

# Note that the key has lower case letters
natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh')

('asl;dkfdfkj:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

Or leave as case sensitive:

natural_key('asl;dkfDFKJ:sdlkfjdf809lkasdjfa_543_hh', True)

('asl;dkfDFKJ:sdlkfjdf', 809, 'lkasdjfa_', 543, '_hh')

We can see that it sorts the OP’s list in the appropriate order

sorted(
    ['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13'],
    key=natural_key
)

['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

But it can handle more complicated lists as well:

sorted(
    ['f_1', 'e_1', 'a_2', 'g_0', 'd_0_12:2', 'd_0_1_:2'],
    key=natural_key
)

['a_2', 'd_0_1_:2', 'd_0_12:2', 'e_1', 'f_1', 'g_0']

My regex equivalent would be

def int_maybe(x):
    return int(x) if str(x).isdigit() else x

def split_digits_re(s, case=False):
    parts = re.findall('\d+|\D+', s)
    if not case:
        return map(int_maybe, (x.lower() for x in parts))
    else:
        return map(int_maybe, parts)
    
def natural_key_re(s, *args, **kwargs):
    return tuple(split_digits_re(s, *args, **kwargs))

回答 8

一种选择是使用扩展形式http://wiki.answers.com/Q/What_does_expanded_form_mean将字符串转换为元组并替换数字

这样,a90将变为(“ a”,90,0),而a1将变为(“ a”,1)

下面是一些示例代码(由于它从数字中删除前导0的方式,因此效率不高)

alist=["something1",
    "something12",
    "something17",
    "something2",
    "something25and_then_33",
    "something25and_then_34",
    "something29",
    "beta1.1",
    "beta2.3.0",
    "beta2.33.1",
    "a001",
    "a2",
    "z002",
    "z1"]

def key(k):
    nums=set(list("0123456789"))
        chars=set(list(k))
    chars=chars-nums
    for i in range(len(k)):
        for c in chars:
            k=k.replace(c+"0",c)
    l=list(k)
    base=10
    j=0
    for i in range(len(l)-1,-1,-1):
        try:
            l[i]=int(l[i])*base**j
            j+=1
        except:
            j=0
    l=tuple(l)
    print l
    return l

print sorted(alist,key=key)

输出:

('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']

One option is to turn the string into a tuple and replace digits using expanded form http://wiki.answers.com/Q/What_does_expanded_form_mean

that way a90 would become (“a”,90,0) and a1 would become (“a”,1)

below is some sample code (which isn’t very efficient due to the way It removes leading 0’s from numbers)

alist=["something1",
    "something12",
    "something17",
    "something2",
    "something25and_then_33",
    "something25and_then_34",
    "something29",
    "beta1.1",
    "beta2.3.0",
    "beta2.33.1",
    "a001",
    "a2",
    "z002",
    "z1"]

def key(k):
    nums=set(list("0123456789"))
        chars=set(list(k))
    chars=chars-nums
    for i in range(len(k)):
        for c in chars:
            k=k.replace(c+"0",c)
    l=list(k)
    base=10
    j=0
    for i in range(len(l)-1,-1,-1):
        try:
            l[i]=int(l[i])*base**j
            j+=1
        except:
            j=0
    l=tuple(l)
    print l
    return l

print sorted(alist,key=key)

output:

('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 1)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 10, 7)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 2)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 3)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 5, 'a', 'n', 'd', '_', 't', 'h', 'e', 'n', '_', 30, 4)
('s', 'o', 'm', 'e', 't', 'h', 'i', 'n', 'g', 20, 9)
('b', 'e', 't', 'a', 1, '.', 1)
('b', 'e', 't', 'a', 2, '.', 3, '.')
('b', 'e', 't', 'a', 2, '.', 30, 3, '.', 1)
('a', 1)
('a', 2)
('z', 2)
('z', 1)
['a001', 'a2', 'beta1.1', 'beta2.3.0', 'beta2.33.1', 'something1', 'something2', 'something12', 'something17', 'something25and_then_33', 'something25and_then_34', 'something29', 'z1', 'z002']

回答 9

根据此处的答案,我编写了一个natural_sorted行为类似于内置函数的函数sorted

# Copyright (C) 2018, Benjamin Drung <bdrung@posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

源代码也可以在我的GitHub代码段存储库中找到:https//github.com/bdrung/snippets/blob/master/natural_sorted.py

Based on the answers here, I wrote a natural_sorted function that behaves like the built-in function sorted:

# Copyright (C) 2018, Benjamin Drung <bdrung@posteo.de>
#
# Permission to use, copy, modify, and/or distribute this software for any
# purpose with or without fee is hereby granted, provided that the above
# copyright notice and this permission notice appear in all copies.
#
# THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
# WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
# MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
# ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
# WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
# ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
# OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.

import re

def natural_sorted(iterable, key=None, reverse=False):
    """Return a new naturally sorted list from the items in *iterable*.

    The returned list is in natural sort order. The string is ordered
    lexicographically (using the Unicode code point number to order individual
    characters), except that multi-digit numbers are ordered as a single
    character.

    Has two optional arguments which must be specified as keyword arguments.

    *key* specifies a function of one argument that is used to extract a
    comparison key from each list element: ``key=str.lower``.  The default value
    is ``None`` (compare the elements directly).

    *reverse* is a boolean value.  If set to ``True``, then the list elements are
    sorted as if each comparison were reversed.

    The :func:`natural_sorted` function is guaranteed to be stable. A sort is
    stable if it guarantees not to change the relative order of elements that
    compare equal --- this is helpful for sorting in multiple passes (for
    example, sort by department, then by salary grade).
    """
    prog = re.compile(r"(\d+)")

    def alphanum_key(element):
        """Split given key in list of strings and digits"""
        return [int(c) if c.isdigit() else c for c in prog.split(key(element)
                if key else element)]

    return sorted(iterable, key=alphanum_key, reverse=reverse)

The source code is also available in my GitHub snippets repository: https://github.com/bdrung/snippets/blob/master/natural_sorted.py


回答 10

上面的答案对于显示的特定示例很好,但是对于更自然的自然排序问题却错过了几个有用的案例。我只是被其中一种情况所困扰,因此创建了一个更彻底的解决方案:

def natural_sort_key(string_or_number):
    """
    by Scott S. Lawton <scott@ProductArchitect.com> 2014-12-11; public domain and/or CC0 license

    handles cases where simple 'int' approach fails, e.g.
        ['0.501', '0.55'] floating point with different number of significant digits
        [0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
        ['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
    """

    def try_float(astring):
        try:
            return float(astring)
        except:
            return astring

    if isinstance(string_or_number, basestring):
        string_or_number = string_or_number.lower()

        if len(re.findall('[.]\d', string_or_number)) <= 1:
            # assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
            # '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
            return [try_float(s) for s in re.split(r'([\d.]+)', string_or_number)]
        else:
            # assume distinct fields, e.g. IP address, phone number with '.', etc.
            # caveat: might want to first split by whitespace
            # TBD: for unicode, replace isdigit with isdecimal
            return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_or_number)]
    else:
        # consider: add code to recurse for lists/tuples and perhaps other iterables
        return string_or_number

测试代码和一些链接(StackOverflow的打开和关闭)在这里:http : //productarchitect.com/code/better-natural-sort.py

欢迎反馈。这并不是一个确定的解决方案。只是前进了一步。

The above answers are good for the specific example that was shown, but miss several useful cases for the more general question of natural sort. I just got bit by one of those cases, so created a more thorough solution:

def natural_sort_key(string_or_number):
    """
    by Scott S. Lawton <scott@ProductArchitect.com> 2014-12-11; public domain and/or CC0 license

    handles cases where simple 'int' approach fails, e.g.
        ['0.501', '0.55'] floating point with different number of significant digits
        [0.01, 0.1, 1]    already numeric so regex and other string functions won't work (and aren't required)
        ['elm1', 'Elm2']  ASCII vs. letters (not case sensitive)
    """

    def try_float(astring):
        try:
            return float(astring)
        except:
            return astring

    if isinstance(string_or_number, basestring):
        string_or_number = string_or_number.lower()

        if len(re.findall('[.]\d', string_or_number)) <= 1:
            # assume a floating point value, e.g. to correctly sort ['0.501', '0.55']
            # '.' for decimal is locale-specific, e.g. correct for the Anglosphere and Asia but not continental Europe
            return [try_float(s) for s in re.split(r'([\d.]+)', string_or_number)]
        else:
            # assume distinct fields, e.g. IP address, phone number with '.', etc.
            # caveat: might want to first split by whitespace
            # TBD: for unicode, replace isdigit with isdecimal
            return [int(s) if s.isdigit() else s for s in re.split(r'(\d+)', string_or_number)]
    else:
        # consider: add code to recurse for lists/tuples and perhaps other iterables
        return string_or_number

Test code and several links (on and off of StackOverflow) are here: http://productarchitect.com/code/better-natural-sort.py

Feedback welcome. That’s not meant to be a definitive solution; just a step forward.


回答 11

最有可能functools.cmp_to_key()与python sort的底层实现紧密相关。此外,cmp参数是旧式的。现代方法是将输入项转换为支持所需的丰富比较操作的对象。

在CPython 2.x中,即使尚未实现各自的丰富比较运算符,也可以对不同类型的对象进行排序。在CPython 3.x下,不同类型的对象必须显式支持比较。请参阅Python如何比较string和int?链接到官方文档。大多数答案取决于这种隐式排序。切换到Python 3.x将需要一种新类型来实现和统一数字和字符串之间的比较。

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True  
Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

有三种不同的方法。第一种使用嵌套类来利用Python的Iterable比较算法。第二个将嵌套展开为单个类。第三类放弃了子类化str,而是专注于性能。都是定时的;第二个速度快两倍,而第三个速度快六倍。子类化str不是必需的,并且一开始可能不是一个好主意,但是确实带来了一些便利。

复制排序字符以强制按大小写排序,并交换大小写以强制小写字母优先排序;这是“自然排序”的典型定义。我无法决定分组的类型;有些人可能更喜欢以下内容,这也带来了显着的性能优势:

d = lambda s: s.lower()+s.swapcase()

如果使用了比较运算符,则将它们设置为,object因此不会被忽略functools.total_ordering

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__
>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

自然排序既复杂又模糊地定义为问题。不要忘记unicodedata.normalize(...)事先运行,并考虑使用str.casefold()而不是str.lower()。我可能没有考虑过细微的编码问题。因此,我暂时推荐natsort库。我快速浏览了github仓库;代码维护一直很出色。

我见过的所有算法都取决于技巧,例如复制和降低字符以及交换大小写。虽然这使运行时间加倍,但是另一种方法将要求对输入字符集进行自然排序。我认为这不是unicode规范的一部分,并且由于unicode位数比得多[0-9],因此创建这种排序同样令人生畏。如果要进行区域设置比较,请locale.strxfrm按照Python的Sorting HOW TO准备字符串。

Most likely functools.cmp_to_key() is closely tied to the underlying implementation of python’s sort. Besides, the cmp parameter is legacy. The modern way is to transform the input items into objects that support the desired rich comparison operations.

Under CPython 2.x, objects of disparate types can be ordered even if the respective rich comparison operators haven’t been implemented. Under CPython 3.x, objects of different types must explicitly support the comparison. See How does Python compare string and int? which links to the official documentation. Most of the answers depend on this implicit ordering. Switching to Python 3.x will require a new type to implement and unify comparisons between numbers and strings.

Python 2.7.12 (default, Sep 29 2016, 13:30:34) 
>>> (0,"foo") < ("foo",0)
True  
Python 3.5.2 (default, Oct 14 2016, 12:54:53) 
>>> (0,"foo") < ("foo",0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  TypeError: unorderable types: int() < str()

There are three different approaches. The first uses nested classes to take advantage of Python’s Iterable comparison algorithm. The second unrolls this nesting into a single class. The third foregoes subclassing str to focus on performance. All are timed; the second is twice as fast while the third almost six times faster. Subclassing str isn’t required, and was probably a bad idea in the first place, but it does come with certain conveniences.

The sort characters are duplicated to force ordering by case, and case-swapped to force lower case letter to sort first; this is the typical definition of “natural sort”. I couldn’t decide on the type of grouping; some might prefer the following, which also brings significant performance benefits:

d = lambda s: s.lower()+s.swapcase()

Where utilized, the comparison operators are set to that of object so they won’t be ignored by functools.total_ordering.

import functools
import itertools


@functools.total_ordering
class NaturalStringA(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda c, s: [ c.NaturalStringPart("".join(v))
                        for k,v in
                       itertools.groupby(s, c.isdigit)
                     ]
    d = classmethod(d)
    @functools.total_ordering
    class NaturalStringPart(str):
        d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
        d = staticmethod(d)
        def __lt__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) < int(other)
            except ValueError:
                if self.isdigit():
                    return True
                elif other.isdigit():
                    return False
                else:
                    return self.d(self) < self.d(other)
        def __eq__(self, other):
            if not isinstance(self, type(other)):
                return NotImplemented
            try:
                return int(self) == int(other)
            except ValueError:
                if self.isdigit() or other.isdigit():
                    return False
                else:
                    return self.d(self) == self.d(other)
        __le__ = object.__le__
        __ne__ = object.__ne__
        __gt__ = object.__gt__
        __ge__ = object.__ge__
    def __lt__(self, other):
        return self.d(self) < self.d(other)
    def __eq__(self, other):
        return self.d(self) == self.d(other)
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools


@functools.total_ordering
class NaturalStringB(str):
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , super().__repr__()
            )
    d = lambda s: "".join(c.lower()+c.swapcase() for c in s)
    d = staticmethod(d)
    def __lt__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return s_v < o_v
        return False
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        groups = map(lambda i: itertools.groupby(i, type(self).isdigit), (self, other))
        zipped = itertools.zip_longest(*groups)
        for s,o in zipped:
            if s is None or o is None:
                return False
            s_k, s_v = s[0], "".join(s[1])
            o_k, o_v = o[0], "".join(o[1])
            if s_k and o_k:
                s_v, o_v = int(s_v), int(o_v)
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                s_v, o_v = self.d(s_v), self.d(o_v)
                if s_v == o_v:
                    continue
                return False
        return True
    __le__ = object.__le__
    __ne__ = object.__ne__
    __gt__ = object.__gt__
    __ge__ = object.__ge__
import functools
import itertools
import enum


class OrderingType(enum.Enum):
    PerWordSwapCase         = lambda s: s.lower()+s.swapcase()
    PerCharacterSwapCase    = lambda s: "".join(c.lower()+c.swapcase() for c in s)


class NaturalOrdering:
    @classmethod
    def by(cls, ordering):
        def wrapper(string):
            return cls(string, ordering)
        return wrapper
    def __init__(self, string, ordering=OrderingType.PerCharacterSwapCase):
        self.string = string
        self.groups = [ (k,int("".join(v)))
                            if k else
                        (k,ordering("".join(v)))
                            for k,v in
                        itertools.groupby(string, str.isdigit)
                      ]
    def __repr__(self):
        return "{}({})".format\
            ( type(self).__name__
            , self.string
            )
    def __lesser(self, other, default):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None:
                return True
            if o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return s_v < o_v
            elif s_k:
                return True
            elif o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return s_v < o_v
        return default
    def __lt__(self, other):
        return self.__lesser(other, default=False)
    def __le__(self, other):
        return self.__lesser(other, default=True)
    def __eq__(self, other):
        if not isinstance(self, type(other)):
            return NotImplemented
        for s,o in itertools.zip_longest(self.groups, other.groups):
            if s is None or o is None:
                return False
            s_k, s_v = s
            o_k, o_v = o
            if s_k and o_k:
                if s_v == o_v:
                    continue
                return False
            elif s_k or o_k:
                return False
            else:
                if s_v == o_v:
                    continue
                return False
        return True
    # functools.total_ordering doesn't create single-call wrappers if both
    # __le__ and __lt__ exist, so do it manually.
    def __gt__(self, other):
        op_result = self.__le__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    def __ge__(self, other):
        op_result = self.__lt__(other)
        if op_result is NotImplemented:
            return op_result
        return not op_result
    # __ne__ is the only implied ordering relationship, it automatically
    # delegates to __eq__
>>> import natsort
>>> import timeit
>>> l1 = ['Apple', 'corn', 'apPlE', 'arbour', 'Corn', 'Banana', 'apple', 'banana']
>>> l2 = list(map(str, range(30)))
>>> l3 = ["{} {}".format(x,y) for x in l1 for y in l2]
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringA)', number=10000, globals=globals()))
362.4729259099986
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalStringB)', number=10000, globals=globals()))
189.7340817489967
>>> print(timeit.timeit('sorted(l3+["0"], key=NaturalOrdering.by(OrderingType.PerCharacterSwapCase))', number=10000, globals=globals()))
69.34636392899847
>>> print(timeit.timeit('natsort.natsorted(l3+["0"], alg=natsort.ns.GROUPLETTERS | natsort.ns.LOWERCASEFIRST)', number=10000, globals=globals()))
98.2531585780016

Natural sorting is both pretty complicated and vaguely defined as a problem. Don’t forget to run unicodedata.normalize(...) beforehand, and consider use str.casefold() rather than str.lower(). There are probably subtle encoding issues I haven’t considered. So I tentatively recommend the natsort library. I took a quick glance at the github repository; the code maintenance has been stellar.

All the algorithms I’ve seen depend on tricks such as duplicating and lowering characters, and swapping case. While this doubles the running time, an alternative would require a total natural ordering on the input character set. I don’t think this is part of the unicode specification, and since there are many more unicode digits than [0-9], creating such a sorting would be equally daunting. If you want locale-aware comparisons, prepare your strings with locale.strxfrm per Python’s Sorting HOW TO.


回答 12

让我对这种需要提出自己的看法:

from typing import Tuple, Union, Optional, Generator


StrOrInt = Union[str, int]


# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
    last_was_digit: Optional[bool] = None
    cluster: str = ""
    for c in s:
        if last_was_digit is None:
            last_was_digit = c.isdigit()
            cluster += c
            continue
        if c.isdigit() != last_was_digit:
            if last_was_digit:
                yield int(cluster)
            else:
                yield cluster
            last_was_digit = c.isdigit()
            cluster = ""
        cluster += c
    if last_was_digit:
        yield int(cluster)
    else:
        yield cluster
    return


def grouper(s: str) -> Tuple[StrOrInt, ...]:
    return tuple(griter(s))

现在,如果我们有这样的列表:

filelist = [
    'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
    'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]

我们可以简单地使用key=kwarg进行自然排序:

>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8', 
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']

当然,这里的缺点是,因为现在,函数将大写字母排在小写字母之前。

我将把不区分大小写的石斑鱼的实现留给读者:-)

Let me submit my own take on this need:

from typing import Tuple, Union, Optional, Generator


StrOrInt = Union[str, int]


# On Python 3.6, string concatenation is REALLY fast
# Tested myself, and this fella also tested:
# https://blog.ganssle.io/articles/2019/11/string-concat.html
def griter(s: str) -> Generator[StrOrInt, None, None]:
    last_was_digit: Optional[bool] = None
    cluster: str = ""
    for c in s:
        if last_was_digit is None:
            last_was_digit = c.isdigit()
            cluster += c
            continue
        if c.isdigit() != last_was_digit:
            if last_was_digit:
                yield int(cluster)
            else:
                yield cluster
            last_was_digit = c.isdigit()
            cluster = ""
        cluster += c
    if last_was_digit:
        yield int(cluster)
    else:
        yield cluster
    return


def grouper(s: str) -> Tuple[StrOrInt, ...]:
    return tuple(griter(s))

Now if we have the list like such:

filelist = [
    'File3', 'File007', 'File3a', 'File10', 'File11', 'File1', 'File4', 'File5',
    'File9', 'File8', 'File8b1', 'File8b2', 'File8b11', 'File6'
]

We can simply use the key= kwarg to do a natural sort:

>>> sorted(filelist, key=grouper)
['File1', 'File3', 'File3a', 'File4', 'File5', 'File6', 'File007', 'File8', 
'File8b1', 'File8b2', 'File8b11', 'File9', 'File10', 'File11']

The drawback here is of course, as it is now, the function will sort uppercase letters before lowercase letters.

I’ll leave the implementation of a case-insenstive grouper to the reader :-)


回答 13

我建议您只使用key关键字参数of sorted来获得所需的列表
,例如:

to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
    # ordered should be [E1,e2,e3,E4,e5]

I suggest you simply use the key keyword argument of sorted to achieve your desired list
For example:

to_order= [e2,E1,e5,E4,e3]
ordered= sorted(to_order, key= lambda x: x.lower())
    # ordered should be [E1,e2,e3,E4,e5]

回答 14

在@Mark Byers回答之后,这是一个接受key参数的改编,并且更符合PEP8。

def natsorted(seq, key=None):
    def convert(text):
        return int(text) if text.isdigit() else text

    def alphanum(obj):
        if key is not None:
            return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
        return [convert(c) for c in re.split(r'([0-9]+)', obj)]

    return sorted(seq, key=alphanum)

我也做了一个要点

Following @Mark Byers answer, here is an adaptation which accepts the key parameter, and is more PEP8-compliant.

def natsorted(seq, key=None):
    def convert(text):
        return int(text) if text.isdigit() else text

    def alphanum(obj):
        if key is not None:
            return [convert(c) for c in re.split(r'([0-9]+)', key(obj))]
        return [convert(c) for c in re.split(r'([0-9]+)', obj)]

    return sorted(seq, key=alphanum)

I also made a Gist


回答 15

Claudiu对Mark Byer答案的改进;-)

import re

def natural_sort_key(s, _re=re.compile(r'(\d+)')):
    return [int(t) if i & 1 else t.lower() for i, t in enumerate(_re.split(s))]

...
my_naturally_sorted_list = sorted(my_list, key=natural_sort_key)

顺便说一句,也许不是每个人都记得函数参数默认值是在def时间评估的

An improvement on Claudiu’s improvement on Mark Byers’ answer ;-)

import re

def natural_sort_key(s, _re=re.compile(r'(\d+)')):
    return [int(t) if i & 1 else t.lower() for i, t in enumerate(_re.split(s))]

...
my_naturally_sorted_list = sorted(my_list, key=natural_sort_key)

BTW, maybe not everyone remembers that function argument defaults are evaluated at def time


回答 16

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

致谢

泡泡排序作业

如何在python中一次读取一个字母的字符串

a = ['H1', 'H100', 'H10', 'H3', 'H2', 'H6', 'H11', 'H50', 'H5', 'H99', 'H8']
b = ''
c = []

def bubble(bad_list):#bubble sort method
        length = len(bad_list) - 1
        sorted = False

        while not sorted:
                sorted = True
                for i in range(length):
                        if bad_list[i] > bad_list[i+1]:
                                sorted = False
                                bad_list[i], bad_list[i+1] = bad_list[i+1], bad_list[i] #sort the integer list 
                                a[i], a[i+1] = a[i+1], a[i] #sort the main list based on the integer list index value

for a_string in a: #extract the number in the string character by character
        for letter in a_string:
                if letter.isdigit():
                        #print letter
                        b += letter
        c.append(b)
        b = ''

print 'Before sorting....'
print a
c = map(int, c) #converting string list into number list
print c
bubble(c)

print 'After sorting....'
print c
print a

Acknowledgments:

Bubble Sort Homework

How to read a string one letter at a time in python


回答 17

>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']
>>> import re
>>> sorted(lst, key=lambda x: int(re.findall(r'\d+$', x)[0]))
['elm0', 'elm1', 'Elm2', 'elm9', 'elm10', 'Elm11', 'Elm12', 'elm13']

如何保持键/值与声明的顺序相同?

问题:如何保持键/值与声明的顺序相同?

我有一本按照特定顺序声明的字典,并希望一直保持该顺序。键/值实际上不能根据它们的值按顺序保留,我只希望按声明的顺序保留。

因此,如果我有字典:

d = {'ac': 33, 'gw': 20, 'ap': 102, 'za': 321, 'bs': 10}

如果我查看它或遍历它,则不是按此顺序进行的,有什么方法可以确保Python保持我声明键/值的显式顺序?

I have a dictionary that I declared in a particular order and want to keep it in that order all the time. The keys/values can’t really be kept in order based on their value, I just want it in the order that I declared it.

So if I have the dictionary:

d = {'ac': 33, 'gw': 20, 'ap': 102, 'za': 321, 'bs': 10}

It isn’t in that order if I view it or iterate through it, is there any way to make sure Python will keep the explicit order that I declared the keys/values in?


回答 0

从Python 3.6开始,标准dict类型默认会保留插入顺序。

定义

d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}

将产生字典,其中的键按源代码中列出的顺序排列。

这是通过将一个带有整数的简单数组用于稀疏哈希表来实现的,其中这些整数索引到另一个存储键值对(加上计算得出的哈希值)的数组中。后面的数组恰好按插入顺序存储项目,实际上,整个组合使用的内存少于Python 3.5及之前版本中使用的实现。有关详细信息,请参阅Raymond Hettinger原始想法帖子

在3.6中,这仍被视为实现细节;请参阅Python 3.6文档的新增功能

此新实现的顺序保留方面被认为是实现细节,因此不应依赖(将来可能会更改,但是希望在更改语言规范之前,先在几个版本中使用该新dict实现该语言,为所有当前和将来的Python实现强制要求保留顺序的语义;这还有助于保留与仍旧有效的随机迭代顺序的旧版本语言(例如Python 3.5)的向后兼容性。

Python 3.7将此实现细节提升为语言规范,因此现在必须dict保留与该版本或更高版本兼容的所有Python实现中的顺序。参见BDFL的声明

在某些情况下,您可能仍想使用collections.OrderedDict()该类,因为它在标准dict类型的基础上提供了一些附加功能。例如是可逆的(这扩展到视图对象),并支持重新排序(通过move_to_end()方法)。

From Python 3.6 onwards, the standard dict type maintains insertion order by default.

Defining

d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}

will result in a dictionary with the keys in the order listed in the source code.

This was achieved by using a simple array with integers for the sparse hash table, where those integers index into another array that stores the key-value pairs (plus the calculated hash). That latter array just happens to store the items in insertion order, and the whole combination actually uses less memory than the implementation used in Python 3.5 and before. See the original idea post by Raymond Hettinger for details.

In 3.6 this was still considered an implementation detail; see the What’s New in Python 3.6 documentation:

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5).

Python 3.7 elevates this implementation detail to a language specification, so it is now mandatory that dict preserves order in all Python implementations compatible with that version or newer. See the pronouncement by the BDFL.

You may still want to use the collections.OrderedDict() class in certain cases, as it offers some additional functionality on top of the standard dict type. Such as as being reversible (this extends to the view objects), and supporting reordering (via the move_to_end() method).


回答 1

from collections import OrderedDict
OrderedDict((word, True) for word in words)

包含

OrderedDict([('He', True), ('will', True), ('be', True), ('the', True), ('winner', True)])

如果值是True(或任何其他不可变对象),则还可以使用:

OrderedDict.fromkeys(words, True)
from collections import OrderedDict
OrderedDict((word, True) for word in words)

contains

OrderedDict([('He', True), ('will', True), ('be', True), ('the', True), ('winner', True)])

If the values are True (or any other immutable object), you can also use:

OrderedDict.fromkeys(words, True)

回答 2

我不会解释理论部分,而是给出一个简单的示例。

>>> from collections import OrderedDict
>>> my_dictionary=OrderedDict()
>>> my_dictionary['foo']=3
>>> my_dictionary['aol']=1
>>> my_dictionary
OrderedDict([('foo', 3), ('aol', 1)])
>>> dict(my_dictionary)
{'foo': 3, 'aol': 1}

Rather than explaining the theoretical part, I’ll give a simple example.

>>> from collections import OrderedDict
>>> my_dictionary=OrderedDict()
>>> my_dictionary['foo']=3
>>> my_dictionary['aol']=1
>>> my_dictionary
OrderedDict([('foo', 3), ('aol', 1)])
>>> dict(my_dictionary)
{'foo': 3, 'aol': 1}

回答 3

请注意,此答案适用于python3.7之前的python版本。CPython 3.6在大多数情况下都将插入顺序作为实现细节来维护。从Python3.7开始,已经声明实现必须维护插入顺序以使其兼容。


python字典是无序的。如果需要有序字典,请尝试collections.OrderedDict

注意,OrderedDict是在python 2.7中引入的标准库。如果您使用的是python的旧版本,则可以在ActiveState上找到有序词典的食谱。

Note that this answer applies to python versions prior to python3.7. CPython 3.6 maintains insertion order under most circumstances as an implementation detail. Starting from Python3.7 onward, it has been declared that implementations MUST maintain insertion order to be compliant.


python dictionaries are unordered. If you want an ordered dictionary, try collections.OrderedDict.

Note that OrderedDict was introduced into the standard library in python 2.7. If you have an older version of python, you can find recipes for ordered dictionaries on ActiveState.


回答 4

字典将使用使搜索有效的顺序,您无法更改该顺序,

您可以只使用对象列表(在简单情况下为2元素元组,甚至是一个类),然后将项目附加到末尾。然后,您可以使用线性搜索在其中查找项目。

或者,您可以创建或使用为维护顺序而创建的其他数据结构。

Dictionaries will use an order that makes searching efficient, and you cant change that,

You could just use a list of objects (a 2 element tuple in a simple case, or even a class), and append items to the end. You can then use linear search to find items in it.

Alternatively you could create or use a different data structure created with the intention of maintaining order.


回答 5

我在尝试弄清楚如何使OrderedDict工作时遇到了这篇文章。用于Eclipse的PyDev根本找不到OrderedDict,所以我最终决定根据字典的键值创建一个元组,因为我希望对它们进行排序。当我需要输出列表时,我只需遍历元组的值,然后将元组中迭代的“键”插入字典中,即可按需要的顺序检索值。

例:

test_dict = dict( val1 = "hi", val2 = "bye", val3 = "huh?", val4 = "what....")
test_tuple = ( 'val1', 'val2', 'val3', 'val4')
for key in test_tuple: print(test_dict[key])

这有点麻烦,但是我时间紧迫,这是我想出的解决方法。

注意:其他人建议的列表列表方法对我来说真的没有意义,因为列表是有序的和索引的(并且结构与字典不同)。

I came across this post while trying to figure out how to get OrderedDict to work. PyDev for Eclipse couldn’t find OrderedDict at all, so I ended up deciding to make a tuple of my dictionary’s key values as I would like them to be ordered. When I needed to output my list, I just iterated through the tuple’s values and plugged the iterated ‘key’ from the tuple into the dictionary to retrieve my values in the order I needed them.

example:

test_dict = dict( val1 = "hi", val2 = "bye", val3 = "huh?", val4 = "what....")
test_tuple = ( 'val1', 'val2', 'val3', 'val4')
for key in test_tuple: print(test_dict[key])

It’s a tad cumbersome, but I’m pressed for time and it’s the workaround I came up with.

note: the list of lists approach that somebody else suggested does not really make sense to me, because lists are ordered and indexed (and are also a different structure than dictionaries).


回答 6

您实际上无法使用字典来完成所需的工作。您已经d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}创建了词典。我发现一旦创建就无法保持秩序。我所做的是用对象制作了一个json文件:

{"ac":33,"gw":20,"ap":102,"za":321,"bs":10}

我用了:

r = json.load(open('file.json'), object_pairs_hook=OrderedDict)

然后使用:

print json.dumps(r)

核实。

You can’t really do what you want with a dictionary. You already have the dictionary d = {'ac':33, 'gw':20, 'ap':102, 'za':321, 'bs':10}created. I found there was no way to keep in order once it is already created. What I did was make a json file instead with the object:

{"ac":33,"gw":20,"ap":102,"za":321,"bs":10}

I used:

r = json.load(open('file.json'), object_pairs_hook=OrderedDict)

then used:

print json.dumps(r)

to verify.


回答 7

from collections import OrderedDict
list1 = ['k1', 'k2']
list2 = ['v1', 'v2']
new_ordered_dict = OrderedDict(zip(list1, list2))
print new_ordered_dict
# OrderedDict([('k1', 'v1'), ('k2', 'v2')])
from collections import OrderedDict
list1 = ['k1', 'k2']
list2 = ['v1', 'v2']
new_ordered_dict = OrderedDict(zip(list1, list2))
print new_ordered_dict
# OrderedDict([('k1', 'v1'), ('k2', 'v2')])

回答 8

另一种选择是使用Pandas,dataframe因为它可以保证像dict这样的结构中项目的顺序和索引位置。

Another alternative is to use Pandas dataframe as it guarantees the order and the index locations of the items in a dict-like structure.


回答 9

一般情况下,你可以设计一个类,像一本字典的行为,主要是实现方法__contains____getitem____delitem____setitem__和更多一些。该类可以具有您喜欢的任何行为,例如,对键使用排序的迭代器…

Generally, you can design a class that behaves like a dictionary, mainly be implementing the methods __contains__, __getitem__, __delitem__, __setitem__ and some more. That class can have any behaviour you like, for example prividing a sorted iterator over the keys …


回答 10

如果您希望按特定顺序拥有字典,则还可以创建一个列表列表,其中第一项为键,第二项为值,看起来像此示例

>>> list =[[1,2],[2,3]]
>>> for i in list:
...     print i[0]
...     print i[1]

1
2
2
3

if you would like to have a dictionary in a specific order, you can also create a list of lists, where the first item will be the key, and the second item will be the value and will look like this example

>>> list =[[1,2],[2,3]]
>>> for i in list:
...     print i[0]
...     print i[1]

1
2
2
3

回答 11

开发Django项目时遇到类似的问题。我无法使用OrderedDict,因为我正在运行旧版本的python,因此解决方案是使用Django的SortedDict类:

https://code.djangoproject.com/wiki/SortedDict

例如,

from django.utils.datastructures import SortedDict
d2 = SortedDict()
d2['b'] = 1
d2['a'] = 2
d2['c'] = 3

注意:此答案最初来自2011年。如果您可以访问python 2.7或更高版本,则应该可以访问现在的standard collections.OrderedDict,该线程中的其他示例已经提供了其中的很多示例。

I had a similar problem when developing a Django project. I couldn’t use OrderedDict, because I was running an old version of python, so the solution was to use Django’s SortedDict class:

https://code.djangoproject.com/wiki/SortedDict

e.g.,

from django.utils.datastructures import SortedDict
d2 = SortedDict()
d2['b'] = 1
d2['a'] = 2
d2['c'] = 3

Note: This answer is originally from 2011. If you have access to Python version 2.7 or higher, then you should have access to the now standard collections.OrderedDict, of which many examples have been provided by others in this thread.


回答 12

您可以做与字典一样的事情。

创建一个列表并清空字典:

dictionary_items = {}
fields = [['Name', 'Himanshu Kanojiya'], ['email id', 'hima@gmail.com']]
l = fields[0][0]
m = fields[0][1]
n = fields[1][0]
q = fields[1][1]
dictionary_items[l] = m
dictionary_items[n] = q
print dictionary_items

You can do the same thing which i did for dictionary.

Create a list and empty dictionary:

dictionary_items = {}
fields = [['Name', 'Himanshu Kanojiya'], ['email id', 'hima@gmail.com']]
l = fields[0][0]
m = fields[0][1]
n = fields[1][0]
q = fields[1][1]
dictionary_items[l] = m
dictionary_items[n] = q
print dictionary_items

根据另一个列表中的值对列表进行排序?

问题:根据另一个列表中的值对列表进行排序?

我有一个这样的字符串列表:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

使用Y中的值对X进行排序以获取以下输出的最短方法是什么?

["a", "d", "h", "b", "c", "e", "i", "f", "g"]

具有相同“键”的元素的顺序无关紧要。我可以求助于for结构的使用,但我好奇是否有更短的方法。有什么建议么?

I have a list of strings like this:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

What is the shortest way of sorting X using values from Y to get the following output?

["a", "d", "h", "b", "c", "e", "i", "f", "g"]

The order of the elements having the same “key” does not matter. I can resort to the use of for constructs but I am curious if there is a shorter way. Any suggestions?


回答 0

最短代码

[x for _,x in sorted(zip(Y,X))]

例:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Z = [x for _,x in sorted(zip(Y,X))]
print(Z)  # ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

一般来说

[x for _, x in sorted(zip(Y,X), key=lambda pair: pair[0])]

解释:

  1. zip两个list
  2. 创建一个新的,list基于zip使用排序sorted()
  3. 使用列表推导从排序的,压缩的中提取每对的第一个元素list

有关如何设置\使用key参数以及sorted一般功能的更多信息,请查看this


Shortest Code

[x for _,x in sorted(zip(Y,X))]

Example:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Z = [x for _,x in sorted(zip(Y,X))]
print(Z)  # ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

Generally Speaking

[x for _, x in sorted(zip(Y,X), key=lambda pair: pair[0])]

Explained:

  1. zip the two lists.
  2. create a new, sorted list based on the zip using sorted().
  3. using a list comprehension extract the first elements of each pair from the sorted, zipped list.

For more information on how to set\use the key parameter as well as the sorted function in general, take a look at this.



回答 1

将两个列表压缩在一起,对其进行排序,然后选择所需的部分:

>>> yx = zip(Y, X)
>>> yx
[(0, 'a'), (1, 'b'), (1, 'c'), (0, 'd'), (1, 'e'), (2, 'f'), (2, 'g'), (0, 'h'), (1, 'i')]
>>> yx.sort()
>>> yx
[(0, 'a'), (0, 'd'), (0, 'h'), (1, 'b'), (1, 'c'), (1, 'e'), (1, 'i'), (2, 'f'), (2, 'g')]
>>> x_sorted = [x for y, x in yx]
>>> x_sorted
['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

将它们结合在一起可获得:

[x for y, x in sorted(zip(Y, X))]

Zip the two lists together, sort it, then take the parts you want:

>>> yx = zip(Y, X)
>>> yx
[(0, 'a'), (1, 'b'), (1, 'c'), (0, 'd'), (1, 'e'), (2, 'f'), (2, 'g'), (0, 'h'), (1, 'i')]
>>> yx.sort()
>>> yx
[(0, 'a'), (0, 'd'), (0, 'h'), (1, 'b'), (1, 'c'), (1, 'e'), (1, 'i'), (2, 'f'), (2, 'g')]
>>> x_sorted = [x for y, x in yx]
>>> x_sorted
['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

Combine these together to get:

[x for y, x in sorted(zip(Y, X))]

回答 2

另外,如果您不介意使用numpy数组(或者实际上已经在处理numpy数组…),这是另一个不错的解决方案:

people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

import numpy
people = numpy.array(people)
ages = numpy.array(ages)
inds = ages.argsort()
sortedPeople = people[inds]

我在这里找到它:http : //scienceoss.com/sort-one-list-by-another-list/

Also, if you don’t mind using numpy arrays (or in fact already are dealing with numpy arrays…), here is another nice solution:

people = ['Jim', 'Pam', 'Micheal', 'Dwight']
ages = [27, 25, 4, 9]

import numpy
people = numpy.array(people)
ages = numpy.array(ages)
inds = ages.argsort()
sortedPeople = people[inds]

I found it here: http://scienceoss.com/sort-one-list-by-another-list/


回答 3

对我来说,最明显的解决方案是使用key关键字arg。

>>> X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
>>> Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]
>>> keydict = dict(zip(X, Y))
>>> X.sort(key=keydict.get)
>>> X
['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

请注意,如果您愿意,可以将其缩短为单线:

>>> X.sort(key=dict(zip(X, Y)).get)

The most obvious solution to me is to use the key keyword arg.

>>> X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
>>> Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]
>>> keydict = dict(zip(X, Y))
>>> X.sort(key=keydict.get)
>>> X
['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

Note that you can shorten this to a one-liner if you care to:

>>> X.sort(key=dict(zip(X, Y)).get)

回答 4

我实际上是来这里寻找按值匹配的列表对列表进行排序的。

list_a = ['foo', 'bar', 'baz']
list_b = ['baz', 'bar', 'foo']
sorted(list_b, key=lambda x: list_a.index(x))
# ['foo', 'bar', 'baz']

I actually came here looking to sort a list by a list where the values matched.

list_a = ['foo', 'bar', 'baz']
list_b = ['baz', 'bar', 'foo']
sorted(list_b, key=lambda x: list_a.index(x))
# ['foo', 'bar', 'baz']

回答 5

more_itertools 有一个用于并行迭代可迭代对象的工具:

给定

from more_itertools import sort_together


X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

演示版

sort_together([Y, X])[1]
# ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')

more_itertools has a tool for sorting iterables in parallel:

Given

from more_itertools import sort_together


X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Demo

sort_together([Y, X])[1]
# ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')

回答 6

我喜欢列出排序索引。这样,我可以按照与源列表相同的顺序对任何列表进行排序。一旦有了排序索引的列表,简单的列表理解就可以解决问题:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

sorted_y_idx_list = sorted(range(len(Y)),key=lambda x:Y[x])
Xs = [X[i] for i in sorted_y_idx_list ]

print( "Xs:", Xs )
# prints: Xs: ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

请注意,也可以使用来获得排序后的索引列表numpy.argsort()

I like having a list of sorted indices. That way, I can sort any list in the same order as the source list. Once you have a list of sorted indices, a simple list comprehension will do the trick:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

sorted_y_idx_list = sorted(range(len(Y)),key=lambda x:Y[x])
Xs = [X[i] for i in sorted_y_idx_list ]

print( "Xs:", Xs )
# prints: Xs: ["a", "d", "h", "b", "c", "e", "i", "f", "g"]

Note that the sorted index list can also be gotten using numpy.argsort().


回答 7

另一个选择,结合几个答案。

zip(*sorted(zip(Y,X)))[1]

为了使用python3:

list(zip(*sorted(zip(B,A))))[1]

Another alternative, combining several of the answers.

zip(*sorted(zip(Y,X)))[1]

In order to work for python3:

list(zip(*sorted(zip(B,A))))[1]

回答 8

zip,按第二列排序,返回第一列。

zip(*sorted(zip(X,Y), key=operator.itemgetter(1)))[0]

zip, sort by the second column, return the first column.

zip(*sorted(zip(X,Y), key=operator.itemgetter(1)))[0]

回答 9

快速的单线。

list_a = [5,4,3,2,1]
list_b = [1,1.5,1.75,2,3,3.5,3.75,4,5]

假设您希望列表a与列表b匹配。

orderedList =  sorted(list_a, key=lambda x: list_b.index(x))

当需要将较小的列表排序为较大的值时,这将很有帮助。假设较大的列表包含较小列表中的所有值,则可以完成此操作。

A quick one-liner.

list_a = [5,4,3,2,1]
list_b = [1,1.5,1.75,2,3,3.5,3.75,4,5]

Say you want list a to match list b.

orderedList =  sorted(list_a, key=lambda x: list_b.index(x))

This is helpful when needing to order a smaller list to values in larger. Assuming that the larger list contains all values in the smaller list, it can be done.


回答 10

您可以创建一个pandas Series,使用主列表作为data,其他列表作为index,然后按索引排序:

import pandas as pd
pd.Series(data=X,index=Y).sort_index().tolist()

输出:

['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

You can create a pandas Series, using the primary list as data and the other list as index, and then just sort by the index:

import pandas as pd
pd.Series(data=X,index=Y).sort_index().tolist()

output:

['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

回答 11

如果您想同时获得两个排序列表(python3),这是Whatangs的答案。

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Zx, Zy = zip(*[(x, y) for x, y in sorted(zip(Y, X))])

print(list(Zx))  # [0, 0, 0, 1, 1, 1, 1, 2, 2]
print(list(Zy))  # ['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

只要记住Zx和Zy是元组即可。如果有更好的方法,我也在徘徊。

警告:如果使用空列表运行它,则会崩溃。

Here is Whatangs answer if you want to get both sorted lists (python3).

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,    0,   1,   2,   2,   0,   1]

Zx, Zy = zip(*[(x, y) for x, y in sorted(zip(Y, X))])

print(list(Zx))  # [0, 0, 0, 1, 1, 1, 1, 2, 2]
print(list(Zy))  # ['a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g']

Just remember Zx and Zy are tuples. I am also wandering if there is a better way to do that.

Warning: If you run it with empty lists it crashes.


回答 12

我创建了一个更通用的函数,该函数根据@Whatang的答案启发,根据另一个列表对两个以上的列表进行排序。

def parallel_sort(*lists):
    """
    Sorts the given lists, based on the first one.
    :param lists: lists to be sorted

    :return: a tuple containing the sorted lists
    """

    # Create the initially empty lists to later store the sorted items
    sorted_lists = tuple([] for _ in range(len(lists)))

    # Unpack the lists, sort them, zip them and iterate over them
    for t in sorted(zip(*lists)):
        # list items are now sorted based on the first list
        for i, item in enumerate(t):    # for each item...
            sorted_lists[i].append(item)  # ...store it in the appropriate list

    return sorted_lists

I have created a more general function, that sorts more than two lists based on another one, inspired by @Whatang’s answer.

def parallel_sort(*lists):
    """
    Sorts the given lists, based on the first one.
    :param lists: lists to be sorted

    :return: a tuple containing the sorted lists
    """

    # Create the initially empty lists to later store the sorted items
    sorted_lists = tuple([] for _ in range(len(lists)))

    # Unpack the lists, sort them, zip them and iterate over them
    for t in sorted(zip(*lists)):
        # list items are now sorted based on the first list
        for i, item in enumerate(t):    # for each item...
            sorted_lists[i].append(item)  # ...store it in the appropriate list

    return sorted_lists

回答 13

list1 = ['a','b','c','d','e','f','g','h','i']
list2 = [0,1,1,0,1,2,2,0,1]

output=[]
cur_loclist = []

要获取存在的唯一值 list2

list_set = set(list2)

在以下位置查找索引的位置 list2

list_str = ''.join(str(s) for s in list2)

list2使用跟踪索引的位置cur_loclist

[0、3、7、1、2、4、8、5、6]

for i in list_set:
cur_loc = list_str.find(str(i))

while cur_loc >= 0:
    cur_loclist.append(cur_loc)
    cur_loc = list_str.find(str(i),cur_loc+1)

print(cur_loclist)

for i in range(0,len(cur_loclist)):
output.append(list1[cur_loclist[i]])
print(output)
list1 = ['a','b','c','d','e','f','g','h','i']
list2 = [0,1,1,0,1,2,2,0,1]

output=[]
cur_loclist = []

To get unique values present in list2

list_set = set(list2)

To find the loc of the index in list2

list_str = ''.join(str(s) for s in list2)

Location of index in list2 is tracked using cur_loclist

[0, 3, 7, 1, 2, 4, 8, 5, 6]

for i in list_set:
cur_loc = list_str.find(str(i))

while cur_loc >= 0:
    cur_loclist.append(cur_loc)
    cur_loc = list_str.find(str(i),cur_loc+1)

print(cur_loclist)

for i in range(0,len(cur_loclist)):
output.append(list1[cur_loclist[i]])
print(output)

回答 14

这是一个古老的问题,但是我看到的一些答案由于zip无法编写脚本而无法实际使用。其他答案没有困扰import operator并在此处提供有关此模块及其好处的更多信息。

这个问题至少有两个好的习惯用法。从您提供的示例输入开始:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

使用“ 装饰-排序-未装饰 ”习惯用法

R. Schwartz在90年代在Perl中推广了这种模式之后,这也称为Schwartzian_transform

# Zip (decorate), sort and unzip (undecorate).
# Converting to list to script the output and extract X
list(zip(*(sorted(zip(Y,X)))))[1]                                                                                                                       
# Results in: ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')

请注意,在这种情况下YX会按字典顺序进行排序和比较。也就是说,比较第一个项目(来自Y);如果它们相同,则比较第二个项目(来自X),依此类推。这会造成不稳定除非您为字典顺序包括原始列表索引,以使重复项保持原始顺序,否则输出。

使用operator模块

这使您可以更直接地控制对输入进行排序的方式,因此您可以通过简单地说明要作为排序依据的特定键来获得排序稳定性在这里查看更多示例。

import operator    

# Sort by Y (1) and extract X [0]
list(zip(*sorted(zip(X,Y), key=operator.itemgetter(1))))[0]                                                                                                 
# Results in: ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')

This is an old question but some of the answers I see posted don’t actually work because zip is not scriptable. Other answers didn’t bother to import operator and provide more info about this module and its benefits here.

There are at least two good idioms for this problem. Starting with the example input you provided:

X = ["a", "b", "c", "d", "e", "f", "g", "h", "i"]
Y = [ 0,   1,   1,   0,   1,   2,   2,   0,   1 ]

Using the “Decorate-Sort-Undecorate” idiom

This is also known as the Schwartzian_transform after R. Schwartz who popularized this pattern in Perl in the 90s:

# Zip (decorate), sort and unzip (undecorate).
# Converting to list to script the output and extract X
list(zip(*(sorted(zip(Y,X)))))[1]                                                                                                                       
# Results in: ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')

Note that in this case Y and X are sorted and compared lexicographically. That is, the first items (from Y) are compared; and if they are the same then the second items (from X) are compared, and so on. This can create unstable outputs unless you include the original list indices for the lexicographic ordering to keep duplicates in their original order.

Using the operator module

This gives you more direct control over how to sort the input, so you can get sorting stability by simply stating the specific key to sort by. See more examples here.

import operator    

# Sort by Y (1) and extract X [0]
list(zip(*sorted(zip(X,Y), key=operator.itemgetter(1))))[0]                                                                                                 
# Results in: ('a', 'd', 'h', 'b', 'c', 'e', 'i', 'f', 'g')