Python 实用宝典

Question 1

I have an array of numbers and I’d like to create another array that represents the rank of each item in the first array. I’m using Python and NumPy.

For example:

array = [4,2,7,1]
ranks = [2,1,3,0]

Here’s the best method I’ve come up with:

array = numpy.array([4,2,7,1])
temp = array.argsort()
ranks = numpy.arange(len(array))[temp.argsort()]

Are there any better/faster methods that avoid sorting the array twice?

Question 2

Use slicing on the left-hand side in the last step:

array = numpy.array([4,2,7,1])
temp = array.argsort()
ranks = numpy.empty_like(temp)
ranks[temp] = numpy.arange(len(array))

This avoids sorting twice by inverting the permutation in the last step.

Question 3

Use argsort twice, first to obtain the order of the array, then to obtain ranking:

array = numpy.array([4,2,7,1])
order = array.argsort()
ranks = order.argsort()

When dealing with 2D (or higher dimensional) arrays, be sure to pass an axis argument to argsort to order over the correct axis.

Question 4

This question is a few years old, and the accepted answer is great, but I think the following is still worth mentioning. If you don’t mind the dependency on scipy, you can use scipy.stats.rankdata:

In [22]: from scipy.stats import rankdata

In [23]: a = [4, 2, 7, 1]

In [24]: rankdata(a)
Out[24]: array([ 3.,  2.,  4.,  1.])

In [25]: (rankdata(a) - 1).astype(int)
Out[25]: array([2, 1, 3, 0])

A nice feature of rankdata is that the method argument provides several options for handling ties. For example, there are three occurrences of 20 and two occurrences of 40 in b:

In [26]: b = [40, 20, 70, 10, 20, 50, 30, 40, 20]

The default assigns the average rank to the tied values:

In [27]: rankdata(b)
Out[27]: array([ 6.5,  3. ,  9. ,  1. ,  3. ,  8. ,  5. ,  6.5,  3. ])

method='ordinal' assigns consecutive ranks:

In [28]: rankdata(b, method='ordinal')
Out[28]: array([6, 2, 9, 1, 3, 8, 5, 7, 4])

method='min' assigns the minimum rank of the tied values to all the tied values:

In [29]: rankdata(b, method='min')
Out[29]: array([6, 2, 9, 1, 2, 8, 5, 6, 2])

See the docstring for more options.

Question 5

I tried to extend both solution for arrays A of more than one dimension, supposing you process your array row-by-row (axis=1).

I extended the first code with a loop on rows; probably it can be improved

temp = A.argsort(axis=1)
rank = np.empty_like(temp)
rangeA = np.arange(temp.shape[1])
for iRow in xrange(temp.shape[0]): 
    rank[iRow, temp[iRow,:]] = rangeA

And the second one, following k.rooijers suggestion, becomes:

temp = A.argsort(axis=1)
rank = temp.argsort(axis=1)

I randomly generated 400 arrays with shape (1000,100); the first code took about 7.5, the second one 3.8.

Question 6

For a vectorized version of an averaged rank, see below. I love np.unique, it really widens the scope of what code can and cannot be efficiently vectorized. Aside from avoiding python for-loops, this approach also avoids the implicit double loop over ‘a’.

import numpy as np

a = np.array( [4,1,6,8,4,1,6])

a = np.array([4,2,7,2,1])
rank = a.argsort().argsort()

unique, inverse = np.unique(a, return_inverse = True)

unique_rank_sum = np.zeros_like(unique)
np.add.at(unique_rank_sum, inverse, rank)
unique_count = np.zeros_like(unique)
np.add.at(unique_count, inverse, 1)

unique_rank_mean = unique_rank_sum.astype(np.float) / unique_count

rank_mean = unique_rank_mean[inverse]

print rank_mean

Question 7

Apart from the elegance and shortness of solutions, there is also the question of performance. Here is a little benchmark:

import numpy as np
from scipy.stats import rankdata
l = list(reversed(range(1000)))

%%timeit -n10000 -r5
x = (rankdata(l) - 1).astype(int)
>>> 128 µs ± 2.72 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

%%timeit -n10000 -r5
a = np.array(l)
r = a.argsort().argsort()
>>> 69.1 µs ± 464 ns per loop (mean ± std. dev. of 5 runs, 10000 loops each)

%%timeit -n10000 -r5
a = np.array(l)
temp = a.argsort()
r = np.empty_like(temp)
r[temp] = np.arange(len(a))
>>> 63.7 µs ± 1.27 µs per loop (mean ± std. dev. of 5 runs, 10000 loops each)

Question 8

Use argsort() twice will do it:

>>> array = [4,2,7,1]
>>> ranks = numpy.array(array).argsort().argsort()
>>> ranks
array([2, 1, 3, 0])

Question 9

I tried the above methods, but failed because I had many zeores. Yes, even with floats duplicate items may be important.

So I wrote a modified 1D solution by adding a tie-checking step:

def ranks (v):
    import numpy as np
    t = np.argsort(v)
    r = np.empty(len(v),int)
    r[t] = np.arange(len(v))
    for i in xrange(1, len(r)):
        if v[t[i]] <= v[t[i-1]]: r[t[i]] = r[t[i-1]]
    return r

# test it
print sorted(zip(ranks(v), v))

I believe it’s as efficient as it can be.

Question 10

I liked the method by k.rooijers, but as rcoup wrote, repeated numbers are ranked according to array position. This was no good for me, so I modified the version to postprocess the ranks and merge any repeated numbers into a combined average rank:

import numpy as np
a = np.array([4,2,7,2,1])
r = np.array(a.argsort().argsort(), dtype=float)
f = a==a
for i in xrange(len(a)):
   if not f[i]: continue
   s = a == a[i]
   ls = np.sum(s)
   if ls > 1:
      tr = np.sum(r[s])
      r[s] = float(tr)/ls
   f[s] = False

print r  # array([ 3. ,  1.5,  4. ,  1.5,  0. ])

I hope this might help others too, I tried to find anothers solution to this, but couldn’t find any…

Question 11

argsort and slice are symmetry operations.

try slice twice instead of argsort twice. since slice is faster than argsort

array = numpy.array([4,2,7,1])
order = array.argsort()
ranks = np.arange(array.shape[0])[order][order]

Question 12

More general version of one of the answers:

In [140]: x = np.random.randn(10, 3)

In [141]: i = np.argsort(x, axis=0)

In [142]: ranks = np.empty_like(i)

In [143]: np.put_along_axis(ranks, i, np.repeat(np.arange(x.shape[0])[:,None], x.shape[1], axis=1), axis=0)

See How to use numpy.argsort() as indices in more than 2 dimensions? to generalize to more dims.

Question 13

In Python 2.x, I could pass custom function to sorted and .sort functions

>>> x=['kar','htar','har','ar']
>>>
>>> sorted(x)
['ar', 'har', 'htar', 'kar']
>>> 
>>> sorted(x,cmp=customsort)
['kar', 'htar', 'har', 'ar']

Because, in My language, consonents are comes with this order

"k","kh",....,"ht",..."h",...,"a"

But In Python 3.x, looks like I could not pass cmp keyword

>>> sorted(x,cmp=customsort)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: 'cmp' is an invalid keyword argument for this function

Is there any alternatives or should I write my own sorted function too?

_{Note: I simplified by using “k”, “kh”, etc. Actual characters are Unicodes and even more complicated, sometimes there is vowels comes before and after consonents, I’ve done custom comparison function, So that part is ok. Only the problem is I could not pass my custom comparison function to sorted or .sort}

Question 14

Use the key argument (and follow the recipe on how to convert your old cmp function to a key function).

functools has a function cmp_to_key mentioned at docs.python.org/3.6/library/functools.html#functools.cmp_to_key

Question 15

Use the key keyword and functools.cmp_to_key to transform your comparison function:

sorted(x, key=functools.cmp_to_key(customsort))

Question 16

Instead of a customsort(), you need a function that translates each word into something that Python already knows how to sort. For example, you could translate each word into a list of numbers where each number represents where each letter occurs in your alphabet. Something like this:

my_alphabet = ['a', 'b', 'c']

def custom_key(word):
   numbers = []
   for letter in word:
      numbers.append(my_alphabet.index(letter))
   return numbers

x=['cbaba', 'ababa', 'bbaa']
x.sort(key=custom_key)

Since your language includes multi-character letters, your custom_key function will obviously need to be more complicated. That should give you the general idea though.

Question 17

A complete python3 cmp_to_key lambda example:

from functools import cmp_to_key

nums = [28, 50, 17, 12, 121]
nums.sort(key=cmp_to_key(lambda x, y: 1 if str(x)+str(y) < str(y)+str(x) else -1))

compare to common object sorting:

class NumStr:
    def __init__(self, v):
        self.v = v
    def __lt__(self, other):
        return self.v + other.v < other.v + self.v


A = [NumStr("12"), NumStr("121")]
A.sort()
print(A[0].v, A[1].v)

A = [obj.v for obj in A]
print(A)

Question 18

I don’t know if this will help, but you may check out the locale module. It looks like you can set the locale to your language and use locale.strcoll to compare strings using your language’s sorting rules.

Question 19

Use the key argument instead. It takes a function that takes the value being processed and returns a single value giving the key to use to sort by.

sorted(x, key=somekeyfunc)

Question 20

I want to group my dataframe by two columns and then sort the aggregated results within the groups.

In [167]:
df

Out[167]:
count   job source
0   2   sales   A
1   4   sales   B
2   6   sales   C
3   3   sales   D
4   7   sales   E
5   5   market  A
6   3   market  B
7   2   market  C
8   4   market  D
9   1   market  E

In [168]:
df.groupby(['job','source']).agg({'count':sum})

Out[168]:
            count
job     source  
market  A   5
        B   3
        C   2
        D   4
        E   1
sales   A   2
        B   4
        C   6
        D   3
        E   7

I would now like to sort the count column in descending order within each of the groups. And then take only the top three rows. To get something like:

            count
job     source  
market  A   5
        D   4
        B   3
sales   E   7
        C   6
        B   4

Question 21

What you want to do is actually again a groupby (on the result of the first groupby): sort and take the first three elements per group.

Starting from the result of the first groupby:

In [60]: df_agg = df.groupby(['job','source']).agg({'count':sum})

We group by the first level of the index:

In [63]: g = df_agg['count'].groupby('job', group_keys=False)

Then we want to sort (‘order’) each group and take the first three elements:

In [64]: res = g.apply(lambda x: x.sort_values(ascending=False).head(3))

However, for this, there is a shortcut function to do this, nlargest:

In [65]: g.nlargest(3)
Out[65]:
job     source
market  A         5
        D         4
        B         3
sales   E         7
        C         6
        B         4
dtype: int64

So in one go, this looks like:

df_agg['count'].groupby('job', group_keys=False).nlargest(3)

Question 22

You could also just do it in one go, by doing the sort first and using head to take the first 3 of each group.

In[34]: df.sort_values(['job','count'],ascending=False).groupby('job').head(3)

Out[35]: 
   count     job source
4      7   sales      E
2      6   sales      C
1      4   sales      B
5      5  market      A
8      4  market      D
6      3  market      B

Question 23

Here’s other example of taking top 3 on sorted order, and sorting within the groups:

In [43]: import pandas as pd                                                                                                                                                       

In [44]:  df = pd.DataFrame({"name":["Foo", "Foo", "Baar", "Foo", "Baar", "Foo", "Baar", "Baar"], "count_1":[5,10,12,15,20,25,30,35], "count_2" :[100,150,100,25,250,300,400,500]})

In [45]: df                                                                                                                                                                        
Out[45]: 
   count_1  count_2  name
0        5      100   Foo
1       10      150   Foo
2       12      100  Baar
3       15       25   Foo
4       20      250  Baar
5       25      300   Foo
6       30      400  Baar
7       35      500  Baar


### Top 3 on sorted order:
In [46]: df.groupby(["name"])["count_1"].nlargest(3)                                                                                                                               
Out[46]: 
name   
Baar  7    35
      6    30
      4    20
Foo   5    25
      3    15
      1    10
dtype: int64


### Sorting within groups based on column "count_1":
In [48]: df.groupby(["name"]).apply(lambda x: x.sort_values(["count_1"], ascending = False)).reset_index(drop=True)
Out[48]: 
   count_1  count_2  name
0       35      500  Baar
1       30      400  Baar
2       20      250  Baar
3       12      100  Baar
4       25      300   Foo
5       15       25   Foo
6       10      150   Foo
7        5      100   Foo

Question 24

Try this Instead

simple way to do ‘groupby’ and sorting in descending order

df.groupby(['companyName'])['overallRating'].sum().sort_values(ascending=False).head(20)

Question 25

If you don’t need to sum a column, then use @tvashtar’s answer. If you do need to sum, then you can use @joris’ answer or this one which is very similar to it.

df.groupby(['job']).apply(lambda x: (x.groupby('source')
                                      .sum()
                                      .sort_values('count', ascending=False))
                                     .head(3))

Question 26

I am a bit confused regarding data structure in python; (),[], and {}. I am trying to sort a simple list, probably since I cannot identify the type of data I am failing to sort it.

My list is simple: ['Stem', 'constitute', 'Sedge', 'Eflux', 'Whim', 'Intrigue']

My question is what type of data this is, and how to sort the words alphabetically?

Question 27

[] denotes a list, () denotes a tuple and {} denotes a dictionary. You should take a look at the official Python tutorial as these are the very basics of programming in Python.

What you have is a list of strings. You can sort it like this:

In [1]: lst = ['Stem', 'constitute', 'Sedge', 'Eflux', 'Whim', 'Intrigue']

In [2]: sorted(lst)
Out[2]: ['Eflux', 'Intrigue', 'Sedge', 'Stem', 'Whim', 'constitute']

As you can see, words that start with an uppercase letter get preference over those starting with a lowercase letter. If you want to sort them independently, do this:

In [4]: sorted(lst, key=str.lower)
Out[4]: ['constitute', 'Eflux', 'Intrigue', 'Sedge', 'Stem', 'Whim']

You can also sort the list in reverse order by doing this:

In [12]: sorted(lst, reverse=True)
Out[12]: ['constitute', 'Whim', 'Stem', 'Sedge', 'Intrigue', 'Eflux']

In [13]: sorted(lst, key=str.lower, reverse=True)
Out[13]: ['Whim', 'Stem', 'Sedge', 'Intrigue', 'Eflux', 'constitute']

Please note: If you work with Python 3, then str is the correct data type for every string that contains human-readable text. However, if you still need to work with Python 2, then you might deal with unicode strings which have the data type unicode in Python 2, and not str. In such a case, if you have a list of unicode strings, you must write key=unicode.lower instead of key=str.lower.

Question 28

Python has a built-in function called sorted, which will give you a sorted list from any iterable you feed it (such as a list ([1,2,3]); a dict ({1:2,3:4}, although it will just return a sorted list of the keys; a set ({1,2,3,4); or a tuple ((1,2,3,4))).

>>> x = [3,2,1]
>>> sorted(x)
[1, 2, 3]
>>> x
[3, 2, 1]

Lists also have a sort method that will perform the sort in-place (x.sort() returns None but changes the x object) .

>>> x = [3,2,1]
>>> x.sort()
>>> x
[1, 2, 3]

Both also take a key argument, which should be a callable (function/lambda) you can use to change what to sort by.
For example, to get a list of (key,value)-pairs from a dict which is sorted by value you can use the following code:

>>> x = {3:2,2:1,1:5}
>>> sorted(x.items(), key=lambda kv: kv[1])  # Items returns a list of `(key,value)`-pairs
[(2, 1), (3, 2), (1, 5)]

Question 29

You’re dealing with a python list, and sorting it is as easy as doing this.

my_list = ['Stem', 'constitute', 'Sedge', 'Eflux', 'Whim', 'Intrigue']
my_list.sort()

Question 30

You can use built-in sorted function.

print sorted(['Stem', 'constitute', 'Sedge', 'Eflux', 'Whim', 'Intrigue'])

Question 31

ListName.sort() will sort it alphabetically. You can add reverse=False/True in the brackets to reverse the order of items: ListName.sort(reverse=False)

Question 32

>>> a = ()
>>> type(a)
<type 'tuple'>
>>> a = []
>>> type(a)
<type 'list'>
>>> a = {}
>>> type(a)
<type 'dict'>
>>> a =  ['Stem', 'constitute', 'Sedge', 'Eflux', 'Whim', 'Intrigue'] 
>>> a.sort()
>>> a
['Eflux', 'Intrigue', 'Sedge', 'Stem', 'Whim', 'constitute']
>>>

Question 33

Given a list of integers, I want to find which number is the closest to a number I give in input:

>>> myList = [4, 1, 88, 44, 3]
>>> myNumber = 5
>>> takeClosest(myList, myNumber)
...
4

Is there any quick way to do this?

Question 34

If we are not sure that the list is sorted, we could use the built-in min() function, to find the element which has the minimum distance from the specified number.

>>> min(myList, key=lambda x:abs(x-myNumber))
4

Note that it also works with dicts with int keys, like {1: "a", 2: "b"}. This method takes O(n) time.

If the list is already sorted, or you could pay the price of sorting the array once only, use the bisection method illustrated in @Lauritz’s answer which only takes O(log n) time (note however checking if a list is already sorted is O(n) and sorting is O(n log n).)

Question 35

I’ll rename the function take_closest to conform with PEP8 naming conventions.

If you mean quick-to-execute as opposed to quick-to-write, min should not be your weapon of choice, except in one very narrow use case. The min solution needs to examine every number in the list and do a calculation for each number. Using bisect.bisect_left instead is almost always faster.

The “almost” comes from the fact that bisect_left requires the list to be sorted to work. Hopefully, your use case is such that you can sort the list once and then leave it alone. Even if not, as long as you don’t need to sort before every time you call take_closest, the bisect module will likely come out on top. If you’re in doubt, try both and look at the real-world difference.

from bisect import bisect_left

def take_closest(myList, myNumber):
    """
    Assumes myList is sorted. Returns closest value to myNumber.

    If two numbers are equally close, return the smallest number.
    """
    pos = bisect_left(myList, myNumber)
    if pos == 0:
        return myList[0]
    if pos == len(myList):
        return myList[-1]
    before = myList[pos - 1]
    after = myList[pos]
    if after - myNumber < myNumber - before:
       return after
    else:
       return before

Bisect works by repeatedly halving a list and finding out which half myNumber has to be in by looking at the middle value. This means it has a running time of O(log n) as opposed to the O(n) running time of the highest voted answer. If we compare the two methods and supply both with a sorted myList, these are the results:

$ python -m timeit -s "
from closest import take_closest
from random import randint
a = range(-1000, 1000, 10)" "take_closest(a, randint(-1100, 1100))"

100000 loops, best of 3: 2.22 usec per loop

$ python -m timeit -s "
from closest import with_min
from random import randint
a = range(-1000, 1000, 10)" "with_min(a, randint(-1100, 1100))"

10000 loops, best of 3: 43.9 usec per loop

So in this particular test, bisect is almost 20 times faster. For longer lists, the difference will be greater.

What if we level the playing field by removing the precondition that myList must be sorted? Let’s say we sort a copy of the list every time take_closest is called, while leaving the min solution unaltered. Using the 200-item list in the above test, the bisect solution is still the fastest, though only by about 30%.

This is a strange result, considering that the sorting step is O(n log(n))! The only reason min is still losing is that the sorting is done in highly optimalized c code, while min has to plod along calling a lambda function for every item. As myList grows in size, the min solution will eventually be faster. Note that we had to stack everything in its favour for the min solution to win.

Question 36

>>> takeClosest = lambda num,collection:min(collection,key=lambda x:abs(x-num))
>>> takeClosest(5,[4,1,88,44,3])
4

A lambda is a special way of writing an “anonymous” function (a function that doesn’t have a name). You can assign it any name you want because a lambda is an expression.

The “long” way of writing the the above would be:

def takeClosest(num,collection):
   return min(collection,key=lambda x:abs(x-num))

Question 37

def closest(list, Number):
    aux = []
    for valor in list:
        aux.append(abs(Number-valor))

    return aux.index(min(aux))

This code will give you the index of the closest number of Number in the list.

The solution given by KennyTM is the best overall, but in the cases you cannot use it (like brython), this function will do the work

Question 38

Iterate over the list and compare the current closest number with abs(currentNumber - myNumber):

def takeClosest(myList, myNumber):
    closest = myList[0]
    for i in range(1, len(myList)):
        if abs(i - myNumber) < closest:
            closest = i
    return closest

Question 39

It’s important to note that Lauritz’s suggestion idea of using bisect does not actually find the closest value in MyList to MyNumber. Instead, bisect finds the next value in order after MyNumber in MyList. So in OP’s case you’d actually get the position of 44 returned instead of the position of 4.

>>> myList = [1, 3, 4, 44, 88] 
>>> myNumber = 5
>>> pos = (bisect_left(myList, myNumber))
>>> myList[pos]
...
44

To get the value that’s closest to 5 you could try converting the list to an array and using argmin from numpy like so.

>>> import numpy as np
>>> myNumber = 5   
>>> myList = [1, 3, 4, 44, 88] 
>>> myArray = np.array(myList)
>>> pos = (np.abs(myArray-myNumber)).argmin()
>>> myArray[pos]
...
4

I don’t know how fast this would be though, my guess would be “not very”.

Question 40

Expanding upon Gustavo Lima’s answer. The same thing can be done without creating an entirely new list. The values in the list can be replaced with the differentials as the FOR loop progresses.

def f_ClosestVal(v_List, v_Number):
"""Takes an unsorted LIST of INTs and RETURNS INDEX of value closest to an INT"""
for _index, i in enumerate(v_List):
    v_List[_index] = abs(v_Number - i)
return v_List.index(min(v_List))

myList = [1, 88, 44, 4, 4, -2, 3]
v_Num = 5
print(f_ClosestVal(myList, v_Num)) ## Gives "3," the index of the first "4" in the list.

Question 41

If I may add to @Lauritz’s answer

In order not to have a run error don’t forget to add a condition before the bisect_left line:

if (myNumber > myList[-1] or myNumber < myList[0]):
    return False

so the full code will look like:

from bisect import bisect_left

def takeClosest(myList, myNumber):
    """
    Assumes myList is sorted. Returns closest value to myNumber.
    If two numbers are equally close, return the smallest number.
    If number is outside of min or max return False
    """
    if (myNumber > myList[-1] or myNumber < myList[0]):
        return False
    pos = bisect_left(myList, myNumber)
    if pos == 0:
            return myList[0]
    if pos == len(myList):
            return myList[-1]
    before = myList[pos - 1]
    after = myList[pos]
    if after - myNumber < myNumber - before:
       return after
    else:
       return before

Question 42

I’ve been able to verify that the findUniqueWords does result in a sorted list. However, it does not return the list. Why?

def findUniqueWords(theList):
    newList = []
    words = []

    # Read a line at a time
    for item in theList:

        # Remove any punctuation from the line
        cleaned = cleanUp(item)

        # Split the line into separate words
        words = cleaned.split()

        # Evaluate each word
        for word in words:

            # Count each unique word
            if word not in newList:
                newList.append(word)

    answer = newList.sort()
    return answer

Question 43

list.sort sorts the list in place, i.e. it doesn’t return a new list. Just write

newList.sort()
return newList

Question 44

The problem is here:

answer = newList.sort()

sort does not return the sorted list; rather, it sorts the list in place.

Use:

answer = sorted(newList)

Question 45

Here is an email from Guido van Rossum in Python’s dev list explaining why he choose not to return self on operations that affects the object and don’t return a new one.

This comes from a coding style (popular in various other languages, I believe especially Lisp revels in it) where a series of side effects on a single object can be chained like this:
 x.compress().chop(y).sort(z)
which would be the same as
  x.compress()
  x.chop(y)
  x.sort(z)
I find the chaining form a threat to readability; it requires that the reader must be intimately familiar with each of the methods. The second form makes it clear that each of these calls acts on the same object, and so even if you don’t know the class and its methods very well, you can understand that the second and third call are applied to x (and that all calls are made for their side-effects), and not to something else.

I’d like to reserve chaining for operations that return new values, like string processing operations:
 y = x.rstrip("\n").split(":").lower()

Question 46

Python habitually returns None from functions and methods that mutate the data, such as list.sort, list.append, and random.shuffle, with the idea being that it hints to the fact that it was mutating.

If you want to take an iterable and return a new, sorted list of its items, use the sorted builtin function.

Question 47

Python has two kinds of sorts: a sort method (or “member function”) and a sort function. The sort method operates on the contents of the object named — think of it as an action that the object is taking to re-order itself. The sort function is an operation over the data represented by an object and returns a new object with the same contents in a sorted order.

Given a list of integers named l the list itself will be reordered if we call l.sort():

>>> l = [1, 5, 2341, 467, 213, 123]
>>> l.sort()
>>> l
[1, 5, 123, 213, 467, 2341]

This method has no return value. But what if we try to assign the result of l.sort()?

>>> l = [1, 5, 2341, 467, 213, 123]
>>> r = l.sort()
>>> print(r)
None

r now equals actually nothing. This is one of those weird, somewhat annoying details that a programmer is likely to forget about after a period of absence from Python (which is why I am writing this, so I don’t forget again).

The function sorted(), on the other hand, will not do anything to the contents of l, but will return a new, sorted list with the same contents as l:

>>> l = [1, 5, 2341, 467, 213, 123]
>>> r = sorted(l)
>>> l
[1, 5, 2341, 467, 213, 123]
>>> r
[1, 5, 123, 213, 467, 2341]

Be aware that the returned value is not a deep copy, so be cautious about side-effecty operations over elements contained within the list as usual:

>>> spam = [8, 2, 4, 7]
>>> eggs = [3, 1, 4, 5]
>>> l = [spam, eggs]
>>> r = sorted(l)
>>> l
[[8, 2, 4, 7], [3, 1, 4, 5]]
>>> r
[[3, 1, 4, 5], [8, 2, 4, 7]]
>>> spam.sort()
>>> eggs.sort()
>>> l
[[2, 4, 7, 8], [1, 3, 4, 5]]
>>> r
[[1, 3, 4, 5], [2, 4, 7, 8]]

Question 48

To understand why it does not return the list:

sort() doesn’t return any value while the sort() method just sorts the elements of a given list in a specific order – ascending or descending without returning any value.

So problem is with answer = newList.sort() where answer is none.

Instead you can just do return newList.sort().

The syntax of the sort() method is:

list.sort(key=..., reverse=...)

Alternatively, you can also use Python’s in-built function sorted() for the same purpose.

sorted(list, key=..., reverse=...)

Note: The simplest difference between sort() and sorted() is: sort() doesn’t return any value while, sorted() returns an iterable list.

So in your case answer = sorted(newList).

Question 49

you can use sorted() method if you want it to return the sorted list. It’s more convenient.

l1 = []
n = int(input())

for i in range(n):
  user = int(input())
  l1.append(user)
sorted(l1,reverse=True)

list.sort() method modifies the list in-place and returns None.

if you still want to use sort you can do this.

l1 = []
n = int(input())

for i in range(n):
  user = int(input())
  l1.append(user)
l1.sort(reverse=True)
print(l1)

Question 50

I don’t quite understand the syntax behind the sorted() argument:

key=lambda variable: variable[0]

Isn’t lambda arbitrary? Why is variable stated twice in what looks like a dict?

Question 51

key is a function that will be called to transform the collection’s items before they are compared. The parameter passed to key must be something that is callable.

The use of lambda creates an anonymous function (which is callable). In the case of sorted the callable only takes one parameters. Python’s lambda is pretty simple. It can only do and return one thing really.

The syntax of lambda is the word lambda followed by the list of parameter names then a single block of code. The parameter list and code block are delineated by colon. This is similar to other constructs in python as well such as while, for, if and so on. They are all statements that typically have a code block. Lambda is just another instance of a statement with a code block.

We can compare the use of lambda with that of def to create a function.

adder_lambda = lambda parameter1,parameter2: parameter1+parameter2
def adder_regular(parameter1, parameter2): return parameter1+parameter2

lambda just gives us a way of doing this without assigning a name. Which makes it great for using as a parameter to a function.

variable is used twice here because on the left hand of the colon it is the name of a parameter and on the right hand side it is being used in the code block to compute something.

Question 52

I think all of the answers here cover the core of what the lambda function does in the context of sorted() quite nicely, however I still feel like a description that leads to an intuitive understanding is lacking, so here is my two cents.

For the sake of completeness, I’ll state the obvious up front: sorted() returns a list of sorted elements and if we want to sort in a particular way or if we want to sort a complex list of elements (e.g. nested lists or a list of tuples) we can invoke the key argument.

For me, the intuitive understanding of the key argument, why it has to be callable, and the use of lambda as the (anonymous) callable function to accomplish this comes in two parts.

Using lamba ultimately means you don’t have to write (define) an entire function, like the one sblom provided an example of. Lambda functions are created, used, and immediately destroyed – so they don’t funk up your code with more code that will only ever be used once. This, as I understand it, is the core utility of the lambda function and its applications for such roles are broad. Its syntax is purely by convention, which is in essence the nature of programmatic syntax in general. Learn the syntax and be done with it.

Lambda syntax is as follows:

lambda input_variable(s): tasty one liner

e.g.

In [1]: f00 = lambda x: x/2

In [2]: f00(10)
Out[2]: 5.0

In [3]: (lambda x: x/2)(10)
Out[3]: 5.0

In [4]: (lambda x, y: x / y)(10, 2)
Out[4]: 5.0

In [5]: (lambda: 'amazing lambda')() # func with no args!
Out[5]: 'amazing lambda'

The idea behind the key argument is that it should take in a set of instructions that will essentially point the ‘sorted()’ function at those list elements which should used to sort by. When it says key=, what it really means is: As I iterate through the list one element at a time (i.e. for e in list), I’m going to pass the current element to the function I provide in the key argument and use that to create a transformed list which will inform me on the order of final sorted list.

Check it out:

mylist = [3,6,3,2,4,8,23]
sorted(mylist, key=WhatToSortBy)

Base example:

sorted(mylist)

[2, 3, 3, 4, 6, 8, 23] # all numbers are in order from small to large.

Example 1:

mylist = [3,6,3,2,4,8,23]
sorted(mylist, key=lambda x: x%2==0)

[3, 3, 23, 6, 2, 4, 8] # Does this sorted result make intuitive sense to you?

Notice that my lambda function told sorted to check if (e) was even or odd before sorting.

BUT WAIT! You may (or perhaps should) be wondering two things – first, why are my odds coming before my evens (since my key value seems to be telling my sorted function to prioritize evens by using the mod operator in x%2==0). Second, why are my evens out of order? 2 comes before 6 right? By analyzing this result, we’ll learn something deeper about how the sorted() ‘key’ argument works, especially in conjunction with the anonymous lambda function.

Firstly, you’ll notice that while the odds come before the evens, the evens themselves are not sorted. Why is this?? Lets read the docs:

Key Functions Starting with Python 2.4, both list.sort() and sorted() added a key parameter to specify a function to be called on each list element prior to making comparisons.

We have to do a little bit of reading between the lines here, but what this tells us is that the sort function is only called once, and if we specify the key argument, then we sort by the value that key function points us to.

So what does the example using a modulo return? A boolean value: True == 1, False == 0. So how does sorted deal with this key? It basically transforms the original list to a sequence of 1s and 0s.

[3,6,3,2,4,8,23] becomes [0,1,0,1,1,1,0]

Now we’re getting somewhere. What do you get when you sort the transformed list?

[0,0,0,1,1,1,1]

Okay, so now we know why the odds come before the evens. But the next question is: Why does the 6 still come before the 2 in my final list? Well that’s easy – its because sorting only happens once! i.e. Those 1s still represent the original list values, which are in their original positions relative to each other. Since sorting only happens once, and we don’t call any kind of sort function to order the original even values from low to high, those values remain in their original order relative to one another.

The final question is then this: How do I think conceptually about how the order of my boolean values get transformed back in to the original values when I print out the final sorted list?

Sorted() is a built-in method that (fun fact) uses a hybrid sorting algorithm called Timsort that combines aspects of merge sort and insertion sort. It seems clear to me that when you call it, there is a mechanic that holds these values in memory and bundles them with their boolean identity (mask) determined by (…!) the lambda function. The order is determined by their boolean identity calculated from the lambda function, but keep in mind that these sublists (of one’s and zeros) are not themselves sorted by their original values. Hence, the final list, while organized by Odds and Evens, is not sorted by sublist (the evens in this case are out of order). The fact that the odds are ordered is because they were already in order by coincidence in the original list. The takeaway from all this is that when lambda does that transformation, the original order of the sublists are retained.

So how does this all relate back to the original question, and more importantly, our intuition on how we should implement sorted() with its key argument and lambda?

That lambda function can be thought of as a pointer that points to the values we need to sort by, whether its a pointer mapping a value to its boolean transformed by the lambda function, or if its a particular element in a nested list, tuple, dict, etc., again determined by the lambda function.

Lets try and predict what happens when I run the following code.

mylist = [(3, 5, 8), (6, 2, 8), ( 2, 9, 4), (6, 8, 5)]
sorted(mylist, key=lambda x: x[1])

My sorted call obviously says, “Please sort this list”. The key argument makes that a little more specific by saying, for each element (x) in mylist, return index 1 of that element, then sort all of the elements of the original list ‘mylist’ by the sorted order of the list calculated by the lambda function. Since we have a list of tuples, we can return an indexed element from that tuple. So we get:

[(6, 2, 8), (3, 5, 8), (6, 8, 5), (2, 9, 4)]

Run that code, and you’ll find that this is the order. Try indexing a list of integers and you’ll find that the code breaks.

This was a long winded explanation, but I hope this helps to ‘sort’ your intuition on the use of lambda functions as the key argument in sorted() and beyond.

Question 53

lambda is a Python keyword that is used to generate anonymous functions.

>>> (lambda x: x+2)(3)
5

Question 54

The variable left of the : is a parameter name. The use of variable on the right is making use of the parameter.

Means almost exactly the same as:

def some_method(variable):
  return variable[0]

Question 55

One more example of usage sorted() function with key=lambda. Let’s consider you have a list of tuples. In each tuple you have a brand, model and weight of the car and you want to sort this list of tuples by brand, model or weight. You can do it with lambda.

cars = [('citroen', 'xsara', 1100), ('lincoln', 'navigator', 2000), ('bmw', 'x5', 1700)]

print(sorted(cars, key=lambda car: car[0]))
print(sorted(cars, key=lambda car: car[1]))
print(sorted(cars, key=lambda car: car[2]))

Results:

[('bmw', 'x5', '1700'), ('citroen', 'xsara', 1100), ('lincoln', 'navigator', 2000)]
[('lincoln', 'navigator', 2000), ('bmw', 'x5', '1700'), ('citroen', 'xsara', 1100)]
[('citroen', 'xsara', 1100), ('bmw', 'x5', 1700), ('lincoln', 'navigator', 2000)]

Question 56

lambda is an anonymous function, not an arbitrary function. The parameter being accepted would be the variable you’re working with, and the column in which you’re sorting it on.

Question 57

Since the usage of lambda was asked in the context of sorted(), take a look at this as well https://wiki.python.org/moin/HowTo/Sorting/#Key_Functions

Question 58

Just to rephrase, the key (Optional. A Function to execute to decide the order. Default is None) in sorted functions expects a function and you use lambda.

To define lambda, you specify the object property you want to sort and python’s built-in sorted function will automatically take care of it.

If you want to sort by multiple properties then assign key = lambda x: (property1, property2).

To specify order-by, pass reverse= true as the third argument(Optional. A Boolean. False will sort ascending, True will sort descending. Default is False) of sorted function.

Question 59

Simple and not time consuming answer with an example relevant to the question asked Follow this example:

 user = [{"name": "Dough", "age": 55}, 
            {"name": "Ben", "age": 44}, 
            {"name": "Citrus", "age": 33},
            {"name": "Abdullah", "age":22},
            ]
    print(sorted(user, key=lambda el: el["name"]))
    print(sorted(user, key= lambda y: y["age"]))

Look at the names in the list, they starts with D, B, C and A. And if you notice the ages, they are 55, 44, 33 and 22. The first print code

print(sorted(user, key=lambda el: el["name"]))

Results to:

[{'name': 'Abdullah', 'age': 22}, 
{'name': 'Ben', 'age': 44}, 
{'name': 'Citrus', 'age': 33}, 
{'name': 'Dough', 'age': 55}]

sorts the name, because by key=lambda el: el[“name”] we are sorting the names and the names return in alphabetical order.

The second print code

print(sorted(user, key= lambda y: y["age"]))

Result:

[{'name': 'Abdullah', 'age': 22},
 {'name': 'Citrus', 'age': 33},
 {'name': 'Ben', 'age': 44}, 
 {'name': 'Dough', 'age': 55}]

sorts by age, and hence the list returns by ascending order of age.

Try this code for better understanding.

Question 60

Say I have two lists:

list1 = [3, 2, 4, 1, 1]
list2 = ['three', 'two', 'four', 'one', 'one2']

If I run list1.sort(), it’ll sort it to [1,1,2,3,4] but is there a way to get list2 in sync as well (so I can say item 4 belongs to 'three')? So, the expected output would be:

list1 = [1, 1, 2, 3, 4]
list2 = ['one', 'one2', 'two', 'three', 'four']

My problem is I have a pretty complex program that is working fine with lists but I sort of need to start referencing some data. I know this is a perfect situation for dictionaries but I’m trying to avoid dictionaries in my processing because I do need to sort the key values (if I must use dictionaries I know how to use them).

Basically the nature of this program is, the data comes in a random order (like above), I need to sort it, process it and then send out the results (order doesn’t matter but users need to know which result belongs to which key). I thought about putting it in a dictionary first, then sorting list one but I would have no way of differentiating of items in the with the same value if order is not maintained (it may have an impact when communicating the results to users). So ideally, once I get the lists I would rather figure out a way to sort both lists together. Is this possible?

Question 61

One classic approach to this problem is to use the “decorate, sort, undecorate” idiom, which is especially simple using python’s built-in zip function:

>>> list1 = [3,2,4,1, 1]
>>> list2 = ['three', 'two', 'four', 'one', 'one2']
>>> list1, list2 = zip(*sorted(zip(list1, list2)))
>>> list1
(1, 1, 2, 3, 4)
>>> list2 
('one', 'one2', 'two', 'three', 'four')

These of course are no longer lists, but that’s easily remedied, if it matters:

>>> list1, list2 = (list(t) for t in zip(*sorted(zip(list1, list2))))
>>> list1
[1, 1, 2, 3, 4]
>>> list2
['one', 'one2', 'two', 'three', 'four']

It’s worth noting that the above may sacrifice speed for terseness; the in-place version, which takes up 3 lines, is a tad faster on my machine for small lists:

>>> %timeit zip(*sorted(zip(list1, list2)))
100000 loops, best of 3: 3.3 us per loop
>>> %timeit tups = zip(list1, list2); tups.sort(); zip(*tups)
100000 loops, best of 3: 2.84 us per loop

On the other hand, for larger lists, the one-line version could be faster:

>>> %timeit zip(*sorted(zip(list1, list2)))
100 loops, best of 3: 8.09 ms per loop
>>> %timeit tups = zip(list1, list2); tups.sort(); zip(*tups)
100 loops, best of 3: 8.51 ms per loop

As Quantum7 points out, JSF’s suggestion is a bit faster still, but it will probably only ever be a little bit faster, because Python uses the very same DSU idiom internally for all key-based sorts. It’s just happening a little closer to the bare metal. (This shows just how well optimized the zip routines are!)

I think the zip-based approach is more flexible and is a little more readable, so I prefer it.

Question 62

You can sort indexes using values as keys:

indexes = range(len(list1))
indexes.sort(key=list1.__getitem__)

To get sorted lists given sorted indexes:

sorted_list1 = map(list1.__getitem__, indexes)
sorted_list2 = map(list2.__getitem__, indexes)

In your case you shouldn’t have list1, list2 but rather a single list of pairs:

data = [(3, 'three'), (2, 'two'), (4, 'four'), (1, 'one'), (1, 'one2')]

It is easy to create; it is easy to sort in Python:

data.sort() # sort using a pair as a key

Sort by the first value only:

data.sort(key=lambda pair: pair[0])

Question 63

I have used the answer given by senderle for a long time until I discovered np.argsort. Here is how it works.

# idx works on np.array and not lists.
list1 = np.array([3,2,4,1])
list2 = np.array(["three","two","four","one"])
idx   = np.argsort(list1)

list1 = np.array(list1)[idx]
list2 = np.array(list2)[idx]

I find this solution more intuitive, and it works really well. The perfomance:

def sorting(l1, l2):
    # l1 and l2 has to be numpy arrays
    idx = np.argsort(l1)
    return l1[idx], l2[idx]

# list1 and list2 are np.arrays here...
%timeit sorting(list1, list2)
100000 loops, best of 3: 3.53 us per loop

# This works best when the lists are NOT np.array
%timeit zip(*sorted(zip(list1, list2)))
100000 loops, best of 3: 2.41 us per loop

# 0.01us better for np.array (I think this is negligible)
%timeit tups = zip(list1, list2); tups.sort(); zip(*tups)
100000 loops, best for 3 loops: 1.96 us per loop

Even though np.argsort isn’t the fastest one, I find it easier to use.

Question 64

Schwartzian transform. The built-in Python sorting is stable, so the two 1s don’t cause a problem.

>>> l1 = [3, 2, 4, 1, 1]
>>> l2 = ['three', 'two', 'four', 'one', 'second one']
>>> zip(*sorted(zip(l1, l2)))
[(1, 1, 2, 3, 4), ('one', 'second one', 'two', 'three', 'four')]

问题：使用Python / NumPy对数组中的项目进行排名，而无需对数组进行两次排序

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：如何在Python 3中使用自定义比较功能？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：熊猫groupby排序

回答 0

回答 1

回答 2

回答 3

试试这个代替

执行“ groupby”并按降序排序的简单方法

Try this Instead

simple way to do ‘groupby’ and sorting in descending order

回答 4

问题：Python数据结构按字母顺序排序列表

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：从整数列表中，获取最接近给定值的数字

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：为什么“ return list.sort（）”返回None，而不返回列表？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：sorted（key = lambda：…）后面的语法

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：如何以完全相同的方式对两个列表（相互引用）进行排序

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

问题：检查列表是否已排序的Python方法

回答 0

回答 1