Python 实用宝典

Question 1

How can I use a Python list (e.g. params = ['a',3.4,None]) as parameters to a function, e.g.:

def some_func(a_char,a_float,a_something):
   # do stuff

Question 2

You can do this using the splat operator:

some_func(*params)

This causes the function to receive each list item as a separate parameter. There’s a description here: http://docs.python.org/tutorial/controlflow.html#unpacking-argument-lists

Question 3

This has already been answered perfectly, but since I just came to this page and did not understand immediately I am just going to add a simple but complete example.

def some_func(a_char, a_float, a_something):
    print a_char

params = ['a', 3.4, None]
some_func(*params)

>> a

Question 4

Use an asterisk:

some_func(*params)

Question 5

You want the argument unpacking operator *.

Question 6

I want to find mean and standard deviation of 1st, 2nd,… digits of several (Z) lists. For example, I have

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

Now I want to take the mean and std of *_Rank[0], the mean and std of *_Rank[1], etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit…; etc).

Question 7

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stdev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

Question 8

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

Question 9

Here’s some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4+.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I’ve used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1

Question 10

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

Population std: Just use numpy.std() with no additional arguments besides to your data list.
Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N – ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.

Question 11

In python 2.7 you can use NumPy’s numpy.std() gives the population standard deviation.

In Python 3.4 statistics.stdev() returns the sample standard deviation. The pstdv() function is the same as numpy.std().

Question 12

Using python, here are few methods:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

Approach1 – using a function

stdev = st.pstdev(data)

Approach2: calculate variance and take square root of it

variance = st.pvariance(data)
devia = math.sqrt(variance)

Approach3: using basic math

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

Note:

variance calculates variance of sample population
pvariance calculates variance of entire population
similar differences between stdev and pstdev

Question 13

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

Question 14

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you’ve described.

I’m going to assume A-Z is the entire population. If not see Ome‘s answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]]

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

Question 15

I want to check if my list of objects contain an object with a certain attribute value.

class Test:
    def __init__(self, name):
        self.name = name

# in main()
l = []
l.append(Test("t1"))
l.append(Test("t2"))
l.append(Test("t2"))

I want a way of checking if list contains an object with name "t1" for example. How can it be done? I found https://stackoverflow.com/a/598415/292291,

[x for x in myList if x.n == 30]               # list of all matches
any(x.n == 30 for x in myList)                 # if there is any matches
[i for i,x in enumerate(myList) if x.n == 30]  # indices of all matches

def first(iterable, default=None):
    for item in iterable:
        return item
    return default

first(x for x in myList if x.n == 30)          # the first match, if any

I don’t want to go through the whole list every time, I just need to know if there’s 1 instance which matches. Will first(...) or any(...) or something else do that?

Question 16

As you can easily see from the documentation, the any() function short-circuits an returns True as soon as a match has been found.

any(x.name == "t2" for x in l)

Question 17

I have two lists as below

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

I want to extract entries from entries when they are in tags:

result = []

for tag in tags:
    for entry in entries:
        if tag in entry:
            result.extend(entry)

How can I write the two loops as a single line list comprehension?

Question 18

This should do it:

[entry for tag in tags for entry in entries if tag in entry]

Question 19

The best way to remember this is that the order of for loop inside the list comprehension is based on the order in which they appear in traditional loop approach. Outer most loop comes first, and then the inner loops subsequently.

So, the equivalent list comprehension would be:

[entry for tag in tags for entry in entries if tag in entry]

In general, if-else statement comes before the first for loop, and if you have just an if statement, it will come at the end. For e.g, if you would like to add an empty list, if tag is not in entry, you would do it like this:

[entry if tag in entry else [] for tag in tags for entry in entries]

Question 20

The appropriate LC would be

[entry for tag in tags for entry in entries if tag in entry]

The order of the loops in the LC is similar to the ones in nested loops, the if statements go to the end and the conditional expressions go in the beginning, something like

[a if a else b for a in sequence]

See the Demo –

>>> tags = [u'man', u'you', u'are', u'awesome']
>>> entries = [[u'man', u'thats'],[ u'right',u'awesome']]
>>> [entry for tag in tags for entry in entries if tag in entry]
[[u'man', u'thats'], [u'right', u'awesome']]
>>> result = []
    for tag in tags:
        for entry in entries:
            if tag in entry:
                result.append(entry)


>>> result
[[u'man', u'thats'], [u'right', u'awesome']]

EDIT – Since, you need the result to be flattened, you could use a similar list comprehension and then flatten the results.

>>> result = [entry for tag in tags for entry in entries if tag in entry]
>>> from itertools import chain
>>> list(chain.from_iterable(result))
[u'man', u'thats', u'right', u'awesome']

Adding this together, you could just do

>>> list(chain.from_iterable(entry for tag in tags for entry in entries if tag in entry))
[u'man', u'thats', u'right', u'awesome']

You use a generator expression here instead of a list comprehension. (Perfectly matches the 79 character limit too (without the list call))

Question 21

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

result = []
[result.extend(entry) for tag in tags for entry in entries if tag in entry]

print(result)

Output:

['man', 'thats', 'right', 'awesome']

Question 22

In comprehension, the nested lists iteration should follow the same order than the equivalent imbricated for loops.

To understand, we will take a simple example from NLP. You want to create a list of all words from a list of sentences where each sentence is a list of words.

>>> list_of_sentences = [['The','cat','chases', 'the', 'mouse','.'],['The','dog','barks','.']]
>>> all_words = [word for sentence in list_of_sentences for word in sentence]
>>> all_words
['The', 'cat', 'chases', 'the', 'mouse', '.', 'The', 'dog', 'barks', '.']

To remove the repeated words, you can use a set {} instead of a list []

>>> all_unique_words = list({word for sentence in list_of_sentences for word in sentence}]
>>> all_unique_words
['.', 'dog', 'the', 'chase', 'barks', 'mouse', 'The', 'cat']

or apply list(set(all_words))

>>> all_unique_words = list(set(all_words))
['.', 'dog', 'the', 'chases', 'barks', 'mouse', 'The', 'cat']

Question 23

return=[entry for tag in tags for entry in entries if tag in entry for entry in entry]

Question 24

I’ve got:

words = ['hello', 'world', 'you', 'look', 'nice']

I want to have:

'"hello", "world", "you", "look", "nice"'

What’s the easiest way to do this with Python?

Question 25

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> ', '.join('"{0}"'.format(w) for w in words)
'"hello", "world", "you", "look", "nice"'

Question 26

you may also perform a single format call

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> '"{0}"'.format('", "'.join(words))
'"hello", "world", "you", "look", "nice"'

Update: Some benchmarking (performed on a 2009 mbp):

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.32559704780578613

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(words))""").timeit(1000)
0.018904924392700195

So it seems that format is actually quite expensive

Update 2: following @JCode’s comment, adding a map to ensure that join will work, Python 2.7.12

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.08646488189697266

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.04855608940124512

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.17348504066467285

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.06372308731079102

Question 27

You can try this :

str(words)[1:-1]

Question 28

>>> ', '.join(['"%s"' % w for w in words])

Question 29

An updated version of @jamylak answer with F Strings (for python 3.6+), I’ve used backticks for a string used for a SQL script.

keys = ['foo', 'bar' , 'omg']
', '.join(f'`{k}`' for k in keys)
# result: '`foo`, `bar`, `omg`'

Question 30

给定list ['a','ab','abc','bac']，我想计算一个包含字符串的列表'ab'。即结果是['ab','abc']。如何在Python中完成？

Question 31

Given the list ['a','ab','abc','bac'], I want to compute a list with strings that have 'ab' in them. I.e. the result is ['ab','abc']. How can this be done in Python?

Question 32

使用Python，可以通过多种方式实现这种简单的过滤。最好的方法是使用“列表推导”，如下所示：

>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']

另一种方法是使用该filter功能。在Python 2中：

>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']

在Python 3中，它返回一个迭代器而不是列表，但是您可以强制转换它：

>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']

尽管使用理解是更好的做法。

Question 33

This simple filtering can be achieved in many ways with Python. The best approach is to use “list comprehensions” as follows:

>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']

Another way is to use the filter function. In Python 2:

>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']

In Python 3, it returns an iterator instead of a list, but you can cast it:

>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']

Though it’s better practice to use a comprehension.

Question 34

[x for x in L if 'ab' in x]

Question 35

[x for x in L if 'ab' in x]

Question 36

# To support matches from the beginning, not any matches:

items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'

filter(lambda x: x.startswith(prefix), items)

Question 37

# To support matches from the beginning, not any matches:

items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'

filter(lambda x: x.startswith(prefix), items)

Question 38

在交互式shell中快速尝试了一下：

>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>

为什么这样做？因为为字符串定义了in运算符，以表示：“是”的子字符串。

另外，您可能要考虑写出循环，而不是使用上面使用的列表理解语法：

l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
   if 'ab' in s:
       result.append(s)

Question 39

Tried this out quickly in the interactive shell:

>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>

Why does this work? Because the in operator is defined for strings to mean: “is substring of”.

Also, you might want to consider writing out the loop as opposed to using the list comprehension syntax used above:

l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
   if 'ab' in s:
       result.append(s)

Question 40

mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

Question 41

mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

Question 42

So I want to create a list which is a sublist of some existing list.

For example,

L = [1, 2, 3, 4, 5, 6, 7], I want to create a sublist li such that li contains all the elements in L at odd positions.

While I can do it by

L = [1, 2, 3, 4, 5, 6, 7]
li = []
count = 0
for i in L:
    if count % 2 == 1:
        li.append(i)
    count += 1

But I want to know if there is another way to do the same efficiently and in fewer number of steps.

Question 43

Solution

Yes, you can:

l = L[1::2]

And this is all. The result will contain the elements placed on the following positions (0-based, so first element is at position 0, second at 1 etc.):

1, 3, 5

so the result (actual numbers) will be:

2, 4, 6

Explanation

The [1::2] at the end is just a notation for list slicing. Usually it is in the following form:

some_list[start:stop:step]

If we omitted start, the default (0) would be used. So the first element (at position 0, because the indexes are 0-based) would be selected. In this case the second element will be selected.

Because the second element is omitted, the default is being used (the end of the list). So the list is being iterated from the second element to the end.

We also provided third argument (step) which is 2. Which means that one element will be selected, the next will be skipped, and so on…

So, to sum up, in this case [1::2] means:

take the second element (which, by the way, is an odd element, if you judge from the index),
skip one element (because we have step=2, so we are skipping one, as a contrary to step=1 which is default),
take the next element,
Repeat steps 2.-3. until the end of the list is reached,

EDIT: @PreetKukreti gave a link for another explanation on Python’s list slicing notation. See here: Explain Python’s slice notation

Extras – replacing counter with `enumerate()`

In your code, you explicitly create and increase the counter. In Python this is not necessary, as you can enumerate through some iterable using enumerate():

for count, i in enumerate(L):
    if count % 2 == 1:
        l.append(i)

The above serves exactly the same purpose as the code you were using:

count = 0
for i in L:
    if count % 2 == 1:
        l.append(i)
    count += 1

More on emulating for loops with counter in Python: Accessing the index in Python ‘for’ loops

Question 44

For the odd positions, you probably want:

>>>> list_ = list(range(10))
>>>> print list_[1::2]
[1, 3, 5, 7, 9]
>>>>

Question 45

I like List comprehensions because of their Math (Set) syntax. So how about this:

L = [1, 2, 3, 4, 5, 6, 7]
odd_numbers = [y for x,y in enumerate(L) if x%2 != 0]
even_numbers = [y for x,y in enumerate(L) if x%2 == 0]

Basically, if you enumerate over a list, you’ll get the index x and the value y. What I’m doing here is putting the value y into the output list (even or odd) and using the index x to find out if that point is odd (x%2 != 0).

Question 46

You can make use of bitwise AND operator &. Let’s see below:

x = [1, 2, 3, 4, 5, 6, 7]
y = [i for i in x if i&1]
>>> 
[1, 3, 5, 7]

Bitwise AND operator is used with 1, and the reason it works because, odd number when written in binary must have its first digit as 1. Let’s check

23 = 1 * (2**4) + 0 * (2**3) + 1 * (2**2) + 1 * (2**1) + 1 * (2**0) = 10111
14 = 1 * (2**3) + 1 * (2**2) + 1 * (2**1) + 0 * (2**0) = 1110

AND operation with 1 will only return 1 (1 in binary will also have last digit 1), iff the value is odd.

Check the Python Bitwise Operator page for more.

P.S: You can tactically use this method if you want to select odd and even columns in a dataframe. Let’s say x and y coordinates of facial key-points are given as columns x1, y1, x2, etc… To normalize the x and y coordinates with width and height values of each image you can simply perform

for i in range(df.shape[1]):
    if i&1:
        df.iloc[:, i] /= heights
    else:
        df.iloc[:, i] /= widths

This is not exactly related to the question but for data scientists and computer vision engineers this method could be useful.

Cheers!

Question 47

list_ = list(range(9)) print(list_[1::2])

Question 48

I’m a bit confused about what can/can’t be used as a key for a python dict.

dicked = {}
dicked[None] = 'foo'     # None ok
dicked[(1,3)] = 'baz'    # tuple ok
import sys
dicked[sys] = 'bar'      # wow, even a module is ok !
dicked[(1,[3])] = 'qux'  # oops, not allowed

So a tuple is an immutable type but if I hide a list inside of it, then it can’t be a key.. couldn’t I just as easily hide a list inside a module?

I had some vague idea that that the key has to be “hashable” but I’m just going to admit my own ignorance about the technical details; I don’t know what’s really going on here. What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

Question 49

There’s a good article on the topic in the Python wiki: Why Lists Can’t Be Dictionary Keys. As explained there:

What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

It can be done without really breaking any of the requirements, but it leads to unexpected behavior. Lists are generally treated as if their value was derived from their content’s values, for instance when checking (in-)equality. Many would – understandably – expect that you can use any list [1, 2] to get the same key, where you’d have to keep around exactly the same list object. But lookup by value breaks as soon as a list used as key is modified, and for lookup by identity requires you to keep around exactly the same list – which isn’t requires for any other common list operation (at least none I can think of).

Other objects such as modules and object make a much bigger deal out of their object identity anyway (when was the last time you had two distinct module objects called sys?), and are compared by that anyway. Therefore, it’s less surprising – or even expected – that they, when used as dict keys, compare by identity in that case as well.

Question 50

Why can’t I use a list as a dict key in python?

>>> d = {repr([1,2,3]): 'value'}
{'[1, 2, 3]': 'value'}

(for anybody who stumbles on this question looking for a way around it)

as explained by others here, indeed you cannot. You can however use its string representation instead if you really want to use your list.

Question 51

Just found you can change List into tuple, then use it as keys.

d = {tuple([1,2,3]): 'value'}

Question 52

The issue is that tuples are immutable, and lists are not. Consider the following

d = {}
li = [1,2,3]
d[li] = 5
li.append(4)

What should d[li] return? Is it the same list? How about d[[1,2,3]]? It has the same values, but is a different list?

Ultimately, there is no satisfactory answer. For example, if the only key that works is the original key, then if you have no reference to that key, you can never again access the value. With every other allowed key, you can construct a key without a reference to the original.

If both of my suggestions work, then you have very different keys that return the same value, which is more than a little surprising. If only the original contents work, then your key will quickly go bad, since lists are made to be modified.

Question 53

Here’s an answer http://wiki.python.org/moin/DictionaryKeys

What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

Looking up different lists with the same contents would produce different results, even though comparing lists with the same contents would indicate them as equivalent.

What about Using a list literal in a dictionary lookup?

Question 54

Your awnser can be found here:

Why Lists Can’t Be Dictionary Keys

Newcomers to Python often wonder why, while the language includes both a tuple and a list type, tuples are usable as a dictionary keys, while lists are not. This was a deliberate design decision, and can best be explained by first understanding how Python dictionaries work.

Source & more info: http://wiki.python.org/moin/DictionaryKeys

Question 55

Because lists are mutable, dict keys (and set members) need to be hashable, and hashing mutable objects is a bad idea because hash values should be computed on the basis of instance attributes.

In this answer, I will give some concrete examples, hopefully adding value on top of the existing answers. Every insight applies to the elements of the set datastructure as well.

Example 1: hashing a mutable object where the hash value is based on a mutable characteristic of the object.

>>> class stupidlist(list):
...     def __hash__(self):
...         return len(self)
... 
>>> stupid = stupidlist([1, 2, 3])
>>> d = {stupid: 0}
>>> stupid.append(4)
>>> stupid
[1, 2, 3, 4]
>>> d
{[1, 2, 3, 4]: 0}
>>> stupid in d
False
>>> stupid in d.keys()
False
>>> stupid in list(d.keys())
True

After mutating stupid, it cannot be found in the dict any longer because the hash changed. Only a linear scan over the list of the dict’s keys finds stupid.

Example 2: … but why not just a constant hash value?

>>> class stupidlist2(list):
...     def __hash__(self):
...         return id(self)
... 
>>> stupidA = stupidlist2([1, 2, 3])
>>> stupidB = stupidlist2([1, 2, 3])
>>> 
>>> stupidA == stupidB
True
>>> stupidA in {stupidB: 0}
False

That’s not a good idea as well because equal objects should hash identically such that you can find them in a dict or set.

Example 3: … ok, what about constant hashes across all instances?!

>>> class stupidlist3(list):
...     def __hash__(self):
...         return 1
... 
>>> stupidC = stupidlist3([1, 2, 3])
>>> stupidD = stupidlist3([1, 2, 3])
>>> stupidE = stupidlist3([1, 2, 3, 4])
>>> 
>>> stupidC in {stupidD: 0}
True
>>> stupidC in {stupidE: 0}
False
>>> d = {stupidC: 0}
>>> stupidC.append(5)
>>> stupidC in d
True

Things seem to work as expected, but think about what’s happening: when all instances of your class produce the same hash value, you will have a hash collision whenever there are more than two instances as keys in a dict or present in a set.

Finding the right instance with my_dict[key] or key in my_dict (or item in my_set) needs to perform as many equality checks as there are instances of stupidlist3 in the dict’s keys (in the worst case). At this point, the purpose of the dictionary – O(1) lookup – is completely defeated. This is demonstrated in the following timings (done with IPython).

Some Timings for Example 3

>>> lists_list = [[i]  for i in range(1000)]
>>> stupidlists_set = {stupidlist3([i]) for i in range(1000)}
>>> tuples_set = {(i,) for i in range(1000)}
>>> l = [999]
>>> s = stupidlist3([999])
>>> t = (999,)
>>> 
>>> %timeit l in lists_list
25.5 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit s in stupidlists_set
38.5 µs ± 61.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit t in tuples_set
77.6 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

As you can see, the membership test in our stupidlists_set is even slower than a linear scan over the whole lists_list, while you have the expected super fast lookup time (factor 500) in a set without loads of hash collisions.

TL; DR: you can use tuple(yourlist) as dict keys, because tuples are immutable and hashable.

Question 56

The simple answer to your question is that the class list does not implement the method hash which is required for any object which wishes to be used as a key in a dictionary. However the reason why hash is not implemented the same way it is in say the tuple class (based on the content of the container) is because a list is mutable so editing the list would require the hash to be recalculated which may mean the list in now located in the wrong bucket within the underling hash table. Note that since you cannot modify a tuple (immutable) it doesn’t run into this problem.

As a side note, the actual implementation of the dictobjects lookup is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. If you have that book available to you it might be a worthwhile read, in addition if you’re really, really interested you may like to take a peek at the developer comments on the actual implementation of dictobject here. It goes into great detail as to exactly how it works. There is also a python lecture on the implementation of dictionaries which you may be interested in. They go through the definition of a key and what a hash is in the first few minutes.

Question 57

According to the Python 2.7.2 documentation:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() or cmp() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

A tuple is immutable in the sense that you cannot add, remove or replace its elements, but the elements themselves may be mutable. List’s hash value depends on the hash values of its elements, and so it changes when you change the elements.

Using id’s for list hashes would imply that all lists compare differently, which would be surprising and inconvenient.

Question 58

A Dictionary is a HashMap it stores map of your keys, value converted to a hashed new key and value mapping.

something like (psuedo code):

{key : val}  
hash(key) = val

If you are wondering which are available options that can be used as key for your dictionary. Then

anything which is hashable(can be converted to hash, and hold static value i.e immutable so as to make a hashed key as stated above) is eligible but as list or set objects can be vary on the go so hash(key) should also needs to vary just to be in sync with your list or set.

You can try :

hash(<your key here>)

If it works fine it can be used as key for your dictionary or else convert it to something hashable.

Inshort :

Convert that list to tuple(<your list>).
Convert that list to str(<your list>).

Question 59

dict keys need to be hashable. Lists are Mutable and they do not provide a valid hash method.

Question 60

Suppose I have ;

LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays

I try to convert;

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5])

I am solving it by iteration on vstack right now but it is really slow for especially large LIST

What do you suggest for the best efficient way?

Question 61

In general you can concatenate a whole sequence of arrays along any axis:

numpy.concatenate( LIST, axis=0 )

but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3×5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.

As Jorge’s answer points out, there is also the function stack, introduced in numpy 1.10:

numpy.stack( LIST, axis=0 )

This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n-element 1D array becomes a 1-by-n 2D array) before concatenating. It will only work if all the input arrays have the same shape—even along the axis of concatenation.

vstack (or equivalently row_stack) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:

numpy.vstack( LIST )

This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ] (note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST.

There is also an analogous function column_stack and shortcut c_[...], for horizontal (column-wise) stacking, as well as an almost-analogous function hstack—although for some reason the latter is less flexible (it is stricter about input arrays’ dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).

Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:

numpy.array( LIST )

…because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.

Question 62

Starting in NumPy version 1.10, we have the method stack. It can stack arrays of any dimension (all equal):

# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]

# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True

M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)

# This are all true    
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True

Enjoy,

Question 63

I checked some of the methods for speed performance and find that there is no difference! The only difference is that using some methods you must carefully check dimension.

Timing:

|------------|----------------|-------------------|
|            | shape (10000)  |  shape (1,10000)  |
|------------|----------------|-------------------|
| np.concat  |    0.18280     |      0.17960      |
|------------|----------------|-------------------|
|  np.stack  |    0.21501     |      0.16465      |
|------------|----------------|-------------------|
| np.vstack  |    0.21501     |      0.17181      |
|------------|----------------|-------------------|
|  np.array  |    0.21656     |      0.16833      |
|------------|----------------|-------------------|

As you can see I tried 2 experiments – using np.random.rand(10000) and np.random.rand(1, 10000) And if we use 2d arrays than np.stack and np.array create additional dimension – result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.

Code:

from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
    new_np = np.random.rand(10000)
    l.append(new_np)



start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')

start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')

start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')

start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')

Question 64

I need to iterate over a circular list, possibly many times, each time starting with the last visited item.

The use case is a connection pool. A client asks for connection, an iterator checks if pointed-to connection is available and returns it, otherwise loops until it finds one that is available.

Is there a neat way to do it in Python?

问题：Python-使用列表作为函数参数

回答 0

回答 1

回答 2

回答 3

问题：列表的标准偏差

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

方法1-使用功能

方法2：计算方差并求平方根

方法3：使用基本数学

注意：

Approach1 – using a function

Approach2: calculate variance and take square root of it

Approach3: using basic math

Note:

回答 6

回答 7

问题：检查对象列表是否包含具有特定属性值的对象

回答 0

问题：如何在列表理解Python中构建两个for循环

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：在python中加入字符串列表，并将每个字符串都用引号引起来

回答 0

回答 1

回答 2

回答 3

回答 4

问题：根据内容过滤字符串列表

回答 0

回答 1

回答 2

回答 3

回答 4

问题：在奇数位置提取列表元素

回答 0

解

说明

额外功能-以取代柜台 enumerate()

Solution

Explanation

Extras – replacing counter with enumerate()

回答 1

回答 2

回答 3

回答 4

问题：为什么我不能在Python中使用列表作为字典键？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：如何将numpy数组列表转换为单个numpy数组？

回答 0

回答 1

回答 2

问题：Python中的循环列表迭代器

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

有趣好用的Python教程

额外功能-以取代柜台 `enumerate()`

Extras – replacing counter with `enumerate()`