标签归档:list

Python-使用列表作为函数参数

问题:Python-使用列表作为函数参数

如何使用Python列表(例如params = ['a',3.4,None])作为函数的参数,例如:

def some_func(a_char,a_float,a_something):
   # do stuff

How can I use a Python list (e.g. params = ['a',3.4,None]) as parameters to a function, e.g.:

def some_func(a_char,a_float,a_something):
   # do stuff

回答 0

您可以使用splat运算符执行此操作:

some_func(*params)

这使函数将每个列表项作为单独的参数接收。这里有一个描述:http : //docs.python.org/tutorial/controlflow.html#unpacking-argument-lists

You can do this using the splat operator:

some_func(*params)

This causes the function to receive each list item as a separate parameter. There’s a description here: http://docs.python.org/tutorial/controlflow.html#unpacking-argument-lists


回答 1

这已经得到了完美的答案,但是由于我刚进入本页并且不立即理解,所以我将添加一个简单但完整的示例。

def some_func(a_char, a_float, a_something):
    print a_char

params = ['a', 3.4, None]
some_func(*params)

>> a

This has already been answered perfectly, but since I just came to this page and did not understand immediately I am just going to add a simple but complete example.

def some_func(a_char, a_float, a_something):
    print a_char

params = ['a', 3.4, None]
some_func(*params)

>> a

回答 2

使用星号:

some_func(*params)

Use an asterisk:

some_func(*params)

回答 3

您需要参数解包运算符*。


列表的标准偏差

问题:列表的标准偏差

我想找到几个(Z)列表的第一,第二,…个数字的均值和标准差。例如,我有

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

现在,我要获取的均值和std *_Rank[0],的均值和std *_Rank[1]
(即:所有(A..Z)_rank列表中第一个数字
的均值和std;来自的第二个数字的均值和std所有(A..Z)_rank列表;
第三个数字的均值和std …;等等)。

I want to find mean and standard deviation of 1st, 2nd,… digits of several (Z) lists. For example, I have

A_rank=[0.8,0.4,1.2,3.7,2.6,5.8]
B_rank=[0.1,2.8,3.7,2.6,5,3.4]
C_Rank=[1.2,3.4,0.5,0.1,2.5,6.1]
# etc (up to Z_rank )...

Now I want to take the mean and std of *_Rank[0], the mean and std of *_Rank[1], etc.
(ie: mean and std of the 1st digit from all the (A..Z)_rank lists;
the mean and std of the 2nd digit from all the (A..Z)_rank lists;
the mean and std of the 3rd digit…; etc).


回答 0

从Python 3.4 / PEP450开始statistics module,标准库中提供了一个,该库提供了一种stdev用于计算像您这样的可迭代对象的标准偏差的方法

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

Since Python 3.4 / PEP450 there is a statistics module in the standard library, which has a method stdev for calculating the standard deviation of iterables like yours:

>>> A_rank = [0.8, 0.4, 1.2, 3.7, 2.6, 5.8]
>>> import statistics
>>> statistics.stdev(A_rank)
2.0634114147853952

回答 1

我将A_Rank等人放入二维NumPy数组中,然后使用numpy.mean()numpy.std()计算均值和标准差:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

I would put A_Rank et al into a 2D NumPy array, and then use numpy.mean() and numpy.std() to compute the means and the standard deviations:

In [17]: import numpy

In [18]: arr = numpy.array([A_rank, B_rank, C_rank])

In [20]: numpy.mean(arr, axis=0)
Out[20]: 
array([ 0.7       ,  2.2       ,  1.8       ,  2.13333333,  3.36666667,
        5.1       ])

In [21]: numpy.std(arr, axis=0)
Out[21]: 
array([ 0.45460606,  1.29614814,  1.37355985,  1.50628314,  1.15566239,
        1.2083046 ])

回答 2

这是一些纯Python代码,可用于计算均值和标准差。

以下所有代码均基于statisticsPython 3.4+中的模块。

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

注意:为提高浮点求和时的准确性,该statistics模块使用了自定义函数,_sum而不是sum我使用的内置函数。

现在我们有例如:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1

Here’s some pure-Python code you can use to calculate the mean and standard deviation.

All code below is based on the statistics module in Python 3.4+.

def mean(data):
    """Return the sample arithmetic mean of data."""
    n = len(data)
    if n < 1:
        raise ValueError('mean requires at least one data point')
    return sum(data)/n # in Python 2 use sum(data)/float(n)

def _ss(data):
    """Return sum of square deviations of sequence data."""
    c = mean(data)
    ss = sum((x-c)**2 for x in data)
    return ss

def stddev(data, ddof=0):
    """Calculates the population standard deviation
    by default; specify ddof=1 to compute the sample
    standard deviation."""
    n = len(data)
    if n < 2:
        raise ValueError('variance requires at least two data points')
    ss = _ss(data)
    pvar = ss/(n-ddof)
    return pvar**0.5

Note: for improved accuracy when summing floats, the statistics module uses a custom function _sum rather than the built-in sum which I’ve used in its place.

Now we have for example:

>>> mean([1, 2, 3])
2.0
>>> stddev([1, 2, 3]) # population standard deviation
0.816496580927726
>>> stddev([1, 2, 3], ddof=1) # sample standard deviation
0.1

回答 3

在Python 2.7.1中,您可以使用numpy.std()以下方法计算标准差:

  • 人口标准:仅使用numpy.std()数据列表之外的其他参数即可。
  • 示例std:您需要将ddof(即Delta自由度)设置为1,如以下示例所示:

numpy.std(<您的列表>,ddof = 1

计算中使用的除数为N-ddof,其中N表示元素数。默认情况下,ddof为零。

它计算样本std而不是总体std。

In Python 2.7.1, you may calculate standard deviation using numpy.std() for:

  • Population std: Just use numpy.std() with no additional arguments besides to your data list.
  • Sample std: You need to pass ddof (i.e. Delta Degrees of Freedom) set to 1, as in the following example:

numpy.std(< your-list >, ddof=1)

The divisor used in calculations is N – ddof, where N represents the number of elements. By default ddof is zero.

It calculates sample std rather than population std.


回答 4

在python 2.7中,您可以使用NumPy numpy.std()给出总体标准差

在Python 3.4中statistics.stdev()返回样本标准偏差。该pstdv()功能是一样的numpy.std()

In python 2.7 you can use NumPy’s numpy.std() gives the population standard deviation.

In Python 3.4 statistics.stdev() returns the sample standard deviation. The pstdv() function is the same as numpy.std().


回答 5

使用python,以下是几种方法:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

方法1-使用功能

stdev = st.pstdev(data)

方法2:计算方差并求平方根

variance = st.pvariance(data)
devia = math.sqrt(variance)

方法3:使用基本数学

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

注意:

  • variance 计算样本总体的方差
  • pvariance 计算整个人口的方差
  • 相似的差异stdevpstdev

Using python, here are few methods:

import statistics as st

n = int(input())
data = list(map(int, input().split()))

Approach1 – using a function

stdev = st.pstdev(data)

Approach2: calculate variance and take square root of it

variance = st.pvariance(data)
devia = math.sqrt(variance)

Approach3: using basic math

mean = sum(data)/n
variance = sum([((x - mean) ** 2) for x in X]) / n
stddev = variance ** 0.5

print("{0:0.1f}".format(stddev))

Note:

  • variance calculates variance of sample population
  • pvariance calculates variance of entire population
  • similar differences between stdev and pstdev

回答 6

纯python代码:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

pure python code:

from math import sqrt

def stddev(lst):
    mean = float(sum(lst)) / len(lst)
    return sqrt(float(reduce(lambda x, y: x + y, map(lambda x: (x - mean) ** 2, lst))) / len(lst))

回答 7

其他答案涵盖了如何在python中充分执行std dev,但没有人解释如何进行您所描述的怪异遍历。

我将假设AZ是整个人口。如果没有,请参阅Ome关于如何从样本推断的答案。

因此,要获得每个列表的第一位数字的标准差/均值,您将需要如下所示:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

为了缩短代码并将其通用化为第n个数字,请使用我为您生成的以下函数:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]] 

现在,您可以像这样简单地从AZ获取所有n个位置的stdd和均值:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

The other answers cover how to do std dev in python sufficiently, but no one explains how to do the bizarre traversal you’ve described.

I’m going to assume A-Z is the entire population. If not see Ome‘s answer on how to inference from a sample.

So to get the standard deviation/mean of the first digit of every list you would need something like this:

#standard deviation
numpy.std([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

#mean
numpy.mean([A_rank[0], B_rank[0], C_rank[0], ..., Z_rank[0]])

To shorten the code and generalize this to any nth digit use the following function I generated for you:

def getAllNthRanks(n):
    return [A_rank[n], B_rank[n], C_rank[n], D_rank[n], E_rank[n], F_rank[n], G_rank[n], H_rank[n], I_rank[n], J_rank[n], K_rank[n], L_rank[n], M_rank[n], N_rank[n], O_rank[n], P_rank[n], Q_rank[n], R_rank[n], S_rank[n], T_rank[n], U_rank[n], V_rank[n], W_rank[n], X_rank[n], Y_rank[n], Z_rank[n]] 

Now you can simply get the stdd and mean of all the nth places from A-Z like this:

#standard deviation
numpy.std(getAllNthRanks(n))

#mean
numpy.mean(getAllNthRanks(n))

检查对象列表是否包含具有特定属性值的对象

问题:检查对象列表是否包含具有特定属性值的对象

我想检查对象列表是否包含具有特定属性值的对象。

class Test:
    def __init__(self, name):
        self.name = name

# in main()
l = []
l.append(Test("t1"))
l.append(Test("t2"))
l.append(Test("t2"))

我想要一种检查列表是否包含名称的对象的方法"t1"。如何做呢?我发现https://stackoverflow.com/a/598415/292291

[x for x in myList if x.n == 30]               # list of all matches
any(x.n == 30 for x in myList)                 # if there is any matches
[i for i,x in enumerate(myList) if x.n == 30]  # indices of all matches

def first(iterable, default=None):
    for item in iterable:
        return item
    return default

first(x for x in myList if x.n == 30)          # the first match, if any

我不想每次都遍历整个列表,我只需要知道是否有1个匹配的实例即可。会first(...)还是any(...)会这样做?

I want to check if my list of objects contain an object with a certain attribute value.

class Test:
    def __init__(self, name):
        self.name = name

# in main()
l = []
l.append(Test("t1"))
l.append(Test("t2"))
l.append(Test("t2"))

I want a way of checking if list contains an object with name "t1" for example. How can it be done? I found https://stackoverflow.com/a/598415/292291,

[x for x in myList if x.n == 30]               # list of all matches
any(x.n == 30 for x in myList)                 # if there is any matches
[i for i,x in enumerate(myList) if x.n == 30]  # indices of all matches

def first(iterable, default=None):
    for item in iterable:
        return item
    return default

first(x for x in myList if x.n == 30)          # the first match, if any

I don’t want to go through the whole list every time, I just need to know if there’s 1 instance which matches. Will first(...) or any(...) or something else do that?


回答 0

文档中您可以很容易地看到,一旦找到匹配项,该any()函数True就会使返回短路。

any(x.name == "t2" for x in l)

As you can easily see from the documentation, the any() function short-circuits an returns True as soon as a match has been found.

any(x.name == "t2" for x in l)

如何在列表理解Python中构建两个for循环

问题:如何在列表理解Python中构建两个for循环

我有两个清单如下

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

我想提取物项从entries当他们在tags

result = []

for tag in tags:
    for entry in entries:
        if tag in entry:
            result.extend(entry)

如何将两个循环写为单行列表理解?

I have two lists as below

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

I want to extract entries from entries when they are in tags:

result = []

for tag in tags:
    for entry in entries:
        if tag in entry:
            result.extend(entry)

How can I write the two loops as a single line list comprehension?


回答 0

应该这样做:

[entry for tag in tags for entry in entries if tag in entry]

This should do it:

[entry for tag in tags for entry in entries if tag in entry]

回答 1

记住这一点的最好方法是,列表理解中for循环的顺序基于它们在传统循环方法中出现的顺序。最外面的循环先到,然后是内部循环。

因此,等效列表理解为:

[entry for tag in tags for entry in entries if tag in entry]

通常,if-else语句位于第一个for循环之前,如果只有一条if语句,它将位于结尾。例如,如果您想添加一个空列表,如果tag没有输入,则可以这样:

[entry if tag in entry else [] for tag in tags for entry in entries]

The best way to remember this is that the order of for loop inside the list comprehension is based on the order in which they appear in traditional loop approach. Outer most loop comes first, and then the inner loops subsequently.

So, the equivalent list comprehension would be:

[entry for tag in tags for entry in entries if tag in entry]

In general, if-else statement comes before the first for loop, and if you have just an if statement, it will come at the end. For e.g, if you would like to add an empty list, if tag is not in entry, you would do it like this:

[entry if tag in entry else [] for tag in tags for entry in entries]

回答 2

适当的LC将是

[entry for tag in tags for entry in entries if tag in entry]

LC中循环的顺序类似于嵌套循环中的顺序,if语句移至末尾,条件表达式移至开始,例如

[a if a else b for a in sequence]

观看演示-

>>> tags = [u'man', u'you', u'are', u'awesome']
>>> entries = [[u'man', u'thats'],[ u'right',u'awesome']]
>>> [entry for tag in tags for entry in entries if tag in entry]
[[u'man', u'thats'], [u'right', u'awesome']]
>>> result = []
    for tag in tags:
        for entry in entries:
            if tag in entry:
                result.append(entry)


>>> result
[[u'man', u'thats'], [u'right', u'awesome']]

编辑 -由于您需要将结果展平,因此可以使用类似的列表理解,然后展平结果。

>>> result = [entry for tag in tags for entry in entries if tag in entry]
>>> from itertools import chain
>>> list(chain.from_iterable(result))
[u'man', u'thats', u'right', u'awesome']

加起来,你可以做

>>> list(chain.from_iterable(entry for tag in tags for entry in entries if tag in entry))
[u'man', u'thats', u'right', u'awesome']

您在此处使用生成器表达式,而不是列表推导。(也完全匹配79个字符的限制(无list呼叫))

The appropriate LC would be

[entry for tag in tags for entry in entries if tag in entry]

The order of the loops in the LC is similar to the ones in nested loops, the if statements go to the end and the conditional expressions go in the beginning, something like

[a if a else b for a in sequence]

See the Demo –

>>> tags = [u'man', u'you', u'are', u'awesome']
>>> entries = [[u'man', u'thats'],[ u'right',u'awesome']]
>>> [entry for tag in tags for entry in entries if tag in entry]
[[u'man', u'thats'], [u'right', u'awesome']]
>>> result = []
    for tag in tags:
        for entry in entries:
            if tag in entry:
                result.append(entry)


>>> result
[[u'man', u'thats'], [u'right', u'awesome']]

EDIT – Since, you need the result to be flattened, you could use a similar list comprehension and then flatten the results.

>>> result = [entry for tag in tags for entry in entries if tag in entry]
>>> from itertools import chain
>>> list(chain.from_iterable(result))
[u'man', u'thats', u'right', u'awesome']

Adding this together, you could just do

>>> list(chain.from_iterable(entry for tag in tags for entry in entries if tag in entry))
[u'man', u'thats', u'right', u'awesome']

You use a generator expression here instead of a list comprehension. (Perfectly matches the 79 character limit too (without the list call))


回答 3

tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

result = []
[result.extend(entry) for tag in tags for entry in entries if tag in entry]

print(result)

输出:

['man', 'thats', 'right', 'awesome']
tags = [u'man', u'you', u'are', u'awesome']
entries = [[u'man', u'thats'],[ u'right',u'awesome']]

result = []
[result.extend(entry) for tag in tags for entry in entries if tag in entry]

print(result)

Output:

['man', 'thats', 'right', 'awesome']

回答 4

理解上,嵌套列表迭代应遵循与forbriced for循环相同的顺序。

为了理解,我们将以NLP为例。您想从句子列表中创建所有单词的列表,其中每个句子都是单词列表。

>>> list_of_sentences = [['The','cat','chases', 'the', 'mouse','.'],['The','dog','barks','.']]
>>> all_words = [word for sentence in list_of_sentences for word in sentence]
>>> all_words
['The', 'cat', 'chases', 'the', 'mouse', '.', 'The', 'dog', 'barks', '.']

要删除重复的单词,可以使用集合{}代替列表[]

>>> all_unique_words = list({word for sentence in list_of_sentences for word in sentence}]
>>> all_unique_words
['.', 'dog', 'the', 'chase', 'barks', 'mouse', 'The', 'cat']

或申请 list(set(all_words))

>>> all_unique_words = list(set(all_words))
['.', 'dog', 'the', 'chases', 'barks', 'mouse', 'The', 'cat']

In comprehension, the nested lists iteration should follow the same order than the equivalent imbricated for loops.

To understand, we will take a simple example from NLP. You want to create a list of all words from a list of sentences where each sentence is a list of words.

>>> list_of_sentences = [['The','cat','chases', 'the', 'mouse','.'],['The','dog','barks','.']]
>>> all_words = [word for sentence in list_of_sentences for word in sentence]
>>> all_words
['The', 'cat', 'chases', 'the', 'mouse', '.', 'The', 'dog', 'barks', '.']

To remove the repeated words, you can use a set {} instead of a list []

>>> all_unique_words = list({word for sentence in list_of_sentences for word in sentence}]
>>> all_unique_words
['.', 'dog', 'the', 'chase', 'barks', 'mouse', 'The', 'cat']

or apply list(set(all_words))

>>> all_unique_words = list(set(all_words))
['.', 'dog', 'the', 'chases', 'barks', 'mouse', 'The', 'cat']

回答 5

return=[entry for tag in tags for entry in entries if tag in entry for entry in entry]
return=[entry for tag in tags for entry in entries if tag in entry for entry in entry]

在python中加入字符串列表,并将每个字符串都用引号引起来

问题:在python中加入字符串列表,并将每个字符串都用引号引起来

我有:

words = ['hello', 'world', 'you', 'look', 'nice']

我希望有:

'"hello", "world", "you", "look", "nice"'

用Python做到这一点最简单的方法是什么?

I’ve got:

words = ['hello', 'world', 'you', 'look', 'nice']

I want to have:

'"hello", "world", "you", "look", "nice"'

What’s the easiest way to do this with Python?


回答 0

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> ', '.join('"{0}"'.format(w) for w in words)
'"hello", "world", "you", "look", "nice"'
>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> ', '.join('"{0}"'.format(w) for w in words)
'"hello", "world", "you", "look", "nice"'

回答 1

您也可以执行一次format通话

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> '"{0}"'.format('", "'.join(words))
'"hello", "world", "you", "look", "nice"'

更新:一些基准测试(以2009 Mbps的速度执行):

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.32559704780578613

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(words))""").timeit(1000)
0.018904924392700195

所以看来format实际上很贵

更新2:在@JCode的注释之后,添加了一个map以确保join可以运行,Python 2.7.12

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.08646488189697266

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.04855608940124512

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.17348504066467285

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.06372308731079102

you may also perform a single format call

>>> words = ['hello', 'world', 'you', 'look', 'nice']
>>> '"{0}"'.format('", "'.join(words))
'"hello", "world", "you", "look", "nice"'

Update: Some benchmarking (performed on a 2009 mbp):

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.32559704780578613

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(words))""").timeit(1000)
0.018904924392700195

So it seems that format is actually quite expensive

Update 2: following @JCode’s comment, adding a map to ensure that join will work, Python 2.7.12

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.08646488189697266

>>> timeit.Timer("""words = ['hello', 'world', 'you', 'look', 'nice'] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.04855608940124512

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; ', '.join('"{0}"'.format(w) for w in words)""").timeit(1000)
0.17348504066467285

>>> timeit.Timer("""words = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] * 100; '"{}"'.format('", "'.join(map(str, words)))""").timeit(1000)
0.06372308731079102

回答 2

您可以尝试以下方法:

str(words)[1:-1]

You can try this :

str(words)[1:-1]

回答 3

>>> ', '.join(['"%s"' % w for w in words])
>>> ', '.join(['"%s"' % w for w in words])

回答 4

@jamylak答案的更新版本带有F字符串(适用于python 3.6+),我已经在SQL脚本使用的字符串中使用了反引号。

keys = ['foo', 'bar' , 'omg']
', '.join(f'`{k}`' for k in keys)
# result: '`foo`, `bar`, `omg`'

An updated version of @jamylak answer with F Strings (for python 3.6+), I’ve used backticks for a string used for a SQL script.

keys = ['foo', 'bar' , 'omg']
', '.join(f'`{k}`' for k in keys)
# result: '`foo`, `bar`, `omg`'

根据内容过滤字符串列表

问题:根据内容过滤字符串列表

给定list ['a','ab','abc','bac'],我想计算一个包含字符串的列表'ab'。即结果是['ab','abc']。如何在Python中完成?

Given the list ['a','ab','abc','bac'], I want to compute a list with strings that have 'ab' in them. I.e. the result is ['ab','abc']. How can this be done in Python?


回答 0

使用Python,可以通过多种方式实现这种简单的过滤。最好的方法是使用“列表推导”,如下所示:

>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']

另一种方法是使用该filter功能。在Python 2中:

>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']

在Python 3中,它返回一个迭代器而不是列表,但是您可以强制转换它:

>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']

尽管使用理解是更好的做法。

This simple filtering can be achieved in many ways with Python. The best approach is to use “list comprehensions” as follows:

>>> lst = ['a', 'ab', 'abc', 'bac']
>>> [k for k in lst if 'ab' in k]
['ab', 'abc']

Another way is to use the filter function. In Python 2:

>>> filter(lambda k: 'ab' in k, lst)
['ab', 'abc']

In Python 3, it returns an iterator instead of a list, but you can cast it:

>>> list(filter(lambda k: 'ab' in k, lst))
['ab', 'abc']

Though it’s better practice to use a comprehension.


回答 1

[x for x in L if 'ab' in x]
[x for x in L if 'ab' in x]

回答 2

# To support matches from the beginning, not any matches:

items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'

filter(lambda x: x.startswith(prefix), items)
# To support matches from the beginning, not any matches:

items = ['a', 'ab', 'abc', 'bac']
prefix = 'ab'

filter(lambda x: x.startswith(prefix), items)

回答 3

在交互式shell中快速尝试了一下:

>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>

为什么这样做?因为为字符串定义了in运算符,以表示:“是”的子字符串。

另外,您可能要考虑写出循环,而不是使用上面使用的列表理解语法

l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
   if 'ab' in s:
       result.append(s)

Tried this out quickly in the interactive shell:

>>> l = ['a', 'ab', 'abc', 'bac']
>>> [x for x in l if 'ab' in x]
['ab', 'abc']
>>>

Why does this work? Because the in operator is defined for strings to mean: “is substring of”.

Also, you might want to consider writing out the loop as opposed to using the list comprehension syntax used above:

l = ['a', 'ab', 'abc', 'bac']
result = []
for s in l:
   if 'ab' in s:
       result.append(s)

回答 4

mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist
mylist = ['a', 'ab', 'abc']
assert 'ab' in mylist

在奇数位置提取列表元素

问题:在奇数位置提取列表元素

所以我想创建一个列表,它是一些现有列表的子列表。

例如,

L = [1, 2, 3, 4, 5, 6, 7],我想创建一个子列表li,使其li包含所有L位于奇数位置的元素。

虽然我可以做到

L = [1, 2, 3, 4, 5, 6, 7]
li = []
count = 0
for i in L:
    if count % 2 == 1:
        li.append(i)
    count += 1

但是我想知道是否还有另一种方法可以以更少的步骤高效地完成相同的工作。

So I want to create a list which is a sublist of some existing list.

For example,

L = [1, 2, 3, 4, 5, 6, 7], I want to create a sublist li such that li contains all the elements in L at odd positions.

While I can do it by

L = [1, 2, 3, 4, 5, 6, 7]
li = []
count = 0
for i in L:
    if count % 2 == 1:
        li.append(i)
    count += 1

But I want to know if there is another way to do the same efficiently and in fewer number of steps.


回答 0

是的你可以:

l = L[1::2]

这就是全部。结果将包含放置在以下位置的元素(0基于-,因此第一个元素在position 0,第二个元素在1etc):

1, 3, 5

因此结果(实际数字)将为:

2, 4, 6

说明

[1::2]在年底只是为了列表切片的符号。通常采用以下形式:

some_list[start:stop:step]

如果省略start,将使用默认值(0)。所以第一个元素(在position 0,因为索引是0基于-的)。在这种情况下,将选择第二个元素。

因为省略了第二个元素,所以使用默认值(列表的末尾)。所以列表从第二个元素到最后一个

我们还提供了第三个参数(step2。这意味着将选择一个元素,将跳过下一个元素,依此类推…

因此,总结起来,在这种情况下[1::2]意味着:

  1. 取第二个元素(顺便说一句,如果从索引判断,它是一个奇数元素),
  2. 跳过一个元素(因为我们有step=2,所以我们跳过了一个元素,这与step=1这与默认设置),
  3. 接下一个元素
  4. 重复步骤2.-3。直到到达列表的末尾,

编辑:@PreetKukreti提供了有关Python的列表切片表示法的另一种解释的链接。看这里:解释Python的切片符号

额外功能-以取代柜台 enumerate()

在您的代码中,您显式创建并增加了计数器。在Python中,这不是必需的,因为您可以使用来枚举一些可迭代的对象enumerate()

for count, i in enumerate(L):
    if count % 2 == 1:
        l.append(i)

上面的代码与您使用的代码完全相同:

count = 0
for i in L:
    if count % 2 == 1:
        l.append(i)
    count += 1

有关for在Python中使用计数器模拟循环的更多信息:在Python“ for”循环中访问索引

Solution

Yes, you can:

l = L[1::2]

And this is all. The result will contain the elements placed on the following positions (0-based, so first element is at position 0, second at 1 etc.):

1, 3, 5

so the result (actual numbers) will be:

2, 4, 6

Explanation

The [1::2] at the end is just a notation for list slicing. Usually it is in the following form:

some_list[start:stop:step]

If we omitted start, the default (0) would be used. So the first element (at position 0, because the indexes are 0-based) would be selected. In this case the second element will be selected.

Because the second element is omitted, the default is being used (the end of the list). So the list is being iterated from the second element to the end.

We also provided third argument (step) which is 2. Which means that one element will be selected, the next will be skipped, and so on…

So, to sum up, in this case [1::2] means:

  1. take the second element (which, by the way, is an odd element, if you judge from the index),
  2. skip one element (because we have step=2, so we are skipping one, as a contrary to step=1 which is default),
  3. take the next element,
  4. Repeat steps 2.-3. until the end of the list is reached,

EDIT: @PreetKukreti gave a link for another explanation on Python’s list slicing notation. See here: Explain Python’s slice notation

Extras – replacing counter with enumerate()

In your code, you explicitly create and increase the counter. In Python this is not necessary, as you can enumerate through some iterable using enumerate():

for count, i in enumerate(L):
    if count % 2 == 1:
        l.append(i)

The above serves exactly the same purpose as the code you were using:

count = 0
for i in L:
    if count % 2 == 1:
        l.append(i)
    count += 1

More on emulating for loops with counter in Python: Accessing the index in Python ‘for’ loops


回答 1

对于奇数位,您可能需要:

>>>> list_ = list(range(10))
>>>> print list_[1::2]
[1, 3, 5, 7, 9]
>>>>

For the odd positions, you probably want:

>>>> list_ = list(range(10))
>>>> print list_[1::2]
[1, 3, 5, 7, 9]
>>>>

回答 2

我喜欢List理解,因为它们具有Math(Set)语法。那么呢:

L = [1, 2, 3, 4, 5, 6, 7]
odd_numbers = [y for x,y in enumerate(L) if x%2 != 0]
even_numbers = [y for x,y in enumerate(L) if x%2 == 0]

基本上,如果您枚举列表,则将获得索引 x和value y。我在这里所做的就是将值y放入输出列表(偶数或奇数),并使用索引x来找出该点是否为奇数(x%2 != 0)。

I like List comprehensions because of their Math (Set) syntax. So how about this:

L = [1, 2, 3, 4, 5, 6, 7]
odd_numbers = [y for x,y in enumerate(L) if x%2 != 0]
even_numbers = [y for x,y in enumerate(L) if x%2 == 0]

Basically, if you enumerate over a list, you’ll get the index x and the value y. What I’m doing here is putting the value y into the output list (even or odd) and using the index x to find out if that point is odd (x%2 != 0).


回答 3

您可以使用按位AND运算符&。让我们看下面:

x = [1, 2, 3, 4, 5, 6, 7]
y = [i for i in x if i&1]
>>> 
[1, 3, 5, 7]

按位AND运算符与1一起使用,其工作原理是因为,用二进制编写时,奇数必须以其第一位为1。

23 = 1 * (2**4) + 0 * (2**3) + 1 * (2**2) + 1 * (2**1) + 1 * (2**0) = 10111
14 = 1 * (2**3) + 1 * (2**2) + 1 * (2**1) + 0 * (2**0) = 1110

如果值是奇数,则与1的AND运算只会返回1(二进制数1也将有最后一位数字)。

查看Python Bitwise Operator页面了解更多信息。

PS:如果要在数据框中选择奇数和偶数列,则可以在战术上使用此方法。假设面部关键点的x和y坐标以x1,y1,x2等列给出。要使用每个图像的宽度和高度值对x和y坐标进行归一化,您可以简单地执行

for i in range(df.shape[1]):
    if i&1:
        df.iloc[:, i] /= heights
    else:
        df.iloc[:, i] /= widths

这与问题不完全相关,但对于数据科学家和计算机视觉工程师而言,此方法可能有用。

干杯!

You can make use of bitwise AND operator &. Let’s see below:

x = [1, 2, 3, 4, 5, 6, 7]
y = [i for i in x if i&1]
>>> 
[1, 3, 5, 7]

Bitwise AND operator is used with 1, and the reason it works because, odd number when written in binary must have its first digit as 1. Let’s check

23 = 1 * (2**4) + 0 * (2**3) + 1 * (2**2) + 1 * (2**1) + 1 * (2**0) = 10111
14 = 1 * (2**3) + 1 * (2**2) + 1 * (2**1) + 0 * (2**0) = 1110

AND operation with 1 will only return 1 (1 in binary will also have last digit 1), iff the value is odd.

Check the Python Bitwise Operator page for more.

P.S: You can tactically use this method if you want to select odd and even columns in a dataframe. Let’s say x and y coordinates of facial key-points are given as columns x1, y1, x2, etc… To normalize the x and y coordinates with width and height values of each image you can simply perform

for i in range(df.shape[1]):
    if i&1:
        df.iloc[:, i] /= heights
    else:
        df.iloc[:, i] /= widths

This is not exactly related to the question but for data scientists and computer vision engineers this method could be useful.

Cheers!


回答 4

list_ = list(range(9))print(list_ [1 :: 2])

list_ = list(range(9)) print(list_[1::2])


为什么我不能在Python中使用列表作为字典键?

问题:为什么我不能在Python中使用列表作为字典键?

对于什么可以/不能用作python dict的键,我有些困惑。

dicked = {}
dicked[None] = 'foo'     # None ok
dicked[(1,3)] = 'baz'    # tuple ok
import sys
dicked[sys] = 'bar'      # wow, even a module is ok !
dicked[(1,[3])] = 'qux'  # oops, not allowed

因此,元组是一个不可变的类型,但是如果我在其中隐藏一个列表,那么它就不能成为键。.我不能像在模块内部一样轻松地隐藏一个列表吗?

我有一个模糊的想法,认为密钥必须是“可哈希的”,但是我只是承认自己对技术细节的无知。我不知道这里到底发生了什么。如果您尝试将列表用作键,而将哈希作为其存储位置,那会出什么问题呢?

I’m a bit confused about what can/can’t be used as a key for a python dict.

dicked = {}
dicked[None] = 'foo'     # None ok
dicked[(1,3)] = 'baz'    # tuple ok
import sys
dicked[sys] = 'bar'      # wow, even a module is ok !
dicked[(1,[3])] = 'qux'  # oops, not allowed

So a tuple is an immutable type but if I hide a list inside of it, then it can’t be a key.. couldn’t I just as easily hide a list inside a module?

I had some vague idea that that the key has to be “hashable” but I’m just going to admit my own ignorance about the technical details; I don’t know what’s really going on here. What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?


回答 0

Python Wiki中有一篇关于该主题的好文章:为什么列表不能成为字典键。如此处所述:

如果您尝试将列表用作键,而将哈希作为其存储位置,那会出什么问题呢?

可以在不真正破坏任何要求的情况下完成此操作,但是会导致意外的行为。通常将列表视为其值是从其内容的值派生的,例如在检查(不等式)时。可以理解的是,许多人希望您可以使用任何列表[1, 2]来获取相同的键,而您必须在其中保留完全相同的列表对象。但是,一旦修改了用作键的列表,按值查找就会中断,并且要通过标识查找,您需要保持完全相同的列表-这不需要任何其他常见的列表操作(至少我不能想到) )。

object无论如何,其他对象(例如模块)都会通过它们的对象标识产生更大的影响(这是您最后一次有两个不同的名为sys?的模块对象),并且无论如何都要进行比较。因此,当它们用作dict键时,在这种情况下也按标识进行比较就不足为奇了-甚至没有想到。

There’s a good article on the topic in the Python wiki: Why Lists Can’t Be Dictionary Keys. As explained there:

What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

It can be done without really breaking any of the requirements, but it leads to unexpected behavior. Lists are generally treated as if their value was derived from their content’s values, for instance when checking (in-)equality. Many would – understandably – expect that you can use any list [1, 2] to get the same key, where you’d have to keep around exactly the same list object. But lookup by value breaks as soon as a list used as key is modified, and for lookup by identity requires you to keep around exactly the same list – which isn’t requires for any other common list operation (at least none I can think of).

Other objects such as modules and object make a much bigger deal out of their object identity anyway (when was the last time you had two distinct module objects called sys?), and are compared by that anyway. Therefore, it’s less surprising – or even expected – that they, when used as dict keys, compare by identity in that case as well.


回答 1

为什么我不能在Python中使用列表作为字典键?

>>> d = {repr([1,2,3]): 'value'}
{'[1, 2, 3]': 'value'}

(对于任何偶然发现此问题以寻求解决方案的人)

正如这里其他人所解释的,实际上您不能。但是,如果您确实要使用列表,则可以使用其字符串表示形式。

Why can’t I use a list as a dict key in python?

>>> d = {repr([1,2,3]): 'value'}
{'[1, 2, 3]': 'value'}

(for anybody who stumbles on this question looking for a way around it)

as explained by others here, indeed you cannot. You can however use its string representation instead if you really want to use your list.


回答 2

刚发现您可以将List更改为元组,然后将其用作键。

d = {tuple([1,2,3]): 'value'}

Just found you can change List into tuple, then use it as keys.

d = {tuple([1,2,3]): 'value'}

回答 3

问题在于元组是不可变的,而列表不是。考虑以下

d = {}
li = [1,2,3]
d[li] = 5
li.append(4)

应该d[li]返回什么?是相同的清单吗?怎么d[[1,2,3]]样 它具有相同的值,但列表不同吗?

最终,没有令人满意的答案。例如,如果唯一起作用的键是原始键,那么如果您没有对该键的引用,则无法再访问该值。使用其他所有允许的密钥,您可以构造一个密钥,而无需参考原始密钥。

如果我的两个建议都起作用,那么您将拥有非常不同的键,它们返回相同的值,这有点令人惊讶。如果仅原始内容有效,则您的密钥将很快失效,因为已修改了列表。

The issue is that tuples are immutable, and lists are not. Consider the following

d = {}
li = [1,2,3]
d[li] = 5
li.append(4)

What should d[li] return? Is it the same list? How about d[[1,2,3]]? It has the same values, but is a different list?

Ultimately, there is no satisfactory answer. For example, if the only key that works is the original key, then if you have no reference to that key, you can never again access the value. With every other allowed key, you can construct a key without a reference to the original.

If both of my suggestions work, then you have very different keys that return the same value, which is more than a little surprising. If only the original contents work, then your key will quickly go bad, since lists are made to be modified.


回答 4

这是一个答案http://wiki.python.org/moin/DictionaryKeys

如果您尝试将列表用作键,而将哈希作为其存储位置,那会出什么问题呢?

查找具有相同内容的不同列表将产生不同的结果,即使比较具有相同内容的列表也将它们视为等效。

在字典查找中使用列表文字怎么办?

Here’s an answer http://wiki.python.org/moin/DictionaryKeys

What would go wrong if you tried to use lists as keys, with the hash as, say, their memory location?

Looking up different lists with the same contents would produce different results, even though comparing lists with the same contents would indicate them as equivalent.

What about Using a list literal in a dictionary lookup?


回答 5

您的遮阳篷可以在这里找到:

为什么列表不能成为字典键

Python的新手常常想知道为什么,尽管语言既包含元组又包含列表类型,但是元组可用作字典键,而列表却不可用。这是一个经过深思熟虑的设计决定,可以通过首先了解Python词典的工作方式来最好地解释。

来源和更多信息:http : //wiki.python.org/moin/DictionaryKeys

Your awnser can be found here:

Why Lists Can’t Be Dictionary Keys

Newcomers to Python often wonder why, while the language includes both a tuple and a list type, tuples are usable as a dictionary keys, while lists are not. This was a deliberate design decision, and can best be explained by first understanding how Python dictionaries work.

Source & more info: http://wiki.python.org/moin/DictionaryKeys


回答 6

因为列表是可变的,所以dict键(和set成员)必须是可哈希的,并且对可变对象进行哈希处理是一个坏主意,因为哈希值基于实例属性进行计算。

在这个答案中,我将给出一些具体的例子,希望在现有答案的基础上增加价值。每个洞察力也适用于数据set结构的元素。

示例1:哈希可变对象,其中哈希值基于对象的可变特性。

>>> class stupidlist(list):
...     def __hash__(self):
...         return len(self)
... 
>>> stupid = stupidlist([1, 2, 3])
>>> d = {stupid: 0}
>>> stupid.append(4)
>>> stupid
[1, 2, 3, 4]
>>> d
{[1, 2, 3, 4]: 0}
>>> stupid in d
False
>>> stupid in d.keys()
False
>>> stupid in list(d.keys())
True

突变后stupid,不能在字典不再因为散列变化发现。仅对字典的键列表进行线性扫描才能找到stupid

例2:…但是为什么不只是一个恒定的哈希值?

>>> class stupidlist2(list):
...     def __hash__(self):
...         return id(self)
... 
>>> stupidA = stupidlist2([1, 2, 3])
>>> stupidB = stupidlist2([1, 2, 3])
>>> 
>>> stupidA == stupidB
True
>>> stupidA in {stupidB: 0}
False

这也不是一个好主意,因为相等的对象应该相同地散列,以便您可以在 dict或中set

例子3:…好吧,在所有实例中保持不变的哈希值呢?

>>> class stupidlist3(list):
...     def __hash__(self):
...         return 1
... 
>>> stupidC = stupidlist3([1, 2, 3])
>>> stupidD = stupidlist3([1, 2, 3])
>>> stupidE = stupidlist3([1, 2, 3, 4])
>>> 
>>> stupidC in {stupidD: 0}
True
>>> stupidC in {stupidE: 0}
False
>>> d = {stupidC: 0}
>>> stupidC.append(5)
>>> stupidC in d
True

事情似乎按预期工作,但是请考虑发生了什么:当类的所有实例产生相同的哈希值时,只要一个实例中有两个以上的实例作为键,您就会发生哈希冲突。 dict或存在set

使用my_dict[key]key in my_dict(或item in my_set)需要执行stupidlist3与字典键中实例相同的次数相等的检查(在最坏的情况下)。在这一点上,字典的目的-O(1)查找-被完全击败了。以下时间(使用IPython完成)对此进行了演示。

示例3的一些时间

>>> lists_list = [[i]  for i in range(1000)]
>>> stupidlists_set = {stupidlist3([i]) for i in range(1000)}
>>> tuples_set = {(i,) for i in range(1000)}
>>> l = [999]
>>> s = stupidlist3([999])
>>> t = (999,)
>>> 
>>> %timeit l in lists_list
25.5 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit s in stupidlists_set
38.5 µs ± 61.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit t in tuples_set
77.6 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

如您所见,我们的成员资格测试stupidlists_set比整个范围的线性扫描要慢lists_list,而您在一组没有哈希冲突的情况下拥有预期的超快查找时间(因子500)。


TL; DR:您可以将其tuple(yourlist)用作dict键,因为元组是不可变且可哈希的。

Because lists are mutable, dict keys (and set members) need to be hashable, and hashing mutable objects is a bad idea because hash values should be computed on the basis of instance attributes.

In this answer, I will give some concrete examples, hopefully adding value on top of the existing answers. Every insight applies to the elements of the set datastructure as well.

Example 1: hashing a mutable object where the hash value is based on a mutable characteristic of the object.

>>> class stupidlist(list):
...     def __hash__(self):
...         return len(self)
... 
>>> stupid = stupidlist([1, 2, 3])
>>> d = {stupid: 0}
>>> stupid.append(4)
>>> stupid
[1, 2, 3, 4]
>>> d
{[1, 2, 3, 4]: 0}
>>> stupid in d
False
>>> stupid in d.keys()
False
>>> stupid in list(d.keys())
True

After mutating stupid, it cannot be found in the dict any longer because the hash changed. Only a linear scan over the list of the dict’s keys finds stupid.

Example 2: … but why not just a constant hash value?

>>> class stupidlist2(list):
...     def __hash__(self):
...         return id(self)
... 
>>> stupidA = stupidlist2([1, 2, 3])
>>> stupidB = stupidlist2([1, 2, 3])
>>> 
>>> stupidA == stupidB
True
>>> stupidA in {stupidB: 0}
False

That’s not a good idea as well because equal objects should hash identically such that you can find them in a dict or set.

Example 3: … ok, what about constant hashes across all instances?!

>>> class stupidlist3(list):
...     def __hash__(self):
...         return 1
... 
>>> stupidC = stupidlist3([1, 2, 3])
>>> stupidD = stupidlist3([1, 2, 3])
>>> stupidE = stupidlist3([1, 2, 3, 4])
>>> 
>>> stupidC in {stupidD: 0}
True
>>> stupidC in {stupidE: 0}
False
>>> d = {stupidC: 0}
>>> stupidC.append(5)
>>> stupidC in d
True

Things seem to work as expected, but think about what’s happening: when all instances of your class produce the same hash value, you will have a hash collision whenever there are more than two instances as keys in a dict or present in a set.

Finding the right instance with my_dict[key] or key in my_dict (or item in my_set) needs to perform as many equality checks as there are instances of stupidlist3 in the dict’s keys (in the worst case). At this point, the purpose of the dictionary – O(1) lookup – is completely defeated. This is demonstrated in the following timings (done with IPython).

Some Timings for Example 3

>>> lists_list = [[i]  for i in range(1000)]
>>> stupidlists_set = {stupidlist3([i]) for i in range(1000)}
>>> tuples_set = {(i,) for i in range(1000)}
>>> l = [999]
>>> s = stupidlist3([999])
>>> t = (999,)
>>> 
>>> %timeit l in lists_list
25.5 µs ± 442 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit s in stupidlists_set
38.5 µs ± 61.2 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
>>> %timeit t in tuples_set
77.6 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)

As you can see, the membership test in our stupidlists_set is even slower than a linear scan over the whole lists_list, while you have the expected super fast lookup time (factor 500) in a set without loads of hash collisions.


TL; DR: you can use tuple(yourlist) as dict keys, because tuples are immutable and hashable.


回答 7

您问题的简单答案是,类列表未实现方法散列,该散列对于任何希望用作字典中键的对象都是必需的。但是散列的原因不相同方式实现它在说,元组类(基于容器的内容)是因为列表是可变的,以便编辑列表将需要散列重新计算,这可能意味着在列表中现在位于基础哈希表中的错误存储桶中。请注意,由于您无法修改元组(不可变的),因此不会遇到此问题。

附带说明,dictobjects查找的实际实现基于Knuth Vol。的算法D。3秒 6.4。如果您有这本书,那么可能值得一读,此外,如果您真的非常有兴趣,则可以在这里查看开发人员对dictobject实际实现的评论。它详细介绍了它的工作原理。您可能也对感兴趣的字典的实现有一个python讲座。它们遍历了键的定义以及前几分钟的哈希值。

The simple answer to your question is that the class list does not implement the method hash which is required for any object which wishes to be used as a key in a dictionary. However the reason why hash is not implemented the same way it is in say the tuple class (based on the content of the container) is because a list is mutable so editing the list would require the hash to be recalculated which may mean the list in now located in the wrong bucket within the underling hash table. Note that since you cannot modify a tuple (immutable) it doesn’t run into this problem.

As a side note, the actual implementation of the dictobjects lookup is based on Algorithm D from Knuth Vol. 3, Sec. 6.4. If you have that book available to you it might be a worthwhile read, in addition if you’re really, really interested you may like to take a peek at the developer comments on the actual implementation of dictobject here. It goes into great detail as to exactly how it works. There is also a python lecture on the implementation of dictionaries which you may be interested in. They go through the definition of a key and what a hash is in the first few minutes.


回答 8

根据Python 2.7.2文档:

如果对象的哈希值在其生命周期内不发生变化(需要使用hash()方法),并且可以与其他对象进行比较(需要使用eq()或cmp()方法),则该对象是可哈希的。比较相等的可哈希对象必须具有相同的哈希值。

散列性使对象可用作字典键和set成员,因为这些数据结构在内部使用散列值。

Python的所有不可变内置对象都是可哈希的,而没有可变容器(例如列表或字典)是可哈希的。作为用户定义类实例的对象默认情况下是可哈希的;它们都比较不相等,并且其哈希值是其id()。

从不能添加,删除或替换其元素的意义上说,元组是不可变的,但是元素本身可能是可变的。列表的哈希值取决于其元素的哈希值,因此当您更改元素时它也会改变。

对列表散列使用id意味着所有列表的比较方式不同,这将令人惊讶且不便。

According to the Python 2.7.2 documentation:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a hash() method), and can be compared to other objects (it needs an eq() or cmp() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal, and their hash value is their id().

A tuple is immutable in the sense that you cannot add, remove or replace its elements, but the elements themselves may be mutable. List’s hash value depends on the hash values of its elements, and so it changes when you change the elements.

Using id’s for list hashes would imply that all lists compare differently, which would be surprising and inconvenient.


回答 9

字典是一个HashMap,它存储您的键的映射,将值转换为哈希的新键以及值映射。

类似于(伪代码):

{key : val}  
hash(key) = val

如果您想知道哪些可用选项可以用作字典的键。然后

任何可散列的内容(可以转换为散列,并保持静态值,即不可变,以形成如上所述的散列键)均符合条件,但是列表或集合对象可以随时随地变化,因此hash(key)也应只是为了与您的列表或集合同步而变化。

你可以试试 :

hash(<your key here>)

如果工作正常,则可以将其用作字典的键,也可以将其转换为可哈希的值。


简而言之 :

  1. 将该列表转换为tuple(<your list>)
  2. 将该列表转换为str(<your list>)

A Dictionary is a HashMap it stores map of your keys, value converted to a hashed new key and value mapping.

something like (psuedo code):

{key : val}  
hash(key) = val

If you are wondering which are available options that can be used as key for your dictionary. Then

anything which is hashable(can be converted to hash, and hold static value i.e immutable so as to make a hashed key as stated above) is eligible but as list or set objects can be vary on the go so hash(key) should also needs to vary just to be in sync with your list or set.

You can try :

hash(<your key here>)

If it works fine it can be used as key for your dictionary or else convert it to something hashable.


Inshort :

  1. Convert that list to tuple(<your list>).
  2. Convert that list to str(<your list>).

回答 10

dict键必须是可哈希的。列表是可变的,它们不提供有效的哈希方法。

dict keys need to be hashable. Lists are Mutable and they do not provide a valid hash method.


如何将numpy数组列表转换为单个numpy数组?

问题:如何将numpy数组列表转换为单个numpy数组?

假设我有;

LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays

我尝试转换;

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5])

我现在正在vstack上通过迭代来解决它,但是对于特别大的LIST来说确实很慢

您对最佳有效方法有何建议?

Suppose I have ;

LIST = [[array([1, 2, 3, 4, 5]), array([1, 2, 3, 4, 5],[1,2,3,4,5])] # inner lists are numpy arrays

I try to convert;

array([[1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5],
       [1, 2, 3, 4, 5])

I am solving it by iteration on vstack right now but it is really slow for especially large LIST

What do you suggest for the best efficient way?


回答 0

通常,您可以沿任意轴连接整个数组序列:

numpy.concatenate( LIST, axis=0 )

但是你必须对列表中的形状和每个阵列的维度担心(用于2维3×5的输出,你需要确保它们都是2维正由-5阵列的话)。如果要将一维数组连接为二维输出的行,则需要扩展其维数。

正如Jorge的答案所指出的那样,还有stacknumpy 1.10中引入的function :

numpy.stack( LIST, axis=0 )

这采用了补充方法:在连接之前,它会为每个输入数组创建一个新视图并添加一个额外的维数(在这种情况下,在左侧,因此每个n元素1D数组将变为1 x n2D数组)。仅当所有输入数组都具有相同的形状时才有效(即使沿着串联轴也是如此)。

vstack(或等价的row_stack)通常是一个更易于使用的解决方案,因为它将采用一维和/或二维数组序列,并在将整个列表连接在一起之前,在必要时且仅在必要时才自动扩展维数。在需要新尺寸的地方,将其添加到左侧。同样,您可以一次串联整个列表,而无需进行迭代:

numpy.vstack( LIST )

语法快捷方式也显示了这种灵活的行为numpy.r_[ array1, ...., arrayN ](请注意方括号)。这对于连接几个显式命名的数组很有用,但对您的情况不利,因为此语法将不接受数组序列,例如your LIST

还有一个类似的函数column_stack和快捷方式c_[...],用于水平(列方式)堆叠,以及一个几乎类似的函数hstack -尽管出于某种原因,后者的灵活性较差(它对输入数组的维数更为严格,并试图进行串联)一维数组首尾相连,而不是将它们视为列。

最后,在垂直堆叠一维数组的特定情况下,以下内容也适用:

numpy.array( LIST )

…因为数组可以从其他数组序列中构造出来,因此在开头增加了新的维度。

In general you can concatenate a whole sequence of arrays along any axis:

numpy.concatenate( LIST, axis=0 )

but you do have to worry about the shape and dimensionality of each array in the list (for a 2-dimensional 3×5 output, you need to ensure that they are all 2-dimensional n-by-5 arrays already). If you want to concatenate 1-dimensional arrays as the rows of a 2-dimensional output, you need to expand their dimensionality.

As Jorge’s answer points out, there is also the function stack, introduced in numpy 1.10:

numpy.stack( LIST, axis=0 )

This takes the complementary approach: it creates a new view of each input array and adds an extra dimension (in this case, on the left, so each n-element 1D array becomes a 1-by-n 2D array) before concatenating. It will only work if all the input arrays have the same shape—even along the axis of concatenation.

vstack (or equivalently row_stack) is often an easier-to-use solution because it will take a sequence of 1- and/or 2-dimensional arrays and expand the dimensionality automatically where necessary and only where necessary, before concatenating the whole list together. Where a new dimension is required, it is added on the left. Again, you can concatenate a whole list at once without needing to iterate:

numpy.vstack( LIST )

This flexible behavior is also exhibited by the syntactic shortcut numpy.r_[ array1, ...., arrayN ] (note the square brackets). This is good for concatenating a few explicitly-named arrays but is no good for your situation because this syntax will not accept a sequence of arrays, like your LIST.

There is also an analogous function column_stack and shortcut c_[...], for horizontal (column-wise) stacking, as well as an almost-analogous function hstack—although for some reason the latter is less flexible (it is stricter about input arrays’ dimensionality, and tries to concatenate 1-D arrays end-to-end instead of treating them as columns).

Finally, in the specific case of vertical stacking of 1-D arrays, the following also works:

numpy.array( LIST )

…because arrays can be constructed out of a sequence of other arrays, adding a new dimension to the beginning.


回答 1

从NumPy 1.10版开始,我们有了方法stack。它可以堆叠任何维度的数组(全部相等):

# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]

# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True

M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)

# This are all true    
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True

请享用,

Starting in NumPy version 1.10, we have the method stack. It can stack arrays of any dimension (all equal):

# List of arrays.
L = [np.random.randn(5,4,2,5,1,2) for i in range(10)]

# Stack them using axis=0.
M = np.stack(L)
M.shape # == (10,5,4,2,5,1,2)
np.all(M == L) # == True

M = np.stack(L, axis=1)
M.shape # == (5,10,4,2,5,1,2)
np.all(M == L) # == False (Don't Panic)

# This are all true    
np.all(M[:,0,:] == L[0]) # == True
all(np.all(M[:,i,:] == L[i]) for i in range(10)) # == True

Enjoy,


回答 2

我检查了一些提高速度性能的方法,发现没有什么不同! 唯一的区别是,使用某些方法必须仔细检查尺寸。

定时:

|------------|----------------|-------------------|
|            | shape (10000)  |  shape (1,10000)  |
|------------|----------------|-------------------|
| np.concat  |    0.18280     |      0.17960      |
|------------|----------------|-------------------|
|  np.stack  |    0.21501     |      0.16465      |
|------------|----------------|-------------------|
| np.vstack  |    0.21501     |      0.17181      |
|------------|----------------|-------------------|
|  np.array  |    0.21656     |      0.16833      |
|------------|----------------|-------------------|

如您所见,我尝试了2个实验-使用np.random.rand(10000)np.random.rand(1, 10000) 如果我们使用2d数组,则np.stacknp.array创建附加维度-result.shape是(1,10000,10000)和(10000,1,10000),那么他们需要采取其他措施来避免这种情况。

码:

from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
    new_np = np.random.rand(10000)
    l.append(new_np)



start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')

start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')

start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')

start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')

I checked some of the methods for speed performance and find that there is no difference! The only difference is that using some methods you must carefully check dimension.

Timing:

|------------|----------------|-------------------|
|            | shape (10000)  |  shape (1,10000)  |
|------------|----------------|-------------------|
| np.concat  |    0.18280     |      0.17960      |
|------------|----------------|-------------------|
|  np.stack  |    0.21501     |      0.16465      |
|------------|----------------|-------------------|
| np.vstack  |    0.21501     |      0.17181      |
|------------|----------------|-------------------|
|  np.array  |    0.21656     |      0.16833      |
|------------|----------------|-------------------|

As you can see I tried 2 experiments – using np.random.rand(10000) and np.random.rand(1, 10000) And if we use 2d arrays than np.stack and np.array create additional dimension – result.shape is (1,10000,10000) and (10000,1,10000) so they need additional actions to avoid this.

Code:

from time import perf_counter
from tqdm import tqdm_notebook
import numpy as np
l = []
for i in tqdm_notebook(range(10000)):
    new_np = np.random.rand(10000)
    l.append(new_np)



start = perf_counter()
stack = np.stack(l, axis=0 )
print(f'np.stack: {perf_counter() - start:.5f}')

start = perf_counter()
vstack = np.vstack(l)
print(f'np.vstack: {perf_counter() - start:.5f}')

start = perf_counter()
wrap = np.array(l)
print(f'np.array: {perf_counter() - start:.5f}')

start = perf_counter()
l = [el.reshape(1,-1) for el in l]
conc = np.concatenate(l, axis=0 )
print(f'np.concatenate: {perf_counter() - start:.5f}')

Python中的循环列表迭代器

问题:Python中的循环列表迭代器

我需要遍历一个循环列表,每次从最后一次访问的项目开始,可能要多次。

用例是一个连接池。客户端请求连接,迭代器检查指向的连接是否可用并返回,否则循环直到找到可用的连接。

有没有一种精巧的方法可以在Python中做到这一点?

I need to iterate over a circular list, possibly many times, each time starting with the last visited item.

The use case is a connection pool. A client asks for connection, an iterator checks if pointed-to connection is available and returns it, otherwise loops until it finds one that is available.

Is there a neat way to do it in Python?


回答 0

使用itertools.cycle,这是其确切目的:

from itertools import cycle

lst = ['a', 'b', 'c']

pool = cycle(lst)

for item in pool:
    print item,

输出:

a b c a b c ...

(显然,永远循环)


为了手动推进迭代器并从中一一提取值,只需调用next(pool)

>>> next(pool)
'a'
>>> next(pool)
'b'

Use itertools.cycle, that’s its exact purpose:

from itertools import cycle

lst = ['a', 'b', 'c']

pool = cycle(lst)

for item in pool:
    print item,

Output:

a b c a b c ...

(Loops forever, obviously)


In order to manually advance the iterator and pull values from it one by one, simply call next(pool):

>>> next(pool)
'a'
>>> next(pool)
'b'

回答 1

正确的答案是使用itertools.cycle。但是,让我们假设库函数不存在。您将如何实施?

使用生成器

def circular():
    while True:
        for connection in ['a', 'b', 'c']:
            yield connection

然后,您可以使用for语句进行无限迭代,也可以调用next()从生成器迭代器获取单个下一个值:

connections = circular()
next(connections) # 'a'
next(connections) # 'b'
next(connections) # 'c'
next(connections) # 'a'
next(connections) # 'b'
next(connections) # 'c'
next(connections) # 'a'
#....

The correct answer is to use itertools.cycle. But, let’s assume that library function doesn’t exist. How would you implement it?

Use a generator:

def circular():
    while True:
        for connection in ['a', 'b', 'c']:
            yield connection

Then, you can either use a for statement to iterate infinitely, or you can call next() to get the single next value from the generator iterator:

connections = circular()
next(connections) # 'a'
next(connections) # 'b'
next(connections) # 'c'
next(connections) # 'a'
next(connections) # 'b'
next(connections) # 'c'
next(connections) # 'a'
#....

回答 2

或者您可以这样做:

conn = ['a', 'b', 'c', 'd', 'e', 'f']
conn_len = len(conn)
index = 0
while True:
    print(conn[index])
    index = (index + 1) % conn_len

永远打印abcdefab c …

Or you can do like this:

conn = ['a', 'b', 'c', 'd', 'e', 'f']
conn_len = len(conn)
index = 0
while True:
    print(conn[index])
    index = (index + 1) % conn_len

prints a b c d e f a b c… forever


回答 3

您可以使用append(pop())循环完成此操作:

l = ['a','b','c','d']
while 1:
    print l[0]
    l.append(l.pop(0))

for i in range()循环:

l = ['a','b','c','d']
ll = len(l)
while 1:
    for i in range(ll):
       print l[i]

或者简单地:

l = ['a','b','c','d']

while 1:
    for i in l:
       print i

所有这些打印:

>>>
a
b
c
d
a
b
c
d
...etc.

在这三个函数中,我倾向于使用append(pop())方法作为函数

servers = ['a','b','c','d']

def rotate_servers(servers):
    servers.append(servers.pop(0))
    return servers

while 1:
    servers = rotate_servers(servers)
    print servers[0]

you can accomplish this with append(pop()) loop:

l = ['a','b','c','d']
while True:
    print l[0]
    l.append(l.pop(0))

or for i in range() loop:

l = ['a','b','c','d']
ll = len(l)
while True:
    for i in range(ll):
       print l[i]

or simply:

l = ['a','b','c','d']

while True:
    for i in l:
       print i

all of which print:

>>>
a
b
c
d
a
b
c
d
...etc.

of the three I’d be prone to the append(pop()) approach as a function

servers = ['a','b','c','d']

def rotate_servers(servers):
    servers.append(servers.pop(0))
    return servers

while True:
    servers = rotate_servers(servers)
    print servers[0]

回答 4

您需要一个自定义迭代器-我将根据此答案改编迭代器。

from itertools import cycle

class ConnectionPool():
    def __init__(self, ...):
        # whatever is appropriate here to initilize
        # your data
        self.pool = cycle([blah, blah, etc])
    def __iter__(self):
        return self
    def __next__(self):
        for connection in self.pool:
            if connection.is_available:  # or however you spell it
                return connection

You need a custom iterator — I’ll adapt the iterator from this answer.

from itertools import cycle

class ConnectionPool():
    def __init__(self, ...):
        # whatever is appropriate here to initilize
        # your data
        self.pool = cycle([blah, blah, etc])
    def __iter__(self):
        return self
    def __next__(self):
        for connection in self.pool:
            if connection.is_available:  # or however you spell it
                return connection

回答 5

如果您希望循环n时间,请实施ncycles itertools配方

from itertools import chain, repeat


def ncycles(iterable, n):
    "Returns the sequence elements n times"
    return chain.from_iterable(repeat(tuple(iterable), n))


list(ncycles(["a", "b", "c"], 3))
# ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']

If you wish to cycle n times, implement the ncycles itertools recipe:

from itertools import chain, repeat


def ncycles(iterable, n):
    "Returns the sequence elements n times"
    return chain.from_iterable(repeat(tuple(iterable), n))


list(ncycles(["a", "b", "c"], 3))
# ['a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c']