分类目录归档:知识问答

如何从列表中删除第一个项目?

问题:如何从列表中删除第一个项目?

我有表[0, 1, 2, 3, 4],我想将它做成[1, 2, 3, 4]。我该怎么办?

I have the list [0, 1, 2, 3, 4] I’d like to make it into [1, 2, 3, 4]. How do I go about this?


回答 0

Python清单

list.pop(索引)

>>> l = ['a', 'b', 'c', 'd']
>>> l.pop(0)
'a'
>>> l
['b', 'c', 'd']
>>> 

删除列表[索引]

>>> l = ['a', 'b', 'c', 'd']
>>> del l[0]
>>> l
['b', 'c', 'd']
>>> 

这些都将修改您的原始列表。

其他人建议使用切片:

  • 复制清单
  • 可以返回一个子集

另外,如果要执行许多pop(0),则应查看collections.deque

from collections import deque
>>> l = deque(['a', 'b', 'c', 'd'])
>>> l.popleft()
'a'
>>> l
deque(['b', 'c', 'd'])
  • 从列表的左端提供更高的性能

Python List

list.pop(index)

>>> l = ['a', 'b', 'c', 'd']
>>> l.pop(0)
'a'
>>> l
['b', 'c', 'd']
>>> 

del list[index]

>>> l = ['a', 'b', 'c', 'd']
>>> del l[0]
>>> l
['b', 'c', 'd']
>>> 

These both modify your original list.

Others have suggested using slicing:

  • Copies the list
  • Can return a subset

Also, if you are performing many pop(0), you should look at collections.deque

from collections import deque
>>> l = deque(['a', 'b', 'c', 'd'])
>>> l.popleft()
'a'
>>> l
deque(['b', 'c', 'd'])
  • Provides higher performance popping from left end of the list

回答 1

切片:

x = [0,1,2,3,4]
x = x[1:]

这实际上将返回原始的子集,但不会对其进行修改。

Slicing:

x = [0,1,2,3,4]
x = x[1:]

Which would actually return a subset of the original but not modify it.


回答 2

>>> x = [0, 1, 2, 3, 4]
>>> x.pop(0)
0

更多关于此这里

>>> x = [0, 1, 2, 3, 4]
>>> x.pop(0)
0

More on this here.


回答 3

使用列表切片,请参阅有关列表的Python教程以获取更多详细信息:

>>> l = [0, 1, 2, 3, 4]
>>> l[1:]
[1, 2, 3, 4]

With list slicing, see the Python tutorial about lists for more details:

>>> l = [0, 1, 2, 3, 4]
>>> l[1:]
[1, 2, 3, 4]

回答 4

你会这样做

l = [0, 1, 2, 3, 4]
l.pop(0)

要么 l = l[1:]

利弊

使用pop可以获取值

x = l.pop(0) x0

you would just do this

l = [0, 1, 2, 3, 4]
l.pop(0)

or l = l[1:]

Pros and Cons

Using pop you can retrieve the value

say x = l.pop(0) x would be 0


回答 5

然后将其删除:

x = [0, 1, 2, 3, 4]
del x[0]
print x
# [1, 2, 3, 4]

Then just delete it:

x = [0, 1, 2, 3, 4]
del x[0]
print x
# [1, 2, 3, 4]

回答 6

您可以list.reverse()用来反转列表,然后list.pop()删除最后一个元素,例如:

l = [0, 1, 2, 3, 4]
l.reverse()
print l
[4, 3, 2, 1, 0]


l.pop()
0
l.pop()
1
l.pop()
2
l.pop()
3
l.pop()
4

You can use list.reverse() to reverse the list, then list.pop() to remove the last element, for example:

l = [0, 1, 2, 3, 4]
l.reverse()
print l
[4, 3, 2, 1, 0]


l.pop()
0
l.pop()
1
l.pop()
2
l.pop()
3
l.pop()
4

回答 7

您也可以使用list.remove(a[0])pop删除列表中的第一个元素。

>>>> a=[1,2,3,4,5]
>>>> a.remove(a[0])
>>>> print a
>>>> [2,3,4,5]

You can also use list.remove(a[0]) to pop out the first element in the list.

>>>> a=[1,2,3,4,5]
>>>> a.remove(a[0])
>>>> print a
>>>> [2,3,4,5]

回答 8

如果使用numpy,则需要使用delete方法:

import numpy as np

a = np.array([1, 2, 3, 4, 5])

a = np.delete(a, 0)

print(a) # [2 3 4 5]

If you are working with numpy you need to use the delete method:

import numpy as np

a = np.array([1, 2, 3, 4, 5])

a = np.delete(a, 0)

print(a) # [2 3 4 5]

回答 9

有一个称为“双端队列”或双头队列的数据结构,它比列表更快,更高效。您可以使用列表并将其转换为双端队列,并在其中进行所需的转换。您也可以将双端队列转换回列表。

import collections
mylist = [0, 1, 2, 3, 4]

#make a deque from your list
de = collections.deque(mylist)

#you can remove from a deque from either left side or right side
de.popleft()
print(de)

#you can covert the deque back to list
mylist = list(de)
print(mylist)

Deque还提供了非常有用的功能,例如将元素插入列表的任一侧或任何特定的索引。您也可以旋转或反转双端队列。试试看!!

There is a datastructure called “deque” or double ended queue which is faster and efficient than a list. You can use your list and convert it to deque and do the required transformations in it. You can also convert the deque back to list.

import collections
mylist = [0, 1, 2, 3, 4]

#make a deque from your list
de = collections.deque(mylist)

#you can remove from a deque from either left side or right side
de.popleft()
print(de)

#you can covert the deque back to list
mylist = list(de)
print(mylist)

Deque also provides very useful functions like inserting elements to either side of the list or to any specific index. You can also rotate or reverse a deque. Give it a try!!


改组对象列表

问题:改组对象列表

我有一个对象列表,我想对其进行洗牌。我以为可以使用该random.shuffle方法,但是当列表中包含对象时,这似乎失败了。是否有一种用于改组对象的方法或解决此问题的另一种方法?

import random

class A:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1, a2]

print(random.shuffle(b))

这将失败。

I have a list of objects and I want to shuffle them. I thought I could use the random.shuffle method, but this seems to fail when the list is of objects. Is there a method for shuffling objects or another way around this?

import random

class A:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1, a2]

print(random.shuffle(b))

This will fail.


回答 0

random.shuffle应该管用。这是一个示例,其中对象是列表:

from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)

# print(x)  gives  [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
# of course your results will vary

请注意,随机播放在适当的地方起作用,并返回None。

random.shuffle should work. Here’s an example, where the objects are lists:

from random import shuffle
x = [[i] for i in range(10)]
shuffle(x)

# print(x)  gives  [[9], [2], [7], [0], [4], [5], [3], [1], [8], [6]]
# of course your results will vary

Note that shuffle works in place, and returns None.


回答 1

当您了解到就地改组就是问题所在。我也经常遇到问题,而且似乎也常常忘记如何复制列表。使用sample(a, len(a))是解决方案,使用len(a)作为样本量。有关Python文档,请参见https://docs.python.org/3.6/library/random.html#random.sample

这是使用的简单版本random.sample(),它将经过改组的结果作为新列表返回。

import random

a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False

# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))

try:
    random.sample(a, len(a) + 1)
except ValueError as e:
    print "Nope!", e

# print: no duplicates: True
# print: Nope! sample larger than population

As you learned the in-place shuffling was the problem. I also have problem frequently, and often seem to forget how to copy a list, too. Using sample(a, len(a)) is the solution, using len(a) as the sample size. See https://docs.python.org/3.6/library/random.html#random.sample for the Python documentation.

Here’s a simple version using random.sample() that returns the shuffled result as a new list.

import random

a = range(5)
b = random.sample(a, len(a))
print a, b, "two list same:", a == b
# print: [0, 1, 2, 3, 4] [2, 1, 3, 4, 0] two list same: False

# The function sample allows no duplicates.
# Result can be smaller but not larger than the input.
a = range(555)
b = random.sample(a, len(a))
print "no duplicates:", a == list(set(b))

try:
    random.sample(a, len(a) + 1)
except ValueError as e:
    print "Nope!", e

# print: no duplicates: True
# print: Nope! sample larger than population

回答 2

我也花了一些时间来做到这一点。但是洗牌的文档非常清楚:

列表随机排列x ; 不返回。

所以你不应该print(random.shuffle(b))。相反random.shuffle(b),然后print(b)

It took me some time to get that too. But the documentation for shuffle is very clear:

shuffle list x in place; return None.

So you shouldn’t print(random.shuffle(b)). Instead do random.shuffle(b) and then print(b).


回答 3

#!/usr/bin/python3

import random

s=list(range(5))
random.shuffle(s) # << shuffle before print or assignment
print(s)

# print: [2, 4, 1, 3, 0]
#!/usr/bin/python3

import random

s=list(range(5))
random.shuffle(s) # << shuffle before print or assignment
print(s)

# print: [2, 4, 1, 3, 0]

回答 4

如果您碰巧已经使用numpy(在科学和金融应用中非常流行),则可以节省导入时间。

import numpy as np    
np.random.shuffle(b)
print(b)

http://docs.scipy.org/doc/numpy/reference/generation/numpy.random.shuffle.html

If you happen to be using numpy already (very popular for scientific and financial applications) you can save yourself an import.

import numpy as np    
np.random.shuffle(b)
print(b)

http://docs.scipy.org/doc/numpy/reference/generated/numpy.random.shuffle.html


回答 5

>>> import random
>>> a = ['hi','world','cat','dog']
>>> random.shuffle(a,random.random)
>>> a
['hi', 'cat', 'dog', 'world']

这对我来说可以。确保设置随机方法。

>>> import random
>>> a = ['hi','world','cat','dog']
>>> random.shuffle(a,random.random)
>>> a
['hi', 'cat', 'dog', 'world']

It works fine for me. Make sure to set the random method.


回答 6

如果您有多个列表,则可能要先定义排列(随机排列列表/重新排列列表中项目的方式),然后将其应用于所有列表:

import random

perm = list(range(len(list_one)))
random.shuffle(perm)
list_one = [list_one[index] for index in perm]
list_two = [list_two[index] for index in perm]

脾气暴躁

如果您的列表是numpy数组,则更为简单:

import numpy as np

perm = np.random.permutation(len(list_one))
list_one = list_one[perm]
list_two = list_two[perm]

处理器

我创建了mpu具有以下consistent_shuffle功能的小型实用程序包:

import mpu

# Necessary if you want consistent results
import random
random.seed(8)

# Define example lists
list_one = [1,2,3]
list_two = ['a', 'b', 'c']

# Call the function
list_one, list_two = mpu.consistent_shuffle(list_one, list_two)

请注意,它mpu.consistent_shuffle接受任意数量的参数。因此,您也可以使用它洗牌三个或更多列表。

If you have multiple lists, you might want to define the permutation (the way you shuffle the list / rearrange the items in the list) first and then apply it to all lists:

import random

perm = list(range(len(list_one)))
random.shuffle(perm)
list_one = [list_one[index] for index in perm]
list_two = [list_two[index] for index in perm]

Numpy / Scipy

If your lists are numpy arrays, it is simpler:

import numpy as np

perm = np.random.permutation(len(list_one))
list_one = list_one[perm]
list_two = list_two[perm]

mpu

I’ve created the small utility package mpu which has the consistent_shuffle function:

import mpu

# Necessary if you want consistent results
import random
random.seed(8)

# Define example lists
list_one = [1,2,3]
list_two = ['a', 'b', 'c']

# Call the function
list_one, list_two = mpu.consistent_shuffle(list_one, list_two)

Note that mpu.consistent_shuffle takes an arbitrary number of arguments. So you can also shuffle three or more lists with it.


回答 7

from random import random
my_list = range(10)
shuffled_list = sorted(my_list, key=lambda x: random())

对于要交换订购功能的某些应用程序,此替代方法可能很有用。

from random import random
my_list = range(10)
shuffled_list = sorted(my_list, key=lambda x: random())

This alternative may be useful for some applications where you want to swap the ordering function.


回答 8

在某些情况下,使用numpy数组时,请random.shuffle在数组中使用创建的重复数据。

另一种方法是使用numpy.random.shuffle。如果您已经在使用numpy,那么这是优于generic的首选方法random.shuffle

numpy.random.shuffle

>>> import numpy as np
>>> import random

使用random.shuffle

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

使用numpy.random.shuffle

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> np.random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

In some cases when using numpy arrays, using random.shuffle created duplicate data in the array.

An alternative is to use numpy.random.shuffle. If you’re working with numpy already, this is the preferred method over the generic random.shuffle.

numpy.random.shuffle

Example

>>> import numpy as np
>>> import random

Using random.shuffle:

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [1, 2, 3],
       [4, 5, 6]])

Using numpy.random.shuffle:

>>> foo = np.array([[1,2,3],[4,5,6],[7,8,9]])
>>> foo

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])


>>> np.random.shuffle(foo)
>>> foo

array([[1, 2, 3],
       [7, 8, 9],
       [4, 5, 6]])

回答 9

对于单行代码,请使用random.sample(list_to_be_shuffled, length_of_the_list)示例:

import random
random.sample(list(range(10)), 10)

输出:[2、9、7、8、3、0、4、1、6、5]

For one-liners, userandom.sample(list_to_be_shuffled, length_of_the_list) with an example:

import random
random.sample(list(range(10)), 10)

outputs: [2, 9, 7, 8, 3, 0, 4, 1, 6, 5]


回答 10

当使用’foo’调用时,’print func(foo)’将输出’func’的返回值。但是,’shuffle’的返回类型为None,因为该列表将被修改,因此不打印任何内容。解决方法:

# shuffle the list in place 
random.shuffle(b)

# print it
print(b)

如果您更喜欢函数式编程风格,则可能需要创建以下包装函数:

def myshuffle(ls):
    random.shuffle(ls)
    return ls

‘print func(foo)’ will print the return value of ‘func’ when called with ‘foo’. ‘shuffle’ however has None as its return type, as the list will be modified in place, hence it prints nothing. Workaround:

# shuffle the list in place 
random.shuffle(b)

# print it
print(b)

If you’re more into functional programming style you might want to make the following wrapper function:

def myshuffle(ls):
    random.shuffle(ls)
    return ls

回答 11

可以定义一个函数shuffledsort与vs 相同sorted

def shuffled(x):
    import random
    y = x[:]
    random.shuffle(y)
    return y

x = shuffled([1, 2, 3, 4])
print x

One can define a function called shuffled (in the same sense of sort vs sorted)

def shuffled(x):
    import random
    y = x[:]
    random.shuffle(y)
    return y

x = shuffled([1, 2, 3, 4])
print x

回答 12

import random

class a:
    foo = "bar"

a1 = a()
a2 = a()
a3 = a()
a4 = a()
b = [a1,a2,a3,a4]

random.shuffle(b)
print(b)

shuffle 到位,因此不要打印结果None,而是列表。

import random

class a:
    foo = "bar"

a1 = a()
a2 = a()
a3 = a()
a4 = a()
b = [a1,a2,a3,a4]

random.shuffle(b)
print(b)

shuffle is in place, so do not print result, which is None, but the list.


回答 13

您可以这样做:

>>> A = ['r','a','n','d','o','m']
>>> B = [1,2,3,4,5,6]
>>> import random
>>> random.sample(A+B, len(A+B))
[3, 'r', 4, 'n', 6, 5, 'm', 2, 1, 'a', 'o', 'd']

如果要返回到两个列表,则可以将此长列表分成两部分。

You can go for this:

>>> A = ['r','a','n','d','o','m']
>>> B = [1,2,3,4,5,6]
>>> import random
>>> random.sample(A+B, len(A+B))
[3, 'r', 4, 'n', 6, 5, 'm', 2, 1, 'a', 'o', 'd']

if you want to go back to two lists, you then split this long list into two.


回答 14

您可以构建一个将列表作为参数并返回列表的随机版本的函数:

from random import *

def listshuffler(inputlist):
    for i in range(len(inputlist)):
        swap = randint(0,len(inputlist)-1)
        temp = inputlist[swap]
        inputlist[swap] = inputlist[i]
        inputlist[i] = temp
    return inputlist

you could build a function that takes a list as a parameter and returns a shuffled version of the list:

from random import *

def listshuffler(inputlist):
    for i in range(len(inputlist)):
        swap = randint(0,len(inputlist)-1)
        temp = inputlist[swap]
        inputlist[swap] = inputlist[i]
        inputlist[i] = temp
    return inputlist

回答 15

""" to shuffle random, set random= True """

def shuffle(x,random=False):
     shuffled = []
     ma = x
     if random == True:
         rando = [ma[i] for i in np.random.randint(0,len(ma),len(ma))]
         return rando
     if random == False:
          for i in range(len(ma)):
          ave = len(ma)//3
          if i < ave:
             shuffled.append(ma[i+ave])
          else:
             shuffled.append(ma[i-ave])    
     return shuffled
""" to shuffle random, set random= True """

def shuffle(x,random=False):
     shuffled = []
     ma = x
     if random == True:
         rando = [ma[i] for i in np.random.randint(0,len(ma),len(ma))]
         return rando
     if random == False:
          for i in range(len(ma)):
          ave = len(ma)//3
          if i < ave:
             shuffled.append(ma[i+ave])
          else:
             shuffled.append(ma[i-ave])    
     return shuffled

回答 16

您可以使用随机播放或采样。两者均来自随机模块。

import random
def shuffle(arr1):
    n=len(arr1)
    b=random.sample(arr1,n)
    return b

要么

import random
def shuffle(arr1):
    random.shuffle(arr1)
    return arr1

you can either use shuffle or sample . both of which come from random module.

import random
def shuffle(arr1):
    n=len(arr1)
    b=random.sample(arr1,n)
    return b

OR

import random
def shuffle(arr1):
    random.shuffle(arr1)
    return arr1

回答 17

确保您没有命名源文件random.py,并且工作目录中没有名为random.pyc ..的文件,这可能会导致程序尝试导入本地random.py文件而不是pythons random模块。

Make sure you are not naming your source file random.py, and that there is not a file in your working directory called random.pyc.. either could cause your program to try and import your local random.py file instead of pythons random module.


回答 18

def shuffle(_list):
    if not _list == []:
        import random
        list2 = []
        while _list != []:
            card = random.choice(_list)
            _list.remove(card)
            list2.append(card)
        while list2 != []:
            card1 = list2[0]
            list2.remove(card1)
            _list.append(card1)
        return _list
def shuffle(_list):
    if not _list == []:
        import random
        list2 = []
        while _list != []:
            card = random.choice(_list)
            _list.remove(card)
            list2.append(card)
        while list2 != []:
            card1 = list2[0]
            list2.remove(card1)
            _list.append(card1)
        return _list

回答 19

import random
class a:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1.foo,a2.foo]
random.shuffle(b)
import random
class a:
    foo = "bar"

a1 = a()
a2 = a()
b = [a1.foo,a2.foo]
random.shuffle(b)

回答 20

改组过程是“有替换的”,因此每个项目的出现可能会改变!至少当列表中的项目也同时列出时。

例如,

ml = [[0], [1]] * 10

后,

random.shuffle(ml)

[0]的数目可以是9或8,但不完全是10。

The shuffling process is “with replacement”, so the occurrence of each item may change! At least when when items in your list is also list.

E.g.,

ml = [[0], [1]] * 10

After,

random.shuffle(ml)

The number of [0] may be 9 or 8, but not exactly 10.


回答 21

计划:无需依赖库就可以完成改组工作。示例:从元素0的开头开始浏览列表;找到一个新的随机位置,例如6,将0的值放在6中,将6的值放在0中。移到元素1并重复此过程,以此类推。

import random
iteration = random.randint(2, 100)
temp_var = 0
while iteration > 0:

    for i in range(1, len(my_list)): # have to use range with len()
        for j in range(1, len(my_list) - i):
            # Using temp_var as my place holder so I don't lose values
            temp_var = my_list[i]
            my_list[i] = my_list[j]
            my_list[j] = temp_var

        iteration -= 1

Plan: Write out the shuffle without relying on a library to do the heavy lifting. Example: Go through the list from the beginning starting with element 0; find a new random position for it, say 6, put 0’s value in 6 and 6’s value in 0. Move on to element 1 and repeat this process, and so on through the rest of the list

import random
iteration = random.randint(2, 100)
temp_var = 0
while iteration > 0:

    for i in range(1, len(my_list)): # have to use range with len()
        for j in range(1, len(my_list) - i):
            # Using temp_var as my place holder so I don't lose values
            temp_var = my_list[i]
            my_list[i] = my_list[j]
            my_list[j] = temp_var

        iteration -= 1

回答 22

它工作正常。我在这里尝试使用功能作为列表对象:

    from random import shuffle

    def foo1():
        print "foo1",

    def foo2():
        print "foo2",

    def foo3():
        print "foo3",

    A=[foo1,foo2,foo3]

    for x in A:
        x()

    print "\r"

    shuffle(A)
    for y in A:
        y()

它打印出来:foo1 foo2 foo3 foo2 foo3 foo1(最后一行中的foos具有随机​​顺序)

It works fine. I am trying it here with functions as list objects:

    from random import shuffle

    def foo1():
        print "foo1",

    def foo2():
        print "foo2",

    def foo3():
        print "foo3",

    A=[foo1,foo2,foo3]

    for x in A:
        x()

    print "\r"

    shuffle(A)
    for y in A:
        y()

It prints out: foo1 foo2 foo3 foo2 foo3 foo1 (the foos in the last row have a random order)


如何在保留订单的同时从列表中删除重复项?

问题:如何在保留订单的同时从列表中删除重复项?

是否有内置的程序在保留顺序的同时从Python列表中删除重复项?我知道我可以使用集合来删除重复项,但这会破坏原始顺序。我也知道我可以这样滚动自己:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(感谢您放松代码示例。)

但是如果可能的话,我想利用一个内置的或更Pythonic的习惯用法。

相关问题:在Python中,从列表中删除重复项以使所有元素在保持顺序唯一的同时最快的算法是什么?

Is there a built-in that removes duplicates from list in Python, whilst preserving order? I know that I can use a set to remove duplicates, but that destroys the original order. I also know that I can roll my own like this:

def uniq(input):
  output = []
  for x in input:
    if x not in output:
      output.append(x)
  return output

(Thanks to unwind for that code sample.)

But I’d like to avail myself of a built-in or a more Pythonic idiom if possible.

Related question: In Python, what is the fastest algorithm for removing duplicates from a list so that all elements are unique while preserving order?


回答 0

在这里,您有一些选择:http : //www.peterbe.com/plog/uniqifiers-benchmark

最快的:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

为什么要分配seen.addseen_add而不是仅打电话给seen.add?Python是一种动态语言,与解决seen.add局部变量相比,解决每次迭代的成本更高。seen.add可能在两次迭代之间发生了变化,并且运行时不够智能,无法排除这种情况。为了安全起见,它必须每次检查对象。

如果您打算在同一数据集上大量使用此功能,则最好使用有序集:http : //code.activestate.com/recipes/528878/

O(1)每个操作的插入,删除和成员检查。

(小的附加说明:seen.add()始终返回None,因此or以上内容仅是尝试进行集合更新的一种方法,而不是逻辑测试的组成部分。)

Here you have some alternatives: http://www.peterbe.com/plog/uniqifiers-benchmark

Fastest one:

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

Why assign seen.add to seen_add instead of just calling seen.add? Python is a dynamic language, and resolving seen.add each iteration is more costly than resolving a local variable. seen.add could have changed between iterations, and the runtime isn’t smart enough to rule that out. To play it safe, it has to check the object each time.

If you plan on using this function a lot on the same dataset, perhaps you would be better off with an ordered set: http://code.activestate.com/recipes/528878/

O(1) insertion, deletion and member-check per operation.

(Small additional note: seen.add() always returns None, so the or above is there only as a way to attempt a set update, and not as an integral part of the logical test.)


回答 1

编辑2016

正如Raymond所指出的那样,在OrderedDictC语言中实现的python 3.5+中,列表理解方法将比OrderedDict(除非您实际上需要列表的末尾-甚至在输入非常短的情况下)慢一些。因此,针对3.5+的最佳解决方案是OrderedDict

重要编辑2015

@abarnert所述,more_itertools库(pip install more_itertools)包含一个unique_everseen为解决此问题而构建的函数,列表理解中没有任何不可读的not seen.add突变。这也是最快的解决方案:

>>> from  more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]

只需导入一个简单的库,就不会有黑客入侵。这来自itertools配方的实现,unique_everseen如下所示:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

在Python中2.7+这种用法公认的通用用法(它可以工作,但没有针对速度进行优化,我现在将使用unique_everseencollections.OrderedDict

运行时间:O(N)

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

看起来比:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

并且没有利用丑陋的hack

not seen.add(x)

这取决于set.add始终返回的就地方法的事实,None因此not None求值为True

但是请注意,尽管破解解决方案具有相同的运行时复杂度O(N),但其原始速度更快。

Edit 2016

As Raymond pointed out, in python 3.5+ where OrderedDict is implemented in C, the list comprehension approach will be slower than OrderedDict (unless you actually need the list at the end – and even then, only if the input is very short). So the best solution for 3.5+ is OrderedDict.

Important Edit 2015

As @abarnert notes, the more_itertools library (pip install more_itertools) contains a unique_everseen function that is built to solve this problem without any unreadable (not seen.add) mutations in list comprehensions. This is also the fastest solution too:

>>> from  more_itertools import unique_everseen
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(unique_everseen(items))
[1, 2, 0, 3]

Just one simple library import and no hacks. This comes from an implementation of the itertools recipe unique_everseen which looks like:

def unique_everseen(iterable, key=None):
    "List unique elements, preserving order. Remember all elements ever seen."
    # unique_everseen('AAAABBBCCDAABBB') --> A B C D
    # unique_everseen('ABBCcAD', str.lower) --> A B C D
    seen = set()
    seen_add = seen.add
    if key is None:
        for element in filterfalse(seen.__contains__, iterable):
            seen_add(element)
            yield element
    else:
        for element in iterable:
            k = key(element)
            if k not in seen:
                seen_add(k)
                yield element

In Python 2.7+ the accepted common idiom (which works but isn’t optimized for speed, I would now use unique_everseen) for this uses collections.OrderedDict:

Runtime: O(N)

>>> from collections import OrderedDict
>>> items = [1, 2, 0, 1, 3, 2]
>>> list(OrderedDict.fromkeys(items))
[1, 2, 0, 3]

This looks much nicer than:

seen = set()
[x for x in seq if x not in seen and not seen.add(x)]

and doesn’t utilize the ugly hack:

not seen.add(x)

which relies on the fact that set.add is an in-place method that always returns None so not None evaluates to True.

Note however that the hack solution is faster in raw speed though it has the same runtime complexity O(N).


回答 2

在Python 2.7中,从迭代器中删除重复项并同时保持其原始顺序的新方法是:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

在Python 3.5中,OrderedDict具有C实现。我的时间表明,这是Python 3.5各种方法中最快也是最短的。

在Python 3.6中,常规字典变得有序且紧凑。(此功能适用于CPython和PyPy,但在其他实现中可能不存在)。这为我们提供了一种在保留订单的同时进行重复数据删除的最快方法:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

在Python 3.7中,保证常规dict在所有实现中都排序。 因此,最短,最快的解决方案是:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

对@max的响应:一旦移至3.6或3.7并使用常规dict而不是OrderedDict,就无法真正以其他任何方式击败性能。该词典非常密集,几乎可以无开销地转换为列表。目标列表的大小预先设置为len(d),这样可以保存列表推导中发生的所有调整大小。同样,由于内部键列表很密集,因此指针的复制几乎快如列表副本。

In Python 2.7, the new way of removing duplicates from an iterable while keeping it in the original order is:

>>> from collections import OrderedDict
>>> list(OrderedDict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.5, the OrderedDict has a C implementation. My timings show that this is now both the fastest and shortest of the various approaches for Python 3.5.

In Python 3.6, the regular dict became both ordered and compact. (This feature is holds for CPython and PyPy but may not present in other implementations). That gives us a new fastest way of deduping while retaining order:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

In Python 3.7, the regular dict is guaranteed to both ordered across all implementations. So, the shortest and fastest solution is:

>>> list(dict.fromkeys('abracadabra'))
['a', 'b', 'r', 'c', 'd']

Response to @max: Once you move to 3.6 or 3.7 and use the regular dict instead of OrderedDict, you can’t really beat the performance in any other way. The dictionary is dense and readily converts to a list with almost no overhead. The target list is pre-sized to len(d) which saves all the resizes that occur in a list comprehension. Also, since the internal key list is dense, copying the pointers is about almost fast as a list copy.


回答 3

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

独特→ ['1', '2', '3', '6', '4', '5']

sequence = ['1', '2', '3', '3', '6', '4', '5', '6']
unique = []
[unique.append(item) for item in sequence if item not in unique]

unique → ['1', '2', '3', '6', '4', '5']


回答 4

不要踢死马(这个问题很老了,已经有很多好的答案了),但是这里有一种使用熊猫的解决方案,在很多情况下都非常快,而且使用起来很简单。

import pandas as pd

my_list = [0, 1, 2, 3, 4, 1, 2, 3, 5]

>>> pd.Series(my_list).drop_duplicates().tolist()
# Output:
# [0, 1, 2, 3, 4, 5]

Not to kick a dead horse (this question is very old and already has lots of good answers), but here is a solution using pandas that is quite fast in many circumstances and is dead simple to use.

import pandas as pd

my_list = [0, 1, 2, 3, 4, 1, 2, 3, 5]

>>> pd.Series(my_list).drop_duplicates().tolist()
# Output:
# [0, 1, 2, 3, 4, 5]

回答 5

from itertools import groupby
[ key for key,_ in groupby(sortedList)]

该列表甚至不必排序,充分的条件是将相等的值分组在一起。

编辑:我认为“保留顺序”意味着该列表实际上是有序的。如果不是这种情况,那么MizardX的解决方案就是正确的解决方案。

社区编辑:但是,这是“将重复的连续元素压缩为单个元素”的最优雅的方法。

from itertools import groupby
[ key for key,_ in groupby(sortedList)]

The list doesn’t even have to be sorted, the sufficient condition is that equal values are grouped together.

Edit: I assumed that “preserving order” implies that the list is actually ordered. If this is not the case, then the solution from MizardX is the right one.

Community edit: This is however the most elegant way to “compress duplicate consecutive elements into a single element”.


回答 6

我想如果您想维持订单,

您可以尝试以下方法:

list1 = ['b','c','d','b','c','a','a']    
list2 = list(set(list1))    
list2.sort(key=list1.index)    
print list2

或者类似地,您可以执行以下操作:

list1 = ['b','c','d','b','c','a','a']  
list2 = sorted(set(list1),key=list1.index)  
print list2 

您也可以这样做:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
for i in list1:    
    if not i in list2:  
        list2.append(i)`    
print list2

也可以这样写:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
[list2.append(i) for i in list1 if not i in list2]    
print list2 

I think if you wanna maintain the order,

you can try this:

list1 = ['b','c','d','b','c','a','a']    
list2 = list(set(list1))    
list2.sort(key=list1.index)    
print list2

OR similarly you can do this:

list1 = ['b','c','d','b','c','a','a']  
list2 = sorted(set(list1),key=list1.index)  
print list2 

You can also do this:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
for i in list1:    
    if not i in list2:  
        list2.append(i)`    
print list2

It can also be written as this:

list1 = ['b','c','d','b','c','a','a']    
list2 = []    
[list2.append(i) for i in list1 if not i in list2]    
print list2 

回答 7

Python 3.7及更高版本中,保证字典记住其键插入顺序。这个问题的答案总结了当前的状况。

OrderedDict因此,该解决方案变得过时了,没有任何导入语句,我们可以简单地发出:

>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> list(dict.fromkeys(lst))
[1, 2, 3, 4]

In Python 3.7 and above, dictionaries are guaranteed to remember their key insertion order. The answer to this question summarizes the current state of affairs.

The OrderedDict solution thus becomes obsolete and without any import statements we can simply issue:

>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> list(dict.fromkeys(lst))
[1, 2, 3, 4]

回答 8

对于另一个非常老的问题的另一个非常晚的答案:

itertools食谱有做到这一点,利用函数seen集技术,但是:

  • 处理标准key功能。
  • 不使用不雅观的骇客。
  • 通过预绑定seen.add而不是查找N次来优化循环。(f7也可以这样做,但某些版本则不能。)
  • 通过使用来优化循环ifilterfalse,因此您只需要遍历Python中的唯一元素,而不是所有元素。(ifilterfalse当然,您仍然可以在内部遍历所有这些对象,但这是在C中,而且速度更快。)

实际上比它快f7吗?这取决于您的数据,因此您必须对其进行测试并查看。如果您最后想要一个列表,请f7使用listcomp,在这里没有办法做到这一点。(您可以直接append代替yielding,也可以将生成器输入到list函数中,但没有一个可以比listcomp内的LIST_APPEND快。)无论如何,通常挤出几微秒的时间不会像之所以重要,是因为它具有易于理解,可重用,已编写的功能,在您要装饰时不需要DSU。

与所有食谱一样,它也可以在中找到more-iterools

如果只希望没有key条件,可以将其简化为:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

For another very late answer to another very old question:

The itertools recipes have a function that does this, using the seen set technique, but:

  • Handles a standard key function.
  • Uses no unseemly hacks.
  • Optimizes the loop by pre-binding seen.add instead of looking it up N times. (f7 also does this, but some versions don’t.)
  • Optimizes the loop by using ifilterfalse, so you only have to loop over the unique elements in Python, instead of all of them. (You still iterate over all of them inside ifilterfalse, of course, but that’s in C, and much faster.)

Is it actually faster than f7? It depends on your data, so you’ll have to test it and see. If you want a list in the end, f7 uses a listcomp, and there’s no way to do that here. (You can directly append instead of yielding, or you can feed the generator into the list function, but neither one can be as fast as the LIST_APPEND inside a listcomp.) At any rate, usually, squeezing out a few microseconds is not going to be as important as having an easily-understandable, reusable, already-written function that doesn’t require DSU when you want to decorate.

As with all of the recipes, it’s also available in more-iterools.

If you just want the no-key case, you can simplify it as:

def unique(iterable):
    seen = set()
    seen_add = seen.add
    for element in itertools.ifilterfalse(seen.__contains__, iterable):
        seen_add(element)
        yield element

回答 9

刚添加另一个(非常高性能的)实现的这样的功能从外部模块1iteration_utilities.unique_everseen

>>> from iteration_utilities import unique_everseen
>>> lst = [1,1,1,2,3,2,2,2,1,3,4]

>>> list(unique_everseen(lst))
[1, 2, 3, 4]

时机

我做了一些计时(Python的3.6),这些表明,它比我测试的所有其他办法,包括更快OrderedDict.fromkeysf7并且more_itertools.unique_everseen

%matplotlib notebook

from iteration_utilities import unique_everseen
from collections import OrderedDict
from more_itertools import unique_everseen as mi_unique_everseen

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

def iteration_utilities_unique_everseen(seq):
    return list(unique_everseen(seq))

def more_itertools_unique_everseen(seq):
    return list(mi_unique_everseen(seq))

def odict(seq):
    return list(OrderedDict.fromkeys(seq))

from simple_benchmark import benchmark

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: list(range(2**i)) for i in range(1, 20)},
              'list size (no duplicates)')
b.plot()

在此处输入图片说明

并确保我也进行了更多重复的测试,以检查是否有所不同:

import random

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(1, 20)},
              'list size (lots of duplicates)')
b.plot()

在此处输入图片说明

还有一个仅包含一个值:

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [1]*(2**i) for i in range(1, 20)},
              'list size (only duplicates)')
b.plot()

在此处输入图片说明

在所有这些情况下,该iteration_utilities.unique_everseen功能都是最快的(在我的计算机上)。


iteration_utilities.unique_everseen函数还可以处理输入中不可散列的值(但是使用O(n*n)性能而不是O(n)可散列值时的性能)。

>>> lst = [{1}, {1}, {2}, {1}, {3}]

>>> list(unique_everseen(lst))
[{1}, {2}, {3}]

1免责声明:我是该软件包的作者。

Just to add another (very performant) implementation of such a functionality from an external module1: iteration_utilities.unique_everseen:

>>> from iteration_utilities import unique_everseen
>>> lst = [1,1,1,2,3,2,2,2,1,3,4]

>>> list(unique_everseen(lst))
[1, 2, 3, 4]

Timings

I did some timings (Python 3.6) and these show that it’s faster than all other alternatives I tested, including OrderedDict.fromkeys, f7 and more_itertools.unique_everseen:

%matplotlib notebook

from iteration_utilities import unique_everseen
from collections import OrderedDict
from more_itertools import unique_everseen as mi_unique_everseen

def f7(seq):
    seen = set()
    seen_add = seen.add
    return [x for x in seq if not (x in seen or seen_add(x))]

def iteration_utilities_unique_everseen(seq):
    return list(unique_everseen(seq))

def more_itertools_unique_everseen(seq):
    return list(mi_unique_everseen(seq))

def odict(seq):
    return list(OrderedDict.fromkeys(seq))

from simple_benchmark import benchmark

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: list(range(2**i)) for i in range(1, 20)},
              'list size (no duplicates)')
b.plot()

enter image description here

And just to make sure I also did a test with more duplicates just to check if it makes a difference:

import random

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [random.randint(0, 2**(i-1)) for _ in range(2**i)] for i in range(1, 20)},
              'list size (lots of duplicates)')
b.plot()

enter image description here

And one containing only one value:

b = benchmark([f7, iteration_utilities_unique_everseen, more_itertools_unique_everseen, odict],
              {2**i: [1]*(2**i) for i in range(1, 20)},
              'list size (only duplicates)')
b.plot()

enter image description here

In all of these cases the iteration_utilities.unique_everseen function is the fastest (on my computer).


This iteration_utilities.unique_everseen function can also handle unhashable values in the input (however with an O(n*n) performance instead of the O(n) performance when the values are hashable).

>>> lst = [{1}, {1}, {2}, {1}, {3}]

>>> list(unique_everseen(lst))
[{1}, {2}, {3}]

1 Disclaimer: I’m the author of that package.


回答 10

对于没有可散列的类型(例如列表列表),基于MizardX:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

For no hashable types (e.g. list of lists), based on MizardX’s:

def f7_noHash(seq)
    seen = set()
    return [ x for x in seq if str( x ) not in seen and not seen.add( str( x ) )]

回答 11

借用在定义Haskell nub函数的列表时使用的递归思想,这将是一种递归方法:

def unique(lst):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))

例如:

In [118]: unique([1,5,1,1,4,3,4])
Out[118]: [1, 5, 4, 3]

我尝试使用它来增加数据量,并看到了次线性时间复杂度(不确定,但建议对于正常数据应该很好)。

In [122]: %timeit unique(np.random.randint(5, size=(1)))
10000 loops, best of 3: 25.3 us per loop

In [123]: %timeit unique(np.random.randint(5, size=(10)))
10000 loops, best of 3: 42.9 us per loop

In [124]: %timeit unique(np.random.randint(5, size=(100)))
10000 loops, best of 3: 132 us per loop

In [125]: %timeit unique(np.random.randint(5, size=(1000)))
1000 loops, best of 3: 1.05 ms per loop

In [126]: %timeit unique(np.random.randint(5, size=(10000)))
100 loops, best of 3: 11 ms per loop

我也认为很有趣的是,其他操作可以很容易地将其推广到唯一性。像这样:

import operator
def unique(lst, cmp_op=operator.ne):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)

例如,您可以传入一个函数,该函数使用舍入的概念将其舍入为相同的整数,就好像它是“相等”是出于唯一性目的一样,如下所示:

def test_round(x,y):
    return round(x) != round(y)

然后unique(some_list,test_round)将提供列表的唯一元素,其中唯一性不再意味着传统的相等性(这是通过使用任何基于集合或基于dict-key的方法来解决的),而是意味着对于每个可能舍入到的整数K,只有第一个舍入到K的元素,例如:

In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
Out[6]: [1.2, 5, 1.9, 4.2, 3]

Borrowing the recursive idea used in definining Haskell’s nub function for lists, this would be a recursive approach:

def unique(lst):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: x!= lst[0], lst[1:]))

e.g.:

In [118]: unique([1,5,1,1,4,3,4])
Out[118]: [1, 5, 4, 3]

I tried it for growing data sizes and saw sub-linear time-complexity (not definitive, but suggests this should be fine for normal data).

In [122]: %timeit unique(np.random.randint(5, size=(1)))
10000 loops, best of 3: 25.3 us per loop

In [123]: %timeit unique(np.random.randint(5, size=(10)))
10000 loops, best of 3: 42.9 us per loop

In [124]: %timeit unique(np.random.randint(5, size=(100)))
10000 loops, best of 3: 132 us per loop

In [125]: %timeit unique(np.random.randint(5, size=(1000)))
1000 loops, best of 3: 1.05 ms per loop

In [126]: %timeit unique(np.random.randint(5, size=(10000)))
100 loops, best of 3: 11 ms per loop

I also think it’s interesting that this could be readily generalized to uniqueness by other operations. Like this:

import operator
def unique(lst, cmp_op=operator.ne):
    return [] if lst==[] else [lst[0]] + unique(filter(lambda x: cmp_op(x, lst[0]), lst[1:]), cmp_op)

For example, you could pass in a function that uses the notion of rounding to the same integer as if it was “equality” for uniqueness purposes, like this:

def test_round(x,y):
    return round(x) != round(y)

then unique(some_list, test_round) would provide the unique elements of the list where uniqueness no longer meant traditional equality (which is implied by using any sort of set-based or dict-key-based approach to this problem) but instead meant to take only the first element that rounds to K for each possible integer K that the elements might round to, e.g.:

In [6]: unique([1.2, 5, 1.9, 1.1, 4.2, 3, 4.8], test_round)
Out[6]: [1.2, 5, 1.9, 4.2, 3]

回答 12

5倍速减少变体但更复杂

>>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]

说明:

default = (list(), set())
# use list to keep order
# use set to make lookup faster

def reducer(result, item):
    if item not in result[1]:
        result[0].append(item)
        result[1].add(item)
    return result

>>> reduce(reducer, l, default)[0]
[5, 6, 1, 2, 3, 4]

5 x faster reduce variant but more sophisticated

>>> l = [5, 6, 6, 1, 1, 2, 2, 3, 4]
>>> reduce(lambda r, v: v in r[1] and r or (r[0].append(v) or r[1].add(v)) or r, l, ([], set()))[0]
[5, 6, 1, 2, 3, 4]

Explanation:

default = (list(), set())
# use list to keep order
# use set to make lookup faster

def reducer(result, item):
    if item not in result[1]:
        result[0].append(item)
        result[1].add(item)
    return result

>>> reduce(reducer, l, default)[0]
[5, 6, 1, 2, 3, 4]

回答 13

您可以引用列表理解,因为它是由符号“ _ [1]”构建的。
例如,以下函数通过引用元素列表理解来唯一化元素列表,而不更改其顺序。

def unique(my_list): 
    return [x for x in my_list if x not in locals()['_[1]']]

演示:

l1 = [1, 2, 3, 4, 1, 2, 3, 4, 5]
l2 = [x for x in l1 if x not in locals()['_[1]']]
print l2

输出:

[1, 2, 3, 4, 5]

You can reference a list comprehension as it is being built by the symbol ‘_[1]’.
For example, the following function unique-ifies a list of elements without changing their order by referencing its list comprehension.

def unique(my_list): 
    return [x for x in my_list if x not in locals()['_[1]']]

Demo:

l1 = [1, 2, 3, 4, 1, 2, 3, 4, 5]
l2 = [x for x in l1 if x not in locals()['_[1]']]
print l2

Output:

[1, 2, 3, 4, 5]

回答 14

MizardX的答案很好地总结了多种方法。

这是我在大声思考时想到的:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

MizardX’s answer gives a good collection of multiple approaches.

This is what I came up with while thinking aloud:

mylist = [x for i,x in enumerate(mylist) if x not in mylist[i+1:]]

回答 15

这是一种简单的方法:

list1 = ["hello", " ", "w", "o", "r", "l", "d"]
sorted(set(list1 ), key=lambda x:list1.index(x))

给出输出:

["hello", " ", "w", "o", "r", "l", "d"]

here is a simple way to do it:

list1 = ["hello", " ", "w", "o", "r", "l", "d"]
sorted(set(list1 ), key=lambda x:list1.index(x))

that gives the output:

["hello", " ", "w", "o", "r", "l", "d"]

回答 16

您可以进行某种丑陋的列表理解技巧。

[l[i] for i in range(len(l)) if l.index(l[i]) == i]

You could do a sort of ugly list comprehension hack.

[l[i] for i in range(len(l)) if l.index(l[i]) == i]

回答 17

相对有效_sorted_numpy数组方法:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])

输出:

array([ 1,  3,  8, 12])

Relatively effective approach with _sorted_ a numpy arrays:

b = np.array([1,3,3, 8, 12, 12,12])    
numpy.hstack([b[0], [x[0] for x in zip(b[1:], b[:-1]) if x[0]!=x[1]]])

Outputs:

array([ 1,  3,  8, 12])

回答 18

l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))

生成器表达式使用集合的O(1)查找来确定是否在新列表中包括元素。

l = [1,2,2,3,3,...]
n = []
n.extend(ele for ele in l if ele not in set(n))

A generator expression that uses the O(1) look up of a set to determine whether or not to include an element in the new list.


回答 19

一个简单的递归解决方案:

def uniquefy_list(a):
    return uniquefy_list(a[1:]) if a[0] in a[1:] else [a[0]]+uniquefy_list(a[1:]) if len(a)>1 else [a[0]]

A simple recursive solution:

def uniquefy_list(a):
    return uniquefy_list(a[1:]) if a[0] in a[1:] else [a[0]]+uniquefy_list(a[1:]) if len(a)>1 else [a[0]]

回答 20

消除序列中的重复值,但保留其余项目的顺序。使用通用发生器功能。

# for hashable sequence
def remove_duplicates(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

a = [1, 5, 2, 1, 9, 1, 5, 10]
list(remove_duplicates(a))
# [1, 5, 2, 9, 10]



# for unhashable sequence
def remove_duplicates(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

a = [ {'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
list(remove_duplicates(a, key=lambda d: (d['x'],d['y'])))
# [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

Eliminating the duplicate values in a sequence, but preserve the order of the remaining items. Use of general purpose generator function.

# for hashable sequence
def remove_duplicates(items):
    seen = set()
    for item in items:
        if item not in seen:
            yield item
            seen.add(item)

a = [1, 5, 2, 1, 9, 1, 5, 10]
list(remove_duplicates(a))
# [1, 5, 2, 9, 10]



# for unhashable sequence
def remove_duplicates(items, key=None):
    seen = set()
    for item in items:
        val = item if key is None else key(item)
        if val not in seen:
            yield item
            seen.add(val)

a = [ {'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 1, 'y': 2}, {'x': 2, 'y': 4}]
list(remove_duplicates(a, key=lambda d: (d['x'],d['y'])))
# [{'x': 1, 'y': 2}, {'x': 1, 'y': 3}, {'x': 2, 'y': 4}]

回答 21

如果您需要一支班轮,那么这可能会有所帮助:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))

…应该可以,但是如果我错了,请纠正我

If you need one liner then maybe this would help:

reduce(lambda x, y: x + y if y[0] not in x else x, map(lambda x: [x],lst))

… should work but correct me if i’m wrong


回答 22

如果您经常使用pandas,并且美观优先于性能,那么请考虑内置功能pandas.Series.drop_duplicates

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

定时:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop

If you routinely use pandas, and aesthetics is preferred over performance, then consider the built-in function pandas.Series.drop_duplicates:

    import pandas as pd
    import numpy as np

    uniquifier = lambda alist: pd.Series(alist).drop_duplicates().tolist()

    # from the chosen answer 
    def f7(seq):
        seen = set()
        seen_add = seen.add
        return [ x for x in seq if not (x in seen or seen_add(x))]

    alist = np.random.randint(low=0, high=1000, size=10000).tolist()

    print uniquifier(alist) == f7(alist)  # True

Timing:

    In [104]: %timeit f7(alist)
    1000 loops, best of 3: 1.3 ms per loop
    In [110]: %timeit uniquifier(alist)
    100 loops, best of 3: 4.39 ms per loop

回答 23

这将保留订单并在O(n)时间运行。基本上,这个想法是在发现重复的地方创建一个孔,并将其下沉到底部。利用读写指针。每当发现重复项时,只有读指针前进,写指针停留在重复项上以覆盖它。

def deduplicate(l):
    count = {}
    (read,write) = (0,0)
    while read < len(l):
        if l[read] in count:
            read += 1
            continue
        count[l[read]] = True
        l[write] = l[read]
        read += 1
        write += 1
    return l[0:write]

this will preserve order and run in O(n) time. basically the idea is to create a hole wherever there is a duplicate found and sink it down to the bottom. makes use of a read and write pointer. whenever a duplicate is found only the read pointer advances and write pointer stays on the duplicate entry to overwrite it.

def deduplicate(l):
    count = {}
    (read,write) = (0,0)
    while read < len(l):
        if l[read] in count:
            read += 1
            continue
        count[l[read]] = True
        l[write] = l[read]
        read += 1
        write += 1
    return l[0:write]

回答 24

不使用导入的模块或集的解决方案:

text = "ask not what your country can do for you ask what you can do for your country"
sentence = text.split(" ")
noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
print(noduplicates)

给出输出:

['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']

A solution without using imported modules or sets:

text = "ask not what your country can do for you ask what you can do for your country"
sentence = text.split(" ")
noduplicates = [(sentence[i]) for i in range (0,len(sentence)) if sentence[i] not in sentence[:i]]
print(noduplicates)

Gives output:

['ask', 'not', 'what', 'your', 'country', 'can', 'do', 'for', 'you']

回答 25

就地方法

此方法是二次方的,因为我们对列表的每个元素都有一个线性查找列表(由于存在以下原因,我们必须增加重新排列列表的成本) del)。

就是说,如果我们从列表的末尾开始,朝着原点前进,删除子列表左侧的每个术语,就有可能进行适当的操作

代码中的这个想法很简单

for i in range(len(l)-1,0,-1): 
    if l[i] in l[:i]: del l[i] 

实施的简单测试

In [91]: from random import randint, seed                                                                                            
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
In [93]: for i in range(len(l)-1,0,-1): 
    ...:     print(l) 
    ...:     print(i, l[i], l[:i], end='') 
    ...:     if l[i] in l[:i]: 
    ...:          print( ': remove', l[i]) 
    ...:          del l[i] 
    ...:     else: 
    ...:          print() 
    ...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]

In [94]:                                                                                                                             

An in-place method

This method is quadratic, because we have a linear lookup into the list for every element of the list (to that we have to add the cost of rearranging the list because of the del s).

That said, it is possible to operate in place if we start from the end of the list and proceed toward the origin removing each term that is present in the sub-list at its left

This idea in code is simply

for i in range(len(l)-1,0,-1): 
    if l[i] in l[:i]: del l[i] 

A simple test of the implementation

In [91]: from random import randint, seed                                                                                            
In [92]: seed('20080808') ; l = [randint(1,6) for _ in range(12)] # Beijing Olympics                                                                 
In [93]: for i in range(len(l)-1,0,-1): 
    ...:     print(l) 
    ...:     print(i, l[i], l[:i], end='') 
    ...:     if l[i] in l[:i]: 
    ...:          print( ': remove', l[i]) 
    ...:          del l[i] 
    ...:     else: 
    ...:          print() 
    ...: print(l)
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5, 2]
11 2 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4, 5]
10 5 [6, 5, 1, 4, 6, 1, 6, 2, 2, 4]: remove 5
[6, 5, 1, 4, 6, 1, 6, 2, 2, 4]
9 4 [6, 5, 1, 4, 6, 1, 6, 2, 2]: remove 4
[6, 5, 1, 4, 6, 1, 6, 2, 2]
8 2 [6, 5, 1, 4, 6, 1, 6, 2]: remove 2
[6, 5, 1, 4, 6, 1, 6, 2]
7 2 [6, 5, 1, 4, 6, 1, 6]
[6, 5, 1, 4, 6, 1, 6, 2]
6 6 [6, 5, 1, 4, 6, 1]: remove 6
[6, 5, 1, 4, 6, 1, 2]
5 1 [6, 5, 1, 4, 6]: remove 1
[6, 5, 1, 4, 6, 2]
4 6 [6, 5, 1, 4]: remove 6
[6, 5, 1, 4, 2]
3 4 [6, 5, 1]
[6, 5, 1, 4, 2]
2 1 [6, 5]
[6, 5, 1, 4, 2]
1 5 [6]
[6, 5, 1, 4, 2]

In [94]:                                                                                                                             

回答 26

zmk的方法使用列表理解,它非常快,但自然保持顺序。对于区分大小写的字符串,可以轻松对其进行修改。这也保留了原始情况。

def DelDupes(aseq) :
    seen = set()
    return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]

紧密相关的功能是:

def HasDupes(aseq) :
    s = set()
    return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)

def GetDupes(aseq) :
    s = set()
    return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))

zmk’s approach uses list comprehension which is very fast, yet keeps the order naturally. For applying to case sensitive strings it can be easily modified. This also preserves the original case.

def DelDupes(aseq) :
    seen = set()
    return [x for x in aseq if (x.lower() not in seen) and (not seen.add(x.lower()))]

Closely associated functions are:

def HasDupes(aseq) :
    s = set()
    return any(((x.lower() in s) or s.add(x.lower())) for x in aseq)

def GetDupes(aseq) :
    s = set()
    return set(x for x in aseq if ((x.lower() in s) or s.add(x.lower())))

回答 27

熊猫用户应退房pandas.unique

>>> import pandas as pd
>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> pd.unique(lst)
array([1, 2, 3, 4])

该函数返回一个NumPy数组。如果需要,可以使用tolist方法将其转换为列表。

pandas users should check out pandas.unique.

>>> import pandas as pd
>>> lst = [1, 2, 1, 3, 3, 2, 4]
>>> pd.unique(lst)
array([1, 2, 3, 4])

The function returns a NumPy array. If needed, you can convert it to a list with the tolist method.


回答 28

一种班轮清单理解:

values_non_duplicated = [value for index, value in enumerate(values) if value not in values[ : index]]

只需添加一个条件以检查该不在先前位置

One liner list comprehension:

values_non_duplicated = [value for index, value in enumerate(values) if value not in values[ : index]]

Simply add a conditional to check that value is not on a previous position


将字典的字符串表示形式转换为字典?

问题:将字典的字符串表示形式转换为字典?

如何将a的str表示形式(dict例如以下字符串)转换为a dict

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

我宁愿不使用eval。我还能使用什么?

这样做的主要原因是他写的我的同事类之一,将所有输入都转换为字符串。我不打算去修改他的类,以解决这个问题。

How can I convert the str representation of a dict, such as the following string, into a dict?

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

I prefer not to use eval. What else can I use?

The main reason for this, is one of my coworkers classes he wrote, converts all input into strings. I’m not in the mood to go and modify his classes, to deal with this issue.


回答 0

从Python 2.6开始,您可以使用内置的ast.literal_eval

>>> import ast
>>> ast.literal_eval("{'muffin' : 'lolz', 'foo' : 'kitty'}")
{'muffin': 'lolz', 'foo': 'kitty'}

这比使用更为安全eval。正如其文档所说:

>>>帮助(ast.literal_eval)
帮助ast模块中的literal_eval函数:

literal_eval(node_or_string)
    安全地评估表达式节点或包含Python的字符串
    表达。提供的字符串或节点只能由以下内容组成
    Python文字结构:字符串,数字,元组,列表,字典,布尔值,
    和没有。

例如:

>>> eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 208, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 206, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: 'mongo'
>>> ast.literal_eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 68, in literal_eval
    return _convert(node_or_string)
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 67, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

Starting in Python 2.6 you can use the built-in ast.literal_eval:

>>> import ast
>>> ast.literal_eval("{'muffin' : 'lolz', 'foo' : 'kitty'}")
{'muffin': 'lolz', 'foo': 'kitty'}

This is safer than using eval. As its own docs say:

>>> help(ast.literal_eval)
Help on function literal_eval in module ast:

literal_eval(node_or_string)
    Safely evaluate an expression node or a string containing a Python
    expression.  The string or node provided may only consist of the following
    Python literal structures: strings, numbers, tuples, lists, dicts, booleans,
    and None.

For example:

>>> eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<string>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 208, in rmtree
    onerror(os.listdir, path, sys.exc_info())
  File "/opt/Python-2.6.1/lib/python2.6/shutil.py", line 206, in rmtree
    names = os.listdir(path)
OSError: [Errno 2] No such file or directory: 'mongo'
>>> ast.literal_eval("shutil.rmtree('mongo')")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 68, in literal_eval
    return _convert(node_or_string)
  File "/opt/Python-2.6.1/lib/python2.6/ast.py", line 67, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

回答 1

https://docs.python.org/3.8/library/json.html

JSON可以解决此问题,尽管其解码器希望在键和值周围使用双引号。如果您不介意更换骇客…

import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}

请注意,如果将单引号作为键或值的一部分,则由于字符替换不当而导致此操作失败。仅当您对评估解决方案强烈反对时,才建议使用此解决方案。

有关JSON单引号的更多信息:jQuery.parseJSON由于JSON中的单引号已转义而引发“无效JSON”错误

https://docs.python.org/3.8/library/json.html

JSON can solve this problem though its decoder wants double quotes around keys and values. If you don’t mind a replace hack…

import json
s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
json_acceptable_string = s.replace("'", "\"")
d = json.loads(json_acceptable_string)
# d = {u'muffin': u'lolz', u'foo': u'kitty'}

NOTE that if you have single quotes as a part of your keys or values this will fail due to improper character replacement. This solution is only recommended if you have a strong aversion to the eval solution.

More about json single quote: jQuery.parseJSON throws “Invalid JSON” error due to escaped single quote in JSON


回答 2

使用json.loads

>>> import json
>>> h = '{"foo":"bar", "foo2":"bar2"}'
>>> d = json.loads(h)
>>> d
{u'foo': u'bar', u'foo2': u'bar2'}
>>> type(d)
<type 'dict'>

using json.loads:

>>> import json
>>> h = '{"foo":"bar", "foo2":"bar2"}'
>>> d = json.loads(h)
>>> d
{u'foo': u'bar', u'foo2': u'bar2'}
>>> type(d)
<type 'dict'>

回答 3

以OP为例:

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

我们可以使用Yaml处理字符串中的这种非标准json:

>>> import yaml
>>> s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> s
"{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> yaml.load(s)
{'muffin': 'lolz', 'foo': 'kitty'}

To OP’s example:

s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"

We can use Yaml to deal with this kind of non-standard json in string:

>>> import yaml
>>> s = "{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> s
"{'muffin' : 'lolz', 'foo' : 'kitty'}"
>>> yaml.load(s)
{'muffin': 'lolz', 'foo': 'kitty'}

回答 4

如果始终可以信任该字符串,则可以使用eval(或literal_eval按建议使用;无论该字符串是什么都是安全的。)否则,您需要一个解析器。如果JSON解析器(例如simplejson)仅存储符合JSON方案的内容,则该解析器将起作用。

If the string can always be trusted, you could use eval (or use literal_eval as suggested; it’s safe no matter what the string is.) Otherwise you need a parser. A JSON parser (such as simplejson) would work if he only ever stores content that fits with the JSON scheme.


回答 5

使用json。该ast库消耗大量内存,并且速度较慢。我有一个过程需要读取156Mb的文本文件。Ast转换字典需要5分钟的延迟,json而使用内存减少60%则需要1分钟!

Use json. the ast library consumes a lot of memory and and slower. I have a process that needs to read a text file of 156Mb. Ast with 5 minutes delay for the conversion dictionary json and 1 minutes using 60% less memory!


回答 6

总结一下:

import ast, yaml, json, timeit

descs=['short string','long string']
strings=['{"809001":2,"848545":2,"565828":1}','{"2979":1,"30581":1,"7296":1,"127256":1,"18803":2,"41619":1,"41312":1,"16837":1,"7253":1,"70075":1,"3453":1,"4126":1,"23599":1,"11465":3,"19172":1,"4019":1,"4775":1,"64225":1,"3235":2,"15593":1,"7528":1,"176840":1,"40022":1,"152854":1,"9878":1,"16156":1,"6512":1,"4138":1,"11090":1,"12259":1,"4934":1,"65581":1,"9747":2,"18290":1,"107981":1,"459762":1,"23177":1,"23246":1,"3591":1,"3671":1,"5767":1,"3930":1,"89507":2,"19293":1,"92797":1,"32444":2,"70089":1,"46549":1,"30988":1,"4613":1,"14042":1,"26298":1,"222972":1,"2982":1,"3932":1,"11134":1,"3084":1,"6516":1,"486617":1,"14475":2,"2127":1,"51359":1,"2662":1,"4121":1,"53848":2,"552967":1,"204081":1,"5675":2,"32433":1,"92448":1}']
funcs=[json.loads,eval,ast.literal_eval,yaml.load]

for  desc,string in zip(descs,strings):
    print('***',desc,'***')
    print('')
    for  func in funcs:
        print(func.__module__+' '+func.__name__+':')
        %timeit func(string)        
    print('')

结果:

*** short string ***

json loads:
4.47 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
builtins eval:
24.1 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ast literal_eval:
30.4 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
yaml load:
504 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

*** long string ***

json loads:
29.6 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
builtins eval:
219 µs ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
ast literal_eval:
331 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yaml load:
9.02 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

结论:更喜欢json.loads

To summarize:

import ast, yaml, json, timeit

descs=['short string','long string']
strings=['{"809001":2,"848545":2,"565828":1}','{"2979":1,"30581":1,"7296":1,"127256":1,"18803":2,"41619":1,"41312":1,"16837":1,"7253":1,"70075":1,"3453":1,"4126":1,"23599":1,"11465":3,"19172":1,"4019":1,"4775":1,"64225":1,"3235":2,"15593":1,"7528":1,"176840":1,"40022":1,"152854":1,"9878":1,"16156":1,"6512":1,"4138":1,"11090":1,"12259":1,"4934":1,"65581":1,"9747":2,"18290":1,"107981":1,"459762":1,"23177":1,"23246":1,"3591":1,"3671":1,"5767":1,"3930":1,"89507":2,"19293":1,"92797":1,"32444":2,"70089":1,"46549":1,"30988":1,"4613":1,"14042":1,"26298":1,"222972":1,"2982":1,"3932":1,"11134":1,"3084":1,"6516":1,"486617":1,"14475":2,"2127":1,"51359":1,"2662":1,"4121":1,"53848":2,"552967":1,"204081":1,"5675":2,"32433":1,"92448":1}']
funcs=[json.loads,eval,ast.literal_eval,yaml.load]

for  desc,string in zip(descs,strings):
    print('***',desc,'***')
    print('')
    for  func in funcs:
        print(func.__module__+' '+func.__name__+':')
        %timeit func(string)        
    print('')

Results:

*** short string ***

json loads:
4.47 µs ± 33.4 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)
builtins eval:
24.1 µs ± 163 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
ast literal_eval:
30.4 µs ± 299 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
yaml load:
504 µs ± 1.29 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

*** long string ***

json loads:
29.6 µs ± 230 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
builtins eval:
219 µs ± 3.92 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
ast literal_eval:
331 µs ± 1.89 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
yaml load:
9.02 ms ± 92.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Conclusion: prefer json.loads


回答 7

string = "{'server1':'value','server2':'value'}"

#Now removing { and }
s = string.replace("{" ,"")
finalstring = s.replace("}" , "")

#Splitting the string based on , we get key value pairs
list = finalstring.split(",")

dictionary ={}
for i in list:
    #Get Key Value pairs separately to store in dictionary
    keyvalue = i.split(":")

    #Replacing the single quotes in the leading.
    m= keyvalue[0].strip('\'')
    m = m.replace("\"", "")
    dictionary[m] = keyvalue[1].strip('"\'')

print dictionary
string = "{'server1':'value','server2':'value'}"

#Now removing { and }
s = string.replace("{" ,"")
finalstring = s.replace("}" , "")

#Splitting the string based on , we get key value pairs
list = finalstring.split(",")

dictionary ={}
for i in list:
    #Get Key Value pairs separately to store in dictionary
    keyvalue = i.split(":")

    #Replacing the single quotes in the leading.
    m= keyvalue[0].strip('\'')
    m = m.replace("\"", "")
    dictionary[m] = keyvalue[1].strip('"\'')

print dictionary

回答 8

没有使用任何库:

dict_format_string = "{'1':'one', '2' : 'two'}"
d = {}
elems  = filter(str.isalnum,dict_format_string.split("'"))
values = elems[1::2]
keys   = elems[0::2]
d.update(zip(keys,values))

注意:由于已进行硬编码,split("'")因此仅适用于“单引号”数据的字符串。

no any libs are used:

dict_format_string = "{'1':'one', '2' : 'two'}"
d = {}
elems  = filter(str.isalnum,dict_format_string.split("'"))
values = elems[1::2]
keys   = elems[0::2]
d.update(zip(keys,values))

NOTE: As it has hardcoded split("'") will work only for strings where data is “single quoted”.


找不到pg_config可执行文件

问题:找不到pg_config可执行文件

我在安装psycopg2时遇到问题。尝试执行以下操作时出现以下错误pip install psycopg2

Error: pg_config executable not found.

Please add the directory containing pg_config to the PATH

or specify the full executable path with the option:



    python setup.py build_ext --pg-config /path/to/pg_config build ...



or with the pg_config option in 'setup.cfg'.

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /tmp/pip-build/psycopg2

但是问题出pg_config在我身上PATH; 它运行没有任何问题:

$ which pg_config
/usr/pgsql-9.1/bin/pg_config

我尝试将pg_config路径添加到setup.cfg文件中,并使用从其网站(http://initd.org/psycopg/)下载的源文件进行构建,然后收到以下错误消息!

Error: Unable to find 'pg_config' file in '/usr/pgsql-9.1/bin/'

但是实际上是那里!!!

这些错误使我感到困惑。有人可以帮忙吗?

顺便说一下,我sudo所有的命令。我也在使用RHEL 5.5。

I am having trouble installing psycopg2. I get the following error when I try to pip install psycopg2:

Error: pg_config executable not found.

Please add the directory containing pg_config to the PATH

or specify the full executable path with the option:



    python setup.py build_ext --pg-config /path/to/pg_config build ...



or with the pg_config option in 'setup.cfg'.

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /tmp/pip-build/psycopg2

But the problem is pg_config is actually in my PATH; it runs without any problem:

$ which pg_config
/usr/pgsql-9.1/bin/pg_config

I tried adding the pg_config path to the setup.cfg file and building it using the source files I downloaded from their website (http://initd.org/psycopg/) and I get the following error message!

Error: Unable to find 'pg_config' file in '/usr/pgsql-9.1/bin/'

But it is actually THERE!!!

I am baffled by these errors. Can anyone help please?

By the way, I sudo all the commands. Also I am on RHEL 5.5.


回答 0

pg_config位于postgresql-devellibpq-devlibpq-develCentos / Cygwin / Babun的Debian / Ubuntu中。)

pg_config is in postgresql-devel (libpq-dev in Debian/Ubuntu, libpq-devel on Centos/Cygwin/Babun.)


回答 1

在Mac OS X上,我使用自制软件包管理器解决了该问题

brew install postgresql

On Mac OS X, I solved it using the homebrew package manager

brew install postgresql

回答 2

你安装好了python-dev吗?如果已经拥有,请尝试安装libpq-dev

sudo apt-get install libpq-dev python-dev

从文章:如何在virtualenv下安装psycopg2

Have you installed python-dev? If you already have, try also installing libpq-dev

sudo apt-get install libpq-dev python-dev

From the article: How to install psycopg2 under virtualenv


回答 3

同样在OSX上。从http://postgresapp.com/安装了Postgress.app,但存在相同的问题。

pg_config在该应用程序的内容中找到了目录,并将目录添加到$PATH

到了/Applications/Postgres.app/Contents/Versions/latest/bin。所以这个工作:export PATH="/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH"

Also on OSX. Installed Postgress.app from http://postgresapp.com/ but had the same issue.

I found pg_config in that app’s contents and added the dir to $PATH.

It was at /Applications/Postgres.app/Contents/Versions/latest/bin. So this worked: export PATH="/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH".


回答 4

在高山上,包含的库pg_configpostgresql-dev。要安装,请运行:

apk add postgresql-dev

On alpine, the library containing pg_config is postgresql-dev. To install, run:

apk add postgresql-dev

回答 5

这是在CentOS上首次安装对我有用的东西:

sudo yum install postgresql postgresql-devel python-devel

在Ubuntu上,只需使用等效的apt-get软件包。

sudo apt-get install postgresql postgresql-dev python-dev

现在在pip安装中包含postgresql二进制目录的路径,这对于基于Debain或RHEL的Linux应该都适用:

sudo PATH=$PATH:/usr/pgsql-9.3/bin/ pip install psycopg2

确保包括正确的路径。就这样 :)

This is what worked for me on CentOS, first install:

sudo yum install postgresql postgresql-devel python-devel

On Ubuntu just use the equivilent apt-get packages.

sudo apt-get install postgresql postgresql-dev python-dev

And now include the path to your postgresql binary dir with you pip install, this should work for either Debain or RHEL based Linux:

sudo PATH=$PATH:/usr/pgsql-9.3/bin/ pip install psycopg2

Make sure to include the correct path. Thats all :)


回答 6

apt-get build-dep python-psycopg2
apt-get build-dep python-psycopg2

回答 7

您应该添加在Ubuntu上的Postgres中使用的python要求。跑:

sudo apt-get install libpq-dev python-dev

You should add python requirements used in Postgres on Ubuntu. Run:

sudo apt-get install libpq-dev python-dev

回答 8

综上所述,我也面临着完全相同的问题。在阅读了很多stackoverflow帖子和在线博客之后,对我有用的最终解决方案是:

1)在安装psycopg2之前,应先安装PostgreSQL(开发版或任何稳定版本)。

2)在安装psycopg2之前,必须显式设置pg_config文件(该文件通常位于PostgreSQL安装文件夹的bin文件夹中)。就我而言,PostgreSQL的安装路径为:

/opt/local/lib/postgresql91/

因此,为了显式设置pg_config文件的PATH,我在终端中输入了以下命令:

PATH=$PATH:/opt/local/lib/postgresql91/bin/

此命令可确保当您尝试通过pip安装psycopg2时,它将自动找到pg_config的路径。

我还在我的博客上发布了有关trace及其解决方案的完整错误,您可能希望参考。它适用于Mac OS X,但是pg_config PATH问题是通用的,也适用于Linux。

Just to sum up, I also faced exactly same problem. After reading a lot of stackoverflow posts and online blogs, the final solution which worked for me is this:

1) PostgreSQL(development or any stable version) should be installed before installing psycopg2.

2) The pg_config file (this file normally resides in the bin folder of the PostgreSQL installation folder) PATH had to be explicitly setup before installing psycopg2. In my case, the installation PATH for PostgreSQL is:

/opt/local/lib/postgresql91/

so in order to explicitly set the PATH of pg_config file, I entered following command in my terminal:

PATH=$PATH:/opt/local/lib/postgresql91/bin/

This command ensures that when you try to pip install psycopg2, it would find the PATH to pg_config automatically this time.

I have also posted a full error with trace and its solution on my blog which you may want to refer. Its for Mac OS X but the pg_config PATH problem is generic and applicable to Linux also.


回答 9

sudo apt-get install libpq-dev 在Ubuntu 15.4上为我工作

sudo apt-get install libpq-dev works for me on Ubuntu 15.4


回答 10

在Linux上Mint sudo apt-get install libpq-dev为我工作。

On Linux Mint sudo apt-get install libpq-dev worked for me.


回答 11

您可以使用pip或在任何平台上安装预编译的二进制文件conda

python -m pip install psycopg2-binary

要么

conda install psycopg2

请注意,psycopg2-binary pypi页面建议在生产中从源代码构建:

二进制软件包是开发和测试的实际选择,但在生产中,建议使用从源构建的软件包

要使用从源构建的软件包,请使用python -m pip install psycopg2。该过程将需要几个依赖项(文档)(重点是我的):

  • 交流编译器
  • Python的头文件。它们通常安装在python-dev之类的软件包中。诸如错误消息:Python.h:没有这样的文件或目录表示缺少Python标头。
  • libpq的头文件。它们通常安装在libpq-dev之类的软件包中。如果出现错误:libpq-fe.h:没有此类文件或目录,您将丢失它们。
  • pg_config程序:它通常是由安装的libpq-dev的包,但有时它是不是在路径目录。将它放在PATH中可以大大简化安装,因此请尝试运行pg_config –version:如果返回错误或意外的版本号,请找到包含正确libpq版本附带的pg_config的目录(通常是/ usr / lib / postgresql / XY / bin /)并将其添加到PATH: $ export PATH=/usr/lib/postgresql/X.Y/bin/:$PATH 您仅需要pg_config来编译psycopg2,而无需常规使用。

You can install pre-compiled binaries on any platform with pip or conda:

python -m pip install psycopg2-binary

or

conda install psycopg2

Please be advised that the psycopg2-binary pypi page recommends building from source in production:

The binary package is a practical choice for development and testing but in production it is advised to use the package built from sources

To use the package built from sources, use python -m pip install psycopg2. That process will require several dependencies (documentation) (emphasis mine):

  • A C compiler.
  • The Python header files. They are usually installed in a package such as python-dev. A message such as error: Python.h: No such file or directory is an indication that the Python headers are missing.
  • The libpq header files. They are usually installed in a package such as libpq-dev. If you get an error: libpq-fe.h: No such file or directory you are missing them.
  • The pg_config program: it is usually installed by the libpq-dev package but sometimes it is not in a PATH directory. Having it in the PATH greatly streamlines the installation, so try running pg_config –version: if it returns an error or an unexpected version number then locate the directory containing the pg_config shipped with the right libpq version (usually /usr/lib/postgresql/X.Y/bin/) and add it to the PATH: $ export PATH=/usr/lib/postgresql/X.Y/bin/:$PATH You only need pg_config to compile psycopg2, not for its regular usage.

回答 12

UPDATE /etc/yum.repos.d/CentOS-Base.repo、[base]和[updates]部分
ADD exclude = postgresql *

curl -O http://yum.postgresql.org/9.1/redhat/rhel-6-i386/pgdg-centos91-9.1-4.noarch.rpmr  
rpm -ivh pgdg-centos91-9.1-4.noarch.rpm

yum install postgresql  
yum install postgresql-devel

PATH=$PATH:/usr/pgsql-9.1/bin/

pip install psycopg2

UPDATE /etc/yum.repos.d/CentOS-Base.repo, [base] and [updates] sections
ADD exclude=postgresql*

curl -O http://yum.postgresql.org/9.1/redhat/rhel-6-i386/pgdg-centos91-9.1-4.noarch.rpmr  
rpm -ivh pgdg-centos91-9.1-4.noarch.rpm

yum install postgresql  
yum install postgresql-devel

PATH=$PATH:/usr/pgsql-9.1/bin/

pip install psycopg2

回答 13

对于运行OS X的用户,此解决方案对我有用:

1)安装Postgres.app:

http://www.postgresql.org/download/macosx/

2)然后打开终端并运行以下命令,将其显示为{{version}}的位置替换为Postgres版本号:

导出PATH = $ PATH:/Applications/Postgres.app/Contents/Versions / {{version}} / bin

例如

导出PATH = $ PATH:/Applications/Postgres.app/Contents/Versions/9.4/bin

For those running OS X, this solution worked for me:

1) Install Postgres.app:

http://www.postgresql.org/download/macosx/

2) Then open the Terminal and run this command, replacing where it says {{version}} with the Postgres version number:

export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/{{version}}/bin

e.g.

export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/9.4/bin


回答 14

尝试将其添加到PATH:

PATH=$PATH:/usr/pgsql-9.1/bin/ ./pip install psycopg2

Try to add it to PATH:

PATH=$PATH:/usr/pgsql-9.1/bin/ ./pip install psycopg2

回答 15

Ali的解决方案对我有用,但是我在查找bin文件夹位置时遇到了麻烦。在Mac OS X上查找路径的快速方法是打开psql(顶部菜单栏中有一个快速链接)。这将打开一个单独的终端窗口,在第二行,您的Postgres安装路径将如下所示:

My-MacBook-Pro:~ Me$ /Applications/Postgres93.app/Contents/MacOS/bin/psql ; exit;

您的pg_config文件在该bin文件夹中。因此,在安装psycopg2之前,请设置pg_config文件的路径:

PATH=$PATH:/Applications/Postgres93.app/Contents/MacOS/bin/

或较新版本:

PATH=$PATH:/Applications/Postgres.app/Contents/Versions/9.3/bin

然后安装psycopg2。

Ali’s solution worked for me but I was having trouble finding the bin folder location. A quick way to find the path on Mac OS X is to open psql (there’s a quick link in the top menu bar). This will open a separate terminal window and on the second line the path of your Postgres installation will appear like so:

My-MacBook-Pro:~ Me$ /Applications/Postgres93.app/Contents/MacOS/bin/psql ; exit;

Your pg_config file is in that bin folder. Therefore, before installing psycopg2 set the path of the pg_config file:

PATH=$PATH:/Applications/Postgres93.app/Contents/MacOS/bin/

or for newer version:

PATH=$PATH:/Applications/Postgres.app/Contents/Versions/9.3/bin

Then install psycopg2.


回答 16

在安装psycopg2之前,您需要升级您的pip。使用此命令

pip install --upgrade pip

You need to upgrade your pip before installing psycopg2. Use this command

pip install --upgrade pip

回答 17

我将把这个留给下一个不幸的灵魂,尽管所有提供的解决方案都无法解决这个问题。只需使用sudo pip3 install psycopg2-binary

I’m going to leave this here for the next unfortunate soul who can’t get around this problem despite all the provided solutions. Simply use sudo pip3 install psycopg2-binary


回答 18

刚刚通过以下方法解决了Cent OS 7中的问题:

export PATH=$PATH:/usr/pgsql-9.5/bin

确保您的PostgreSql版本与上面的正确版本匹配。

Just solved the problem in Cent OS 7 by:

export PATH=$PATH:/usr/pgsql-9.5/bin

make sure your PostgreSql version matches the right version above.


回答 19

在Mac OS X上,如果您使用的是Postgres App(http://postgresapp.com/):

export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin

无需在此命令中指定Postgres的版本。它将始终指向最新。

并做

pip install psycopg2

PS:如果更改未反映出您可能需要重新启动终端/命令提示符

资源

On Mac OS X and If you are using Postgres App (http://postgresapp.com/):

export PATH=$PATH:/Applications/Postgres.app/Contents/Versions/latest/bin

No need to specify version of Postgres in this command. It will be always pointed to latest.

and do

pip install psycopg2

P.S: If Changes doesn’t reflect you may need to restart the Terminal/Command prompt

Source


回答 20

安装python-psycopg2在Arch Linux上为我解决了这个问题:

pacman -S python-psycopg2

Installing python-psycopg2 solved it for me on Arch Linux:

pacman -S python-psycopg2

回答 21

在Windows上,您可能需要安装PsycopgWindows端口,该端口psycopg的文档中建议使用。

On Windows, You may want to install the Windows port of Psycopg, which is recommended in psycopg’s documentation.


回答 22

在MacOS上,最简单的解决方案是将正确的二进制文件符号链接到Postgres软件包下。

sudo ln -s /Applications/Postgres.app/Contents/Versions/latest/bin/pg_config /usr/local/bin/pg_config

这是相当无害的,并且如果需要,所有应用程序都可以在系统范围内使用它。

On MacOS, the simplest solution will be to symlink the correct binary, that is under the Postgres package.

sudo ln -s /Applications/Postgres.app/Contents/Versions/latest/bin/pg_config /usr/local/bin/pg_config

This is fairly harmless, and all the applications will be able to use it system wide, if required.


回答 23

sudo yum安装postgresql-devel(centos6X)

点安装psycopg2 == 2.5.2

sudo yum install postgresql-devel (centos6X)

pip install psycopg2==2.5.2


回答 24

在这里,为了确保OS X的完整性:如果从MacPorts安装PostgreSQL,则pg_config将位于 /opt/local/lib/postgresql94/bin/pg_config

安装MacPorts时,它已经添加了 /opt/local/bin到PATH中。

因此,这将解决问题: $ sudo ln -s /opt/local/lib/postgresql94/bin/pg_config /opt/local/bin/pg_config

现在pip install psycopg2将可以pg_config毫无问题地运行。

Here, for OS X completeness: if you install PostgreSQL from MacPorts, pg_config will be in /opt/local/lib/postgresql94/bin/pg_config.

When you installed MacPorts, it already added /opt/local/bin to your PATH.

So, this will fix the problem: $ sudo ln -s /opt/local/lib/postgresql94/bin/pg_config /opt/local/bin/pg_config

Now pip install psycopg2 will be able to run pg_config without issues.


回答 25

对于使用zshshell的macOS Catalina 上也安装了postgres应用程序的用户

打开~/.zshrc文件,并添加以下行:

export PATH="/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH"

然后关闭所有终端,重新打开它们,您将解决问题。

如果您不想关闭终端,只需输入要继续使用的终端即可source ~/.zshrc

To those on macOS Catalina using the zsh shell who have also installed the postgres app:

Open your ~/.zshrc file, and add the following line:

export PATH="/Applications/Postgres.app/Contents/Versions/latest/bin:$PATH"

Then close all your terminals, reopen them, and you’ll have resolved your problem.

If you don’t want to close your terminals, simply enter source ~/.zshrc in whatever terminal you’d like to keep working on.


回答 26

对于mac用户,扩展您的path变量以包括PostgreSQL这样的export PATH=$PATH:/Library/PostgreSQL/12/bin

For mac users, extend your path variable to include PostgreSQL like this export PATH=$PATH:/Library/PostgreSQL/12/bin.


回答 27

这是我设法安装psycopg2的方法

$ wget http://initd.org/psycopg/tarballs/PSYCOPG-2-5/psycopg2-2.5.3.tar.gz
$ tar -xzf psycopg2-2.5.3.tar.gz
$ cd psycopg2-2.5.3
$ pip install .

This is how I managed to install psycopg2

$ wget http://initd.org/psycopg/tarballs/PSYCOPG-2-5/psycopg2-2.5.3.tar.gz
$ tar -xzf psycopg2-2.5.3.tar.gz
$ cd psycopg2-2.5.3
$ pip install .

回答 28

我敢肯定,您会遇到与我相同的“问题”,因此,我将为您提供极为简单的解决方案…

在您的情况下,您需要添加到$ PATH(或作为命令参数)的实际路径是:

/usr/pgsql-9.1/bin/pg_config

/usr/pgsql-9.1/bin

例如,如果您随后运行python setup.py脚本,则可以这样运行它:

python setup.py build_ext --pg-config /usr/pgsql-9.1/bin/pg_config build

可能为时已晚,但仍然是最简单的解决方案。

之后编辑:

在进一步测试下,我发现如果您最初以以下形式将路径添加到pg_config

/usr/pgsql-9.1/bin

(在../bin之后没有/ pg_config)并运行pip install命令,它将起作用。

但是,如果您随后决定按照说明运行python setup.py,则必须在….. / bin之后使用/ pg_config指定路径,即

python setup.py build_ext --pg-config /usr/pgsql-9.1/bin/pg_config build

I am pretty sure you’ve experienced the same “problem” i did, therefore I’ll offer you the extremely easy solution…

In your case, the actual path that you need to add to $PATH (or as a command param) is:

/usr/pgsql-9.1/bin/pg_config

not

/usr/pgsql-9.1/bin

E.g. if you run the python setup.py script afterwards, you would run it like this:

python setup.py build_ext --pg-config /usr/pgsql-9.1/bin/pg_config build

Probably too late, but still the easiest solution.

LATER EDIT:

Under further test I found out that if you initially add the path to pg_config in the form

/usr/pgsql-9.1/bin

(without /pg_config after …../bin) and run the pip install command it will work.

However, if you then decide to follow the indication to run python setup.py, you will have to specify the path with /pg_config after …../bin, i.e.

python setup.py build_ext --pg-config /usr/pgsql-9.1/bin/pg_config build

回答 29

对于CentOS / RedHat,请确保这/etc/alternatives/pgsql-pg_config是一个不间断的符号链接

for CentOS/RedHat make sure that /etc/alternatives/pgsql-pg_config is a non-broken symlink


如何检查变量的类型是否为字符串?

问题:如何检查变量的类型是否为字符串?

有没有一种方法可以检查python中变量的类型是否为string,例如:

isinstance(x,int);

对于整数值?

Is there a way to check if the type of a variable in python is a string, like:

isinstance(x,int);

for integer values?


回答 0

在Python 2.x中,您可以

isinstance(s, basestring)

basestring抽象的超类strunicode。它可用于测试对象是否是str或的实例unicode


在Python 3.x中,正确的测试是

isinstance(s, str)

bytes在Python 3中,该类不被视为字符串类型。

In Python 2.x, you would do

isinstance(s, basestring)

basestring is the abstract superclass of str and unicode. It can be used to test whether an object is an instance of str or unicode.


In Python 3.x, the correct test is

isinstance(s, str)

The bytes class isn’t considered a string type in Python 3.


回答 1

我知道这是一个古老的话题,但是作为第一个显示在google上的话题,鉴于我没有找到满意的答案,因此我将其留在此处以供将来参考:

第六个是Python 2和3兼容性库,它已经解决了这个问题。然后,您可以执行以下操作:

import six

if isinstance(value, six.string_types):
    pass # It's a string !!

检查代码,您会发现:

import sys

PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str,
else:
    string_types = basestring,

I know this is an old topic, but being the first one shown on google and given that I don’t find any of the answers satisfactory, I’ll leave this here for future reference:

six is a Python 2 and 3 compatibility library which already covers this issue. You can then do something like this:

import six

if isinstance(value, six.string_types):
    pass # It's a string !!

Inspecting the code, this is what you find:

import sys

PY3 = sys.version_info[0] == 3

if PY3:
    string_types = str,
else:
    string_types = basestring,

回答 2

在Python 3.x或Python 2.7.6中

if type(x) == str:

In Python 3.x or Python 2.7.6

if type(x) == str:

回答 3

你可以做:

var = 1
if type(var) == int:
   print('your variable is an integer')

要么:

var2 = 'this is variable #2'
if type(var2) == str:
    print('your variable is a string')
else:
    print('your variable IS NOT a string')

希望这可以帮助!

you can do:

var = 1
if type(var) == int:
   print('your variable is an integer')

or:

var2 = 'this is variable #2'
if type(var2) == str:
    print('your variable is a string')
else:
    print('your variable IS NOT a string')

hope this helps!


回答 4

如果要检查的内容多于整数和字符串,则类型模块也存在。 http://docs.python.org/library/types.html

The type module also exists if you are checking more than ints and strings. http://docs.python.org/library/types.html


回答 5

根据以下更好的答案进行编辑。记下3个答案,找出基弦的凉爽。

旧答案:当心unicode字符串,您可以从多个地方获得unicode字符串,包括Windows中的所有COM调用。

if isinstance(target, str) or isinstance(target, unicode):

Edit based on better answer below. Go down about 3 answers and find out about the coolness of basestring.

Old answer: Watch out for unicode strings, which you can get from several places, including all COM calls in Windows.

if isinstance(target, str) or isinstance(target, unicode):

回答 6

由于basestring未在Python3中定义,因此此小技巧可能有助于使代码兼容:

try: # check whether python knows about 'basestring'
   basestring
except NameError: # no, it doesn't (it's Python3); use 'str' instead
   basestring=str

之后,您可以在Python2和Python3上运行以下测试

isinstance(myvar, basestring)

since basestring isn’t defined in Python3, this little trick might help to make the code compatible:

try: # check whether python knows about 'basestring'
   basestring
except NameError: # no, it doesn't (it's Python3); use 'str' instead
   basestring=str

after that you can run the following test on both Python2 and Python3

isinstance(myvar, basestring)

回答 7

Python 2/3包括unicode

from __future__ import unicode_literals
from builtins import str  #  pip install future
isinstance('asdf', str)   #  True
isinstance(u'asdf', str)  #  True

http://python-future.org/overview.html

Python 2 / 3 including unicode

from __future__ import unicode_literals
from builtins import str  #  pip install future
isinstance('asdf', str)   #  True
isinstance(u'asdf', str)  #  True

http://python-future.org/overview.html


回答 8

我还要注意,如果要检查变量的类型是否为特定类型,可以将变量的类型与已知对象的类型进行比较。

对于字符串,您可以使用此

type(s) == type('')

Also I want notice that if you want to check whether the type of a variable is a specific kind, you can compare the type of the variable to the type of a known object.

For string you can use this

type(s) == type('')

回答 9

其他人在这里提供了很多好的建议,但是我看不到一个很好的跨平台摘要。对于任何Python程序来说,以下内容都是不错的选择:

def isstring(s):
    # if we use Python 3
    if (sys.version_info[0] >= 3):
        return isinstance(s, str)
    # we use Python 2
    return isinstance(s, basestring)

在此函数中,我们用于isinstance(object, classinfo)查看输入是str在Python 3中还是basestring在Python 2中。

Lots of good suggestions provided by others here, but I don’t see a good cross-platform summary. The following should be a good drop in for any Python program:

def isstring(s):
    # if we use Python 3
    if (sys.version_info[0] >= 3):
        return isinstance(s, str)
    # we use Python 2
    return isinstance(s, basestring)

In this function, we use isinstance(object, classinfo) to see if our input is a str in Python 3 or a basestring in Python 2.


回答 10

不使用basestring的Python 2替代方法:

isinstance(s, (str, unicode))

但由于unicode未定义(在Python 3中),因此在Python 3中仍然无法使用。

Alternative way for Python 2, without using basestring:

isinstance(s, (str, unicode))

But still won’t work in Python 3 since unicode isn’t defined (in Python 3).


回答 11

所以,

您可以使用很多选项来检查变量是否为字符串:

a = "my string"
type(a) == str # first 
a.__class__ == str # second
isinstance(a, str) # third
str(a) == a # forth
type(a) == type('') # fifth

此命令是有目的的。

So,

You have plenty of options to check whether your variable is string or not:

a = "my string"
type(a) == str # first 
a.__class__ == str # second
isinstance(a, str) # third
str(a) == a # forth
type(a) == type('') # fifth

This order is for purpose.


回答 12

a = '1000' # also tested for 'abc100', 'a100bc', '100abc'

isinstance(a, str) or isinstance(a, unicode)

返回True

type(a) in [str, unicode]

返回True

a = '1000' # also tested for 'abc100', 'a100bc', '100abc'

isinstance(a, str) or isinstance(a, unicode)

returns True

type(a) in [str, unicode]

returns True


回答 13

这是我对同时支持Python 2和Python 3以及这些要求的回答:

  • 用最少的Py2兼容代码以Py3代码编写。
  • 稍后删除Py2兼容代码而不会受到干扰。即仅旨在删除,不修改Py3代码。
  • 避免使用 six或类似的compat模块,因为它们倾向于隐藏试图实现的目标。
  • 面向未来的潜在Py4。

import sys
PY2 = sys.version_info.major == 2

# Check if string (lenient for byte-strings on Py2):
isinstance('abc', basestring if PY2 else str)

# Check if strictly a string (unicode-string):
isinstance('abc', unicode if PY2 else str)

# Check if either string (unicode-string) or byte-string:
isinstance('abc', basestring if PY2 else (str, bytes))

# Check for byte-string (Py3 and Py2.7):
isinstance('abc', bytes)

Here is my answer to support both Python 2 and Python 3 along with these requirements:

  • Written in Py3 code with minimal Py2 compat code.
  • Remove Py2 compat code later without disruption. I.e. aim for deletion only, no modification to Py3 code.
  • Avoid using six or similar compat module as they tend to hide away what is trying to be achieved.
  • Future-proof for a potential Py4.

import sys
PY2 = sys.version_info.major == 2

# Check if string (lenient for byte-strings on Py2):
isinstance('abc', basestring if PY2 else str)

# Check if strictly a string (unicode-string):
isinstance('abc', unicode if PY2 else str)

# Check if either string (unicode-string) or byte-string:
isinstance('abc', basestring if PY2 else (str, bytes))

# Check for byte-string (Py3 and Py2.7):
isinstance('abc', bytes)

回答 14

如果您不想依赖外部库,那么这对于Python 2.7+和Python 3(http://ideone.com/uB4Kdc)都适用:

# your code goes here
s = ["test"];
#s = "test";
isString = False;

if(isinstance(s, str)):
    isString = True;
try:
    if(isinstance(s, basestring)):
        isString = True;
except NameError:
    pass;

if(isString):
    print("String");
else:
    print("Not String");

If you do not want to depend on external libs, this works both for Python 2.7+ and Python 3 (http://ideone.com/uB4Kdc):

# your code goes here
s = ["test"];
#s = "test";
isString = False;

if(isinstance(s, str)):
    isString = True;
try:
    if(isinstance(s, basestring)):
        isString = True;
except NameError:
    pass;

if(isString):
    print("String");
else:
    print("Not String");

回答 15

您可以简单地使用isinstance函数来确保输入数据的格式为stringunicode。以下示例将帮助您轻松理解。

>>> isinstance('my string', str)
True
>>> isinstance(12, str)
False
>>> isinstance('my string', unicode)
False
>>> isinstance(u'my string',  unicode)
True

You can simply use the isinstance function to make sure that the input data is of format string or unicode. Below examples will help you to understand easily.

>>> isinstance('my string', str)
True
>>> isinstance(12, str)
False
>>> isinstance('my string', unicode)
False
>>> isinstance(u'my string',  unicode)
True

回答 16

s = '123'
issubclass(s.__class__, str)
s = '123'
issubclass(s.__class__, str)

回答 17

这是我的方法:

if type(x) == type(str()):

This is how I do it:

if type(x) == type(str()):

回答 18

我见过:

hasattr(s, 'endswith') 

I’ve seen:

hasattr(s, 'endswith') 

回答 19

>>> thing = 'foo'
>>> type(thing).__name__ == 'str' or type(thing).__name__ == 'unicode'
True
>>> thing = 'foo'
>>> type(thing).__name__ == 'str' or type(thing).__name__ == 'unicode'
True

有什么办法可以杀死线程吗?

问题:有什么办法可以杀死线程吗?

是否可以在不设置/检查任何标志/信号灯/等的情况下终止正在运行的线程?

Is it possible to terminate a running thread without setting/checking any flags/semaphores/etc.?


回答 0

用Python和任何语言突然终止线程通常是一个错误的模式。考虑以下情况:

  • 线程持有必须正确关闭的关键资源
  • 该线程创建了其他几个必须同时终止的线程。

如果您负担得起的话(如果您要管理自己的线程),处理此问题的一种好方法是有一个exit_request标志,每个线程定期检查一次,以查看是否该退出。

例如:

import threading

class StoppableThread(threading.Thread):
    """Thread class with a stop() method. The thread itself has to check
    regularly for the stopped() condition."""

    def __init__(self,  *args, **kwargs):
        super(StoppableThread, self).__init__(*args, **kwargs)
        self._stop_event = threading.Event()

    def stop(self):
        self._stop_event.set()

    def stopped(self):
        return self._stop_event.is_set()

在此代码中,您应该stop()在希望退出线程时调用该线程,然后使用来等待线程正确退出join()。线程应定期检查停止标志。

但是,在某些情况下,您确实需要杀死线程。一个示例是当您包装一个忙于长时间调用的外部库并且想要中断它时。

以下代码允许(有一些限制)在Python线程中引发Exception:

def _async_raise(tid, exctype):
    '''Raises an exception in the threads with id tid'''
    if not inspect.isclass(exctype):
        raise TypeError("Only types can be raised (not instances)")
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid),
                                                     ctypes.py_object(exctype))
    if res == 0:
        raise ValueError("invalid thread id")
    elif res != 1:
        # "if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"
        ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class ThreadWithExc(threading.Thread):
    '''A thread class that supports raising exception in the thread from
       another thread.
    '''
    def _get_my_tid(self):
        """determines this (self's) thread id

        CAREFUL : this function is executed in the context of the caller
        thread, to get the identity of the thread represented by this
        instance.
        """
        if not self.isAlive():
            raise threading.ThreadError("the thread is not active")

        # do we have it cached?
        if hasattr(self, "_thread_id"):
            return self._thread_id

        # no, look for it in the _active dict
        for tid, tobj in threading._active.items():
            if tobj is self:
                self._thread_id = tid
                return tid

        # TODO: in python 2.6, there's a simpler way to do : self.ident

        raise AssertionError("could not determine the thread's id")

    def raiseExc(self, exctype):
        """Raises the given exception type in the context of this thread.

        If the thread is busy in a system call (time.sleep(),
        socket.accept(), ...), the exception is simply ignored.

        If you are sure that your exception should terminate the thread,
        one way to ensure that it works is:

            t = ThreadWithExc( ... )
            ...
            t.raiseExc( SomeException )
            while t.isAlive():
                time.sleep( 0.1 )
                t.raiseExc( SomeException )

        If the exception is to be caught by the thread, you need a way to
        check that your thread has caught it.

        CAREFUL : this function is executed in the context of the
        caller thread, to raise an excpetion in the context of the
        thread represented by this instance.
        """
        _async_raise( self._get_my_tid(), exctype )

(基于Tomer Filiba的Killable Threads。有关return值的引用PyThreadState_SetAsyncExc似乎来自旧版本的Python。)

如文档中所述,这不是灵丹妙药,因为如果线程在Python解释器之外很忙,它将无法捕获中断。

此代码的一个很好的用法模式是让线程捕获特定的异常并执行清理。这样,您可以中断任务并仍然进行适当的清理。

It is generally a bad pattern to kill a thread abruptly, in Python and in any language. Think of the following cases:

  • the thread is holding a critical resource that must be closed properly
  • the thread has created several other threads that must be killed as well.

The nice way of handling this if you can afford it (if you are managing your own threads) is to have an exit_request flag that each threads checks on regular interval to see if it is time for it to exit.

For example:

import threading

class StoppableThread(threading.Thread):
    """Thread class with a stop() method. The thread itself has to check
    regularly for the stopped() condition."""

    def __init__(self,  *args, **kwargs):
        super(StoppableThread, self).__init__(*args, **kwargs)
        self._stop_event = threading.Event()

    def stop(self):
        self._stop_event.set()

    def stopped(self):
        return self._stop_event.is_set()

In this code, you should call stop() on the thread when you want it to exit, and wait for the thread to exit properly using join(). The thread should check the stop flag at regular intervals.

There are cases however when you really need to kill a thread. An example is when you are wrapping an external library that is busy for long calls and you want to interrupt it.

The following code allows (with some restrictions) to raise an Exception in a Python thread:

def _async_raise(tid, exctype):
    '''Raises an exception in the threads with id tid'''
    if not inspect.isclass(exctype):
        raise TypeError("Only types can be raised (not instances)")
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid),
                                                     ctypes.py_object(exctype))
    if res == 0:
        raise ValueError("invalid thread id")
    elif res != 1:
        # "if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"
        ctypes.pythonapi.PyThreadState_SetAsyncExc(ctypes.c_long(tid), None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class ThreadWithExc(threading.Thread):
    '''A thread class that supports raising exception in the thread from
       another thread.
    '''
    def _get_my_tid(self):
        """determines this (self's) thread id

        CAREFUL : this function is executed in the context of the caller
        thread, to get the identity of the thread represented by this
        instance.
        """
        if not self.isAlive():
            raise threading.ThreadError("the thread is not active")

        # do we have it cached?
        if hasattr(self, "_thread_id"):
            return self._thread_id

        # no, look for it in the _active dict
        for tid, tobj in threading._active.items():
            if tobj is self:
                self._thread_id = tid
                return tid

        # TODO: in python 2.6, there's a simpler way to do : self.ident

        raise AssertionError("could not determine the thread's id")

    def raiseExc(self, exctype):
        """Raises the given exception type in the context of this thread.

        If the thread is busy in a system call (time.sleep(),
        socket.accept(), ...), the exception is simply ignored.

        If you are sure that your exception should terminate the thread,
        one way to ensure that it works is:

            t = ThreadWithExc( ... )
            ...
            t.raiseExc( SomeException )
            while t.isAlive():
                time.sleep( 0.1 )
                t.raiseExc( SomeException )

        If the exception is to be caught by the thread, you need a way to
        check that your thread has caught it.

        CAREFUL : this function is executed in the context of the
        caller thread, to raise an excpetion in the context of the
        thread represented by this instance.
        """
        _async_raise( self._get_my_tid(), exctype )

(Based on Killable Threads by Tomer Filiba. The quote about the return value of PyThreadState_SetAsyncExc appears to be from an old version of Python.)

As noted in the documentation, this is not a magic bullet because if the thread is busy outside the Python interpreter, it will not catch the interruption.

A good usage pattern of this code is to have the thread catch a specific exception and perform the cleanup. That way, you can interrupt a task and still have proper cleanup.


回答 1

没有官方的API可以这样做。

您需要使用平台API杀死线程,例如pthread_kill或TerminateThread。您可以通过pythonwin或ctypes访问此类API。

请注意,这本质上是不安全的。如果被杀死的线程在被杀死时具有GIL,则可能会导致无法收集的垃圾(来自成为垃圾的堆栈帧的局部变量),并可能导致死锁。

There is no official API to do that, no.

You need to use platform API to kill the thread, e.g. pthread_kill, or TerminateThread. You can access such API e.g. through pythonwin, or through ctypes.

Notice that this is inherently unsafe. It will likely lead to uncollectable garbage (from local variables of the stack frames that become garbage), and may lead to deadlocks, if the thread being killed has the GIL at the point when it is killed.


回答 2

一个multiprocessing.Process罐头p.terminate()

在我想杀死一个线程,但又不想使用标志/锁/信号/信号量/事件/任何东西的情况下,我会将线程提升为完整进程。对于仅使用几个线程的代码,开销并不是那么糟糕。

例如,这样做很方便,可以轻松终止执行阻塞I / O的助手“线程”

转换是微不足道的:在相关代码中,将全部替换threading.Threadmultiprocessing.Process和全部替换queue.Queuemultiprocessing.Queue,并将所需的调用添加p.terminate()到要杀死其子级的父进程中p

请参阅Python文档multiprocessing

A multiprocessing.Process can p.terminate()

In the cases where I want to kill a thread, but do not want to use flags/locks/signals/semaphores/events/whatever, I promote the threads to full blown processes. For code that makes use of just a few threads the overhead is not that bad.

E.g. this comes in handy to easily terminate helper “threads” which execute blocking I/O

The conversion is trivial: In related code replace all threading.Thread with multiprocessing.Process and all queue.Queue with multiprocessing.Queue and add the required calls of p.terminate() to your parent process which wants to kill its child p

See the Python documentation for multiprocessing.


回答 3

如果试图终止整个程序,则可以将线程设置为“守护程序”。参见 Thread.daemon

If you are trying to terminate the whole program you can set the thread as a “daemon”. see Thread.daemon


回答 4

正如其他人提到的那样,规范是设置停止标志。对于轻量级的东西(没有Thread的子类,没有全局变量),可以选择使用lambda回调。(请注意中的括号if stop()。)

import threading
import time

def do_work(id, stop):
    print("I am thread", id)
    while True:
        print("I am thread {} doing something".format(id))
        if stop():
            print("  Exiting loop.")
            break
    print("Thread {}, signing off".format(id))


def main():
    stop_threads = False
    workers = []
    for id in range(0,3):
        tmp = threading.Thread(target=do_work, args=(id, lambda: stop_threads))
        workers.append(tmp)
        tmp.start()
    time.sleep(3)
    print('main: done sleeping; time to stop the threads.')
    stop_threads = True
    for worker in workers:
        worker.join()
    print('Finis.')

if __name__ == '__main__':
    main()

替换print()pr()始终刷新(sys.stdout.flush())的函数可以提高Shell输出的精度。

(仅在Windows / Eclipse / Python3.3上测试过)

As others have mentioned, the norm is to set a stop flag. For something lightweight (no subclassing of Thread, no global variable), a lambda callback is an option. (Note the parentheses in if stop().)

import threading
import time

def do_work(id, stop):
    print("I am thread", id)
    while True:
        print("I am thread {} doing something".format(id))
        if stop():
            print("  Exiting loop.")
            break
    print("Thread {}, signing off".format(id))


def main():
    stop_threads = False
    workers = []
    for id in range(0,3):
        tmp = threading.Thread(target=do_work, args=(id, lambda: stop_threads))
        workers.append(tmp)
        tmp.start()
    time.sleep(3)
    print('main: done sleeping; time to stop the threads.')
    stop_threads = True
    for worker in workers:
        worker.join()
    print('Finis.')

if __name__ == '__main__':
    main()

Replacing print() with a pr() function that always flushes (sys.stdout.flush()) may improve the precision of the shell output.

(Only tested on Windows/Eclipse/Python3.3)


回答 5

这基于thread2-可杀死的线程(Python配方)

您需要调用PyThreadState_SetasyncExc(),该方法仅可通过ctypes使用。

该功能仅在Python 2.7.3上进行了测试,但可能与其他最新的2.x版本一起使用。

import ctypes

def terminate_thread(thread):
    """Terminates a python thread from another thread.

    :param thread: a threading.Thread instance
    """
    if not thread.isAlive():
        return

    exc = ctypes.py_object(SystemExit)
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_long(thread.ident), exc)
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

This is based on thread2 — killable threads (Python recipe)

You need to call PyThreadState_SetasyncExc(), which is only available through ctypes.

This has only been tested on Python 2.7.3, but it is likely to work with other recent 2.x releases.

import ctypes

def terminate_thread(thread):
    """Terminates a python thread from another thread.

    :param thread: a threading.Thread instance
    """
    if not thread.isAlive():
        return

    exc = ctypes.py_object(SystemExit)
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(
        ctypes.c_long(thread.ident), exc)
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble,
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(thread.ident, None)
        raise SystemError("PyThreadState_SetAsyncExc failed")

回答 6

如果不配合使用线程,切勿强行杀死它。

杀死线程会删除所有尝试/最终阻止设置的保证,因此您可能会锁定锁,打开文件等。

唯一可以断言强行杀死线程是个好主意的方法是快速杀死程序,但绝不要杀死单个线程。

You should never forcibly kill a thread without cooperating with it.

Killing a thread removes any guarantees that try/finally blocks set up so you might leave locks locked, files open, etc.

The only time you can argue that forcibly killing threads is a good idea is to kill a program fast, but never single threads.


回答 7

在Python中,您根本无法直接杀死线程。

如果您实际上并不需要Thread(!),则可以使用 multiprocessing软件包,而不是使用threading软件包来做。在这里,要杀死一个进程,您可以简单地调用方法:

yourProcess.terminate()  # kill the process!

Python将杀死您的进程(在Unix上通过SIGTERM信号,而在Windows上通过TerminateProcess()调用)。在使用队列或管道时,请注意使用它!(它可能会损坏队列/管道中的数据)

请注意,multiprocessing.Event和的multiprocessing.Semaphore工作方式与threading.Event和的完全相同threading.Semaphore。实际上,第一个是后者的克隆。

如果您确实需要使用线程,则无法直接杀死它。但是,您可以使用“守护程序线程”。实际上,在Python中,可以将Thread标记为守护程序

yourThread.daemon = True  # set the Thread as a "daemon thread"

当没有活动的非守护线程时,主程序将退出。换句话说,当您的主线程(当然,这是一个非守护程序线程)完成其操作时,即使仍有一些守护程序线程在工作,程序也会退出。

请注意,有必要daemonstart()调用该方法之前设置一个线程!

当然,daemon甚至可以使用multiprocessing。在这里,当主进程退出时,它将尝试终止其所有守护进程。

最后,请注意,sys.exit()os.kill()不是选择。

In Python, you simply cannot kill a Thread directly.

If you do NOT really need to have a Thread (!), what you can do, instead of using the threading package , is to use the multiprocessing package . Here, to kill a process, you can simply call the method:

yourProcess.terminate()  # kill the process!

Python will kill your process (on Unix through the SIGTERM signal, while on Windows through the TerminateProcess() call). Pay attention to use it while using a Queue or a Pipe! (it may corrupt the data in the Queue/Pipe)

Note that the multiprocessing.Event and the multiprocessing.Semaphore work exactly in the same way of the threading.Event and the threading.Semaphore respectively. In fact, the first ones are clones of the latters.

If you REALLY need to use a Thread, there is no way to kill it directly. What you can do, however, is to use a “daemon thread”. In fact, in Python, a Thread can be flagged as daemon:

yourThread.daemon = True  # set the Thread as a "daemon thread"

The main program will exit when no alive non-daemon threads are left. In other words, when your main thread (which is, of course, a non-daemon thread) will finish its operations, the program will exit even if there are still some daemon threads working.

Note that it is necessary to set a Thread as daemon before the start() method is called!

Of course you can, and should, use daemon even with multiprocessing. Here, when the main process exits, it attempts to terminate all of its daemonic child processes.

Finally, please, note that sys.exit() and os.kill() are not choices.


回答 8

您可以通过在将退出线程的线程中安装跟踪来杀死线程。请参阅附件链接以获取一种可能的实现。

在Python中杀死线程

You can kill a thread by installing trace into the thread that will exit the thread. See attached link for one possible implementation.

Kill a thread in Python


回答 9

如果你是显式调用time.sleep()为你的线程(比如查询一些外部的服务)的一部分,在菲利普的方法的改进是使用了超时的eventwait()方法,无论你sleep()

例如:

import threading

class KillableThread(threading.Thread):
    def __init__(self, sleep_interval=1):
        super().__init__()
        self._kill = threading.Event()
        self._interval = sleep_interval

    def run(self):
        while True:
            print("Do Something")

            # If no kill signal is set, sleep for the interval,
            # If kill signal comes in while sleeping, immediately
            #  wake up and handle
            is_killed = self._kill.wait(self._interval)
            if is_killed:
                break

        print("Killing Thread")

    def kill(self):
        self._kill.set()

然后运行

t = KillableThread(sleep_interval=5)
t.start()
# Every 5 seconds it prints:
#: Do Something
t.kill()
#: Killing Thread

使用wait()而不是sleep()ing并定期检查事件的优点是您可以在更长的睡眠间隔内进行编程,线程几乎立即停止(原本应该是sleep()ing),并且我认为处理退出的代码明显更简单。

If you are explicitly calling time.sleep() as part of your thread (say polling some external service), an improvement upon Phillipe’s method is to use the timeout in the event‘s wait() method wherever you sleep()

For example:

import threading

class KillableThread(threading.Thread):
    def __init__(self, sleep_interval=1):
        super().__init__()
        self._kill = threading.Event()
        self._interval = sleep_interval

    def run(self):
        while True:
            print("Do Something")

            # If no kill signal is set, sleep for the interval,
            # If kill signal comes in while sleeping, immediately
            #  wake up and handle
            is_killed = self._kill.wait(self._interval)
            if is_killed:
                break

        print("Killing Thread")

    def kill(self):
        self._kill.set()

Then to run it

t = KillableThread(sleep_interval=5)
t.start()
# Every 5 seconds it prints:
#: Do Something
t.kill()
#: Killing Thread

The advantage of using wait() instead of sleep()ing and regularly checking the event is that you can program in longer intervals of sleep, the thread is stopped almost immediately (when you would otherwise be sleep()ing) and in my opinion, the code for handling exit is significantly simpler.


回答 10

如果您不杀死线程,那就更好了。一种方法可能是在线程的循环中引入“ try”块,并在您要停止线程时引发异常(例如,break / return / …会停止for / while / …)。我已经在我的应用程序上使用了它,并且可以使用…

It is better if you don’t kill a thread. A way could be to introduce a “try” block into the thread’s cycle and to throw an exception when you want to stop the thread (for example a break/return/… that stops your for/while/…). I’ve used this on my app and it works…


回答 11

绝对有可能实现Thread.stop以下示例代码中所示的方法:

import sys
import threading
import time


class StopThread(StopIteration):
    pass

threading.SystemExit = SystemExit, StopThread


class Thread2(threading.Thread):

    def stop(self):
        self.__stop = True

    def _bootstrap(self):
        if threading._trace_hook is not None:
            raise ValueError('Cannot run thread with tracing!')
        self.__stop = False
        sys.settrace(self.__trace)
        super()._bootstrap()

    def __trace(self, frame, event, arg):
        if self.__stop:
            raise StopThread()
        return self.__trace


class Thread3(threading.Thread):

    def _bootstrap(self, stop_thread=False):
        def stop():
            nonlocal stop_thread
            stop_thread = True
        self.stop = stop

        def tracer(*_):
            if stop_thread:
                raise StopThread()
            return tracer
        sys.settrace(tracer)
        super()._bootstrap()

###############################################################################


def main():
    test1 = Thread2(target=printer)
    test1.start()
    time.sleep(1)
    test1.stop()
    test1.join()
    test2 = Thread2(target=speed_test)
    test2.start()
    time.sleep(1)
    test2.stop()
    test2.join()
    test3 = Thread3(target=speed_test)
    test3.start()
    time.sleep(1)
    test3.stop()
    test3.join()


def printer():
    while True:
        print(time.time() % 1)
        time.sleep(0.1)


def speed_test(count=0):
    try:
        while True:
            count += 1
    except StopThread:
        print('Count =', count)

if __name__ == '__main__':
    main()

Thread3类似乎比快大约33%的运行代码Thread2类。

It is definitely possible to implement a Thread.stop method as shown in the following example code:

import sys
import threading
import time


class StopThread(StopIteration):
    pass

threading.SystemExit = SystemExit, StopThread


class Thread2(threading.Thread):

    def stop(self):
        self.__stop = True

    def _bootstrap(self):
        if threading._trace_hook is not None:
            raise ValueError('Cannot run thread with tracing!')
        self.__stop = False
        sys.settrace(self.__trace)
        super()._bootstrap()

    def __trace(self, frame, event, arg):
        if self.__stop:
            raise StopThread()
        return self.__trace


class Thread3(threading.Thread):

    def _bootstrap(self, stop_thread=False):
        def stop():
            nonlocal stop_thread
            stop_thread = True
        self.stop = stop

        def tracer(*_):
            if stop_thread:
                raise StopThread()
            return tracer
        sys.settrace(tracer)
        super()._bootstrap()

###############################################################################


def main():
    test1 = Thread2(target=printer)
    test1.start()
    time.sleep(1)
    test1.stop()
    test1.join()
    test2 = Thread2(target=speed_test)
    test2.start()
    time.sleep(1)
    test2.stop()
    test2.join()
    test3 = Thread3(target=speed_test)
    test3.start()
    time.sleep(1)
    test3.stop()
    test3.join()


def printer():
    while True:
        print(time.time() % 1)
        time.sleep(0.1)


def speed_test(count=0):
    try:
        while True:
            count += 1
    except StopThread:
        print('Count =', count)

if __name__ == '__main__':
    main()

The Thread3 class appears to run code approximately 33% faster than the Thread2 class.


回答 12

from ctypes import *
pthread = cdll.LoadLibrary("libpthread-2.15.so")
pthread.pthread_cancel(c_ulong(t.ident))

t是您的Thread对象。

阅读python源代码(Modules/threadmodule.cPython/thread_pthread.h),您可以看到的Thread.ident是一种pthread_t类型,因此您可以pthread在python use中做任何事情libpthread

from ctypes import *
pthread = cdll.LoadLibrary("libpthread-2.15.so")
pthread.pthread_cancel(c_ulong(t.ident))

t is your Thread object.

Read the python source (Modules/threadmodule.c and Python/thread_pthread.h) you can see the Thread.ident is an pthread_t type, so you can do anything pthread can do in python use libpthread.


回答 13

可以使用以下解决方法杀死线程:

kill_threads = False

def doSomething():
    global kill_threads
    while True:
        if kill_threads:
            thread.exit()
        ......
        ......

thread.start_new_thread(doSomething, ())

这甚至可以用于终止从主线程终止其代码在另一个模块中编写的线程。我们可以在该模块中声明一个全局变量,并使用它终止该模块中产生的线程。

我通常使用它在程序出口处终止所有线程。这可能不是终止线程的理想方法,但可能会有所帮助。

Following workaround can be used to kill a thread:

kill_threads = False

def doSomething():
    global kill_threads
    while True:
        if kill_threads:
            thread.exit()
        ......
        ......

thread.start_new_thread(doSomething, ())

This can be used even for terminating threads, whose code is written in another module, from main thread. We can declare a global variable in that module and use it to terminate thread/s spawned in that module.

I usually use this to terminate all the threads at the program exit. This might not be the perfect way to terminate thread/s but could help.


回答 14

我玩这个游戏很晚,但是我一直在努力解决类似的问题,以下内容似乎可以为我很好地解决问题,并且让守护子线程退出时让我做一些基本的线程状态检查和清理工作:

import threading
import time
import atexit

def do_work():

  i = 0
  @atexit.register
  def goodbye():
    print ("'CLEANLY' kill sub-thread with value: %s [THREAD: %s]" %
           (i, threading.currentThread().ident))

  while True:
    print i
    i += 1
    time.sleep(1)

t = threading.Thread(target=do_work)
t.daemon = True
t.start()

def after_timeout():
  print "KILL MAIN THREAD: %s" % threading.currentThread().ident
  raise SystemExit

threading.Timer(2, after_timeout).start()

Yield:

0
1
KILL MAIN THREAD: 140013208254208
'CLEANLY' kill sub-thread with value: 2 [THREAD: 140013674317568]

I’m way late to this game, but I’ve been wrestling with a similar question and the following appears to both resolve the issue perfectly for me AND lets me do some basic thread state checking and cleanup when the daemonized sub-thread exits:

import threading
import time
import atexit

def do_work():

  i = 0
  @atexit.register
  def goodbye():
    print ("'CLEANLY' kill sub-thread with value: %s [THREAD: %s]" %
           (i, threading.currentThread().ident))

  while True:
    print i
    i += 1
    time.sleep(1)

t = threading.Thread(target=do_work)
t.daemon = True
t.start()

def after_timeout():
  print "KILL MAIN THREAD: %s" % threading.currentThread().ident
  raise SystemExit

threading.Timer(2, after_timeout).start()

Yields:

0
1
KILL MAIN THREAD: 140013208254208
'CLEANLY' kill sub-thread with value: 2 [THREAD: 140013674317568]

回答 15

我想补充的一件事是,如果您在线程lib Python中阅读了官方文档,建议您避免使用“恶魔”线程,如果您不希望线程突然结束,请使用Paolo Rovelli 提到的标志。

根据官方文档:

守护程序线程在关闭时突然停止。它们的资源(例如打开的文件,数据库事务等)可能无法正确释放。如果您希望线程正常停止,请将它们设置为非守护进程,并使用适当的信令机制(例如事件)。

我认为创建守护线程取决于您的应用程序,但总的来说(我认为)最好避免杀死它们或使其成为守护线程。在多处理中,您可以is_alive()用来检查过程状态并“终止”以完成过程(还可以避免GIL问题)。但是,有时在Windows中执行代码时会发现更多问题。

永远记住,如果您有“活动线程”,Python解释器将运行以等待它们。(因为这个守护进程可以帮助您,如果没关系突然结束)。

One thing I want to add is that if you read official documentation in threading lib Python, it’s recommended to avoid use of “demonic” threads, when you don’t want threads end abruptly, with the flag that Paolo Rovelli mentioned.

From official documentation:

Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signaling mechanism such as an Event.

I think that creating daemonic threads depends of your application, but in general (and in my opinion) it’s better to avoid killing them or making them daemonic. In multiprocessing you can use is_alive() to check process status and “terminate” for finish them (Also you avoid GIL problems). But you can find more problems, sometimes, when you execute your code in Windows.

And always remember that if you have “live threads”, the Python interpreter will be running for wait them. (Because of this daemonic can help you if don’t matter abruptly ends).


回答 16

有一个为此目的而建立的库stopit。尽管此处列出的某些注意事项仍然适用,但至少此库提供了一种常规的,可重复的技术来实现所述目标。

There is a library built for this purpose, stopit. Although some of the same cautions listed herein still apply, at least this library presents a regular, repeatable technique for achieving the stated goal.


回答 17

虽然它已经很老了,对于某些人来说可能是一个方便的解决方案:

一个扩展了线程的模块功能的小模块-允许一个线程在另一个线程的上下文中引发异常。通过提高SystemExit,您最终可以杀死python线程。

import threading
import ctypes     

def _async_raise(tid, excobj):
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, ctypes.py_object(excobj))
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble, 
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, 0)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class Thread(threading.Thread):
    def raise_exc(self, excobj):
        assert self.isAlive(), "thread must be started"
        for tid, tobj in threading._active.items():
            if tobj is self:
                _async_raise(tid, excobj)
                return

        # the thread was alive when we entered the loop, but was not found 
        # in the dict, hence it must have been already terminated. should we raise
        # an exception here? silently ignore?

    def terminate(self):
        # must raise the SystemExit type, instead of a SystemExit() instance
        # due to a bug in PyThreadState_SetAsyncExc
        self.raise_exc(SystemExit)

因此,它允许“线程在另一个线程的上下文中引发异常”,这样,终止的线程可以处理终止,而无需定期检查中止标志。

但是,根据其原始来源,此代码存在一些问题。

  • 仅在执行python字节码时才会引发异常。如果您的线程调用了本机/内置阻止函数,则仅当执行返回到python代码时才会引发异常。
    • 如果内置函数在内部调用PyErr_Clear()也会存在一个问题,这将有效地取消您的未决异常。您可以尝试再次提高它。
  • 只有异常类型可以安全地引发。异常实例可能会导致意外行为,因此受到限制。
  • 我要求在内置线程模块中公开此功能,但是由于ctypes已成为标准库(从2.5版开始),并且此
    功能不太可能与实现无关,因此可以不公开

While it’s rather old, this might be a handy solution for some:

A little module that extends the threading’s module functionality — allows one thread to raise exceptions in the context of another thread. By raising SystemExit, you can finally kill python threads.

import threading
import ctypes     

def _async_raise(tid, excobj):
    res = ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, ctypes.py_object(excobj))
    if res == 0:
        raise ValueError("nonexistent thread id")
    elif res > 1:
        # """if it returns a number greater than one, you're in trouble, 
        # and you should call it again with exc=NULL to revert the effect"""
        ctypes.pythonapi.PyThreadState_SetAsyncExc(tid, 0)
        raise SystemError("PyThreadState_SetAsyncExc failed")

class Thread(threading.Thread):
    def raise_exc(self, excobj):
        assert self.isAlive(), "thread must be started"
        for tid, tobj in threading._active.items():
            if tobj is self:
                _async_raise(tid, excobj)
                return

        # the thread was alive when we entered the loop, but was not found 
        # in the dict, hence it must have been already terminated. should we raise
        # an exception here? silently ignore?

    def terminate(self):
        # must raise the SystemExit type, instead of a SystemExit() instance
        # due to a bug in PyThreadState_SetAsyncExc
        self.raise_exc(SystemExit)

So, it allows a “thread to raise exceptions in the context of another thread” and in this way, the terminated thread can handle the termination without regularly checking an abort flag.

However, according to its original source, there are some issues with this code.

  • The exception will be raised only when executing python bytecode. If your thread calls a native/built-in blocking function, the exception will be raised only when execution returns to the python code.
    • There is also an issue if the built-in function internally calls PyErr_Clear(), which would effectively cancel your pending exception. You can try to raise it again.
  • Only exception types can be raised safely. Exception instances are likely to cause unexpected behavior, and are thus restricted.
  • I asked to expose this function in the built-in thread module, but since ctypes has become a standard library (as of 2.5), and this
    feature is not likely to be implementation-agnostic, it may be kept
    unexposed.

回答 18

这似乎与Windows 7上的pywin32一起使用

my_thread = threading.Thread()
my_thread.start()
my_thread._Thread__stop()

This seems to work with pywin32 on windows 7

my_thread = threading.Thread()
my_thread.start()
my_thread._Thread__stop()

回答 19

ØMQ项目的创始人之一Pieter Hintjens 说,使用ØMQ并避免使用同步原语(例如锁,互斥体,事件等),是编写多线程程序的最安全的方法:

http://zguide.zeromq.org/py:all#Multithreading-with-ZeroMQ

这包括告诉子线程应该取消其工作。这可以通过为该线程配备一个ØMQ套接字并在该套接字上轮询以获取一条消息指出该线程应取消的信息来完成。

该链接还提供了有关带有ØMQ的多线程python代码的示例。

Pieter Hintjens — one of the founders of the ØMQ-project — says, using ØMQ and avoiding synchronization primitives like locks, mutexes, events etc., is the sanest and securest way to write multi-threaded programs:

http://zguide.zeromq.org/py:all#Multithreading-with-ZeroMQ

This includes telling a child thread, that it should cancel its work. This would be done by equipping the thread with a ØMQ-socket and polling on that socket for a message saying that it should cancel.

The link also provides an example on multi-threaded python code with ØMQ.


回答 20

假设您要具有多个具有相同功能的线程,这是恕我直言,最简单的实现是通过id停止一个线程:

import time
from threading import Thread

def doit(id=0):
    doit.stop=0
    print("start id:%d"%id)
    while 1:
        time.sleep(1)
        print(".")
        if doit.stop==id:
            doit.stop=0
            break
    print("end thread %d"%id)

t5=Thread(target=doit, args=(5,))
t6=Thread(target=doit, args=(6,))

t5.start() ; t6.start()
time.sleep(2)
doit.stop =5  #kill t5
time.sleep(2)
doit.stop =6  #kill t6

不错的是,您可以拥有多个相同和不同的功能,并通过以下方式将其全部停止 functionname.stop

如果您只想使用该函数的一个线程,则无需记住ID。如果doit.stop> 0 ,则停止。

Asuming, that you want to have multiple threads of the same function, this is IMHO the easiest implementation to stop one by id:

import time
from threading import Thread

def doit(id=0):
    doit.stop=0
    print("start id:%d"%id)
    while 1:
        time.sleep(1)
        print(".")
        if doit.stop==id:
            doit.stop=0
            break
    print("end thread %d"%id)

t5=Thread(target=doit, args=(5,))
t6=Thread(target=doit, args=(6,))

t5.start() ; t6.start()
time.sleep(2)
doit.stop =5  #kill t5
time.sleep(2)
doit.stop =6  #kill t6

The nice thing is here, you can have multiple of same and different functions, and stop them all by functionname.stop

If you want to have only one thread of the function then you don’t need to remember the id. Just stop, if doit.stop > 0.


回答 21

只是基于@SCB的想法(这正是我所需要的)来创建具有自定义函数的KillableThread子类:

from threading import Thread, Event

class KillableThread(Thread):
    def __init__(self, sleep_interval=1, target=None, name=None, args=(), kwargs={}):
        super().__init__(None, target, name, args, kwargs)
        self._kill = Event()
        self._interval = sleep_interval
        print(self._target)

    def run(self):
        while True:
            # Call custom function with arguments
            self._target(*self._args)

        # If no kill signal is set, sleep for the interval,
        # If kill signal comes in while sleeping, immediately
        #  wake up and handle
        is_killed = self._kill.wait(self._interval)
        if is_killed:
            break

    print("Killing Thread")

def kill(self):
    self._kill.set()

if __name__ == '__main__':

    def print_msg(msg):
        print(msg)

    t = KillableThread(10, print_msg, args=("hello world"))
    t.start()
    time.sleep(6)
    print("About to kill thread")
    t.kill()

自然,就像@SBC一样,线程不必等待运行新循环来停止。在此示例中,您将在“即将杀死线程”之后看到“杀死线程”消息,而不是再等待4秒钟才能完成线程(因为我们已经睡了6秒钟)。

KillableThread构造函数中的第二个参数是您的自定义函数(此处为print_msg)。Args参数是在此处调用函数((“ hello world”))时将使用的参数。

Just to build up on @SCB’s idea (which was exactly what I needed) to create a KillableThread subclass with a customized function:

from threading import Thread, Event

class KillableThread(Thread):
    def __init__(self, sleep_interval=1, target=None, name=None, args=(), kwargs={}):
        super().__init__(None, target, name, args, kwargs)
        self._kill = Event()
        self._interval = sleep_interval
        print(self._target)

    def run(self):
        while True:
            # Call custom function with arguments
            self._target(*self._args)

        # If no kill signal is set, sleep for the interval,
        # If kill signal comes in while sleeping, immediately
        #  wake up and handle
        is_killed = self._kill.wait(self._interval)
        if is_killed:
            break

    print("Killing Thread")

def kill(self):
    self._kill.set()

if __name__ == '__main__':

    def print_msg(msg):
        print(msg)

    t = KillableThread(10, print_msg, args=("hello world"))
    t.start()
    time.sleep(6)
    print("About to kill thread")
    t.kill()

Naturally, like with @SBC, the thread doesn’t wait to run a new loop to stop. In this example, you would see the “Killing Thread” message printed right after the “About to kill thread” instead of waiting for 4 more seconds for the thread to complete (since we have slept for 6 seconds already).

Second argument in KillableThread constructor is your custom function (print_msg here). Args argument are the arguments that will be used when calling the function ((“hello world”)) here.


回答 22

如@Kozyarchuk的答案中所述,安装跟踪有效。由于此答案不包含任何代码,因此下面是一个可以使用的示例:

import sys, threading, time 

class TraceThread(threading.Thread): 
    def __init__(self, *args, **keywords): 
        threading.Thread.__init__(self, *args, **keywords) 
        self.killed = False
    def start(self): 
        self._run = self.run 
        self.run = self.settrace_and_run
        threading.Thread.start(self) 
    def settrace_and_run(self): 
        sys.settrace(self.globaltrace) 
        self._run()
    def globaltrace(self, frame, event, arg): 
        return self.localtrace if event == 'call' else None
    def localtrace(self, frame, event, arg): 
        if self.killed and event == 'line': 
            raise SystemExit() 
        return self.localtrace 

def f(): 
    while True: 
        print('1') 
        time.sleep(2)
        print('2') 
        time.sleep(2)
        print('3') 
        time.sleep(2)

t = TraceThread(target=f) 
t.start() 
time.sleep(2.5) 
t.killed = True

打印1并打印后停止23不打印。

As mentioned in @Kozyarchuk’s answer, installing trace works. Since this answer contained no code, here is a working ready-to-use example:

import sys, threading, time 

class TraceThread(threading.Thread): 
    def __init__(self, *args, **keywords): 
        threading.Thread.__init__(self, *args, **keywords) 
        self.killed = False
    def start(self): 
        self._run = self.run 
        self.run = self.settrace_and_run
        threading.Thread.start(self) 
    def settrace_and_run(self): 
        sys.settrace(self.globaltrace) 
        self._run()
    def globaltrace(self, frame, event, arg): 
        return self.localtrace if event == 'call' else None
    def localtrace(self, frame, event, arg): 
        if self.killed and event == 'line': 
            raise SystemExit() 
        return self.localtrace 

def f(): 
    while True: 
        print('1') 
        time.sleep(2)
        print('2') 
        time.sleep(2)
        print('3') 
        time.sleep(2)

t = TraceThread(target=f) 
t.start() 
time.sleep(2.5) 
t.killed = True

It stops after having printed 1 and 2. 3 is not printed.


回答 23

您可以在进程中执行命令,然后使用进程ID将其杀死。我需要在两个线程之间进行同步,其中一个线程不会自行返回。

processIds = []

def executeRecord(command):
    print(command)

    process = subprocess.Popen(command, stdout=subprocess.PIPE)
    processIds.append(process.pid)
    print(processIds[0])

    #Command that doesn't return by itself
    process.stdout.read().decode("utf-8")
    return;


def recordThread(command, timeOut):

    thread = Thread(target=executeRecord, args=(command,))
    thread.start()
    thread.join(timeOut)

    os.kill(processIds.pop(), signal.SIGINT)

    return;

You can execute your command in a process and then kill it using the process id. I needed to sync between two thread one of which doesn’t return by itself.

processIds = []

def executeRecord(command):
    print(command)

    process = subprocess.Popen(command, stdout=subprocess.PIPE)
    processIds.append(process.pid)
    print(processIds[0])

    #Command that doesn't return by itself
    process.stdout.read().decode("utf-8")
    return;


def recordThread(command, timeOut):

    thread = Thread(target=executeRecord, args=(command,))
    thread.start()
    thread.join(timeOut)

    os.kill(processIds.pop(), signal.SIGINT)

    return;

回答 24

使用setDaemon(True)启动子线程。

def bootstrap(_filename):
    mb = ModelBootstrap(filename=_filename) # Has many Daemon threads. All get stopped automatically when main thread is stopped.

t = threading.Thread(target=bootstrap,args=('models.conf',))
t.setDaemon(False)

while True:
    t.start()
    time.sleep(10) # I am just allowing the sub-thread to run for 10 sec. You can listen on an event to stop execution.
    print('Thread stopped')
    break

Start the sub thread with setDaemon(True).

def bootstrap(_filename):
    mb = ModelBootstrap(filename=_filename) # Has many Daemon threads. All get stopped automatically when main thread is stopped.

t = threading.Thread(target=bootstrap,args=('models.conf',))
t.setDaemon(False)

while True:
    t.start()
    time.sleep(10) # I am just allowing the sub-thread to run for 10 sec. You can listen on an event to stop execution.
    print('Thread stopped')
    break

回答 25

这是一个错误的答案,请参阅评论

方法如下:

from threading import *

...

for thread in enumerate():
    if thread.isAlive():
        try:
            thread._Thread__stop()
        except:
            print(str(thread.getName()) + ' could not be terminated'))

给它几秒钟,然后应该停止线程。还要检查thread._Thread__delete()方法。

thread.quit()为了方便起见,我建议使用一种方法。例如,如果您的线程中有一个套接字,建议quit()您在套接字句柄类中创建一个方法,终止该套接字,然后在的thread._Thread__stop()内部运行quit()

This is a bad answer, see the comments

Here’s how to do it:

from threading import *

...

for thread in enumerate():
    if thread.isAlive():
        try:
            thread._Thread__stop()
        except:
            print(str(thread.getName()) + ' could not be terminated'))

Give it a few seconds then your thread should be stopped. Check also the thread._Thread__delete() method.

I’d recommend a thread.quit() method for convenience. For example if you have a socket in your thread, I’d recommend creating a quit() method in your socket-handle class, terminate the socket, then run a thread._Thread__stop() inside of your quit().


回答 26

如果您确实需要杀死子任务的能力,请使用替代实现。multiprocessing并且gevent都支持滥杀“线程”。

Python的线程不支持取消。想都别想。您的代码很可能会死锁,损坏或泄漏内存,或者具有其他意想不到的“有趣”难以调试的效果,这种情况很少且不确定地发生。

If you really need the ability to kill a sub-task, use an alternate implementation. multiprocessing and gevent both support indiscriminately killing a “thread”.

Python’s threading does not support cancellation. Do not even try. Your code is very likely to deadlock, corrupt or leak memory, or have other unintended “interesting” hard-to-debug effects which happen rarely and nondeterministically.


在Python中将十六进制字符串转换为int

问题:在Python中将十六进制字符串转换为int

如何在Python中将十六进制字符串转换为int?

我可能将其命名为“ 0xffff”或“ ffff”。

How do I convert a hex string to an int in Python?

I may have it as “0xffff” or just “ffff“.


回答 0

如果没有 0x前缀,则需要显式指定基数,否则无法告诉:

x = int("deadbeef", 16)

使用 0x前缀,Python可以自动区分十六进制和十进制。

>>> print int("0xdeadbeef", 0)
3735928559
>>> print int("10", 0)
10

(您必须指定0作为基准才能调用此前缀猜测行为;省略第二个参数意味着假定基准为10。)

Without the 0x prefix, you need to specify the base explicitly, otherwise there’s no way to tell:

x = int("deadbeef", 16)

With the 0x prefix, Python can distinguish hex and decimal automatically.

>>> print int("0xdeadbeef", 0)
3735928559
>>> print int("10", 0)
10

(You must specify 0 as the base in order to invoke this prefix-guessing behavior; omitting the second parameter means to assume base-10.)


回答 1

int(hexString, 16) 可以解决问题,并且可以使用和不使用0x前缀:

>>> int("a", 16)
10
>>> int("0xa",16)
10

int(hexString, 16) does the trick, and works with and without the 0x prefix:

>>> int("a", 16)
10
>>> int("0xa",16)
10

回答 2

对于任何给定的字符串s:

int(s, 16)

For any given string s:

int(s, 16)

回答 3

在Python中将十六进制字符串转换为int

我可能有它"0xffff"或只是它"ffff"

要将字符串转换为int,请将字符串int与要转换的基数一起传递给。

两个字符串都可以通过以下方式进行转换:

>>> string_1 = "0xffff"
>>> string_2 = "ffff"
>>> int(string_1, 16)
65535
>>> int(string_2, 16)
65535

int推断

如果您将0作为基数,int则将从字符串中的前缀推断基数。

>>> int(string_1, 0)
65535

如果没有十六进制前缀0xint没有足够的信息与猜测:

>>> int(string_2, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 0: 'ffff'

文字:

如果您要输入源代码或解释器,Python将为您进行转换:

>>> integer = 0xffff
>>> integer
65535

这将无法使用,ffff因为Python会认为您正在尝试编写合法的Python名称:

>>> integer = ffff
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ffff' is not defined

Python数字以数字字符开头,而Python名称不能以数字字符开头。

Convert hex string to int in Python

I may have it as "0xffff" or just "ffff".

To convert a string to an int, pass the string to int along with the base you are converting from.

Both strings will suffice for conversion in this way:

>>> string_1 = "0xffff"
>>> string_2 = "ffff"
>>> int(string_1, 16)
65535
>>> int(string_2, 16)
65535

Letting int infer

If you pass 0 as the base, int will infer the base from the prefix in the string.

>>> int(string_1, 0)
65535

Without the hexadecimal prefix, 0x, int does not have enough information with which to guess:

>>> int(string_2, 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 0: 'ffff'

literals:

If you’re typing into source code or an interpreter, Python will make the conversion for you:

>>> integer = 0xffff
>>> integer
65535

This won’t work with ffff because Python will think you’re trying to write a legitimate Python name instead:

>>> integer = ffff
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'ffff' is not defined

Python numbers start with a numeric character, while Python names cannot start with a numeric character.


回答 4

在上述Dan的答案中加上:如果为int()函数提供了十六进制字符串,则必须将基数指定为16,否则它不会认为您给了它有效的值。对于字符串中不包含的十六进制数字,无需指定基数16。

print int(0xdeadbeef) # valid

myHex = "0xdeadbeef"
print int(myHex) # invalid, raises ValueError
print int(myHex , 16) # valid

Adding to Dan’s answer above: if you supply the int() function with a hex string, you will have to specify the base as 16 or it will not think you gave it a valid value. Specifying base 16 is unnecessary for hex numbers not contained in strings.

print int(0xdeadbeef) # valid

myHex = "0xdeadbeef"
print int(myHex) # invalid, raises ValueError
print int(myHex , 16) # valid

回答 5

最坏的方法:

>>> def hex_to_int(x):
    return eval("0x" + x)

>>> hex_to_int("c0ffee")
12648430

请不要这样做!

在Python中使用eval是不好的做法吗?

The worst way:

>>> def hex_to_int(x):
    return eval("0x" + x)

>>> hex_to_int("c0ffee")
12648430

Please don’t do this!

Is using eval in Python a bad practice?


回答 6

或者ast.literal_eval(这很安全,不像eval):

ast.literal_eval("0xffff")

演示:

>>> import ast
>>> ast.literal_eval("0xffff")
65535
>>> 

Or ast.literal_eval (this is safe, unlike eval):

ast.literal_eval("0xffff")

Demo:

>>> import ast
>>> ast.literal_eval("0xffff")
65535
>>> 

回答 7

格式化程序选项’%x’%对我来说似乎也可以在赋值语句中使用。(假设Python 3.0及更高版本)

a = int('0x100', 16)
print(a)   #256
print('%x' % a) #100
b = a
print(b) #256
c = '%x' % a
print(c) #100

The formatter option ‘%x’ % seems to work in assignment statements as well for me. (Assuming Python 3.0 and later)

Example

a = int('0x100', 16)
print(a)   #256
print('%x' % a) #100
b = a
print(b) #256
c = '%x' % a
print(c) #100

回答 8

如果您使用的是python解释器,则只需键入0x(您的十六进制值),解释器就会自动为您转换。

>>> 0xffff

65535

If you are using the python interpreter, you can just type 0x(your hex value) and the interpreter will convert it automatically for you.

>>> 0xffff

65535

回答 9

处理十六进制,八进制,二进制,整数和浮点数

使用标准前缀(即0x,0b,0和0o),此函数会将任何合适的字符串转换为数字。我在这里回答了这个问题:https : //stackoverflow.com/a/58997070/2464381,但这是必需的功能。

def to_number(n):
    ''' Convert any number representation to a number 
    This covers: float, decimal, hex, and octal numbers.
    '''

    try:
        return int(str(n), 0)
    except:
        try:
            # python 3 doesn't accept "010" as a valid octal.  You must use the
            # '0o' prefix
            return int('0o' + n, 0)
        except:
            return float(n)

Handles hex, octal, binary, int, and float

Using the standard prefixes (i.e. 0x, 0b, 0, and 0o) this function will convert any suitable string to a number. I answered this here: https://stackoverflow.com/a/58997070/2464381 but here is the needed function.

def to_number(n):
    ''' Convert any number representation to a number 
    This covers: float, decimal, hex, and octal numbers.
    '''

    try:
        return int(str(n), 0)
    except:
        try:
            # python 3 doesn't accept "010" as a valid octal.  You must use the
            # '0o' prefix
            return int('0o' + n, 0)
        except:
            return float(n)

回答 10

在Python 2.7中,int('deadbeef',10)似乎不起作用。

以下对我有用:

>>a = int('deadbeef',16)
>>float(a)
3735928559.0

In Python 2.7, int('deadbeef',10) doesn’t seem to work.

The following works for me:

>>a = int('deadbeef',16)
>>float(a)
3735928559.0

回答 11

加上“ 0x”前缀,您也可以使用eval函数

例如

>>a='0xff'
>>eval(a)
255

with ‘0x’ prefix, you might also use eval function

For example

>>a='0xff'
>>eval(a)
255

** wargs的目的和用途是什么?

问题:** wargs的目的和用途是什么?

**kwargsPython 的用途是什么?

我知道您可以objects.filter在表上进行传递**kwargs参数。  

我还可以指定时间增量timedelta(hours = time1)吗?

它是如何工作的?它被归类为“拆包”吗?喜欢a,b=1,2吗?

What are the uses for **kwargs in Python?

I know you can do an objects.filter on a table and pass in a **kwargs argument.  

Can I also do this for specifying time deltas i.e. timedelta(hours = time1)?

How exactly does it work? Is it classes as ‘unpacking’? Like a,b=1,2?


回答 0

您可以**kwargs用来让函数接受任意数量的关键字参数(“ kwargs”表示“关键字参数”):

>>> def print_keyword_args(**kwargs):
...     # kwargs is a dict of the keyword args passed to the function
...     for key, value in kwargs.iteritems():
...         print "%s = %s" % (key, value)
... 
>>> print_keyword_args(first_name="John", last_name="Doe")
first_name = John
last_name = Doe

您还可以**kwargs在调用函数时使用语法,方法是构造关键字参数字典并将其传递给函数:

>>> kwargs = {'first_name': 'Bobby', 'last_name': 'Smith'}
>>> print_keyword_args(**kwargs)
first_name = Bobby
last_name = Smith

Python指南,包含了如何工作的,有一些很好的例子沿着一个很好的解释。

<-更新->

对于使用Python 3的用户,请使用items()代替iteritems()

You can use **kwargs to let your functions take an arbitrary number of keyword arguments (“kwargs” means “keyword arguments”):

>>> def print_keyword_args(**kwargs):
...     # kwargs is a dict of the keyword args passed to the function
...     for key, value in kwargs.iteritems():
...         print "%s = %s" % (key, value)
... 
>>> print_keyword_args(first_name="John", last_name="Doe")
first_name = John
last_name = Doe

You can also use the **kwargs syntax when calling functions by constructing a dictionary of keyword arguments and passing it to your function:

>>> kwargs = {'first_name': 'Bobby', 'last_name': 'Smith'}
>>> print_keyword_args(**kwargs)
first_name = Bobby
last_name = Smith

The Python Tutorial contains a good explanation of how it works, along with some nice examples.

<–Update–>

For people using Python 3, instead of iteritems(), use items()


回答 1

开箱字典

** 打开字典包装。

这个

func(a=1, b=2, c=3)

是相同的

args = {'a': 1, 'b': 2, 'c':3}
func(**args)

如果必须构造参数,这将非常有用:

args = {'name': person.name}
if hasattr(person, "address"):
    args["address"] = person.address
func(**args)  # either expanded to func(name=person.name) or
              #                    func(name=person.name, address=person.address)

函数的打包参数

def setstyle(**styles):
    for key, value in styles.iteritems():      # styles is a regular dictionary
        setattr(someobject, key, value)

这使您可以使用如下功能:

setstyle(color="red", bold=False)

Unpacking dictionaries

** unpacks dictionaries.

This

func(a=1, b=2, c=3)

is the same as

args = {'a': 1, 'b': 2, 'c':3}
func(**args)

It’s useful if you have to construct parameters:

args = {'name': person.name}
if hasattr(person, "address"):
    args["address"] = person.address
func(**args)  # either expanded to func(name=person.name) or
              #                    func(name=person.name, address=person.address)

Packing parameters of a function

def setstyle(**styles):
    for key, value in styles.iteritems():      # styles is a regular dictionary
        setattr(someobject, key, value)

This lets you use the function like this:

setstyle(color="red", bold=False)

回答 2

kwargs只是添加到参数的字典。

字典可以包含键,值对。那就是怪兽。好的,这就是方法。

目的不是那么简单。

例如(非常假设),您有一个仅调用其他例程来完成工作的接口:

def myDo(what, where, why):
   if what == 'swim':
      doSwim(where, why)
   elif what == 'walk':
      doWalk(where, why)
   ...

现在,您将获得一个新方法“ drive”:

elif what == 'drive':
   doDrive(where, why, vehicle)

但是请稍等,这里有一个新的参数“ vehicle”-您以前不知道。现在,您必须将其添加到myDo函数的签名中。

在这里,您可以使用kwargs-您只需在签名中添加kwargs:

def myDo(what, where, why, **kwargs):
   if what == 'drive':
      doDrive(where, why, **kwargs)
   elif what == 'swim':
      doSwim(where, why, **kwargs)

这样,您不必每次某些被调用的例程可能更改时都更改接口函数的签名。

这只是一个很好的例子,您可能会发现kwargs有帮助。

kwargs is just a dictionary that is added to the parameters.

A dictionary can contain key, value pairs. And that are the kwargs. Ok, this is how.

The what for is not so simple.

For example (very hypothetical) you have an interface that just calls other routines to do the job:

def myDo(what, where, why):
   if what == 'swim':
      doSwim(where, why)
   elif what == 'walk':
      doWalk(where, why)
   ...

Now you get a new method “drive”:

elif what == 'drive':
   doDrive(where, why, vehicle)

But wait a minute, there is a new parameter “vehicle” — you did not know it before. Now you must add it to the signature of the myDo-function.

Here you can throw kwargs into play — you just add kwargs to the signature:

def myDo(what, where, why, **kwargs):
   if what == 'drive':
      doDrive(where, why, **kwargs)
   elif what == 'swim':
      doSwim(where, why, **kwargs)

This way you don’t need to change the signature of your interface function every time some of your called routines might change.

This is just one nice example you could find kwargs helpful.


回答 3

基于好的样本有时比冗长的论述更好,我将使用所有python变量参数传递工具(位置参数和命名参数)编写两个函数。您应该可以轻松地自己查看它的作用:

def f(a = 0, *args, **kwargs):
    print("Received by f(a, *args, **kwargs)")
    print("=> f(a=%s, args=%s, kwargs=%s" % (a, args, kwargs))
    print("Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)")
    g(10, 11, 12, *args, d = 13, e = 14, **kwargs)

def g(f, g = 0, *args, **kwargs):
    print("Received by g(f, g = 0, *args, **kwargs)")
    print("=> g(f=%s, g=%s, args=%s, kwargs=%s)" % (f, g, args, kwargs))

print("Calling f(1, 2, 3, 4, b = 5, c = 6)")
f(1, 2, 3, 4, b = 5, c = 6)

这是输出:

Calling f(1, 2, 3, 4, b = 5, c = 6)
Received by f(a, *args, **kwargs) 
=> f(a=1, args=(2, 3, 4), kwargs={'c': 6, 'b': 5}
Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)
Received by g(f, g = 0, *args, **kwargs)
=> g(f=10, g=11, args=(12, 2, 3, 4), kwargs={'c': 6, 'b': 5, 'e': 14, 'd': 13})

On the basis that a good sample is sometimes better than a long discourse I will write two functions using all python variable argument passing facilities (both positional and named arguments). You should easily be able to see what it does by yourself:

def f(a = 0, *args, **kwargs):
    print("Received by f(a, *args, **kwargs)")
    print("=> f(a=%s, args=%s, kwargs=%s" % (a, args, kwargs))
    print("Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)")
    g(10, 11, 12, *args, d = 13, e = 14, **kwargs)

def g(f, g = 0, *args, **kwargs):
    print("Received by g(f, g = 0, *args, **kwargs)")
    print("=> g(f=%s, g=%s, args=%s, kwargs=%s)" % (f, g, args, kwargs))

print("Calling f(1, 2, 3, 4, b = 5, c = 6)")
f(1, 2, 3, 4, b = 5, c = 6)

And here is the output:

Calling f(1, 2, 3, 4, b = 5, c = 6)
Received by f(a, *args, **kwargs) 
=> f(a=1, args=(2, 3, 4), kwargs={'c': 6, 'b': 5}
Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)
Received by g(f, g = 0, *args, **kwargs)
=> g(f=10, g=11, args=(12, 2, 3, 4), kwargs={'c': 6, 'b': 5, 'e': 14, 'd': 13})

回答 4

Motif:*args**kwargs用作需要传递给函数调用的参数的占位符

使用*args**kwargs调用函数

def args_kwargs_test(arg1, arg2, arg3):
    print "arg1:", arg1
    print "arg2:", arg2
    print "arg3:", arg3

现在我们将使用*args上面定义的函数

#args can either be a "list" or "tuple"
>>> args = ("two", 3, 5)  
>>> args_kwargs_test(*args)

结果:

arg1:两个
arg2:3
arg3:5


现在,使用**kwargs来调用相同的功能

#keyword argument "kwargs" has to be a dictionary
>>> kwargs = {"arg3":3, "arg2":'two', "arg1":5}
>>> args_kwargs_test(**kwargs)

结果:

arg1:5
arg2:两个
arg3:3

底线:*args没有智能,它只是将传入的args插值到参数(按从左到右的顺序),同时**kwargs通过将适当的值放在所需的位置@来智能地运行

Motif: *args and **kwargs serves as a placeholder for the arguments that need to be passed to a function call

using *args and **kwargs to call a function

def args_kwargs_test(arg1, arg2, arg3):
    print "arg1:", arg1
    print "arg2:", arg2
    print "arg3:", arg3

Now we’ll use *args to call the above defined function

#args can either be a "list" or "tuple"
>>> args = ("two", 3, 5)  
>>> args_kwargs_test(*args)

result:

arg1: two
arg2: 3
arg3: 5


Now, using **kwargs to call the same function

#keyword argument "kwargs" has to be a dictionary
>>> kwargs = {"arg3":3, "arg2":'two', "arg1":5}
>>> args_kwargs_test(**kwargs)

result:

arg1: 5
arg2: two
arg3: 3

Bottomline : *args has no intelligence, it simply interpolates the passed args to the parameters(in left-to-right order) while **kwargs behaves intelligently by placing the appropriate value @ the required place


回答 5

  • kwargs**kwargs只是变量名。你可以很好地拥有**anyVariableName
  • kwargs代表“关键字参数”。但是我觉得最好将它们称为“命名参数”,因为它们只是随名称一起传递的参数(我对“关键字参数”一词中的“关键字”一词没有任何意义。我猜“关键字”通常是指编程语言保留的单词,因此程序员不要将其用于变量名。因此,我们给名称 param1param2两个传递给函数的参数值如下:func(param1="val1",param2="val2"),而不是仅传递值:func(val1,val2)。因此,我认为应将它们适当地称为“命名参数的任意数量”,因为我们可以指定任意数量的这些参数(即,funcfunc(**kwargs)

可以这么说,让我先解释“命名参数”,然后再解释“任意数量的命名参数” kwargs

命名参数

  • 命名的args应该跟随位置args
  • args的顺序并不重要
  • def function1(param1,param2="arg2",param3="arg3"):
        print("\n"+str(param1)+" "+str(param2)+" "+str(param3)+"\n")
    
    function1(1)                      #1 arg2 arg3   #1 positional arg
    function1(param1=1)               #1 arg2 arg3   #1 named arg
    function1(1,param2=2)             #1 2 arg3      #1 positional arg, 1 named arg
    function1(param1=1,param2=2)      #1 2 arg3      #2 named args       
    function1(param2=2, param1=1)     #1 2 arg3      #2 named args out of order
    function1(1, param3=3, param2=2)  #1 2 3         #
    
    #function1()                      #invalid: required argument missing
    #function1(param2=2,1)            #invalid: SyntaxError: non-keyword arg after keyword arg
    #function1(1,param1=11)           #invalid: TypeError: function1() got multiple values for argument 'param1'
    #function1(param4=4)              #invalid: TypeError: function1() got an unexpected keyword argument 'param4'

任意数量的命名参数 kwargs

  • 功能参数顺序:
    1. 位置参数
    2. 捕获任意数量参数的形式参数(带*前缀)
    3. 命名形式参数
    4. 形式参数,用于捕获任意数量的命名参数(带**前缀)
  • def function2(param1, *tupleParams, param2, param3, **dictionaryParams):
        print("param1: "+ param1)
        print("param2: "+ param2)
        print("param3: "+ param3)
        print("custom tuple params","-"*10)
        for p in tupleParams:
            print(str(p) + ",")
        print("custom named params","-"*10)
        for k,v in dictionaryParams.items():
            print(str(k)+":"+str(v))
    
    function2("arg1",
              "custom param1",
              "custom param2",
              "custom param3",
              param3="arg3",
              param2="arg2", 
              customNamedParam1 = "val1",
              customNamedParam2 = "val2"
              )
    
    # Output
    #
    #param1: arg1
    #param2: arg2
    #param3: arg3
    #custom tuple params ----------
    #custom param1,
    #custom param2,
    #custom param3,
    #custom named params ----------
    #customNamedParam2:val2
    #customNamedParam1:val1

为自定义参数传递元组和dict变量

最后,请允许我注意我们可以通过

  • 作为元组变量的“捕获任意数量参数的形式参数”
  • “形式参数捕获任意数量的命名参数”作为dict变量

因此,可以进行以下相同的调用:

tupleCustomArgs = ("custom param1", "custom param2", "custom param3")
dictCustomNamedArgs = {"customNamedParam1":"val1", "customNamedParam2":"val2"}

function2("arg1",
      *tupleCustomArgs,    #note *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs     #note **
      )

最后请注意***上面的函数调用。如果我们忽略它们,可能会导致不良结果。

省略*元组参数:

function2("arg1",
      tupleCustomArgs,   #omitting *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs
      )

版画

param1: arg1
param2: arg2
param3: arg3
custom tuple params ----------
('custom param1', 'custom param2', 'custom param3'),
custom named params ----------
customNamedParam2:val2
customNamedParam1:val1

元组上方('custom param1', 'custom param2', 'custom param3')按原样打印。

省略dict参数:

function2("arg1",
      *tupleCustomArgs,   
      param3="arg3",
      param2="arg2", 
      dictCustomNamedArgs   #omitting **
      )

dictCustomNamedArgs
         ^
SyntaxError: non-keyword arg after keyword arg
  • kwargs in **kwargs is just variable name. You can very well have **anyVariableName
  • kwargs stands for “keyword arguments”. But I feel they should better be called as “named arguments”, as these are simply arguments passed along with names (I dont find any significance to the word “keyword” in the term “keyword arguments”. I guess “keyword” usually means words reserved by programming language and hence not to be used by the programmer for variable names. No such thing is happening here in case of kwargs.). So we give names param1 and param2 to two parameter values passed to the function as follows: func(param1="val1",param2="val2"), instead of passing only values: func(val1,val2). Thus, I feel they should be appropriately called “arbitrary number of named arguments” as we can specify any number of these parameters (that is, arguments) if func has signature func(**kwargs)

So being said that let me explain “named arguments” first and then “arbitrary number of named arguments” kwargs.

Named arguments

  • named args should follow positional args
  • order of named args is not important
  • Example

    def function1(param1,param2="arg2",param3="arg3"):
        print("\n"+str(param1)+" "+str(param2)+" "+str(param3)+"\n")
    
    function1(1)                      #1 arg2 arg3   #1 positional arg
    function1(param1=1)               #1 arg2 arg3   #1 named arg
    function1(1,param2=2)             #1 2 arg3      #1 positional arg, 1 named arg
    function1(param1=1,param2=2)      #1 2 arg3      #2 named args       
    function1(param2=2, param1=1)     #1 2 arg3      #2 named args out of order
    function1(1, param3=3, param2=2)  #1 2 3         #
    
    #function1()                      #invalid: required argument missing
    #function1(param2=2,1)            #invalid: SyntaxError: non-keyword arg after keyword arg
    #function1(1,param1=11)           #invalid: TypeError: function1() got multiple values for argument 'param1'
    #function1(param4=4)              #invalid: TypeError: function1() got an unexpected keyword argument 'param4'
    

Arbitrary number of named arguments kwargs

  • Sequence of function parameters:
    1. positional parameters
    2. formal parameter capturing arbitrary number of arguments (prefixed with *)
    3. named formal parameters
    4. formal parameter capturing arbitrary number of named parameters (prefixed with **)
  • Example

    def function2(param1, *tupleParams, param2, param3, **dictionaryParams):
        print("param1: "+ param1)
        print("param2: "+ param2)
        print("param3: "+ param3)
        print("custom tuple params","-"*10)
        for p in tupleParams:
            print(str(p) + ",")
        print("custom named params","-"*10)
        for k,v in dictionaryParams.items():
            print(str(k)+":"+str(v))
    
    function2("arg1",
              "custom param1",
              "custom param2",
              "custom param3",
              param3="arg3",
              param2="arg2", 
              customNamedParam1 = "val1",
              customNamedParam2 = "val2"
              )
    
    # Output
    #
    #param1: arg1
    #param2: arg2
    #param3: arg3
    #custom tuple params ----------
    #custom param1,
    #custom param2,
    #custom param3,
    #custom named params ----------
    #customNamedParam2:val2
    #customNamedParam1:val1
    

Passing tuple and dict variables for custom args

To finish it up, let me also note that we can pass

  • “formal parameter capturing arbitrary number of arguments” as tuple variable and
  • “formal parameter capturing arbitrary number of named parameters” as dict variable

Thus the same above call can be made as follows:

tupleCustomArgs = ("custom param1", "custom param2", "custom param3")
dictCustomNamedArgs = {"customNamedParam1":"val1", "customNamedParam2":"val2"}

function2("arg1",
      *tupleCustomArgs,    #note *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs     #note **
      )

Finally note * and ** in function calls above. If we omit them, we may get ill results.

Omitting * in tuple args:

function2("arg1",
      tupleCustomArgs,   #omitting *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs
      )

prints

param1: arg1
param2: arg2
param3: arg3
custom tuple params ----------
('custom param1', 'custom param2', 'custom param3'),
custom named params ----------
customNamedParam2:val2
customNamedParam1:val1

Above tuple ('custom param1', 'custom param2', 'custom param3') is printed as is.

Omitting dict args:

function2("arg1",
      *tupleCustomArgs,   
      param3="arg3",
      param2="arg2", 
      dictCustomNamedArgs   #omitting **
      )

gives

dictCustomNamedArgs
         ^
SyntaxError: non-keyword arg after keyword arg

回答 6

此外,在调用kwargs函数时,还可以混合使用不同的用法:

def test(**kwargs):
    print kwargs['a']
    print kwargs['b']
    print kwargs['c']


args = { 'b': 2, 'c': 3}

test( a=1, **args )

给出以下输出:

1
2
3

注意** kwargs必须是最后一个参数

As an addition, you can also mix different ways of usage when calling kwargs functions:

def test(**kwargs):
    print kwargs['a']
    print kwargs['b']
    print kwargs['c']


args = { 'b': 2, 'c': 3}

test( a=1, **args )

gives this output:

1
2
3

Note that **kwargs has to be the last argument


回答 7

kwargs是一种语法糖,用于将名称参数作为字典(对于func)传递,或将字典作为命名参数(对func)传递

kwargs are a syntactic sugar to pass name arguments as dictionaries(for func), or dictionaries as named arguments(to func)


回答 8

这是一个用于解释用法的简单函数:

def print_wrap(arg1, *args, **kwargs):
    print(arg1)
    print(args)
    print(kwargs)
    print(arg1, *args, **kwargs)

函数定义中指定的所有参数都将放入args列表或kwargs列表中,具体取决于它们是否为关键字参数:

>>> print_wrap('one', 'two', 'three', end='blah', sep='--')
one
('two', 'three')
{'end': 'blah', 'sep': '--'}
one--two--threeblah

如果添加永远不会传递给函数的关键字参数,则会引发错误:

>>> print_wrap('blah', dead_arg='anything')
TypeError: 'dead_arg' is an invalid keyword argument for this function

Here’s a simple function that serves to explain the usage:

def print_wrap(arg1, *args, **kwargs):
    print(arg1)
    print(args)
    print(kwargs)
    print(arg1, *args, **kwargs)

Any arguments that are not specified in the function definition will be put in the args list, or the kwargs list, depending on whether they are keyword arguments or not:

>>> print_wrap('one', 'two', 'three', end='blah', sep='--')
one
('two', 'three')
{'end': 'blah', 'sep': '--'}
one--two--threeblah

If you add a keyword argument that never gets passed to a function, an error will be raised:

>>> print_wrap('blah', dead_arg='anything')
TypeError: 'dead_arg' is an invalid keyword argument for this function

回答 9

这是一个示例,希望对您有所帮助:

#! /usr/bin/env python
#
def g( **kwargs) :
  print ( "In g ready to print kwargs" )
  print kwargs
  print ( "in g, calling f")
  f ( **kwargs )
  print ( "In g, after returning from f")

def f( **kwargs ) :
  print ( "in f, printing kwargs")
  print ( kwargs )
  print ( "In f, after printing kwargs")


g( a="red", b=5, c="Nassau")

g( q="purple", w="W", c="Charlie", d=[4, 3, 6] )

运行该程序时,您将获得:

$ python kwargs_demo.py 
In g ready to print kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
in g, calling f
in f, printing kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
In f, after printing kwargs
In g, after returning from f
In g ready to print kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
in g, calling f
in f, printing kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
In f, after printing kwargs
In g, after returning from f

这里的关键是,调用中可变数量的命名实参将转换为函数中的字典。

Here is an example that I hope is helpful:

#! /usr/bin/env python
#
def g( **kwargs) :
  print ( "In g ready to print kwargs" )
  print kwargs
  print ( "in g, calling f")
  f ( **kwargs )
  print ( "In g, after returning from f")

def f( **kwargs ) :
  print ( "in f, printing kwargs")
  print ( kwargs )
  print ( "In f, after printing kwargs")


g( a="red", b=5, c="Nassau")

g( q="purple", w="W", c="Charlie", d=[4, 3, 6] )

When you run the program, you get:

$ python kwargs_demo.py 
In g ready to print kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
in g, calling f
in f, printing kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
In f, after printing kwargs
In g, after returning from f
In g ready to print kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
in g, calling f
in f, printing kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
In f, after printing kwargs
In g, after returning from f

The key take away here is that the variable number of named arguments in the call translate into a dictionary in the function.


回答 10

这是了解python拆包的简单示例,

>>> def f(*args, **kwargs):
...    print 'args', args, 'kwargs', kwargs

eg1:

>>>f(1, 2)
>>> args (1,2) kwargs {} #args return parameter without reference as a tuple
>>>f(a = 1, b = 2)
>>> args () kwargs {'a': 1, 'b': 2} #args is empty tuple and kwargs return parameter with reference as a dictionary

This is the simple example to understand about python unpacking,

>>> def f(*args, **kwargs):
...    print 'args', args, 'kwargs', kwargs

eg1:

>>>f(1, 2)
>>> args (1,2) kwargs {} #args return parameter without reference as a tuple
>>>f(a = 1, b = 2)
>>> args () kwargs {'a': 1, 'b': 2} #args is empty tuple and kwargs return parameter with reference as a dictionary

回答 11

在Java中,可以使用构造函数重载类并允许多个输入参数。在python中,您可以使用kwargs提供类似的行为。

Java示例:https//beginnersbook.com/2013/05/constructor-overloading/

python示例:

class Robot():
    # name is an arg and color is a kwarg
    def __init__(self,name, color='red'):
        self.name = name
        self.color = color

red_robot = Robot('Bob')
blue_robot = Robot('Bob', color='blue')

print("I am a {color} robot named {name}.".format(color=red_robot.color, name=red_robot.name))
print("I am a {color} robot named {name}.".format(color=blue_robot.color, name=blue_robot.name))

>>> I am a red robot named Bob.
>>> I am a blue robot named Bob.

只是另一种思考方式。

In Java, you use constructors to overload classes and allow for multiple input parameters. In python, you can use kwargs to provide similar behavior.

java example: https://beginnersbook.com/2013/05/constructor-overloading/

python example:

class Robot():
    # name is an arg and color is a kwarg
    def __init__(self,name, color='red'):
        self.name = name
        self.color = color

red_robot = Robot('Bob')
blue_robot = Robot('Bob', color='blue')

print("I am a {color} robot named {name}.".format(color=red_robot.color, name=red_robot.name))
print("I am a {color} robot named {name}.".format(color=blue_robot.color, name=blue_robot.name))

>>> I am a red robot named Bob.
>>> I am a blue robot named Bob.

just another way to think about it.


回答 12

关键字参数通常在Python中简化为kwargs。在计算机编程中

关键字参数指的是一种计算机语言对函数调用的支持,该函数清楚地说明了函数调用中每个参数的名称。

参数名称** kwargs之前的两个星号的用法是,当一个人不知道将多少个关键字参数传递给该函数时。在这种情况下,它称为任意/通配符关键字参数。

Django的接收器函数就是一个例子。

def my_callback(sender, **kwargs):
    print("Request finished!")

请注意,该函数带有一个sender参数以及通配符关键字参数(** kwargs);所有信号处理程序都必须采用这些参数。所有信号都发送关键字参数,并且可以随时更改这些关键字参数。在request_finished的情况下,它被记录为不发送任何参数,这意味着我们可能很想将信号处理编写为my_callback(sender)。

这将是错误的-实际上,如果您这样做,Django将抛出错误。这是因为在任何时候都可以将参数添加到信号中,并且您的接收器必须能够处理这些新参数。

请注意,它不必称为kwargs,但必须具有**(名称kwargs是一个约定)。

Keyword Arguments are often shortened to kwargs in Python. In computer programming,

keyword arguments refer to a computer language’s support for function calls that clearly state the name of each parameter within the function call.

The usage of the two asterisk before the parameter name, **kwargs, is when one doesn’t know how many keyword arguments will be passed into the function. When that’s the case, it’s called Arbitrary / Wildcard Keyword Arguments.

One example of this is Django’s receiver functions.

def my_callback(sender, **kwargs):
    print("Request finished!")

Notice that the function takes a sender argument, along with wildcard keyword arguments (**kwargs); all signal handlers must take these arguments. All signals send keyword arguments, and may change those keyword arguments at any time. In the case of request_finished, it’s documented as sending no arguments, which means we might be tempted to write our signal handling as my_callback(sender).

This would be wrong – in fact, Django will throw an error if you do so. That’s because at any point arguments could get added to the signal and your receiver must be able to handle those new arguments.

Note that it doesn’t have to be called kwargs, but it needs to have ** (the name kwargs is a convention).


__slots__的用法?

问题:__slots__的用法?

__slots__Python 的目的是什么-尤其是关于何时要使用它,何时不使用它?

What is the purpose of __slots__ in Python — especially with respect to when I would want to use it, and when not?


回答 0

在Python中,目的是__slots__什么?在什么情况下应该避免这种情况?

TLDR:

特殊属性__slots__允许您显式说明您希望对象实例具有哪些实例属性,并具有预期的结果:

  1. 更快的属性访问。
  2. 节省内存空间

节省的空间来自

  1. 将值引用存储在插槽中而不是中__dict__
  2. 如果父类拒绝它们并且您声明,则拒绝__dict____weakref__创建__slots__

快速警告

请注意,您只应在继承树中一次声明一个特定的插槽。例如:

class Base:
    __slots__ = 'foo', 'bar'

class Right(Base):
    __slots__ = 'baz', 

class Wrong(Base):
    __slots__ = 'foo', 'bar', 'baz'        # redundant foo and bar

遇到错误时,Python不会反对(它应该会),否则问题可能不会显现出来,但是您的对象将比原先占用更多的空间。Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

这是因为基准站的插槽描述符的插槽与错误的插槽分开。通常不应该这样,但是可以:

>>> w = Wrong()
>>> w.foo = 'foo'
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
'foo'

最大的警告是多重继承-无法将多个“具有非空插槽的父类”组合在一起。

为适应此限制,请遵循最佳做法:排除其父母(或他们的具体类)将共同继承的除一个或所有父代之外的所有抽象-给这些抽象留出空位(就像在父类中的抽象基类一样)标准库)。

有关示例,请参见下面有关多重继承的部分。

要求:

  • 要使名为in的属性__slots__实际上存储在插槽中而不是存储在插槽中__dict__,则类必须从继承object

  • 为防止创建__dict__,您必须继承,object并且继承中的所有类都必须声明,__slots__并且它们都不能具有'__dict__'条目。

如果您想继续阅读,有很多细节。

为什么使用__slots__:更快的属性访问。

Python的创建者Guido van Rossum 指出,他实际上是__slots__为了更快地访问属性而创建的。

证明可观的显着更快访问是微不足道的:

import timeit

class Foo(object): __slots__ = 'foo',

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = 'foo'
        obj.foo
        del obj.foo
    return get_set_delete

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

在Ubuntu 3.5上的Python 3.5中,插槽式访问的速度几乎快了30%。

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

在Windows上的Python 2中,我测得的速度要快15%。

为何使用__slots__:内存节省

的另一个目的__slots__是减少每个对象实例占用的内存空间。

我自己对文档的贡献清楚地说明了背后的原因

通过使用节省的空间__dict__可能很大。

SQLAlchemy将大量内存节省归因__slots__

为了验证这一点,请在Ubuntu Linux上使用Python 2.7的Anaconda发行版(带有guppy.hpy(又是堆的)和)sys.getsizeof,不__slots__声明且没有其他声明的类实例的大小为64字节。但这包括__dict__。再次感谢Python的惰性求值,在__dict__引用它之前,显然不会调用,但是没有数据的类通常是无用的。当存在时,该__dict__属性另外至少为280个字节。

相反,__slots__声明为()(无数据)的类实例只有16个字节,插槽中有一项的总字节数为56个,插槽中有两项的总数为64个字节。

对于64位Python,我说明了dict在3.6中增长的每个点(0、1和2属性除外)的for __slots____dict__(未定义插槽)在Python 2.7和3.6中以字节为单位的内存消耗:

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272   16         56 + 112 | if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

因此,尽管Python 3中的指令较小,但我们仍然可以看到__slots__实例可以很好地扩展以节省内存,这是您要使用的主要原因__slots__

只是为了完整起见,请注意,在类的命名空间中,每个插槽的一次性成本为Python 2中64字节,而在Python 3中为72字节,因为插槽使用数据描述符(如属性)称为“成员”。

>>> Foo.foo
<member 'foo' of 'Foo' objects>
>>> type(Foo.foo)
<class 'member_descriptor'>
>>> getsizeof(Foo.foo)
72

演示__slots__

要拒绝创建__dict__,必须子类化object

class Base(object): 
    __slots__ = ()

现在:

>>> b = Base()
>>> b.a = 'a'
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = 'a'
AttributeError: 'Base' object has no attribute 'a'

或子类化另一个定义的类 __slots__

class Child(Base):
    __slots__ = ('a',)

现在:

c = Child()
c.a = 'a'

但:

>>> c.b = 'b'
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = 'b'
AttributeError: 'Child' object has no attribute 'b'

要在对有槽位的__dict__对象进行子类化时允许创建,只需添加'__dict__'__slots__(请注意,槽位是有序的,并且您不应重复父类中已经存在的槽位):

class SlottedWithDict(Child): 
    __slots__ = ('__dict__', 'b')

swd = SlottedWithDict()
swd.a = 'a'
swd.b = 'b'
swd.c = 'c'

>>> swd.__dict__
{'c': 'c'}

或者甚至不需要__slots__在子类中声明,并且仍将使用父级的插槽,但不限制创建__dict__

class NoSlots(Child): pass
ns = NoSlots()
ns.a = 'a'
ns.b = 'b'

和:

>>> ns.__dict__
{'b': 'b'}

但是,__slots__可能会导致多重继承问题:

class BaseA(object): 
    __slots__ = ('a',)

class BaseB(object): 
    __slots__ = ('b',)

由于从具有两个非空插槽的父母创建子类失败:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

如果遇到此问题,则可以将其__slots__从父级中移除,或者如果您可以控制父级,则给他们留空的空位,或重构为抽象:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ('a',)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ('b',)

class Child(AbstractA, AbstractB): 
    __slots__ = ('a', 'b')

c = Child() # no problem!

添加'__dict__'__slots__以获得动态分配:

class Foo(object):
    __slots__ = 'bar', 'baz', '__dict__'

现在:

>>> foo = Foo()
>>> foo.boink = 'boink'

因此,'__dict__'在具有插槽的情况下,我们将失去一些尺寸上的好处,因为它具有动态分配的优势,并且仍具有我们确实期望的名称的插槽。

当您从未插入槽的对象继承时,使用时会得到相同的语义__slots____slots__指向插入槽的值的名称,而其他所有值都放在实例的中__dict__

避免这样做__slots__是因为您不希望出现这种情况,因为它实际上并不是一个很好的理由- 如果需要,只需添加"__dict__"您的属性即可__slots__

如果需要该功能,可以类似地将__weakref____slots__显式添加。

子类化namedtuple时,设置为空tuple:

内置namedtuple使不可变的实例非常轻巧(本质上是元组的大小),但是要获得好处,如果您将它们子类化,则需要自己做:

from collections import namedtuple
class MyNT(namedtuple('MyNT', 'bar baz')):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

用法:

>>> nt = MyNT('bar', 'baz')
>>> nt.bar
'bar'
>>> nt.baz
'baz'

尝试分配意外属性会引发,AttributeError因为我们已阻止创建__dict__

>>> nt.quux = 'quux'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyNT' object has no attribute 'quux'

可以__dict__通过设置off 允许创建__slots__ = (),但是不能__slots__对元组的子类型使用非空。

最大的警告:多重继承

即使多个父级的非空插槽相同,也不能一起使用:

class Foo(object): 
    __slots__ = 'foo', 'bar'
class Bar(object):
    __slots__ = 'foo', 'bar' # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

使用空__slots__父似乎提供了最大的灵活性,允许孩子选择阻止或允许(通过增加'__dict__'获得动态分配,见上面部分)创建的__dict__

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ('foo', 'bar')
b = Baz()
b.foo, b.bar = 'foo', 'bar'

你不具备有槽-因此,如果您添加它们,后来删除它们,它不应引起任何问题。

走出放在这里肢体:如果您撰写的混入或使用抽象基类,它不打算被实例化,空__slots__在那些父母似乎是在灵活性作为子类方面最好的一段路要走。

为了演示,首先,让我们创建一个我们希望在多重继承下使用的代码的类。

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f'{type(self).__name__}({repr(self.a)}, {repr(self.b)})'

我们可以通过继承并声明预期的位置来直接使用以上内容:

class Foo(AbstractBase):
    __slots__ = 'a', 'b'

但是我们对此并不在意,这是微不足道的单一继承,我们需要另一个我们也可能继承的类,也许带有嘈杂的属性:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print('getting c!')
        return self._c
    @c.setter
    def c(self, arg):
        print('setting c!')
        self._c = arg

现在,如果两个基地都有非空插槽,我们将无法进行以下操作。(实际上,如果我们愿意,我们可以给AbstractBase非空槽a和b,并将它们排除在下面的声明之外-将它们留在里面是错误的):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = 'a b _c'.split()

现在,我们具有通过多重继承的功能,并且仍然可以拒绝__dict____weakref__实例化:

>>> c = Concretion('a', 'b')
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion('a', 'b')
>>> c.d = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Concretion' object has no attribute 'd'

其他避免插槽的情况:

  • __class__除非插槽布局相同,否则要在不具有它们的另一个类(并且不能添加它们)上执行分配时,请避免使用它们。(我对了解谁在做什么以及为什么这样做很感兴趣。)
  • 如果您想将诸如long,tuple或str之类的可变长度内建子类化,并想为其添加属性,请避免使用它们。
  • 如果您坚持通过实例变量的类属性提供默认值,请避免使用它们。

您也许可以从__slots__ 文档的其余部分(最新的3.7 dev文档)中找出更多的警告,我最近做出了很大的贡献。

对其他答案的批评

当前的最佳答案引用了过时的信息,而且非常容易波动,并且在某些重要方面未达到要求。

不要“仅__slots__在实例化许多对象时使用”

我引用:

__slots__如果要实例化大量(数百个,数千个)同一类的对象,则需要使用。”

例如,来自collections模块的抽象基类未实例化,但__slots__已为其声明。

为什么?

如果用户希望拒绝__dict____weakref__创建,则这些内容在父类中必须不可用。

__slots__ 创建接口或混入时有助于重用。

的确,许多Python用户并不是为可重用性而编写的,但是当您这样做时,可以选择拒绝不必要的空间使用是很有价值的。

__slots__ 不会破坏酸洗

腌制开槽的物体时,您可能会发现它带有误导性的抱怨TypeError

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

这实际上是不正确的。此消息来自最早的协议,这是默认协议。您可以使用-1参数选择最新的协议。在Python 2.7中为2(在2.3中引入),在3.6中为4

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

在Python 2.7中:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

在Python 3.6中

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

所以我会牢记这一点,因为这是一个已解决的问题。

评论(至2016年10月2日)被接受

第一段是一半简短的解释,一半是预测的。这是真正回答问题的唯一部分

正确的用法__slots__是节省对象空间。静态结构不允许创建后添加任何内容,而不是具有允许随时向对象添加属性的动态命令。这样可以为使用插槽的每个对象节省一个指令的开销

后半部分是一厢情愿的想法,并且超出了预期:

尽管有时这是一个有用的优化,但是如果Python解释器足够动态,则仅在实际向对象添加内容时才需要dict,就完全没有必要了。

Python实际上做了类似的事情,只在__dict__访问时创建,但是创建许多没有数据的对象是相当荒谬的。

第二段过分简化,错过了避免的实际原因__slots__。以下不是避免使用插槽的真正原因(出于实际原因,请参阅上面我的回答的其余部分。):

它们以一种可被控制怪胎和静态类型临时表滥用的方式更改具有插槽的对象的行为。

然后,它继续讨论了使用Python实现该有害目标的其他方法,而不是讨论与之相关的任何方法__slots__

第三段是更多的如意算盘。答案者甚至根本没有写过这些杂乱的内容,而是为该网站的批评者弹药。

内存使用证据

创建一些普通对象和带槽对象:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

实例化其中的一百万:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

检查guppy.hpy().heap()

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

访问常规对象及其对象,__dict__然后再次检查:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

这与Python历史一致,来自Python 2.2中的Unifying类型和类。

如果您将内置类型作为子类,则多余的空间会自动添加到实例中以容纳__dict____weakrefs__。(__dict__尽管直到使用完,它才会被初始化,因此您不必担心空字典为您创建的每个实例所占用的空间。)如果不需要此多余的空间,则可以在短语中添加“ __slots__ = []”你的班。

In Python, what is the purpose of __slots__ and what are the cases one should avoid this?

TLDR:

The special attribute __slots__ allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

  1. faster attribute access.
  2. space savings in memory.

The space savings is from

  1. Storing value references in slots instead of __dict__.
  2. Denying __dict__ and __weakref__ creation if parent classes deny them and you declare __slots__.

Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

class Base:
    __slots__ = 'foo', 'bar'

class Right(Base):
    __slots__ = 'baz', 

class Wrong(Base):
    __slots__ = 'foo', 'bar', 'baz'        # redundant foo and bar

Python doesn’t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

This is because the Base’s slot descriptor has a slot separate from the Wrong’s. This shouldn’t usually come up, but it could:

>>> w = Wrong()
>>> w.foo = 'foo'
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
'foo'

The biggest caveat is for multiple inheritance – multiple “parent classes with nonempty slots” cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents’ abstraction which their concrete class respectively and your new concrete class collectively will inherit from – giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

Requirements:

  • To have attributes named in __slots__ to actually be stored in slots instead of a __dict__, a class must inherit from object.

  • To prevent the creation of a __dict__, you must inherit from object and all classes in the inheritance must declare __slots__ and none of them can have a '__dict__' entry.

There are a lot of details if you wish to keep reading.

Why use __slots__: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created __slots__ for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

import timeit

class Foo(object): __slots__ = 'foo',

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = 'foo'
        obj.foo
        del obj.foo
    return get_set_delete

and

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

In Python 2 on Windows I have measured it about 15% faster.

Why use __slots__: Memory Savings

Another purpose of __slots__ is to reduce the space in memory that each object instance takes up.

My own contribution to the documentation clearly states the reasons behind this:

The space saved over using __dict__ can be significant.

SQLAlchemy attributes a lot of memory savings to __slots__.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with guppy.hpy (aka heapy) and sys.getsizeof, the size of a class instance without __slots__ declared, and nothing else, is 64 bytes. That does not include the __dict__. Thank you Python for lazy evaluation again, the __dict__ is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the __dict__ attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with __slots__ declared to be () (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for __slots__ and __dict__ (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272†   16         56 + 112† | †if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

So, in spite of smaller dicts in Python 3, we see how nicely __slots__ scale for instances to save us memory, and that is a major reason you would want to use __slots__.

Just for completeness of my notes, note that there is a one-time cost per slot in the class’s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called “members”.

>>> Foo.foo
<member 'foo' of 'Foo' objects>
>>> type(Foo.foo)
<class 'member_descriptor'>
>>> getsizeof(Foo.foo)
72

Demonstration of __slots__:

To deny the creation of a __dict__, you must subclass object:

class Base(object): 
    __slots__ = ()

now:

>>> b = Base()
>>> b.a = 'a'
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = 'a'
AttributeError: 'Base' object has no attribute 'a'

Or subclass another class that defines __slots__

class Child(Base):
    __slots__ = ('a',)

and now:

c = Child()
c.a = 'a'

but:

>>> c.b = 'b'
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = 'b'
AttributeError: 'Child' object has no attribute 'b'

To allow __dict__ creation while subclassing slotted objects, just add '__dict__' to the __slots__ (note that slots are ordered, and you shouldn’t repeat slots that are already in parent classes):

class SlottedWithDict(Child): 
    __slots__ = ('__dict__', 'b')

swd = SlottedWithDict()
swd.a = 'a'
swd.b = 'b'
swd.c = 'c'

and

>>> swd.__dict__
{'c': 'c'}

Or you don’t even need to declare __slots__ in your subclass, and you will still use slots from the parents, but not restrict the creation of a __dict__:

class NoSlots(Child): pass
ns = NoSlots()
ns.a = 'a'
ns.b = 'b'

And:

>>> ns.__dict__
{'b': 'b'}

However, __slots__ may cause problems for multiple inheritance:

class BaseA(object): 
    __slots__ = ('a',)

class BaseB(object): 
    __slots__ = ('b',)

Because creating a child class from parents with both non-empty slots fails:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

If you run into this problem, You could just remove __slots__ from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ('a',)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ('b',)

class Child(AbstractA, AbstractB): 
    __slots__ = ('a', 'b')

c = Child() # no problem!

Add '__dict__' to __slots__ to get dynamic assignment:

class Foo(object):
    __slots__ = 'bar', 'baz', '__dict__'

and now:

>>> foo = Foo()
>>> foo.boink = 'boink'

So with '__dict__' in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn’t slotted, you get the same sort of semantics when you use __slots__ – names that are in __slots__ point to slotted values, while any other values are put in the instance’s __dict__.

Avoiding __slots__ because you want to be able to add attributes on the fly is actually not a good reason – just add "__dict__" to your __slots__ if this is required.

You can similarly add __weakref__ to __slots__ explicitly if you need that feature.

Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

from collections import namedtuple
class MyNT(namedtuple('MyNT', 'bar baz')):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

usage:

>>> nt = MyNT('bar', 'baz')
>>> nt.bar
'bar'
>>> nt.baz
'baz'

And trying to assign an unexpected attribute raises an AttributeError because we have prevented the creation of __dict__:

>>> nt.quux = 'quux'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyNT' object has no attribute 'quux'

You can allow __dict__ creation by leaving off __slots__ = (), but you can’t use non-empty __slots__ with subtypes of tuple.

Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

class Foo(object): 
    __slots__ = 'foo', 'bar'
class Bar(object):
    __slots__ = 'foo', 'bar' # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

Using an empty __slots__ in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding '__dict__' to get dynamic assignment, see section above) the creation of a __dict__:

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ('foo', 'bar')
b = Baz()
b.foo, b.bar = 'foo', 'bar'

You don’t have to have slots – so if you add them, and remove them later, it shouldn’t cause any problems.

Going out on a limb here: If you’re composing mixins or using abstract base classes, which aren’t intended to be instantiated, an empty __slots__ in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let’s create a class with code we’d like to use under multiple inheritance

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f'{type(self).__name__}({repr(self.a)}, {repr(self.b)})'

We could use the above directly by inheriting and declaring the expected slots:

class Foo(AbstractBase):
    __slots__ = 'a', 'b'

But we don’t care about that, that’s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print('getting c!')
        return self._c
    @c.setter
    def c(self, arg):
        print('setting c!')
        self._c = arg

Now if both bases had nonempty slots, we couldn’t do the below. (In fact, if we wanted, we could have given AbstractBase nonempty slots a and b, and left them out of the below declaration – leaving them in would be wrong):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = 'a b _c'.split()

And now we have functionality from both via multiple inheritance, and can still deny __dict__ and __weakref__ instantiation:

>>> c = Concretion('a', 'b')
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion('a', 'b')
>>> c.d = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Concretion' object has no attribute 'd'

Other cases to avoid slots:

  • Avoid them when you want to perform __class__ assignment with another class that doesn’t have them (and you can’t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
  • Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
  • Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the __slots__ documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

Critiques of other answers

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

Do not “only use __slots__ when instantiating lots of objects”

I quote:

“You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class.”

Abstract Base Classes, for example, from the collections module, are not instantiated, yet __slots__ are declared for them.

Why?

If a user wishes to deny __dict__ or __weakref__ creation, those things must not be available in the parent classes.

__slots__ contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren’t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

__slots__ doesn’t break pickling

When pickling a slotted object, you may find it complains with a misleading TypeError:

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the -1 argument. In Python 2.7 this would be 2 (which was introduced in 2.3), and in 3.6 it is 4.

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

in Python 2.7:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

in Python 3.6

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

So I would keep this in mind, as it is a solved problem.

Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here’s the only part that actually answers the question

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the __dict__ when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid __slots__. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with __slots__.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn’t even author and contributes to ammunition for critics of the site.

Memory usage evidence

Create some normal objects and slotted objects:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

Instantiate a million of them:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

Inspect with guppy.hpy().heap():

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

Access the regular objects and their __dict__ and inspect again:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate __dict__ and __weakrefs__. (The __dict__ is not initialized until you use it though, so you shouldn’t worry about the space occupied by an empty dictionary for each instance you create.) If you don’t need this extra space, you can add the phrase “__slots__ = []” to your class.


回答 1

引用雅各布·哈伦Jacob Hallen)的话

正确的用法__slots__是节省对象空间。静态结构不允许创建后添加任何内容,而不是具有允许随时向对象添加属性的动态命令。[这种使用__slots__消除了每个对象一个字典的开销。]尽管这有时是有用的优化,但是如果Python解释器足够动态,以至仅在实际添加了dict时才需要该字典,则完全没有必要。宾语。

不幸的是,插槽有副作用。它们以一种可被控制怪胎和静态类型临时表滥用的方式更改具有插槽的对象的行为。这是不好的,因为控制怪胎应该滥用元类,而静态类型之间应该滥用装饰器,因为在Python中,应该只有一种明显的方法。

使CPython足够智能来处理节省的空间__slots__是一项艰巨的任务,这可能就是为什么它不在P3k更改列表中的原因(至今)。

Quoting Jacob Hallen:

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. [This use of __slots__ eliminates the overhead of one dict for every object.] While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Unfortunately there is a side effect to slots. They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies. This is bad, because the control freaks should be abusing the metaclasses and the static typing weenies should be abusing decorators, since in Python, there should be only one obvious way of doing something.

Making CPython smart enough to handle saving space without __slots__ is a major undertaking, which is probably why it is not on the list of changes for P3k (yet).


回答 2

你会想使用__slots__,如果你要实例化同一个类的对象很多(几百,几千)。__slots__仅作为内存优化工具存在。

不建议将其__slots__用于约束属性创建。

__slots__使用默认的(最旧的)泡菜协议将不能使用酸洗对象。有必要指定一个更高的版本。

python的其他一些自省功能也可能受到不利影响。

You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class. __slots__ only exists as a memory optimization tool.

It’s highly discouraged to use __slots__ for constraining attribute creation.

Pickling objects with __slots__ won’t work with the default (oldest) pickle protocol; it’s necessary to specify a later version.

Some other introspection features of python may also be adversely affected.


回答 3

每个python对象都有一个__dict__属性,该属性是包含所有其他属性的字典。例如,当您键入self.attrpython时实际上正在执行self.__dict__['attr']。您可以想象使用字典存储属性会花费一些额外的空间和时间来访问它。

但是,当您使用 __slots__,为该类创建的任何对象将没有__dict__属性。相反,所有属性访问都直接通过指针进行。

因此,如果要使用C样式结构而不是完整的类,则可以使用它__slots__来压缩对象的大小并减少属性访问时间。一个很好的例子是一个包含属性x&y的Point类。如果您有很多要点,可以尝试使用__slots__以节省一些内存。

Each python object has a __dict__ atttribute which is a dictionary containing all other attributes. e.g. when you type self.attr python is actually doing self.__dict__['attr']. As you can imagine using a dictionary to store attribute takes some extra space & time for accessing it.

However, when you use __slots__, any object created for that class won’t have a __dict__ attribute. Instead, all attribute access is done directly via pointers.

So if want a C style structure rather than a full fledged class you can use __slots__ for compacting size of the objects & reducing attribute access time. A good example is a Point class containing attributes x & y. If you are going to have a lot of points, you can try using __slots__ in order to conserve some memory.


回答 4

除了其他答案,这是使用的示例__slots__

>>> class Test(object):   #Must be new-style class!
...  __slots__ = ['x', 'y']
... 
>>> pt = Test()
>>> dir(pt)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', 
 '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__slots__', '__str__', 'x', 'y']
>>> pt.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: x
>>> pt.x = 1
>>> pt.x
1
>>> pt.z = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute 'z'
>>> pt.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__dict__'
>>> pt.__slots__
['x', 'y']

因此,要实现__slots__,只需要多花一行(如果还没有,请将您的类变成新样式的类)。这样,您可以将这些类的内存占用量减少5倍,而在必要时以及必须编写自定义的pickle代码的代价是。

In addition to the other answers, here is an example of using __slots__:

>>> class Test(object):   #Must be new-style class!
...  __slots__ = ['x', 'y']
... 
>>> pt = Test()
>>> dir(pt)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', 
 '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__slots__', '__str__', 'x', 'y']
>>> pt.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: x
>>> pt.x = 1
>>> pt.x
1
>>> pt.z = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute 'z'
>>> pt.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__dict__'
>>> pt.__slots__
['x', 'y']

So, to implement __slots__, it only takes an extra line (and making your class a new-style class if it isn’t already). This way you can reduce the memory footprint of those classes 5-fold, at the expense of having to write custom pickle code, if and when that becomes necessary.


回答 5

插槽对于库调用非常有用,以消除进行函数调用时的“命名方法分派”。SWIG 文档中提到了这一点。对于想要减少使用插槽的通常称为函数的函数开销的高性能库,速度要快得多。

现在,这可能与OP问题没有直接关系。它与构建扩展有关,而不是与在对象上使用slot语法有关。但这确实有助于完整了解插槽的使用情况以及它们背后的一些原因。

Slots are very useful for library calls to eliminate the “named method dispatch” when making function calls. This is mentioned in the SWIG documentation. For high performance libraries that want to reduce function overhead for commonly called functions using slots is much faster.

Now this may not be directly related to the OPs question. It is related more to building extensions than it does to using the slots syntax on an object. But it does help complete the picture for the usage of slots and some of the reasoning behind them.


回答 6

类实例的属性具有3个属性:实例,属性名称和属性值。

常规属性访问中,实例充当字典,而属性名称充当该字典中查找值的键。

instance(attribute)->值

__slots__访问,属性名称充当字典,实例充当字典中查找值的键。

属性(实例)->值

flyweight模式中,属性的名称充当字典,而值充当该字典中查找实例的键。

attribute(value)->实例

An attribute of a class instance has 3 properties: the instance, the name of the attribute, and the value of the attribute.

In regular attribute access, the instance acts as a dictionary and the name of the attribute acts as the key in that dictionary looking up value.

instance(attribute) –> value

In __slots__ access, the name of the attribute acts as the dictionary and the instance acts as the key in the dictionary looking up value.

attribute(instance) –> value

In flyweight pattern, the name of the attribute acts as the dictionary and the value acts as the key in that dictionary looking up the instance.

attribute(value) –> instance


回答 7

一个非常简单的__slot__属性示例。

问题:无 __slots__

如果__slot__我的Class中没有属性,则可以向对象添加新属性。

class Test:
    pass

obj1=Test()
obj2=Test()

print(obj1.__dict__)  #--> {}
obj1.x=12
print(obj1.__dict__)  # --> {'x': 12}
obj1.y=20
print(obj1.__dict__)  # --> {'x': 12, 'y': 20}

obj2.x=99
print(obj2.__dict__)  # --> {'x': 99}

如果您看上面的示例,可以看到obj1obj2都有自己的xy属性,而python还dict为每个对象创建了一个属性(obj1obj2)。

假设我的类Test是否有成千上万个此类对象?dict为每个对象创建一个附加属性将导致我的代码中大量开销(内存,计算能力等)。

解决方案: __slots__

现在,在下面的示例中,我的类Test包含__slots__属性。现在,我无法向对象添加新属性(attribute除外x),而python不再创建dict属性。这消除了每个对象的开销,如果您有很多对象,开销可能会变得很大。

class Test:
    __slots__=("x")

obj1=Test()
obj2=Test()
obj1.x=12
print(obj1.x)  # --> 12
obj2.x=99
print(obj2.x)  # --> 99

obj1.y=28
print(obj1.y)  # --> AttributeError: 'Test' object has no attribute 'y'

A very simple example of __slot__ attribute.

Problem: Without __slots__

If I don’t have __slot__ attribute in my class, I can add new attributes to my objects.

class Test:
    pass

obj1=Test()
obj2=Test()

print(obj1.__dict__)  #--> {}
obj1.x=12
print(obj1.__dict__)  # --> {'x': 12}
obj1.y=20
print(obj1.__dict__)  # --> {'x': 12, 'y': 20}

obj2.x=99
print(obj2.__dict__)  # --> {'x': 99}

If you look at example above, you can see that obj1 and obj2 have their own x and y attributes and python has also created a dict attribute for each object (obj1 and obj2).

Suppose if my class Test has thousands of such objects? Creating an additional attribute dict for each object will cause lot of overhead (memory, computing power etc.) in my code.

Solution: With __slots__

Now in the following example my class Test contains __slots__ attribute. Now I can’t add new attributes to my objects (except attribute x) and python doesn’t create a dict attribute anymore. This eliminates overhead for each object, which can become significant if you have many objects.

class Test:
    __slots__=("x")

obj1=Test()
obj2=Test()
obj1.x=12
print(obj1.x)  # --> 12
obj2.x=99
print(obj2.x)  # --> 99

obj1.y=28
print(obj1.y)  # --> AttributeError: 'Test' object has no attribute 'y'

回答 8

另一个晦涩的用法__slots__是从ProxyTypes包(以前是PEAK项目的一部分)中向对象代理添加属性。它ObjectWrapper允许您代理另一个对象,但拦截与代理对象的所有交互。它不是很常用(并且没有Python 3支持),但是我们已经使用它来实现基于龙卷风的异步实现周围的线程安全的阻塞包装,该龙卷风使用线程安全来反弹通过ioloop对代理对象的所有访问concurrent.Future对象以同步并返回结果。

默认情况下,对代理对象的任何属性访问都将为您提供代理对象的结果。如果需要在代理对象上添加属性,__slots__则可以使用。

from peak.util.proxies import ObjectWrapper

class Original(object):
    def __init__(self):
        self.name = 'The Original'

class ProxyOriginal(ObjectWrapper):

    __slots__ = ['proxy_name']

    def __init__(self, subject, proxy_name):
        # proxy_info attributed added directly to the
        # Original instance, not the ProxyOriginal instance
        self.proxy_info = 'You are proxied by {}'.format(proxy_name)

        # proxy_name added to ProxyOriginal instance, since it is
        # defined in __slots__
        self.proxy_name = proxy_name

        super(ProxyOriginal, self).__init__(subject)

if __name__ == "__main__":
    original = Original()
    proxy = ProxyOriginal(original, 'Proxy Overlord')

    # Both statements print "The Original"
    print "original.name: ", original.name
    print "proxy.name: ", proxy.name

    # Both statements below print 
    # "You are proxied by Proxy Overlord", since the ProxyOriginal
    # __init__ sets it to the original object 
    print "original.proxy_info: ", original.proxy_info
    print "proxy.proxy_info: ", proxy.proxy_info

    # prints "Proxy Overlord"
    print "proxy.proxy_name: ", proxy.proxy_name
    # Raises AttributeError since proxy_name is only set on 
    # the proxy object
    print "original.proxy_name: ", proxy.proxy_name

Another somewhat obscure use of __slots__ is to add attributes to an object proxy from the ProxyTypes package, formerly part of the PEAK project. Its ObjectWrapper allows you to proxy another object, but intercept all interactions with the proxied object. It is not very commonly used (and no Python 3 support), but we have used it to implement a thread-safe blocking wrapper around an async implementation based on tornado that bounces all access to the proxied object through the ioloop, using thread-safe concurrent.Future objects to synchronise and return results.

By default any attribute access to the proxy object will give you the result from the proxied object. If you need to add an attribute on the proxy object, __slots__ can be used.

from peak.util.proxies import ObjectWrapper

class Original(object):
    def __init__(self):
        self.name = 'The Original'

class ProxyOriginal(ObjectWrapper):

    __slots__ = ['proxy_name']

    def __init__(self, subject, proxy_name):
        # proxy_info attributed added directly to the
        # Original instance, not the ProxyOriginal instance
        self.proxy_info = 'You are proxied by {}'.format(proxy_name)

        # proxy_name added to ProxyOriginal instance, since it is
        # defined in __slots__
        self.proxy_name = proxy_name

        super(ProxyOriginal, self).__init__(subject)

if __name__ == "__main__":
    original = Original()
    proxy = ProxyOriginal(original, 'Proxy Overlord')

    # Both statements print "The Original"
    print "original.name: ", original.name
    print "proxy.name: ", proxy.name

    # Both statements below print 
    # "You are proxied by Proxy Overlord", since the ProxyOriginal
    # __init__ sets it to the original object 
    print "original.proxy_info: ", original.proxy_info
    print "proxy.proxy_info: ", proxy.proxy_info

    # prints "Proxy Overlord"
    print "proxy.proxy_name: ", proxy.proxy_name
    # Raises AttributeError since proxy_name is only set on 
    # the proxy object
    print "original.proxy_name: ", proxy.proxy_name

回答 9

您基本上没有用__slots__

在您认为自己可能需要的时间里__slots__,您实际上想使用轻量级轻量级设计模式。在这些情况下,您不再希望使用纯Python对象。相反,您希望在数组,结构或numpy数组周围使用类似Python对象的包装器。

class Flyweight(object):

    def get(self, theData, index):
        return theData[index]

    def set(self, theData, index, value):
        theData[index]= value

类似于类的包装器没有属性-它仅提供对基础数据起作用的方法。这些方法可以简化为类方法。实际上,可以将其简化为仅对底层数据数组进行操作的函数。

You have — essentially — no use for __slots__.

For the time when you think you might need __slots__, you actually want to use Lightweight or Flyweight design patterns. These are cases when you no longer want to use purely Python objects. Instead, you want a Python object-like wrapper around an array, struct, or numpy array.

class Flyweight(object):

    def get(self, theData, index):
        return theData[index]

    def set(self, theData, index, value):
        theData[index]= value

The class-like wrapper has no attributes — it just provides methods that act on the underlying data. The methods can be reduced to class methods. Indeed, it could be reduced to just functions operating on the underlying array of data.


回答 10

最初的问题是关于一般用例,而不仅仅是内存。因此,在这里应该提到的是,在实例化大量对象时,您也会获得更好的性能 -有趣的是,例如,将大型文档解析为对象或从数据库中解析时。

这是使用插槽和不使用插槽创建具有一百万个条目的对象树的比较。作为参考,在对树使用普通字典时的性能(在OSX上为Py2.7.10):

********** RUN 1 **********
1.96036410332 <class 'css_tree_select.element.Element'>
3.02922606468 <class 'css_tree_select.element.ElementNoSlots'>
2.90828204155 dict
********** RUN 2 **********
1.77050495148 <class 'css_tree_select.element.Element'>
3.10655999184 <class 'css_tree_select.element.ElementNoSlots'>
2.84120798111 dict
********** RUN 3 **********
1.84069895744 <class 'css_tree_select.element.Element'>
3.21540498734 <class 'css_tree_select.element.ElementNoSlots'>
2.59615707397 dict
********** RUN 4 **********
1.75041103363 <class 'css_tree_select.element.Element'>
3.17366290092 <class 'css_tree_select.element.ElementNoSlots'>
2.70941114426 dict

测试类(ident,来自插槽的appart):

class Element(object):
    __slots__ = ['_typ', 'id', 'parent', 'childs']
    def __init__(self, typ, id, parent=None):
        self._typ = typ
        self.id = id
        self.childs = []
        if parent:
            self.parent = parent
            parent.childs.append(self)

class ElementNoSlots(object): (same, w/o slots)

测试代码,详细模式:

na, nb, nc = 100, 100, 100
for i in (1, 2, 3, 4):
    print '*' * 10, 'RUN', i, '*' * 10
    # tree with slot and no slot:
    for cls in Element, ElementNoSlots:
        t1 = time.time()
        root = cls('root', 'root')
        for i in xrange(na):
            ela = cls(typ='a', id=i, parent=root)
            for j in xrange(nb):
                elb = cls(typ='b', id=(i, j), parent=ela)
                for k in xrange(nc):
                    elc = cls(typ='c', id=(i, j, k), parent=elb)
        to =  time.time() - t1
        print to, cls
        del root

    # ref: tree with dicts only:
    t1 = time.time()
    droot = {'childs': []}
    for i in xrange(na):
        ela =  {'typ': 'a', id: i, 'childs': []}
        droot['childs'].append(ela)
        for j in xrange(nb):
            elb =  {'typ': 'b', id: (i, j), 'childs': []}
            ela['childs'].append(elb)
            for k in xrange(nc):
                elc =  {'typ': 'c', id: (i, j, k), 'childs': []}
                elb['childs'].append(elc)
    td = time.time() - t1
    print td, 'dict'
    del droot

The original question was about general use cases not only about memory. So it should be mentioned here that you also get better performance when instantiating large amounts of objects – interesting e.g. when parsing large documents into objects or from a database.

Here is a comparison of creating object trees with a million entries, using slots and without slots. As a reference also the performance when using plain dicts for the trees (Py2.7.10 on OSX):

********** RUN 1 **********
1.96036410332 <class 'css_tree_select.element.Element'>
3.02922606468 <class 'css_tree_select.element.ElementNoSlots'>
2.90828204155 dict
********** RUN 2 **********
1.77050495148 <class 'css_tree_select.element.Element'>
3.10655999184 <class 'css_tree_select.element.ElementNoSlots'>
2.84120798111 dict
********** RUN 3 **********
1.84069895744 <class 'css_tree_select.element.Element'>
3.21540498734 <class 'css_tree_select.element.ElementNoSlots'>
2.59615707397 dict
********** RUN 4 **********
1.75041103363 <class 'css_tree_select.element.Element'>
3.17366290092 <class 'css_tree_select.element.ElementNoSlots'>
2.70941114426 dict

Test classes (ident, appart from slots):

class Element(object):
    __slots__ = ['_typ', 'id', 'parent', 'childs']
    def __init__(self, typ, id, parent=None):
        self._typ = typ
        self.id = id
        self.childs = []
        if parent:
            self.parent = parent
            parent.childs.append(self)

class ElementNoSlots(object): (same, w/o slots)

testcode, verbose mode:

na, nb, nc = 100, 100, 100
for i in (1, 2, 3, 4):
    print '*' * 10, 'RUN', i, '*' * 10
    # tree with slot and no slot:
    for cls in Element, ElementNoSlots:
        t1 = time.time()
        root = cls('root', 'root')
        for i in xrange(na):
            ela = cls(typ='a', id=i, parent=root)
            for j in xrange(nb):
                elb = cls(typ='b', id=(i, j), parent=ela)
                for k in xrange(nc):
                    elc = cls(typ='c', id=(i, j, k), parent=elb)
        to =  time.time() - t1
        print to, cls
        del root

    # ref: tree with dicts only:
    t1 = time.time()
    droot = {'childs': []}
    for i in xrange(na):
        ela =  {'typ': 'a', id: i, 'childs': []}
        droot['childs'].append(ela)
        for j in xrange(nb):
            elb =  {'typ': 'b', id: (i, j), 'childs': []}
            ela['childs'].append(elb)
            for k in xrange(nc):
                elc =  {'typ': 'c', id: (i, j, k), 'childs': []}
                elb['childs'].append(elc)
    td = time.time() - t1
    print td, 'dict'
    del droot