How do I get a value (any value) out of s without doing s.pop()? I want to leave the item in the set until I am sure I can remove it – something I can only be sure of after an asynchronous call to another host.
Quick and dirty:
>>> elem = s.pop()
>>> s.add(elem)
But do you know of a better way? Ideally in constant time.
Obviously this would create a new list which contains each member of the set, so not great if your set is very large.
回答 2
我想知道这些功能如何在不同的集合上执行,所以我做了一个基准测试:
from random import sample
defForLoop(s):for e in s:breakreturn e
defIterNext(s):return next(iter(s))defListIndex(s):return list(s)[0]defPopAdd(s):
e = s.pop()
s.add(e)return e
defRandomSample(s):return sample(s,1)defSetUnpacking(s):
e,*_ = s
return e
from simple_benchmark import benchmark
b = benchmark([ForLoop,IterNext,ListIndex,PopAdd,RandomSample,SetUnpacking],{2**i: set(range(2**i))for i in range(1,20)},
argument_name='set size',
function_aliases={first:'First'})
b.plot()
I wondered how the functions will perform for different sets, so I did a benchmark:
from random import sample
def ForLoop(s):
for e in s:
break
return e
def IterNext(s):
return next(iter(s))
def ListIndex(s):
return list(s)[0]
def PopAdd(s):
e = s.pop()
s.add(e)
return e
def RandomSample(s):
return sample(s, 1)
def SetUnpacking(s):
e, *_ = s
return e
from simple_benchmark import benchmark
b = benchmark([ForLoop, IterNext, ListIndex, PopAdd, RandomSample, SetUnpacking],
{2**i: set(range(2**i)) for i in range(1, 20)},
argument_name='set size',
function_aliases={first: 'First'})
b.plot()
This plot clearly shows that some approaches (RandomSample, SetUnpacking and ListIndex) depend on the size of the set and should be avoided in the general case (at least if performance might be important). As already shown by the other answers the fastest way is ForLoop.
However as long as one of the constant time approaches is used the performance difference will be negligible.
iteration_utilities (Disclaimer: I’m the author) contains a convenience function for this use-case: first:
>>> from iteration_utilities import first
>>> first({1,2,3,4})
1
I also included it in the benchmark above. It can compete with the other two “fast” solutions but the difference isn’t much either way.
回答 3
tl; dr
for first_item in muh_set: break仍然是Python 3.x中的最佳方法。诅咒你,圭多。
from timeit importTimer
stats =["for i in range(1000): \n\tfor x in s: \n\t\tbreak","for i in range(1000): next(iter(s))","for i in range(1000): s.add(s.pop())","for i in range(1000): list(s)[0]","for i in range(1000): random.sample(s, 1)",]for stat in stats:
t =Timer(stat, setup="import random\ns=set(range(100))")try:print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))except:
t.print_exc()
快速过时的计时
看哪!按最快到最慢的片段排序:
$ ./test_get.py
Timeforfor i in range(1000):for x in s:break:0.249871Timeforfor i in range(1000): next(iter(s)):0.526266Timeforfor i in range(1000): s.add(s.pop()):0.658832Timeforfor i in range(1000): list(s)[0]:4.117106Timeforfor i in range(1000): random.sample(s,1):21.851104
全家人的面部植物
毫不奇怪,手动迭代的速度至少是下一个最快解决方案的两倍。尽管与Bad Old Python 2.x相比(以前的手动迭代速度至少快四倍),差距有所减小,但PPE 20狂热者对我而言最冗长的解决方案是最好的解决方案感到失望。至少将一个集转换为一个列表以仅提取该集的第一个元素是预期的那样可怕。感谢Guido,愿他的光芒继续指导我们。
from timeit import Timer
stats = [
"for i in range(1000): \n\tfor x in s: \n\t\tbreak",
"for i in range(1000): next(iter(s))",
"for i in range(1000): s.add(s.pop())",
"for i in range(1000): list(s)[0]",
"for i in range(1000): random.sample(s, 1)",
]
for stat in stats:
t = Timer(stat, setup="import random\ns=set(range(100))")
try:
print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
except:
t.print_exc()
Quickly Obsoleted Timeless Timings
Behold! Ordered by fastest to slowest snippets:
$ ./test_get.py
Time for for i in range(1000):
for x in s:
break: 0.249871
Time for for i in range(1000): next(iter(s)): 0.526266
Time for for i in range(1000): s.add(s.pop()): 0.658832
Time for for i in range(1000): list(s)[0]: 4.117106
Time for for i in range(1000): random.sample(s, 1): 21.851104
Faceplants for the Whole Family
Unsurprisingly, manual iteration remains at least twice as fast as the next fastest solution. Although the gap has decreased from the Bad Old Python 2.x days (in which manual iteration was at least four times as fast), it disappoints the PEP 20 zealot in me that the most verbose solution is the best. At least converting a set into a list just to extract the first element of the set is as horrible as expected. Thank Guido, may his light continue to guide us.
Surprisingly, the RNG-based solution is absolutely horrible. List conversion is bad, but randomreally takes the awful-sauce cake. So much for the Random Number God.
I just wish the amorphous They would PEP up a set.get_first() method for us already. If you’re reading this, They: “Please. Do something.”
from timeit import*
stats =["for i in xrange(1000): iter(s).next() ","for i in xrange(1000): \n\tfor x in s: \n\t\tbreak","for i in xrange(1000): s.add(s.pop()) ","for i in xrange(1000): s.get() "]for stat in stats:
t =Timer(stat, setup="s=set(range(100))")try:print"Time for %s:\t %f"%(stat, t.timeit(number=1000))except:
t.print_exc()
输出为:
$ ./test_get.py
Timeforfor i in xrange(1000): iter(s).next():0.433080Timeforfor i in xrange(1000):for x in s:break:0.148695Timeforfor i in xrange(1000): s.add(s.pop()):0.317418Timeforfor i in xrange(1000): s.get():0.146673
To provide some timing figures behind the different approaches, consider the following code.
The get() is my custom addition to Python’s setobject.c, being just a pop() without removing the element.
from timeit import *
stats = ["for i in xrange(1000): iter(s).next() ",
"for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
"for i in xrange(1000): s.add(s.pop()) ",
"for i in xrange(1000): s.get() "]
for stat in stats:
t = Timer(stat, setup="s=set(range(100))")
try:
print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
except:
t.print_exc()
The output is:
$ ./test_get.py
Time for for i in xrange(1000): iter(s).next() : 0.433080
Time for for i in xrange(1000):
for x in s:
break: 0.148695
Time for for i in xrange(1000): s.add(s.pop()) : 0.317418
Time for for i in xrange(1000): s.get() : 0.146673
This means that the for/break solution is the fastest (sometimes faster than the custom get() solution).
回答 5
由于您需要一个随机元素,因此也可以使用:
>>>import random
>>> s = set([1,2,3])>>> random.sample(s,1)[2]
Since you want a random element, this will also work:
>>> import random
>>> s = set([1,2,3])
>>> random.sample(s, 1)
[2]
The documentation doesn’t seem to mention performance of random.sample. From a really quick empirical test with a huge list and a huge set, it seems to be constant time for a list but not for the set. Also, iteration over a set isn’t random; the order is undefined but predictable:
>>> list(set(range(10))) == range(10)
True
If randomness is important and you need a bunch of elements in constant time (large sets), I’d use random.sample and convert to a list first:
>>> lst = list(s) # once, O(len(s))?
...
>>> e = random.sample(lst, 1)[0] # constant time
from timeit import*
stats =["for i in range(1000): next(iter(s))","for i in range(1000): \n\tfor x in s: \n\t\tbreak","for i in range(1000): s.add(s.pop())"]for stat in stats:
t =Timer(stat, setup="s=set(range(100000))")try:print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))except:
t.print_exc()
输出:
Timeforfor i in range(1000): next(iter(s)):0.205888Timeforfor i in range(1000):for x in s:break:0.083397Timeforfor i in range(1000): s.add(s.pop()):0.226570
from timeit import*
stats =["while s:\n\ta = next(iter(s))\n\ts.remove(a)","while s:\n\tfor x in s: break\n\ts.remove(x)","while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]for stat in stats:
t =Timer(stat, setup="s=set(range(100000))")try:print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))except:
t.print_exc()
结果是:
Timeforwhile s:
a = next(iter(s))
s.remove(a):2.938494Timeforwhile s:for x in s:break
s.remove(x):2.728367Timeforwhile s:
x=s.pop()
s.add(x)
s.remove(x):0.030272
Following @wr. post, I get similar results (for Python3.5)
from timeit import *
stats = ["for i in range(1000): next(iter(s))",
"for i in range(1000): \n\tfor x in s: \n\t\tbreak",
"for i in range(1000): s.add(s.pop())"]
for stat in stats:
t = Timer(stat, setup="s=set(range(100000))")
try:
print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
except:
t.print_exc()
Output:
Time for for i in range(1000): next(iter(s)): 0.205888
Time for for i in range(1000):
for x in s:
break: 0.083397
Time for for i in range(1000): s.add(s.pop()): 0.226570
However, when changing the underlying set (e.g. call to remove()) things go badly for the iterable examples (for, iter):
from timeit import *
stats = ["while s:\n\ta = next(iter(s))\n\ts.remove(a)",
"while s:\n\tfor x in s: break\n\ts.remove(x)",
"while s:\n\tx=s.pop()\n\ts.add(x)\n\ts.remove(x)"]
for stat in stats:
t = Timer(stat, setup="s=set(range(100000))")
try:
print("Time for %s:\t %f"%(stat, t.timeit(number=1000)))
except:
t.print_exc()
Results in:
Time for while s:
a = next(iter(s))
s.remove(a): 2.938494
Time for while s:
for x in s: break
s.remove(x): 2.728367
Time for while s:
x=s.pop()
s.add(x)
s.remove(x): 0.030272
A side effect of this choice is that your code will be backwards compatible with older, pre-set versions of Python. It’s maybe not the best answer but it’s another option.
Edit: You can even do something like this to hide the fact that you used a dict instead of an array or set: