标签归档:language-design

为什么python dict.update()不返回对象?

问题:为什么python dict.update()不返回对象?

我正在尝试:

award_dict = {
    "url" : "http://facebook.com",
    "imageurl" : "http://farm4.static.flickr.com/3431/3939267074_feb9eb19b1_o.png",
    "count" : 1,
}

def award(name, count, points, desc_string, my_size, parent) :
    if my_size > count :
        a = {
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }
        a.update(award_dict)
        return self.add_award(a, siteAlias, alias).award

但是,如果觉得该函数真的很麻烦,我宁愿这样做:

        return self.add_award({
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }.update(award_dict), siteAlias, alias).award

为什么不更新返回对象,以便您可以链接?

JQuery这样做是为了进行链接。为什么在python中不可接受?

I ‘m trying to do :

award_dict = {
    "url" : "http://facebook.com",
    "imageurl" : "http://farm4.static.flickr.com/3431/3939267074_feb9eb19b1_o.png",
    "count" : 1,
}

def award(name, count, points, desc_string, my_size, parent) :
    if my_size > count :
        a = {
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }
        a.update(award_dict)
        return self.add_award(a, siteAlias, alias).award

But if felt really cumbersome in the function, and I would have rather done :

        return self.add_award({
            "name" : name,
            "description" : desc_string % count,
            "points" : points,
            "parent_award" : parent,
        }.update(award_dict), siteAlias, alias).award

Why doesn’t update return the object so you can chain?

JQuery does this to do chaining. Why isn’t it acceptable in python?


回答 0

Python大多实现了务实的命令查询分离风格:mutator返回None(带有务实的异常,例如pop;-),因此它们不可能与访问器混淆(同样,赋值不是表达式,语句-表达式分离,依此类推)。

这并不意味着没有很多方法可以在您真正想要的时候将它们合并,例如,dict(a, **award_dict)做出一个新的字典,就像您希望.update返回的字典一样。所以,如果您真的觉得很重要,那就为什么不使用THAT ?

编辑:顺便说一句,在您的特定情况下,无需a按照以下方式进行创建:

dict(name=name, description=desc % count, points=points, parent_award=parent,
     **award_dict)

创建一个具有与您的语义完全相同的语义的dict a.update(award_dict)(包括在发生冲突的情况下,in中的条目award_dict会覆盖您明确提供的条目的事实;要获取其他语义,即,使显式条目“赢得”此类冲突,award_dict作为唯一的位置 arg 传递,关键字“>” 之前传递,并丧失**形式- dict(award_dict, name=name等等)。

Python’s mostly implementing a pragmatically tinged flavor of command-query separation: mutators return None (with pragmatically induced exceptions such as pop;-) so they can’t possibly be confused with accessors (and in the same vein, assignment is not an expression, the statement-expression separation is there, and so forth).

That doesn’t mean there aren’t a lot of ways to merge things up when you really want, e.g., dict(a, **award_dict) makes a new dict much like the one you appear to wish .update returned — so why not use THAT if you really feel it’s important?

Edit: btw, no need, in your specific case, to create a along the way, either:

dict(name=name, description=desc % count, points=points, parent_award=parent,
     **award_dict)

creates a single dict with exactly the same semantics as your a.update(award_dict) (including, in case of conflicts, the fact that entries in award_dict override those you’re giving explicitly; to get the other semantics, i.e., to have explicit entries “winning” such conflicts, pass award_dict as the sole positional arg, before the keyword ones, and bereft of the ** form — dict(award_dict, name=name etc etc).


回答 1

按照惯例,Python的API区分过程和函数。函数根据其参数(包括任何目标对象)计算新值;过程会修改对象,并且不返回任何内容(即,它们返回None)。因此,程序具有副作用,而功能则没有。更新是一个过程,因此它不返回值。

这样做的动机是,否则可能会导致不良的副作用。考虑

bar = foo.reverse()

如果reverse(也将反向替换列表)也返回列表,则用户可能会认为reverse返回一个新列表,该列表已分配给bar,而永远不会注意到foo也被修改了。通过使反向返回为“无”,他们可以立即认识到bar不是反向的结果,并且看起来更接近反向的效果。

Python’s API, by convention, distinguishes between procedures and functions. Functions compute new values out of their parameters (including any target object); procedures modify objects and don’t return anything (i.e. they return None). So procedures have side effects, functions don’t. update is a procedure, hence it doesn’t return a value.

The motivation for doing it that way is that otherwise, you may get undesirable side effects. Consider

bar = foo.reverse()

If reverse (which reverses the list in-place) would also return the list, users may think that reverse returns a new list which gets assigned to bar, and never notice that foo also gets modified. By making reverse return None, they immediately recognize that bar is not the result of the reversal, and will look more close what the effect of reverse is.


回答 2

这很容易,因为:

(lambda d: d.update(dict2) or d)(d1)

This is easy as:

(lambda d: d.update(dict2) or d)(d1)

回答 3

>>> dict_merge = lambda a,b: a.update(b) or a
>>> dict_merge({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}

请注意,除了返回合并的字典外,它还会就地修改第一个参数。因此dict_merge(a,b)将修改a。

或者,当然,您可以全部内联:

>>> (lambda a,b: a.update(b) or a)({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}
>>> dict_merge = lambda a,b: a.update(b) or a
>>> dict_merge({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}

Note that as well as returning the merged dict, it modifies the first parameter in-place. So dict_merge(a,b) will modify a.

Or, of course, you can do it all inline:

>>> (lambda a,b: a.update(b) or a)({'a':1, 'b':3},{'c':5})
{'a': 1, 'c': 5, 'b': 3}

回答 4

没有足够的声誉来评论顶部答案

@beardc这似乎不是CPython。PyPy给我“ TypeError:关键字必须是字符串”

之所以**kwargs只能使用解决方案,是因为要合并的字典仅具有string类型的键

>>> dict({1:2}, **{3:4})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

>>> dict({1:2}, **{'3':4})
{1: 2, '3': 4}

not enough reputation for comment left on top answer

@beardc this doesn’t seem to be CPython thing. PyPy gives me “TypeError: keywords must be strings”

The solution with **kwargs only works because the dictionary to be merged only has keys of type string.

i.e.

>>> dict({1:2}, **{3:4})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: keyword arguments must be strings

vs

>>> dict({1:2}, **{'3':4})
{1: 2, '3': 4}

回答 5

不是说它不被接受,而是不是那样dicts实现的。

如果您查看Django的ORM,它将充分利用链接。不劝阻它,您甚至可以继承dict并仅重写update以执行update和return self,如果您确实需要的话。

class myDict(dict):
    def update(self, *args):
        dict.update(self, *args)
        return self

Its not that it isn’t acceptable, but rather that dicts weren’t implemented that way.

If you look at Django’s ORM, it makes extensive use of chaining. Its not discouraged, you could even inherit from dict and only override update to do update and return self, if you really want it.

class myDict(dict):
    def update(self, *args):
        dict.update(self, *args)
        return self

回答 6

我会尽可能接近您建议的解决方案

from collections import ChainMap

return self.add_award(ChainMap(award_dict, {
    "name" : name,
    "description" : desc_string % count,
    "points" : points,
    "parent_award" : parent,
}), siteAlias, alias).award

as close to your proposed solution as I could get

from collections import ChainMap

return self.add_award(ChainMap(award_dict, {
    "name" : name,
    "description" : desc_string % count,
    "points" : points,
    "parent_award" : parent,
}), siteAlias, alias).award

回答 7

对于那些迟到的人,我已经安排了一些时间安排(Py 3.7),显示了.update()保留输入的基础方法看起来要快一点(约5%),而就地更新时则要快得多(约30%)。 。

像往常一样,所有基准都应加一粒盐。

def join2(dict1, dict2, inplace=False):
    result = dict1 if inplace else dict1.copy()
    result.update(dict2)
    return result


def join(*items):
    iter_items = iter(items)
    result = next(iter_items).copy()
    for item in iter_items:
        result.update(item)
    return result


def update_or(dict1, dict2):
    return dict1.update(dict2) or dict1


d1 = {i: str(i) for i in range(1000000)}
d2 = {str(i): i for i in range(1000000)}

%timeit join2(d1, d2)
# 258 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit join(d1, d2)
# 262 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dict(d1, **d2)
# 267 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
# 267 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

就地操作的时序有些棘手,因此需要在额外的复制操作中进行修改(第一个时序仅供参考):

%timeit dd = d1.copy()
# 44.9 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit dd = d1.copy(); join2(dd, d2)
# 296 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); join2(dd, d2, True)
# 234 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); update_or(dd, d2)
# 235 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

For those coming late to the party, I had put some timing together (Py 3.7), showing that .update() based methods look a bit (~5%) faster when inputs are preserved and noticeably (~30%) faster when just updating in-place.

As usual, all the benchmarks should be taken with a grain of salt.

def join2(dict1, dict2, inplace=False):
    result = dict1 if inplace else dict1.copy()
    result.update(dict2)
    return result


def join(*items):
    iter_items = iter(items)
    result = next(iter_items).copy()
    for item in iter_items:
        result.update(item)
    return result


def update_or(dict1, dict2):
    return dict1.update(dict2) or dict1


d1 = {i: str(i) for i in range(1000000)}
d2 = {str(i): i for i in range(1000000)}

%timeit join2(d1, d2)
# 258 ms ± 1.47 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit join(d1, d2)
# 262 ms ± 2.97 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dict(d1, **d2)
# 267 ms ± 2.74 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit {**d1, **d2}
# 267 ms ± 1.84 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

The timings for the in-place operations are a bit trickier, so it would need to be modified along an extra copy operation (the first timing is just for reference):

%timeit dd = d1.copy()
# 44.9 ms ± 495 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit dd = d1.copy(); join2(dd, d2)
# 296 ms ± 2.05 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); join2(dd, d2, True)
# 234 ms ± 1.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit dd = d1.copy(); update_or(dd, d2)
# 235 ms ± 1.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

回答 8

import itertools
dict_merge = lambda *args: dict(itertools.chain(*[d.iteritems() for d in args]))
import itertools
dict_merge = lambda *args: dict(itertools.chain(*[d.iteritems() for d in args]))

回答 9

刚在Python 3.4中尝试过此操作(因此无法使用高级{**dict_1, **dict_2}语法)。

我希望能够在字典中使用非字符串键,并提供任意数量的字典。

另外,我想制作一本新词典,所以我选择不使用collections.ChainMap(这是我dict.update最初不想使用的原因。

这是我最后写的:

def merge_dicts(*dicts):
    all_keys  = set(k for d in dicts for k in d.keys())
    chain_map = ChainMap(*reversed(dicts))
    return {k: chain_map[k] for k in all_keys}

merge_maps({'1': 1}, {'2': 2, '3': 3}, {'1': 4, '3': 5})
# {'1': 4, '3': 5, '2': 2}

Just been trying this myself in Python 3.4 (so wasn’t able to use the fancy {**dict_1, **dict_2} syntax).

I wanted to be able to have non-string keys in dictionaries as well as provide an arbitrary amount of dictionaries.

Also, I wanted to make a new dictionary so I opted to not use collections.ChainMap (kinda the reason I didn’t want to use dict.update initially.

Here’s what I ended up writing:

def merge_dicts(*dicts):
    all_keys  = set(k for d in dicts for k in d.keys())
    chain_map = ChainMap(*reversed(dicts))
    return {k: chain_map[k] for k in all_keys}

merge_maps({'1': 1}, {'2': 2, '3': 3}, {'1': 4, '3': 5})
# {'1': 4, '3': 5, '2': 2}

为什么Python没有符号功能?

问题:为什么Python没有符号功能?

我不明白为什么Python没有sign功能。它有一个abs内置的(我认为sign是姐姐),但是没有sign

在python 2.6中甚至有一个copysign函数(在math中),但是没有符号。copysign(x,y)当您可以只写一个sign然后copysign直接从中获取时,为什么还要麻烦写一个abs(x) * sign(y)?后者会更清楚:x带有y的符号,而带copysign的您必须记住它是x带有y的符号还是y带有x的符号!

显然sign(x),除了cmp(x,0),它不提供任何其他功能,但是它也将更具可读性(对于像python这样的易读性语言,这将是一个很大的优势)。

如果我是python设计人员,那么我会反过来:没有cmp内置的,而是一个sign。当需要时cmp(x,y),您可以做一个sign(x-y)(或者,对于非数值的东西更好,只需x> y-当然,这应该要求sorted接受布尔值而不是整数比较器)。这也将更加清晰:正时x>y(而与cmp你必须记住公约正值当第一的,但它可能是周围的其他方法)。当然cmp,出于其他原因(例如,在对非数字事物进行排序时,或者如果您希望排序是稳定的,仅使用布尔值是不可能的),就有意义了

因此,问题是:为什么Python设计人员决定将sign函数保留在语言之外?为什么要麻烦copysign父母而不是父母sign

我想念什么吗?

编辑-Peter Hansen评论后。足够公平,您没有使用它,但是您没有说您使用python做什么。在使用python的7年中,我无数次需要它,最后一个是打破骆驼背的稻草!

是的,您可以传递cmp,但是我传递它的90%的时间是成语,这样 lambda x,y: cmp(score(x),score(y))就可以很好地使用sign了。

最后,我希望您同意这sign会比有用copysign,所以即使我购买了您的视图,为什么还要在数学中定义它而不是符号呢?copysign如何比sign有用呢?

I can’t understand why Python doesn’t have a sign function. It has an abs builtin (which I consider sign‘s sister), but no sign.

In python 2.6 there is even a copysign function (in math), but no sign. Why bother to write a copysign(x,y) when you could just write a sign and then get the copysign directly from abs(x) * sign(y)? The latter would be much more clear: x with the sign of y, whereas with copysign you have to remember if it’s x with the sign of y or y with the sign of x!

Obviously sign(x) does not provide anything more than cmp(x,0), but it would be much more readable that this too (and for a greatly readable language like python, this would have been a big plus).

If I were a python designer, I would been the other way arond: no cmp builtin, but a sign. When you need cmp(x,y), you could just do a sign(x-y) (or, even better for non-numerical stuff, just a x>y – of course this should have required sorted accepting a boolean instead of an integer comparator). This would also be more clear: positive when x>y (whereas with cmp you have to remember the convention positive when the first is bigger, but it could be the other way around). Of course cmp makes sense in its own for other reasons (e.g. when sorting non-numerical things, or if you want the sort to be stable, which is not possible using with simply a boolean)

So, the question is: why did the Python designer(s) decide to leave the sign function out of the language? Why the heck bother with copysign and not its parent sign?

Am I missing something?

EDIT – after Peter Hansen comment. Fair enough that you didn’t use it, but you didn’t say what you use python for. In 7 years that I use python, I needed it countless times, and the last is the straw that broke the camel’s back!

Yes, you can pass cmp around, but 90% of the times that I needed to pass it was in an idiom like lambda x,y: cmp(score(x),score(y)) that would have worked with sign just fine.

Finally, I hope you agree that sign would be more useful than copysign, so even if I bought your view, why bother about defining that in math, instead of sign? How can copysign be so much useful than sign?


回答 0

编辑:

确实,数学中包含一个补丁,但未被接受,因为他们在所有极端情况(+/- 0,+ /-nan等)上均未达成共识。sign()

因此,他们决定仅实施copysign,尽管可以使用它(尽管更为冗长),才能将最终情况下所需的行为委托给最终用户有时可能需要调用cmp(x,0)


我不知道为什么它不是内置的,但我有一些想法。

copysign(x,y):
Return x with the sign of y.

最重要的copysign是,是的超集signcopysignx = 1的调用与sign函数相同。这样您就可以使用它,copysign不必理会它

>>> math.copysign(1, -4)
-1.0
>>> math.copysign(1, 3)
1.0

如果您厌倦了传递两个完整的参数,则可以采用sign这种方式实现,它仍将与其他人提到的IEEE兼容:

>>> sign = functools.partial(math.copysign, 1) # either of these
>>> sign = lambda x: math.copysign(1, x) # two will work
>>> sign(-4)
-1.0
>>> sign(3)
1.0
>>> sign(0)
1.0
>>> sign(-0.0)
-1.0
>>> sign(float('nan'))
-1.0

其次,通常,当您想要某物的符号时,您最终会将其乘以另一个值。当然,那基本上是什么copysign

因此,代替:

s = sign(a)
b = b * s

您可以这样做:

b = copysign(b, a)

是的,我很惊讶您已经使用Python 7年了,认为cmp可以如此轻松地将其删除并替换为sign!您是否从未使用__cmp__方法实现类?您是否从未调用cmp和指定自定义比较器函数?

总而言之,我发现自己也想要一个sign函数,但是copysign第一个参数为1就可以了。我不同意这样做sign会比有用copysign,因为我已经证明它只是相同功能的一部分。

EDIT:

Indeed there was a patch which included sign() in math, but it wasn’t accepted, because they didn’t agree on what it should return in all the edge cases (+/-0, +/-nan, etc)

So they decided to implement only copysign, which (although more verbose) can be used to delegate to the end user the desired behavior for edge cases – which sometimes might require the call to cmp(x,0).


I don’t know why it’s not a built-in, but I have some thoughts.

copysign(x,y):
Return x with the sign of y.

Most importantly, copysign is a superset of sign! Calling copysign with x=1 is the same as a sign function. So you could just use copysign and forget about it.

>>> math.copysign(1, -4)
-1.0
>>> math.copysign(1, 3)
1.0

If you get sick of passing two whole arguments, you can implement sign this way, and it will still be compatible with the IEEE stuff mentioned by others:

>>> sign = functools.partial(math.copysign, 1) # either of these
>>> sign = lambda x: math.copysign(1, x) # two will work
>>> sign(-4)
-1.0
>>> sign(3)
1.0
>>> sign(0)
1.0
>>> sign(-0.0)
-1.0
>>> sign(float('nan'))
-1.0

Secondly, usually when you want the sign of something, you just end up multiplying it with another value. And of course that’s basically what copysign does.

So, instead of:

s = sign(a)
b = b * s

You can just do:

b = copysign(b, a)

And yes, I’m surprised you’ve been using Python for 7 years and think cmp could be so easily removed and replaced by sign! Have you never implemented a class with a __cmp__ method? Have you never called cmp and specified a custom comparator function?

In summary, I’ve found myself wanting a sign function too, but copysign with the first argument being 1 will work just fine. I disagree that sign would be more useful than copysign, as I’ve shown that it’s merely a subset of the same functionality.


回答 1

“ copysign”由IEEE 754和C99规范的一部分定义。这就是为什么它在Python中。该函数不能完全由abs(x)* sign(y)实现,因为它应该如何处理NaN值。

>>> import math
>>> math.copysign(1, float("nan"))
1.0
>>> math.copysign(1, float("-nan"))
-1.0
>>> math.copysign(float("nan"), 1)
nan
>>> math.copysign(float("nan"), -1)
nan
>>> float("nan") * -1
nan
>>> float("nan") * 1
nan
>>> 

这使copysign()比sign()更有用。

至于为什么标准的Python中没有IEEE的signbit(x)的具体原因,我不知道。我可以做一些假设,但这是猜测。

数学模块本身使用copysign(1,x)作为检查x是负数还是非负数的方法。在大多数情况下,处理数学函数似乎比使sign(x)返回1、0或-1有用,因为要考虑的情况要少得多。例如,以下内容来自Python的math模块:

static double
m_atan2(double y, double x)
{
        if (Py_IS_NAN(x) || Py_IS_NAN(y))
                return Py_NAN;
        if (Py_IS_INFINITY(y)) {
                if (Py_IS_INFINITY(x)) {
                        if (copysign(1., x) == 1.)
                                /* atan2(+-inf, +inf) == +-pi/4 */
                                return copysign(0.25*Py_MATH_PI, y);
                        else
                                /* atan2(+-inf, -inf) == +-pi*3/4 */
                                return copysign(0.75*Py_MATH_PI, y);
                }
                /* atan2(+-inf, x) == +-pi/2 for finite x */
                return copysign(0.5*Py_MATH_PI, y);

在那里,您可以清楚地看到copysign()比三值sign()函数更有效。

你写了:

如果我是python设计器,那我将是另一种方式:没有内置的cmp(),但是有一个sign()

这意味着您不知道cmp()会用于数字以外的东西。cmp(“ This”,“ That”)不能通过sign()函数实现。

编辑以在其他地方整理我的其他答案

您的辩解基于abs()和sign()经常一起出现的方式。由于C标准库不包含任何类型的’sign(x)’函数,因此我不知道您如何证明自己的观点。有一个abs(int)和fabs(double)和fabsf(float)和fabsl(long),但没有提及符号。有“ copysign()”和“ signbit()”,但它们仅适用于IEEE 754数字。

对于复数,如果要实现,sign(-3 + 4j)在Python中返回什么?abs(-3 + 4j)返回5.0。这是一个清晰的例子,说明在sign()毫无意义的地方如何使用abs()。

假设将sign(x)添加到Python,作为对abs(x)的补充。如果“ x”是实现__abs __(self)方法的用户定义类的实例,则abs(x)将调用x .__ abs __()。为了正常工作,以与Python相同的方式处理abs(x),Python将必须获得一个符号(x)插槽。

对于相对不需要的功能来说,这是过多的。此外,为什么应该有sign(x)而不存在nonnegative(x)和nonpositive(x)?我来自Python数学模块实现的代码片段显示了如何使用copybit(x,y)来实现nonnegative(),而简单的sign(x)无法做到这一点。

Python应该支持对IEEE 754 / C99数学函数的更好支持。这将添加一个signbit(x)函数,该函数将在浮点数的情况下实现您想要的功能。它不适用于整数或复数,更不用说字符串了,并且没有您要查找的名称。

您问“为什么”,答案是“ sign(x)没用”。您断言它很有用。然而,您的评论表明,您不足以提出该断言,这意味着您必须提供令人信服的证据来证明其必要性。说NumPy实现了它还不足以令人信服。您将需要展示如何使用符号函数改进现有代码的案例。

而且它超出了StackOverflow的范围。将其改为Python列表之一。

“copysign” is defined by IEEE 754, and part of the C99 specification. That’s why it’s in Python. The function cannot be implemented in full by abs(x) * sign(y) because of how it’s supposed to handle NaN values.

>>> import math
>>> math.copysign(1, float("nan"))
1.0
>>> math.copysign(1, float("-nan"))
-1.0
>>> math.copysign(float("nan"), 1)
nan
>>> math.copysign(float("nan"), -1)
nan
>>> float("nan") * -1
nan
>>> float("nan") * 1
nan
>>> 

That makes copysign() a more useful function than sign().

As to specific reasons why IEEE’s signbit(x) is not available in standard Python, I don’t know. I can make assumptions, but it would be guessing.

The math module itself uses copysign(1, x) as a way to check if x is negative or non-negative. For most cases dealing with mathematical functions that seems more useful than having a sign(x) which returns 1, 0, or -1 because there’s one less case to consider. For example, the following is from Python’s math module:

static double
m_atan2(double y, double x)
{
        if (Py_IS_NAN(x) || Py_IS_NAN(y))
                return Py_NAN;
        if (Py_IS_INFINITY(y)) {
                if (Py_IS_INFINITY(x)) {
                        if (copysign(1., x) == 1.)
                                /* atan2(+-inf, +inf) == +-pi/4 */
                                return copysign(0.25*Py_MATH_PI, y);
                        else
                                /* atan2(+-inf, -inf) == +-pi*3/4 */
                                return copysign(0.75*Py_MATH_PI, y);
                }
                /* atan2(+-inf, x) == +-pi/2 for finite x */
                return copysign(0.5*Py_MATH_PI, y);

There you can clearly see that copysign() is a more effective function than a three-valued sign() function.

You wrote:

If I were a python designer, I would been the other way around: no cmp() builtin, but a sign()

That means you don’t know that cmp() is used for things besides numbers. cmp(“This”, “That”) cannot be implemented with a sign() function.

Edit to collate my additional answers elsewhere:

You base your justifications on how abs() and sign() are often seen together. As the C standard library does not contain a ‘sign(x)’ function of any sort, I don’t know how you justify your views. There’s an abs(int) and fabs(double) and fabsf(float) and fabsl(long) but no mention of sign. There is “copysign()” and “signbit()” but those only apply to IEEE 754 numbers.

With complex numbers, what would sign(-3+4j) return in Python, were it to be implemented? abs(-3+4j) return 5.0. That’s a clear example of how abs() can be used in places where sign() makes no sense.

Suppose sign(x) were added to Python, as a complement to abs(x). If ‘x’ is an instance of a user-defined class which implements the __abs__(self) method then abs(x) will call x.__abs__(). In order to work correctly, to handle abs(x) in the same way then Python will have to gain a sign(x) slot.

This is excessive for a relatively unneeded function. Besides, why should sign(x) exist and nonnegative(x) and nonpositive(x) not exist? My snippet from Python’s math module implementation shows how copybit(x, y) can be used to implement nonnegative(), which a simple sign(x) cannot do.

Python should support have better support for IEEE 754/C99 math function. That would add a signbit(x) function, which would do what you want in the case of floats. It would not work for integers or complex numbers, much less strings, and it wouldn’t have the name you are looking for.

You ask “why”, and the answer is “sign(x) isn’t useful.” You assert that it is useful. Yet your comments show that you do not know enough to be able to make that assertion, which means you would have to show convincing evidence of its need. Saying that NumPy implements it is not convincing enough. You would need to show cases of how existing code would be improved with a sign function.

And that it outside the scope of StackOverflow. Take it instead to one of the Python lists.


回答 2

另一个衬里的sign()

sign = lambda x: (1, -1)[x<0]

如果您希望它在x = 0时返回0:

sign = lambda x: x and (1, -1)[x<0]

Another one liner for sign()

sign = lambda x: (1, -1)[x<0]

If you want it to return 0 for x = 0:

sign = lambda x: x and (1, -1)[x<0]

回答 3

由于cmp删除,因此您可以获得与

def cmp(a, b):
    return (a > b) - (a < b)

def sign(a):
    return (a > 0) - (a < 0)

它的工作原理为floatint甚至Fraction。对于float,注意sign(float("nan"))为零。

Python不需要比较返回布尔值,因此将比较强制为bool()可以避免允许的但不常见的实现:

def sign(a):
    return bool(a > 0) - bool(a < 0)

Since cmp has been removed, you can get the same functionality with

def cmp(a, b):
    return (a > b) - (a < b)

def sign(a):
    return (a > 0) - (a < 0)

It works for float, int and even Fraction. In the case of float, notice sign(float("nan")) is zero.

Python doesn’t require that comparisons return a boolean, and so coercing the comparisons to bool() protects against allowable, but uncommon implementation:

def sign(a):
    return bool(a > 0) - bool(a < 0)

回答 4

仅符合Wikipedia定义的正确答案

维基百科上定义如下:

因此,

sign = lambda x: -1 if x < 0 else (1 if x > 0 else (0 if x == 0 else NaN))

出于所有意图和目的,可以简化为:

sign = lambda x: -1 if x < 0 else (1 if x > 0 else 0)

该函数定义执行速度很快,保证输出0、0.0,-0.0,-4和5的正确结果(请参见其他错误答案的注释)。

请注意,零(0)既不是正数也不是负数

Only correct answer compliant with the Wikipedia definition

The definition on Wikipedia reads:

Hence,

sign = lambda x: -1 if x < 0 else (1 if x > 0 else (0 if x == 0 else NaN))

Which for all intents and purposes may be simplified to:

sign = lambda x: -1 if x < 0 else (1 if x > 0 else 0)

This function definition executes fast and yields guaranteed correct results for 0, 0.0, -0.0, -4 and 5 (see comments to other incorrect answers).

Note that zero (0) is neither positive nor negative.


回答 5

numpy具有符号功能,并且还为您提供其他功能的加成。所以:

import numpy as np
x = np.sign(y)

请注意结果是numpy.float64:

>>> type(np.sign(1.0))
<type 'numpy.float64'>

对于json之类的东西,这很重要,因为json不知道如何序列化numpy.float64类型。在这种情况下,您可以执行以下操作:

float(np.sign(y))

获得定期浮动。

numpy has a sign function, and gives you a bonus of other functions as well. So:

import numpy as np
x = np.sign(y)

Just be careful that the result is a numpy.float64:

>>> type(np.sign(1.0))
<type 'numpy.float64'>

For things like json, this matters, as json does not know how to serialize numpy.float64 types. In that case, you could do:

float(np.sign(y))

to get a regular float.


回答 6

尝试运行此命令,其中x是任意数字

int_sign = bool(x > 0) - bool(x < 0)

对bool()的强制处理比较运算符不返回布尔值的可能性

Try running this, where x is any number

int_sign = bool(x > 0) - bool(x < 0)

The coercion to bool() handles the possibility that the comparison operator doesn’t return a boolean.


回答 7

是的,正确的sign()函数至少应该在math模块中-就像在numpy中一样。因为一个人经常需要面向数学的代码。

math.copysign()也可以独立使用。

cmp()obj.__cmp__()…通常独立地具有很高的重要性。不只是面向数学的代码。考虑比较/排序元组,日期对象,…

http://bugs.python.org/issue1640上关于省略的开发参数math.sign()很奇怪,因为:

  • 没有单独的 -NaN
  • sign(nan) == nan 不用担心(如exp(nan)
  • sign(-0.0) == sign(0.0) == 0 不用担心
  • sign(-inf) == -1 不用担心

-因为它在numpy中

Yes a correct sign() function should be at least in the math module – as it is in numpy. Because one frequently needs it for math oriented code.

But math.copysign() is also useful independently.

cmp() and obj.__cmp__() … have generally high importance independently. Not just for math oriented code. Consider comparing/sorting tuples, date objects, …

The dev arguments at http://bugs.python.org/issue1640 regarding the omission of math.sign() are odd, because:

  • There is no separate -NaN
  • sign(nan) == nan without worry (like exp(nan) )
  • sign(-0.0) == sign(0.0) == 0 without worry
  • sign(-inf) == -1 without worry

— as it is in numpy


回答 8

在Python 2中,cmp()返回一个整数:不需要结果为-1、0或1,因此sign(x)与并不相同cmp(x,0)

在Python 3中,cmp()已删除它,以进行丰富的比较。对于cmp(),Python 3 建议这样做

def cmp(a, b):
    return (a > b) - (a < b)

这对cmp()很好,但是又不能用于sign(),因为比较运算符不需要返回booleans

为了解决这种可能性,必须将比较结果强制为布尔值:

 def sign(a):
    return bool(x > 0) - bool(x < 0)

这适用于所有type完全有序的(包括特殊值,例如NaNinfinies)。

In Python 2, cmp() returns an integer: there’s no requirement that the result be -1, 0, or 1, so sign(x) is not the same as cmp(x,0).

In Python 3, cmp() has been removed in favor of rich comparison. For cmp(), Python 3 suggests this:

def cmp(a, b):
    return (a > b) - (a < b)

which is fine for cmp(), but again can’t be used for sign() because the comparison operators need not return booleans.

To deal with this possibility, the comparison results must be coerced to booleans:

 def sign(x):
    return bool(x > 0) - bool(x < 0)

This works for any type which is totally ordered (including special values like NaN or infinities).


回答 9

您不需要一个,您可以使用:

If not number == 0:
    sig = number/abs(number)
else:
    sig = 0

You dont need one, you can just use:

if not number == 0:
    sig = number/abs(number)
else:
    sig = 0

Or create a function as described by others:

sign = lambda x: bool(x > 0) - bool(x < 0)

def sign(x):
    return bool(x > 0) - bool(x < 0)

回答 10

只是没有。

解决此问题的最佳方法是:

sign = lambda x: bool(x > 0) - bool(x < 0)

It just doesn’t.

The best way to fix this is:

sign = lambda x: bool(x > 0) - bool(x < 0)

回答 11

之所以没有包含“符号”,是因为如果我们在内置函数列表中包含所有有用的单行代码,那么使用Python就会变得不那么容易和实用。如果您经常使用此功能,那么为什么不自己考虑呢?这样做并不难,甚至乏味。

The reason “sign” is not included is that if we included every useful one-liner in the list of built-in functions, Python wouldn’t be easy and practical to work with anymore. If you use this function so often then why don’t you do factor it out yourself? It’s not like it’s remotely hard or even tedious to do so.


是什么阻碍了Ruby,Python获得Javascript V8速度?[关闭]

问题:是什么阻碍了Ruby,Python获得Javascript V8速度?[关闭]

V8引擎是否有任何阻止优化实现(例如内联缓存)的Ruby / Python功能?

Python由Google家伙共同开发,因此不应被软件专利所阻止。

还是这与Google投入V8项目的资源有关。

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Python is co-developed by Google guys so it shouldn’t be blocked by software patents.

Or this is rather matter of resources put into the V8 project by Google.


回答 0

是什么阻碍了Ruby,Python获得Javascript V8速度?

没有。

好吧,好的:钱。(还有时间,人员,资源,但是如果您有钱,就可以购买。)

V8拥有一支由精明,高度专业,经验丰富(因此薪资很高)的工程师组成的团队,他们在创建高性能执行方面拥有数十年的经验(我个人是在讲话,而集体而言更像是几个世纪)动态OO语言的引擎。他们基本上都是创建Sun HotSpot JVM的人(还有许多其他人)。

首席开发人员Lars Bak实际上从事VM已有25年的历史(并且所有这些VM都达到了V8版本),这基本上就是他的整个(职业)生涯。有些编写Ruby VM的人甚至不到25岁。

是否有任何Ruby / Python功能阻止V8引擎执行优化(例如,内联缓存)?

至少考虑到IronRuby,JRuby,MagLev,MacRuby和Rubinius具有单态(IronRuby)或多态内联缓存,答案显然不是。

现代Ruby实现已经进行了大量优化。例如,对于某些操作,Rubinius的Hash类比YARV的类要快。现在,这听起来并不令人兴奋,直到您意识到Rubinius的Hash类是在100%纯Ruby中实现的,而YARV的类是在100%手动优化的C中实现的。

因此,至少在某些情况下,Rubinius可以生成比GCC更好的代码!

还是这与Google投入V8项目的资源有关。

是。不只是谷歌。V8的源代码沿袭至今已有25年的历史了。使用V8的人还创建了Self VM(迄今为止,这是迄今为止创建的最快的动态OO语言执行引擎之一),Animorphic Smalltalk VM(迄今为止,是迄今为止创建的最快的Smalltalk执行引擎之一),HotSpot JVM(有史以来最快的JVM,可能是最快的VM周期)和OOVM(有史以来最高效的Smalltalk VM之一)。

实际上,V8的首席开发人员Lars Bak致力于其中每一个,以及其他一些人。

What blocks Ruby, Python to get Javascript V8 speed?

Nothing.

Well, okay: money. (And time, people, resources, but if you have money, you can buy those.)

V8 has a team of brilliant, highly-specialized, highly-experienced (and thus highly-paid) engineers working on it, that have decades of experience (I’m talking individually – collectively it’s more like centuries) in creating high-performance execution engines for dynamic OO languages. They are basically the same people who also created the Sun HotSpot JVM (among many others).

Lars Bak, the lead developer, has been literally working on VMs for 25 years (and all of those VMs have lead up to V8), which is basically his entire (professional) life. Some of the people writing Ruby VMs aren’t even 25 years old.

Are there any Ruby / Python features that are blocking implementation of optimizations (e.g. inline caching) V8 engine has?

Given that at least IronRuby, JRuby, MagLev, MacRuby and Rubinius have either monomorphic (IronRuby) or polymorphic inline caching, the answer is obviously no.

Modern Ruby implementations already do a great deal of optimizations. For example, for certain operations, Rubinius’s Hash class is faster than YARV’s. Now, this doesn’t sound terribly exciting until you realize that Rubinius’s Hash class is implemented in 100% pure Ruby, while YARV’s is implemented in 100% hand-optimized C.

So, at least in some cases, Rubinius can generate better code than GCC!

Or this is rather matter of resources put into the V8 project by Google.

Yes. Not just Google. The lineage of V8’s source code is 25 years old now. The people who are working on V8 also created the Self VM (to this day one of the fastest dynamic OO language execution engines ever created), the Animorphic Smalltalk VM (to this day one of the fastest Smalltalk execution engines ever created), the HotSpot JVM (the fastest JVM ever created, probably the fastest VM period) and OOVM (one of the most efficient Smalltalk VMs ever created).

In fact, Lars Bak, the lead developer of V8, worked on every single one of those, plus a few others.


回答 1

高度优化JavaScript解释器的动力更大,这就是为什么我们看到Mozilla,Google和Microsoft之间投入了大量资源的原因。JavaScript必须在(通常是不耐烦的)人在等待时进行下载,解析,编译和实时运行,它必须在有人与之交互时运行,并且它是在不受控制的客户端中进行的可以是计算机,电话或烤面包机的环境。为了有效地在这些条件下运行,它必须高效。

Python和Ruby在开发人员/部署者控制的环境中运行。通常,功能强大的服务器或台式机系统的限制因素将是诸如内存或磁盘I / O之类的因素,而不是执行时间。或者在可以利用非引擎优化(例如缓存)的地方。对于这些语言,将重点放在语言和库功能集而不是速度优化上可能更有意义。

这样做的附带好处是,我们有两个出色的高性能开源JavaScript引擎,这些引擎可以并且正在重新用于所有类型的应用程序,例如Node.js。

There’s a lot more impetus to highly optimize JavaScript interpretors which is why we see so many resources being put into them between Mozilla, Google, and Microsoft. JavaScript has to be downloaded, parsed, compiled, and run in real time while a (usually impatient) human being is waiting for it, it has to run WHILE a person is interacting with it, and it’s doing this in an uncontrolled client-end environment that could be a computer, a phone, or a toaster. It HAS to be efficient in order to run under these conditions effectively.

Python and Ruby are run in an environment controlled by the developer/deployer. A beefy server or desktop system generally where the limiting factor will be things like memory or disk I/O and not execution time. Or where non-engine optimizations like caching can be utilized. For these languages it probably does make more sense to focus on language and library feature set over speed optimization.

The side benefit of this is that we have two great high performance open source JavaScript engines that can and are being re-purposed for all manner of applications such as Node.js.


回答 2

其中很大一部分与社区有关。大多数情况下,Python和Ruby没有公司的支持。没有人获得全职从事Python和Ruby工作的报酬(尤其是他们没有获得始终从事CPython或MRI工作的报酬)。另一方面,V8得到了世界上最强大的IT公司的支持。

此外,V8可以更快,因为对V8员工唯一重要的是解释器-他们没有标准的库可以使用,也无需担心语言设计。他们只是写翻译。而已。

它与知识产权法无关。Python也不是由Google伙计共同开发的(它的创建者与其他一些提交者在那儿一起工作,但是在Python上工作他们没有得到报酬)。

Python 3的另一个障碍是Python3。它的采用似乎是语言开发人员的主要关注点,以至于他们冻结了新语言功能的开发,直到其他实现赶上来。

关于技术细节,我对Ruby不太了解,但是Python在很多地方都可以使用优化功能(Google项目Unladen Swallow在开始努力之前就开始实现这些功能)。这是他们计划的一些优化。如果为CPython实现JIT la PyPy,我可以看到Python在将来获得V8的速度,但这在未来几年似乎不太可能(目前的重点是采用Python 3,而不是JIT)。

许多人还认为Ruby和Python可以从删除各自的全局解释器锁中受益匪浅。

您还必须了解Python和Ruby都是比JS重得多的语言-它们以标准库,语言功能和结构的方式提供了更多内容。单独的面向对象的类系统增加了很多权重(我认为这是一种很好的方式)。我几乎认为Javascript是一种旨在嵌入的语言,例如Lua(在许多方面,它们是相似的)。Ruby和Python具有更丰富的功能集,而表现力通常是以速度为代价的。

A good part of it has to do with community. Python and Ruby for the most part have no corporate backing. No one gets paid to work on Python and Ruby full-time (and they especially don’t get paid to work on CPython or MRI the whole time). V8, on the other hand, is backed by the most powerful IT company in the world.

Furthermore, V8 can be faster because the only thing that matters to the V8 people is the interpreter — they have no standard library to work on, no concerns about language design. They just write the interpreter. That’s it.

It has nothing to do with intellectual property law. Nor is Python co-developed by Google guys (its creator works there along with a few other committers, but they don’t get paid to work on Python).

Another obstacle to Python speed is Python 3. Its adoption seems to be the main concern of the language developers — to the point that they have frozen development of new language features until other implementations catch up.

On to the technical details, I don’t know much about Ruby, but Python has a number of places where optimizations could be used (and Unladen Swallow, a Google project, started to implement these before biting the dust). Here are some of the optimizations that they planned. I could see Python gaining V8 speed in the future if a JIT a la PyPy gets implemented for CPython, but that does not seem likely for the coming years (the focus right now is Python 3 adoption, not a JIT).

Many also feel that Ruby and Python could benefit immensely from removing their respective global interpreter locks.

You also have to understand that Python and Ruby are both much heavier languages than JS — they provide far more in the way of standard library, language features, and structure. The class system of object-orientation alone adds a great deal of weight (in a good way, I think). I almost think of Javascript as a language designed to be embedded, like Lua (and in many ways, they are similar). Ruby and Python have a much richer set of features, and that expressiveness is usually going to come at the cost of speed.


回答 3

性能似乎并不是核心Python开发人员的主要关注点,他们似乎认为“足够快”足够好,并且帮助程序员提高生产率的功能比帮助计算机更快地运行代码的功能更为重要。

但是,确实的确有一个Google项目(现在已被废弃),未满载吞咽,用于生成与标准解释器兼容的更快的Python解释器。PyPy是另一个旨在产生更快的Python的项目。还有PsyPy的先驱PsycoCython,它们可以在不更改整个解释器的情况下提高许多Python脚本的性能,而Cython则可以让您使用非常类似于Python语法的方式为Python编写高性能的C库。

Performance doesn’t seem to be a major focus of the core Python developers, who seem to feel that “fast enough” is good enough, and that features that help programmers be more productive are more important than features that help computers run code faster.

Indeed, however, there was a (now abandoned) Google project, unladen-swallow, to produce a faster Python interpreter compatible with the standard interpreter. PyPy is another project that intends to produce a faster Python. There is also Psyco, the forerunner of PyPy, which can provide performance boosts to many Python scripts without changing out the whole interpreter, and Cython, which lets you write high-performance C libraries for Python using something very much like Python syntax.


回答 4

误导性的问题。V8是JavaScript的JIT(即时编译器)实现,在其最流行的非浏览器实现Node.js中,它是围绕事件循环构建的。CPython不是JIT,也不是事件。但是这些在PyPy项目中最普遍地存在于Python中-一个兼容CPython 2.7(不久将成为3.0+)的JIT。并且有大量事件服务器库,例如Tornado。在运行Tornado与Node.js的PyPy之间存在真实的测试,并且性能差异很小。

Misleading question. V8 is a JIT (a just in time compiler) implementation of JavaScript and in its most popular non-browser implementation Node.js it is constructed around an event loop. CPython is not a JIT & not evented. But these exist in Python most commonly in the PyPy project – a CPython 2.7 (and soon to be 3.0+) compatible JIT. And there are loads of evented server libraries like Tornado for example. Real world tests exist between PyPy running Tornado vs Node.js and the performance differences are slight.


回答 5

我只是遇到了这个问题,而性能差异也有一个未提及的重大技术原因。Python拥有功能强大的软件扩展的庞大生态系统,但是其中大多数扩展都是用C或其他低级语言编写的,以提高性能,并且与CPython API紧密相关。

有许多众所周知的技术(JIT,现代垃圾收集器等)可用于加速CPython实现,但是所有这些都需要对API进行实质性更改,从而破坏了该过程中的大多数扩展。CPython会更快,但是很多使Python如此吸引人的东西(广泛的软件堆栈)将会丢失。恰当的例子是,那里有几种更快的Python实现,但是与CPython相比,它们几乎没有吸引力。

I just ran across this question and there is also a big technical reason for the performance difference that wasn’t mentioned. Python has a very large ecosystem of powerful software extensions, but most of these extensions are written in C or other low-level languages for performance and are heavily tied to the CPython API.

There are lots of well-known techniques (JIT, modern garbage collector, etc) that could be used to speed up the CPython implementation but all would require substantial changes to the API, breaking most of the extensions in the process. CPython would be faster, but a lot of what makes Python so attractive (the extensive software stack) would be lost. Case in point, there are several faster Python implementations out there but they have little traction compared to CPython.


回答 6

由于不同的设计优先级和用例目标,我相信。

通常,脚本(aka动态)语言的主要目的是成为本机函数调用之间的“胶水”。这些本机功能应a)覆盖最关键/经常使用的区域,并且b)尽可能有效。

这是一个示例: 导致iOS Safari冻结jQuery排序冻结是由于过度使用了按选择器调用而引起的。如果按选择器获取将以本机代码实现并且有效,那么根本就不会出现这样的问题。

考虑ray-tracer演示,它是V8演示的常用演示。在Python世界中,由于Python提供了本机扩展的所有功能,因此可以用本机代码实现。但是在V8领域(客户端沙箱)中,您没有其他选择,只能使VM发挥更大的作用。因此,唯一的选择是使用脚本代码来查看ray-tracer的实现。

因此有不同的优先事项和动机。

Sciter中,我通过本地实现几乎完整的jQurey核心进行了测试。在诸如ScIDE(由HTML / CSS / Script制成的IDE)之类的实际任务上,我相信这种解决方案比任何VM优化都有效。

Because of different design priorities and use case goals I believe.

In general main purpose of scripting (a.k.a. dynamic) languages is to be a “glue” between calls of native functions. And these native functions shall a) cover most critical/frequently used areas and b) be as effective as possible.

Here is an example: jQuery sort causing iOS Safari to freeze The freeze there is caused by excessive use of get-by-selector calls. If get-by-selector would be implemented in native code and effectively it will be no such problem at all.

Consider ray-tracer demo that is frequently used demo for V8 demonstration. In Python world it can be implemented in native code as Python provides all facilities for native extensions. But in V8 realm (client side sandbox) you have no other options rather than making VM to be [sub]effective as possible. And so the only option see ray-tracer implementation there is by using script code.

So different priorities and motivations.

In Sciter I’ve made a test by implementing pretty much full jQurey core natively. On practical tasks like ScIDE (IDE made of HTML/CSS/Script) I believe such solution works significantly better then any VM optimizations.


回答 7

正如其他人所提到的,Python具有PyPy形式的高性能JIT编译器。

进行有意义的基准测试总是很微妙的,但是我碰巧有一个用不同语言编写的K均值的简单基准测试-您可以在这里找到。约束之一是,各种语言都应实现相同的算法,并应努力做到简单和惯用(相对于速度优化而言)。我已经编写了所有实现,所以我知道我没有被欺骗,尽管我不能对所有语言都声称我所写的东西是惯用的(我只对其中的一些有所了解)。

我没有任何明确的结论,但是PyPy是我得到的最快的实现之一,远胜于Node。相反,CPython是排名最慢的一端。

As other people have mentioned, Python has a performant JIT compiler in the form of PyPy.

Making meaningful benchmarks is always subtle, but I happen to have a simple benchmark of K-means written in different languages – you can find it here. One of the constraints was that the various languages should all implement the same algorithm and should strive to be simple and idiomatic (as opposed to optimized for speed). I have written all the implementations, so I know I have not cheated, although I cannot claim for all languages that what I have written is idiomatic (I only have a passing knowledge of some of those).

I do not claim any definitive conclusion, but PyPy was among the fastest implementations I got, far better than Node. CPython, instead, was at the slowest end of the ranking.


回答 8

该说法不完全正确

就像V8只是JS的实现一样,CPython也是Python的一种实现。Pypy具有与V8匹配的性能

此外,还存在可感知的性能问题:由于V8本质上是非阻塞的,因此Web开发人员可以节省更多的IO等待,从而导致性能更高的项目。V8主要用于以IO为关键的dev Web,因此他们将其与类似项目进行了比较。但是您可以在Web开发人员之外的许多其他领域中使用Python。您甚至可以使用C扩展来完成许多任务,例如科学计算或加密,并以出色的性能处理数据。

但是在网络上,最流行的Python和Ruby项目正在阻止。尤其是Python,它具有同步WSGI标准的遗产,像著名的Django之类的框架都基于它。

您可以编写异步Python(例如Twisted,Tornado,gevent或asyncio)或Ruby。但是它并不经常执行。最好的工具仍在阻塞。

但是,这是为什么Ruby和Python中的默认实现没有V8这么快的一些原因。

经验

就像JörgW Mittag指出的那样,从事V8工作的人都是VM天才。Python是一群热情的开发人员,在很多领域都非常擅长,但并不擅长VM调优。

资源资源

Python软件基金会的资金很少:一年不到40K的资金来投资Python。当您认为像Google,Facebook或Apple这样的大公司都在使用Python时,这有点疯狂,但这是一个丑陋的事实:大多数工作都是免费完成的。自愿者在Java手工制作之前为Youtube提供支持的语言。

他们是聪明而敬业的志愿者,但是当他们确定自己在田间需要更多果汁时,就无法要求30万聘请该领域的顶尖专家。他们必须四处寻找免费提供服务的人。

在此有效的同时,这意味着您必须非常注意自己的优先事项。因此,现在我们需要查看:

目标

即使具有最新的现代功能,编写Java脚本也很糟糕。您遇到范围问题,集合极少,可怕的字符串和数组操作,几乎没有日期,数学和正则表达式之外的stdlist,甚至对于非常常见的操作也没有语法糖。

但是在V8中,您已经有了速度。

这是因为,速度是Google的主要目标,因为它是Chrome浏览器页面渲染的瓶颈。

在Python中,可用性是主要目标。因为这几乎从来不是项目的瓶颈。这里的稀缺资源是开发人员时间。针对开发人员进行了优化。

The statement is not exactly true

Just like V8 is just an implementation for JS, CPython is just one implementation for Python. Pypy has performances matching V8’s.

Also, there the problem of perceived performance : since V8 is natively non blocking, Web dev leads to more performant projects because you save the IO wait. And V8 is mainly used for dev Web where IO is key, so they compare it to similar projects. But you can use Python in many, many other areas than web dev. And you can even use C extensions for a lot of tasks, such as scientific computations or encryption, and crunch data with blazing perfs.

But on the web, most popular Python and Ruby projects are blocking. Python, especially, has the legacy of the synchronous WSGI standard, and frameworks like the famous Django are based on it.

You can write asynchronous Python (like with Twisted, Tornado, gevent or asyncio) or Ruby. But it’s not done often. The best tools are still blocking.

However, they are some reasons for why the default implementations in Ruby and Python are not as speedy as V8.

Experience

Like Jörg W Mittag pointed out, the guys working on V8 are VM geniuses. Python is dev by a bunch a passionate people, very good in a lot of domains, but are not as specialized in VM tuning.

Resources

The Python Software foundation has very little money : less than 40k in a year to invest in Python. This is kinda crazy when you think big players such as Google, Facebook or Apple are all using Python, but it’s the ugly truth : most work is done for free. The language that powers Youtube and existed before Java has been handcrafted by volunteers.

They are smart and dedicated volunteers, but when they identify they need more juice in a field, they can’t ask for 300k to hire a top notch specialist for this area of expertise. They have to look around for somebody who would do it for free.

While this works, it means you have to be very a careful about your priorities. Hence, now we need to look at :

Objectives

Even with the latest modern features, writing Javascript is terrible. You have scoping issues, very few collections, terrible string and array manipulation, almost no stdlist apart from date, maths and regexes, and no syntactic sugar even for very common operations.

But in V8, you’ve got speed.

This is because, speed was the main objective for Google, since it’s a bottleneck for page rendering in Chrome.

In Python, usability is the main objective. Because it’s almost never the bottleneck on the project. The scarce resource here is developer time. It’s optimized for the developer.


回答 9

因为JavaScript实现不需要关心其绑定的向后兼容性。

直到最近,JavaScript实现的唯一用户还是Web浏览器。由于安全性要求,仅Web浏览器供应商有权通过向运行时编写绑定来扩展功能。因此,无需保持绑定的C API向后兼容,可以允许Web浏览器开发人员随着JavaScript运行时的发展来更新其源代码。他们反正一起工作。甚至是V8,它是游戏的后来者,也是由一个非常有经验的开发人员领导的,随着它变得越来越好,它也改变了API。

OTOH Ruby(主要)用于服务器端。许多流行的ruby扩展都被编写为C绑定(考虑RDBMS驱动程序)。换句话说,如果不保持兼容性,Ruby永远不会成功。

今天,这种差异在一定程度上仍然存在。使用node.js的开发人员抱怨说,随着V8随着时间的推移更改API,很难保持其本机扩展向后兼容(这是node.js被派生的原因之一)。在这方面,IIRC红宝石仍采取更为保守的方法。

Because JavaScript implementations need not care about backwards compatibility of their bindings.

Until recently the only users of the JavaScript implementations have been web browsers. Due to security requirements, only the web browser vendors had the privilege to extend the functionality by writing bindings to the runtimes. Thus there was no need keep the C API of the bindings backwards compatible, it was permissible to request the web browser developers update their source code as the JavaScript runtimes evolved; they were working together anyways. Even V8, which was a latecomer to the game, and also lead by a very very experienced developer, have changed the API as it became better.

OTOH Ruby is used (mainly) on the server-side. Many popular ruby extensions are written as C bindings (consider an RDBMS driver). In other words, Ruby would have never succeeded without maintaining the compatibility.

Today, the difference still exist to some extent. Developers using node.js are complaining that it is hard to keep their native extensions backwards compatible, as V8 changes the API over time (and that is one of the reasons node.js has been forked). IIRC ruby is still taking a much more conservative approach in this respect.


回答 10

由于JIT,曲轴,类型推断器和数据优化代码,V8速度很快。标记指针,NaN标记双打。当然,它在中间进行常规的编译器优化。

普通的ruby,python和perl引擎都不会做这些,而只是进行一些基本的优化。

唯一接近的主要虚拟机是luajit,它甚至不进行类型推断,常量折叠,NaN标记或整数,但是使用类似的小代码和数据结构,不如不良语言那么胖。我的原型动态语言potion和p2具有与luajit类似的功能,并且胜过v8。使用可选的类型系统“渐进式键入”,您可以轻松超越v8,因为您可以绕过曲轴。参见飞镖。

已知的优化后端(如pypy或jruby)仍然遭受各种过度设计技术的困扰。

V8 is fast due to the JIT, Crankshaft, the type inferencer and data-optimized code. Tagged pointers, NaN-tagging of doubles. And of course it does normal compiler optimizations in the middle.

The plain ruby, python and perl engines don’t do neither of the those, just minor basic optimizations.

The only major vm which comes close is luajit, which doesn’t even do type inference, constant folding, NaN-tagging nor integers, but uses similar small code and data structures, not as fat as the bad languages. And my prototype dynamic languages, potion and p2 have similar features as luajit, and outperform v8. With an optional type system, “gradual typing”, you could easily outperform v8, as you can bypass crankshaft. See dart.

The known optimized backends, like pypy or jruby still suffer from various over-engineering techniques.


“最少惊讶”和可变默认参数

问题:“最少惊讶”和可变默认参数

长时间修改Python的任何人都被以下问题咬伤(或弄成碎片):

def foo(a=[]):
    a.append(5)
    return a

Python新手希望此函数始终返回仅包含一个元素的列表[5]。结果是非常不同的,并且非常令人惊讶(对于新手而言):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

我的一位经理曾经第一次遇到此功能,并将其称为该语言的“巨大设计缺陷”。我回答说,这种行为有一个潜在的解释,如果您不了解内部原理,那确实是非常令人困惑和意外的。但是,我无法(对自己)回答以下问题:在函数定义而不是函数执行时绑定默认参数的原因是什么?我怀疑经验丰富的行为是否具有实际用途(谁真正在C中使用了静态变量,却没有滋生bug?)

编辑

巴泽克举了一个有趣的例子。连同您的大多数评论,特别是Utaal的评论,我进一步阐述了:

>>> def a():
...     print("a executed")
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print(x)
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

在我看来,设计决策似乎与将参数范围放置在何处有关:在函数内部还是“一起”使用?

在函数内部进行绑定将意味着x在调用该函数(未定义)时,该绑定实际上已绑定到指定的默认值,这会带来深层的缺陷:def从绑定的一部分(即函数对象)将在定义时发生,部分(默认参数的分配)将在函数调用时发生。

实际行为更加一致:执行该行时将评估该行的所有内容,即在函数定义时进行评估。

Anyone tinkering with Python long enough has been bitten (or torn to pieces) by the following issue:

def foo(a=[]):
    a.append(5)
    return a

Python novices would expect this function to always return a list with only one element: [5]. The result is instead very different, and very astonishing (for a novice):

>>> foo()
[5]
>>> foo()
[5, 5]
>>> foo()
[5, 5, 5]
>>> foo()
[5, 5, 5, 5]
>>> foo()

A manager of mine once had his first encounter with this feature, and called it “a dramatic design flaw” of the language. I replied that the behavior had an underlying explanation, and it is indeed very puzzling and unexpected if you don’t understand the internals. However, I was not able to answer (to myself) the following question: what is the reason for binding the default argument at function definition, and not at function execution? I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs?)

Edit:

Baczek made an interesting example. Together with most of your comments and Utaal’s in particular, I elaborated further:

>>> def a():
...     print("a executed")
...     return []
... 
>>>            
>>> def b(x=a()):
...     x.append(5)
...     print(x)
... 
a executed
>>> b()
[5]
>>> b()
[5, 5]

To me, it seems that the design decision was relative to where to put the scope of parameters: inside the function or “together” with it?

Doing the binding inside the function would mean that x is effectively bound to the specified default when the function is called, not defined, something that would present a deep flaw: the def line would be “hybrid” in the sense that part of the binding (of the function object) would happen at definition, and part (assignment of default parameters) at function invocation time.

The actual behavior is more consistent: everything of that line gets evaluated when that line is executed, meaning at function definition.


回答 0

实际上,这不是设计缺陷,也不是由于内部因素或性能所致。
这完全是因为Python中的函数是一流的对象,而不仅仅是一段代码。

一旦您想到这种方式,就完全有道理了:函数是根据其定义求值的对象;默认参数属于“成员数据”,因此它们的状态可能会从一个调用更改为另一个调用-完全与其他任何对象一样。

无论如何,Effbot 在Python的Default Parameter Values中都很好地解释了这种现象的原因。
我发现它很清晰,我真的建议您阅读它,以更好地了解函数对象的工作原理。

Actually, this is not a design flaw, and it is not because of internals, or performance.
It comes simply from the fact that functions in Python are first-class objects, and not only a piece of code.

As soon as you get to think into this way, then it completely makes sense: a function is an object being evaluated on its definition; default parameters are kind of “member data” and therefore their state may change from one call to the other – exactly as in any other object.

In any case, Effbot has a very nice explanation of the reasons for this behavior in Default Parameter Values in Python.
I found it very clear, and I really suggest reading it for a better knowledge of how function objects work.


回答 1

假设您有以下代码

fruits = ("apples", "bananas", "loganberries")

def eat(food=fruits):
    ...

当我看到eat的声明时,最令人吃惊的事情是认为,如果没有给出第一个参数,它将等于元组 ("apples", "bananas", "loganberries")

但是,假设稍后在代码中,我做类似

def some_random_function():
    global fruits
    fruits = ("blueberries", "mangos")

然后,如果默认参数是在函数执行时绑定的,而不是在函数声明时绑定的,那么我会以一种非常糟糕的方式惊讶地发现结果已经改变。与发现foo上面的功能正在使列表发生变化相比,这将使IMO更加令人惊讶。

真正的问题在于可变变量,所有语言都在一定程度上存在此问题。这是一个问题:假设在Java中,我有以下代码:

StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) );  // does this work?

现在,我的地图StringBuffer在放入地图时会使用密钥的值吗,还是通过引用存储密钥?无论哪种方式,都会有人感到惊讶。尝试Map使用与其放入对象的值相同的值从对象中取出对象的人,或者即使他们使用的键实际上是同一个对象,似乎也无法检索其对象的人用来将其放入地图中(这实际上就是Python不允许将其可变的内置数据类型用作字典键的原因)。

您的示例很好地说明了Python新手会感到惊讶和被咬的情况。但是我认为,如果我们“解决”这个问题,那只会造成一种不同的情况,那就是被它们咬住,而且这种情况甚至不那么直观。而且,在处理可变变量时总是如此。您总是遇到这样的情况:根据编写的代码,某人可以直观地预期一种或相反的行为。

我个人喜欢Python当前的方法:定义函数时会评估默认函数参数,而该对象始终是默认对象。我想他们可以使用空列表来特殊情况,但是这种特殊的大小写会引起更多的惊讶,更不用说向后不兼容了。

Suppose you have the following code

fruits = ("apples", "bananas", "loganberries")

def eat(food=fruits):
    ...

When I see the declaration of eat, the least astonishing thing is to think that if the first parameter is not given, that it will be equal to the tuple ("apples", "bananas", "loganberries")

However, supposed later on in the code, I do something like

def some_random_function():
    global fruits
    fruits = ("blueberries", "mangos")

then if default parameters were bound at function execution rather than function declaration then I would be astonished (in a very bad way) to discover that fruits had been changed. This would be more astonishing IMO than discovering that your foo function above was mutating the list.

The real problem lies with mutable variables, and all languages have this problem to some extent. Here’s a question: suppose in Java I have the following code:

StringBuffer s = new StringBuffer("Hello World!");
Map<StringBuffer,Integer> counts = new HashMap<StringBuffer,Integer>();
counts.put(s, 5);
s.append("!!!!");
System.out.println( counts.get(s) );  // does this work?

Now, does my map use the value of the StringBuffer key when it was placed into the map, or does it store the key by reference? Either way, someone is astonished; either the person who tried to get the object out of the Map using a value identical to the one they put it in with, or the person who can’t seem to retrieve their object even though the key they’re using is literally the same object that was used to put it into the map (this is actually why Python doesn’t allow its mutable built-in data types to be used as dictionary keys).

Your example is a good one of a case where Python newcomers will be surprised and bitten. But I’d argue that if we “fixed” this, then that would only create a different situation where they’d be bitten instead, and that one would be even less intuitive. Moreover, this is always the case when dealing with mutable variables; you always run into cases where someone could intuitively expect one or the opposite behavior depending on what code they’re writing.

I personally like Python’s current approach: default function arguments are evaluated when the function is defined and that object is always the default. I suppose they could special-case using an empty list, but that kind of special casing would cause even more astonishment, not to mention be backwards incompatible.


回答 2

文档的相关部分:

执行功能定义时,默认参数值从左到右评估。这意味着,在定义函数时,表达式将被计算一次,并且每次调用均使用相同的“预计算”值。这对于理解默认参数是可变对象(例如列表或字典)时尤其重要:如果函数修改了该对象(例如,通过将项目附加到列表中),则默认值实际上已被修改。这通常不是预期的。解决此问题的方法是使用None默认值,并在函数主体中显式测试它,例如:

def whats_on_the_telly(penguin=None):
    if penguin is None:
        penguin = []
    penguin.append("property of the zoo")
    return penguin

The relevant part of the documentation:

Default parameter values are evaluated from left to right when the function definition is executed. This means that the expression is evaluated once, when the function is defined, and that the same “pre-computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use None as the default, and explicitly test for it in the body of the function, e.g.:

def whats_on_the_telly(penguin=None):
    if penguin is None:
        penguin = []
    penguin.append("property of the zoo")
    return penguin

回答 3

我对Python解释器的内部运作一无所知(而且我也不是编译器和解释器的专家),所以如果我提出任何不明智或不可能的事情,也不要怪我。

假设python对象是可变的,我认为在设计默认参数时应考虑到这一点。实例化列表时:

a = []

您希望获得由引用的列表a

为什么要a=[]

def x(a=[]):

在函数定义而不是调用上实例化一个新列表?就像您要问“如果用户不提供参数,则实例化一个新列表并像调用方产生的那样使用它”。我认为这是模棱两可的:

def x(a=datetime.datetime.now()):

用户,是否要a默认为定义或执行时的日期时间x?在这种情况下,与上一个例子一样,我将保持相同的行为,就像默认参数“赋值”是该函数的第一条指令(datetime.now()在函数调用时调用)一样。另一方面,如果用户想要定义时间映射,则可以编写:

b = datetime.datetime.now()
def x(a=b):

我知道,我知道:那是一个封闭。另外,Python可以提供一个关键字来强制定义时间绑定:

def x(static a=b):

I know nothing about the Python interpreter inner workings (and I’m not an expert in compilers and interpreters either) so don’t blame me if I propose anything unsensible or impossible.

Provided that python objects are mutable I think that this should be taken into account when designing the default arguments stuff. When you instantiate a list:

a = []

you expect to get a new list referenced by a.

Why should the a=[] in

def x(a=[]):

instantiate a new list on function definition and not on invocation? It’s just like you’re asking “if the user doesn’t provide the argument then instantiate a new list and use it as if it was produced by the caller”. I think this is ambiguous instead:

def x(a=datetime.datetime.now()):

user, do you want a to default to the datetime corresponding to when you’re defining or executing x? In this case, as in the previous one, I’ll keep the same behaviour as if the default argument “assignment” was the first instruction of the function (datetime.now() called on function invocation). On the other hand, if the user wanted the definition-time mapping he could write:

b = datetime.datetime.now()
def x(a=b):

I know, I know: that’s a closure. Alternatively Python might provide a keyword to force definition-time binding:

def x(static a=b):

回答 4

好吧,原因很简单:绑定是在执行代码时完成的,而函数定义是在执行时定义的。

比较一下:

class BananaBunch:
    bananas = []

    def addBanana(self, banana):
        self.bananas.append(banana)

此代码遭受完全相同的意外情况。bananas是一个类属性,因此,当您向其中添加内容时,它将被添加到该类的所有实例中。原因是完全一样的。

只是“它是如何工作的”,要使其在函数情况下以不同的方式工作可能会很复杂,而在类情况下则可能是不可能的,或者至少会大大减慢对象实例化,因为您必须保留类代码并在创建对象时执行它。

是的,这是意外的。但是一旦一分钱下降,它就完全适合Python的工作方式。实际上,这是一个很好的教学辅助工具,一旦您了解了为什么会发生这种情况,就可以更好地使用python。

也就是说,它应该在任何优秀的Python教程中都非常突出。因为正如您提到的,每个人迟早都会遇到此问题。

Well, the reason is quite simply that bindings are done when code is executed, and the function definition is executed, well… when the functions is defined.

Compare this:

class BananaBunch:
    bananas = []

    def addBanana(self, banana):
        self.bananas.append(banana)

This code suffers from the exact same unexpected happenstance. bananas is a class attribute, and hence, when you add things to it, it’s added to all instances of that class. The reason is exactly the same.

It’s just “How It Works”, and making it work differently in the function case would probably be complicated, and in the class case likely impossible, or at least slow down object instantiation a lot, as you would have to keep the class code around and execute it when objects are created.

Yes, it is unexpected. But once the penny drops, it fits in perfectly with how Python works in general. In fact, it’s a good teaching aid, and once you understand why this happens, you’ll grok python much better.

That said it should feature prominently in any good Python tutorial. Because as you mention, everyone runs into this problem sooner or later.


回答 5

你为什么不自省?

我真的惊讶,没有人对可调用对象执行Python提供的深刻的自省(23适用)。

给定一个简单的小函数,func定义为:

>>> def func(a = []):
...    a.append(5)

当Python遇到它时,它要做的第一件事就是对其进行编译,以便code为此函数创建一个对象。完成此编译步骤后,Python 计算 *,然后默认参数([]此处为空列表)存储在函数对象本身中。正如上面提到的最高答案:a现在可以将列表视为函数的成员func

因此,让我们进行一些自省,前后检查清单如何在内部扩展在函数对象。我Python 3.x为此使用,对于Python 2同样适用(在python 2中使用__defaults__func_defaults;是的,同一事物有两个名称)。

执行前的功能:

>>> def func(a = []):
...     a.append(5)
...     

Python执行此定义后,它将采用指定的任何默认参数(a = []在此处)并将其填充到__defaults__函数对象的属性中(相关部分:Callables):

>>> func.__defaults__
([],)

好的,所以__defaults__正如您期望的那样,将空列表作为中的单个条目。

执行后功能:

现在执行以下功能:

>>> func()

现在,让我们__defaults__再次看看:

>>> func.__defaults__
([5],)

吃惊吗 对象内部的值改变了!现在,对该函数的连续调用将简单地追加到该嵌入式list对象:

>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)

因此,出现“缺陷”的原因是因为默认参数是函数对象的一部分。这里没有什么奇怪的事情,这一切都令人惊讶。

解决此问题的常见方法是使用None默认值,然后在函数体内进行初始化:

def func(a = None):
    # or: a = [] if a is None else a
    if a is None:
        a = []

由于函数主体每次都会重新执行,因此如果没有为传递任何参数,则始终会得到一个新的空列表a


要进一步验证in中的列表__defaults__与函数中使用的列表相同,func您只需更改函数以返回函数体内使用id的列表的列表即可a。然后,把它比作在列表中__defaults__(位置[0]__defaults__),你会看到这些确实是指的同一个列表实例:

>>> def func(a = []): 
...     a.append(5)
...     return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True

具备内省的力量!


*要验证在函数编译期间Python是否评估默认参数,请尝试执行以下命令:

def bar(a=input('Did you just see me without calling the function?')): 
    pass  # use raw_input in Py2

您会注意到,input()在构建函数并将其绑定到名称的过程完成之前会被调用bar

Why don’t you introspect?

I’m really surprised no one has performed the insightful introspection offered by Python (2 and 3 apply) on callables.

Given a simple little function func defined as:

>>> def func(a = []):
...    a.append(5)

When Python encounters it, the first thing it will do is compile it in order to create a code object for this function. While this compilation step is done, Python evaluates* and then stores the default arguments (an empty list [] here) in the function object itself. As the top answer mentioned: the list a can now be considered a member of the function func.

So, let’s do some introspection, a before and after to examine how the list gets expanded inside the function object. I’m using Python 3.x for this, for Python 2 the same applies (use __defaults__ or func_defaults in Python 2; yes, two names for the same thing).

Function Before Execution:

>>> def func(a = []):
...     a.append(5)
...     

After Python executes this definition it will take any default parameters specified (a = [] here) and cram them in the __defaults__ attribute for the function object (relevant section: Callables):

>>> func.__defaults__
([],)

O.k, so an empty list as the single entry in __defaults__, just as expected.

Function After Execution:

Let’s now execute this function:

>>> func()

Now, let’s see those __defaults__ again:

>>> func.__defaults__
([5],)

Astonished? The value inside the object changes! Consecutive calls to the function will now simply append to that embedded list object:

>>> func(); func(); func()
>>> func.__defaults__
([5, 5, 5, 5],)

So, there you have it, the reason why this ‘flaw’ happens, is because default arguments are part of the function object. There’s nothing weird going on here, it’s all just a bit surprising.

The common solution to combat this is to use None as the default and then initialize in the function body:

def func(a = None):
    # or: a = [] if a is None else a
    if a is None:
        a = []

Since the function body is executed anew each time, you always get a fresh new empty list if no argument was passed for a.


To further verify that the list in __defaults__ is the same as that used in the function func you can just change your function to return the id of the list a used inside the function body. Then, compare it to the list in __defaults__ (position [0] in __defaults__) and you’ll see how these are indeed refering to the same list instance:

>>> def func(a = []): 
...     a.append(5)
...     return id(a)
>>>
>>> id(func.__defaults__[0]) == func()
True

All with the power of introspection!


* To verify that Python evaluates the default arguments during compilation of the function, try executing the following:

def bar(a=input('Did you just see me without calling the function?')): 
    pass  # use raw_input in Py2

as you’ll notice, input() is called before the process of building the function and binding it to the name bar is made.


回答 6

我曾经认为在运行时创建对象是更好的方法。我现在不太确定,因为您确实失去了一些有用的功能,尽管不管是为了防止新手混淆,还是值得的。这样做的缺点是:

1.表现

def foo(arg=something_expensive_to_compute())):
    ...

如果使用了调用时评估,那么每次使用不带参数的函数时都会调用昂贵的函数。您要么为每个调用付出昂贵的代价,要么需要在外部手动缓存该值,从而污染您的命名空间并增加冗长性。

2.强制绑定参数

一个有用的技巧是在创建lambda时将lambda的参数绑定到变量的当前绑定。例如:

funcs = [ lambda i=i: i for i in range(10)]

这将返回分别返回0、1、2、3 …的函数列表。如果更改了行为,则它们将绑定i到i 的调用时值,因此您将获得所有返回的函数的列表9

否则,实现此目的的唯一方法是使用i绑定创建另一个闭包,即:

def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]

3.内省

考虑以下代码:

def foo(a='test', b=100, c=[]):
   print a,b,c

我们可以使用以下inspect模块获取有关参数和默认值的信息:

>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))

该信息对于文档生成,元编程,装饰器等非常有用。

现在,假设可以更改默认行为,使其等效于:

_undefined = object()  # sentinel value

def foo(a=_undefined, b=_undefined, c=_undefined)
    if a is _undefined: a='test'
    if b is _undefined: b=100
    if c is _undefined: c=[]

但是,我们失去了自省的能力,无法看到默认参数。由于尚未构造对象,因此,如果不实际调用函数,就无法拥有它们。我们最好的办法是存储源代码,并将其作为字符串返回。

I used to think that creating the objects at runtime would be the better approach. I’m less certain now, since you do lose some useful features, though it may be worth it regardless simply to prevent newbie confusion. The disadvantages of doing so are:

1. Performance

def foo(arg=something_expensive_to_compute())):
    ...

If call-time evaluation is used, then the expensive function is called every time your function is used without an argument. You’d either pay an expensive price on each call, or need to manually cache the value externally, polluting your namespace and adding verbosity.

2. Forcing bound parameters

A useful trick is to bind parameters of a lambda to the current binding of a variable when the lambda is created. For example:

funcs = [ lambda i=i: i for i in range(10)]

This returns a list of functions that return 0,1,2,3… respectively. If the behaviour is changed, they will instead bind i to the call-time value of i, so you would get a list of functions that all returned 9.

The only way to implement this otherwise would be to create a further closure with the i bound, ie:

def make_func(i): return lambda: i
funcs = [make_func(i) for i in range(10)]

3. Introspection

Consider the code:

def foo(a='test', b=100, c=[]):
   print a,b,c

We can get information about the arguments and defaults using the inspect module, which

>>> inspect.getargspec(foo)
(['a', 'b', 'c'], None, None, ('test', 100, []))

This information is very useful for things like document generation, metaprogramming, decorators etc.

Now, suppose the behaviour of defaults could be changed so that this is the equivalent of:

_undefined = object()  # sentinel value

def foo(a=_undefined, b=_undefined, c=_undefined)
    if a is _undefined: a='test'
    if b is _undefined: b=100
    if c is _undefined: c=[]

However, we’ve lost the ability to introspect, and see what the default arguments are. Because the objects haven’t been constructed, we can’t ever get hold of them without actually calling the function. The best we could do is to store off the source code and return that as a string.


回答 7

捍卫Python的5分

  1. 简单性:行为在以下意义上是简单的:大多数人只会陷入一次陷阱,而不是几次。

  2. 一致性:Python 始终传递对象,而不传递名称。显然,默认参数是函数标题的一部分(而不是函数主体)。因此,应该在模块加载时(并且仅在模块加载时,除非嵌套)进行评估,而不是在函数调用时进行评估。

  3. 用途:正如Frederik Lundh在对“ Python中的默认参数值”的解释中所指出的那样,当前行为对于高级编程可能非常有用。(请谨慎使用。)

  4. 足够的文档:在最基本的Python文档中,该教程在“更多关于定义函数”部分的第一小节中 以“重要警告”的形式大声宣布该问题。警告甚至使用黑体字,很少在标题之外使用。RTFM:阅读精美的手册。

  5. 元学习:陷入陷阱实际上是一个非常有用的时刻(至少如果您是一个反思型学习者),因为您随后将更好地理解上面的“一致性”这一点,这将教给您很多有关Python的知识。

5 points in defense of Python

  1. Simplicity: The behavior is simple in the following sense: Most people fall into this trap only once, not several times.

  2. Consistency: Python always passes objects, not names. The default parameter is, obviously, part of the function heading (not the function body). It therefore ought to be evaluated at module load time (and only at module load time, unless nested), not at function call time.

  3. Usefulness: As Frederik Lundh points out in his explanation of “Default Parameter Values in Python”, the current behavior can be quite useful for advanced programming. (Use sparingly.)

  4. Sufficient documentation: In the most basic Python documentation, the tutorial, the issue is loudly announced as an “Important warning” in the first subsection of Section “More on Defining Functions”. The warning even uses boldface, which is rarely applied outside of headings. RTFM: Read the fine manual.

  5. Meta-learning: Falling into the trap is actually a very helpful moment (at least if you are a reflective learner), because you will subsequently better understand the point “Consistency” above and that will teach you a great deal about Python.


回答 8

此行为很容易通过以下方式解释:

  1. 函数(类等)声明仅执行一次,创建所有默认值对象
  2. 一切都通过引用传递

所以:

def x(a=0, b=[], c=[], d=0):
    a = a + 1
    b = b + [1]
    c.append(1)
    print a, b, c
  1. a 不变-每个分配调用都会创建一个新的int对象-打印新对象
  2. b 不变-从默认值构建新数组并打印
  3. c 更改-对同一对象执行操作-并打印

This behavior is easy explained by:

  1. function (class etc.) declaration is executed only once, creating all default value objects
  2. everything is passed by reference

So:

def x(a=0, b=[], c=[], d=0):
    a = a + 1
    b = b + [1]
    c.append(1)
    print a, b, c
  1. a doesn’t change – every assignment call creates new int object – new object is printed
  2. b doesn’t change – new array is build from default value and printed
  3. c changes – operation is performed on same object – and it is printed

回答 9

您要问的是为什么这样:

def func(a=[], b = 2):
    pass

在内部不等同于此:

def func(a=None, b = None):
    a_default = lambda: []
    b_default = lambda: 2
    def actual_func(a=None, b=None):
        if a is None: a = a_default()
        if b is None: b = b_default()
    return actual_func
func = func()

除了显式调用func(None,None)的情况外,我们将忽略它。

换句话说,为什么不存储默认参数,而不是评估默认参数,并在调用函数时对其进行评估?

一个答案可能就在那里-它可以有效地将具有默认参数的每个函数转换为闭包。即使全部隐藏在解释器中,而不是完全关闭,数据也必须存储在某个地方。它将变慢,并使用更多的内存。

What you’re asking is why this:

def func(a=[], b = 2):
    pass

isn’t internally equivalent to this:

def func(a=None, b = None):
    a_default = lambda: []
    b_default = lambda: 2
    def actual_func(a=None, b=None):
        if a is None: a = a_default()
        if b is None: b = b_default()
    return actual_func
func = func()

except for the case of explicitly calling func(None, None), which we’ll ignore.

In other words, instead of evaluating default parameters, why not store each of them, and evaluate them when the function is called?

One answer is probably right there–it would effectively turn every function with default parameters into a closure. Even if it’s all hidden away in the interpreter and not a full-blown closure, the data’s got to be stored somewhere. It’d be slower and use more memory.


回答 10

1)所谓的“可变默认参数”问题通常是一个特殊的示例,它表明:
“所有带有此问题的函数在实际参数上也遭受类似的副作用,”,
这违反了函数编程的规则,通常不可思议,应将两者固定在一起。

例:

def foo(a=[]):                 # the same problematic function
    a.append(5)
    return a

>>> somevar = [1, 2]           # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5]                      # usually expected [1, 2]

解决方案:一个副本
的绝对安全解决方案是copydeepcopy输入对象进行操作,然后对副本执行任何操作。

def foo(a=[]):
    a = a[:]     # a copy
    a.append(5)
    return a     # or everything safe by one line: "return a + [5]"

许多内置的可变类型的复制方法如some_dict.copy()some_set.copy(),可以像somelist[:]或那样轻松复制list(some_list)。每个对象也可以通过以下方式复制copy.copy(any_object)或更彻底地复制:copy.deepcopy()(后者有用如果可变对象是从可变对象构成)。有些对象从根本上是基于副作用的,例如“文件”对象,并且不能通过复制有意义地进行复制。复制中

示例问题 类似的SO问题的

class Test(object):            # the original problematic class
  def __init__(self, var1=[]):
    self._var1 = var1

somevar = [1, 2]               # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar                  # [1, 2, [1]] but usually expected [1, 2]
print t2._var1                 # [1, 2, [1]] but usually expected [1, 2]

不应将其保存在任何公共场所此函数返回的实例的属性中。(假设实例的私有属性不应按惯例从此类或子类的外部进行修改。即为_var1私有属性)

结论:
输入参数对象不应就地修改(突变),也不应将其绑定到函数返回的对象中。(如果我们更喜欢强烈建议没有副作用的编程。请参见Wiki上的“副作用”(在此上下文中,前两段是相关内容)。)

2)
仅当需要对实际参数产生副作用但对默认参数没有副作用时,有用的解决方案是def ...(var1=None): if var1 is None: var1 = [] More。

3)在某些情况下,默认参数的可变行为很有用

1) The so-called problem of “Mutable Default Argument” is in general a special example demonstrating that:
“All functions with this problem suffer also from similar side effect problem on the actual parameter,”
That is against the rules of functional programming, usually undesiderable and should be fixed both together.

Example:

def foo(a=[]):                 # the same problematic function
    a.append(5)
    return a

>>> somevar = [1, 2]           # an example without a default parameter
>>> foo(somevar)
[1, 2, 5]
>>> somevar
[1, 2, 5]                      # usually expected [1, 2]

Solution: a copy
An absolutely safe solution is to copy or deepcopy the input object first and then to do whatever with the copy.

def foo(a=[]):
    a = a[:]     # a copy
    a.append(5)
    return a     # or everything safe by one line: "return a + [5]"

Many builtin mutable types have a copy method like some_dict.copy() or some_set.copy() or can be copied easy like somelist[:] or list(some_list). Every object can be also copied by copy.copy(any_object) or more thorough by copy.deepcopy() (the latter useful if the mutable object is composed from mutable objects). Some objects are fundamentally based on side effects like “file” object and can not be meaningfully reproduced by copy. copying

Example problem for a similar SO question

class Test(object):            # the original problematic class
  def __init__(self, var1=[]):
    self._var1 = var1

somevar = [1, 2]               # an example without a default parameter
t1 = Test(somevar)
t2 = Test(somevar)
t1._var1.append([1])
print somevar                  # [1, 2, [1]] but usually expected [1, 2]
print t2._var1                 # [1, 2, [1]] but usually expected [1, 2]

It shouldn’t be neither saved in any public attribute of an instance returned by this function. (Assuming that private attributes of instance should not be modified from outside of this class or subclasses by convention. i.e. _var1 is a private attribute )

Conclusion:
Input parameters objects shouldn’t be modified in place (mutated) nor they should not be binded into an object returned by the function. (If we prefere programming without side effects which is strongly recommended. see Wiki about “side effect” (The first two paragraphs are relevent in this context.) .)

2)
Only if the side effect on the actual parameter is required but unwanted on the default parameter then the useful solution is def ...(var1=None): if var1 is None: var1 = [] More..

3) In some cases is the mutable behavior of default parameters useful.


回答 11

实际上,这与默认值无关,除了在编写具有可变默认值的函数时,它经常会作为意外行为出现。

>>> def foo(a):
    a.append(5)
    print a

>>> a  = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]

此代码中没有默认值,但是您遇到了完全相同的问题。

问题是当调用者不希望这样做时,foo正在修改从调用者传入的可变变量。如果函数被调用类似,这样的代码会很好append_5; 那么调用者将调用该函数以修改其传入的值,并且行为将是预期的。但是这样的函数不太可能采用默认参数,并且可能不会返回列表(因为调用者已经具有对该列表的引用;它只是传入了该列表)。

foo具有默认参数的原件不应修改a是显式传递还是获得默认值。除非上下文/名称/文档中明确指出应该修改参数,否则您的代码应仅保留可变参数。将传入的可变值作为参数用作本地临时对象是一个极坏的主意,无论我们是否使用Python,是否涉及默认参数。

如果您需要在计算内容的过程中破坏性地操作本地临时文件,并且需要从参数值开始进行操作,则需要进行复制。

This actually has nothing to do with default values, other than that it often comes up as an unexpected behaviour when you write functions with mutable default values.

>>> def foo(a):
    a.append(5)
    print a

>>> a  = [5]
>>> foo(a)
[5, 5]
>>> foo(a)
[5, 5, 5]
>>> foo(a)
[5, 5, 5, 5]
>>> foo(a)
[5, 5, 5, 5, 5]

No default values in sight in this code, but you get exactly the same problem.

The problem is that foo is modifying a mutable variable passed in from the caller, when the caller doesn’t expect this. Code like this would be fine if the function was called something like append_5; then the caller would be calling the function in order to modify the value they pass in, and the behaviour would be expected. But such a function would be very unlikely to take a default argument, and probably wouldn’t return the list (since the caller already has a reference to that list; the one it just passed in).

Your original foo, with a default argument, shouldn’t be modifying a whether it was explicitly passed in or got the default value. Your code should leave mutable arguments alone unless it is clear from the context/name/documentation that the arguments are supposed to be modified. Using mutable values passed in as arguments as local temporaries is an extremely bad idea, whether we’re in Python or not and whether there are default arguments involved or not.

If you need to destructively manipulate a local temporary in the course of computing something, and you need to start your manipulation from an argument value, you need to make a copy.


回答 12

话题已经很忙了,但是根据我在这里所读到的内容,以下内容帮助我意识到了它在内部的工作方式:

def bar(a=[]):
     print id(a)
     a = a + [1]
     print id(a)
     return a

>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object 
[1]
>>> id(bar.func_defaults[0])
4484370232

Already busy topic, but from what I read here, the following helped me realizing how it’s working internally:

def bar(a=[]):
     print id(a)
     a = a + [1]
     print id(a)
     return a

>>> bar()
4484370232
4484524224
[1]
>>> bar()
4484370232
4484524152
[1]
>>> bar()
4484370232 # Never change, this is 'class property' of the function
4484523720 # Always a new object 
[1]
>>> id(bar.func_defaults[0])
4484370232

回答 13

这是一项性能优化。通过此功能,您认为这两个函数调用中哪个更快?

def print_tuple(some_tuple=(1,2,3)):
    print some_tuple

print_tuple()        #1
print_tuple((1,2,3)) #2

我会给你一个提示。这是反汇编(请参阅http://docs.python.org/library/dis.html):

#1个

0 LOAD_GLOBAL              0 (print_tuple)
3 CALL_FUNCTION            0
6 POP_TOP
7 LOAD_CONST               0 (None)
10 RETURN_VALUE

#2

 0 LOAD_GLOBAL              0 (print_tuple)
 3 LOAD_CONST               4 ((1, 2, 3))
 6 CALL_FUNCTION            1
 9 POP_TOP
10 LOAD_CONST               0 (None)
13 RETURN_VALUE

我怀疑经验丰富的行为是否具有实际用途(谁真正在C中使用了静态变量,却没有滋生bug?)

正如你所看到的,用一成不变的默认参数时提高性能。如果这是一个经常调用的函数,或者默认参数需要花费很长时间来构造,那么这可能会有所不同。另外,请记住,Python不是C。在C中,您拥有几乎免费的常量。在Python中,您没有此好处。

It’s a performance optimization. As a result of this functionality, which of these two function calls do you think is faster?

def print_tuple(some_tuple=(1,2,3)):
    print some_tuple

print_tuple()        #1
print_tuple((1,2,3)) #2

I’ll give you a hint. Here’s the disassembly (see http://docs.python.org/library/dis.html):

#1

0 LOAD_GLOBAL              0 (print_tuple)
3 CALL_FUNCTION            0
6 POP_TOP
7 LOAD_CONST               0 (None)
10 RETURN_VALUE

#2

 0 LOAD_GLOBAL              0 (print_tuple)
 3 LOAD_CONST               4 ((1, 2, 3))
 6 CALL_FUNCTION            1
 9 POP_TOP
10 LOAD_CONST               0 (None)
13 RETURN_VALUE

I doubt the experienced behavior has a practical use (who really used static variables in C, without breeding bugs ?)

As you can see, there is a performance benefit when using immutable default arguments. This can make a difference if it’s a frequently called function or the default argument takes a long time to construct. Also, bear in mind that Python isn’t C. In C you have constants that are pretty much free. In Python you don’t have this benefit.


回答 14

Python:可变默认参数

在函数编译为函数对象时会评估默认参数。当函数使用该函数时,该函数多次使用它们,它们仍然是同一对象。

当它们是可变的时,当发生突变(例如,通过向其添加元素)时,它们将在连续调用时保持突变。

它们保持变异,因为它们每次都是相同的对象。

等效代码:

由于列表是在编译和实例化函数对象时绑定到函数的,因此:

def foo(mutable_default_argument=[]): # make a list the default argument
    """function that uses a list"""

几乎完全等同于此:

_a_list = [] # create a list in the globals

def foo(mutable_default_argument=_a_list): # make it the default argument
    """function that uses a list"""

del _a_list # remove globals name binding

示范

这是一个演示-您可以在每次引用它们时验证它们是否是同一对象

  • 看到列表是在函数完成编译为函数对象之前创建的,
  • 观察到每次引用列表时ID都是相同的,
  • 观察到第二次调用使用列表的函数时列表保持不变,
  • 观察从源打印输出的顺序(我方便地为您编号):

example.py

print('1. Global scope being evaluated')

def create_list():
    '''noisily create a list for usage as a kwarg'''
    l = []
    print('3. list being created and returned, id: ' + str(id(l)))
    return l

print('2. example_function about to be compiled to an object')

def example_function(default_kwarg1=create_list()):
    print('appending "a" in default default_kwarg1')
    default_kwarg1.append("a")
    print('list with id: ' + str(id(default_kwarg1)) + 
          ' - is now: ' + repr(default_kwarg1))

print('4. example_function compiled: ' + repr(example_function))


if __name__ == '__main__':
    print('5. calling example_function twice!:')
    example_function()
    example_function()

并使用以下命令运行它python example.py

1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']

这是否违反了“最少惊讶”的原则?

这种执行顺序经常会使Python的新用户感到困惑。如果您了解Python执行模型,那么就可以预期了。

对新Python用户的一般说明:

但这就是为什么对新用户的通常指示是改为创建其默认参数,如下所示:

def example_function_2(default_kwarg=None):
    if default_kwarg is None:
        default_kwarg = []

这使用None单例作为哨兵对象来告诉函数我们是否获得了默认值以外的参数。如果没有参数,则实际上我们想使用一个新的空列表[]作为默认值。

正如关于控制流教程部分所述

如果您不希望在后续调用之间共享默认值,则可以这样编写函数:

def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

Python: The Mutable Default Argument

Default arguments get evaluated at the time the function is compiled into a function object. When used by the function, multiple times by that function, they are and remain the same object.

When they are mutable, when mutated (for example, by adding an element to it) they remain mutated on consecutive calls.

They stay mutated because they are the same object each time.

Equivalent code:

Since the list is bound to the function when the function object is compiled and instantiated, this:

def foo(mutable_default_argument=[]): # make a list the default argument
    """function that uses a list"""

is almost exactly equivalent to this:

_a_list = [] # create a list in the globals

def foo(mutable_default_argument=_a_list): # make it the default argument
    """function that uses a list"""

del _a_list # remove globals name binding

Demonstration

Here’s a demonstration – you can verify that they are the same object each time they are referenced by

  • seeing that the list is created before the function has finished compiling to a function object,
  • observing that the id is the same each time the list is referenced,
  • observing that the list stays changed when the function that uses it is called a second time,
  • observing the order in which the output is printed from the source (which I conveniently numbered for you):

example.py

print('1. Global scope being evaluated')

def create_list():
    '''noisily create a list for usage as a kwarg'''
    l = []
    print('3. list being created and returned, id: ' + str(id(l)))
    return l

print('2. example_function about to be compiled to an object')

def example_function(default_kwarg1=create_list()):
    print('appending "a" in default default_kwarg1')
    default_kwarg1.append("a")
    print('list with id: ' + str(id(default_kwarg1)) + 
          ' - is now: ' + repr(default_kwarg1))

print('4. example_function compiled: ' + repr(example_function))


if __name__ == '__main__':
    print('5. calling example_function twice!:')
    example_function()
    example_function()

and running it with python example.py:

1. Global scope being evaluated
2. example_function about to be compiled to an object
3. list being created and returned, id: 140502758808032
4. example_function compiled: <function example_function at 0x7fc9590905f0>
5. calling example_function twice!:
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a']
appending "a" in default default_kwarg1
list with id: 140502758808032 - is now: ['a', 'a']

Does this violate the principle of “Least Astonishment”?

This order of execution is frequently confusing to new users of Python. If you understand the Python execution model, then it becomes quite expected.

The usual instruction to new Python users:

But this is why the usual instruction to new users is to create their default arguments like this instead:

def example_function_2(default_kwarg=None):
    if default_kwarg is None:
        default_kwarg = []

This uses the None singleton as a sentinel object to tell the function whether or not we’ve gotten an argument other than the default. If we get no argument, then we actually want to use a new empty list, [], as the default.

As the tutorial section on control flow says:

If you don’t want the default to be shared between subsequent calls, you can write the function like this instead:

def f(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

回答 15

最短的答案可能是“定义就是执行”,因此整个论点没有严格意义。作为更人为的示例,您可以引用以下内容:

def a(): return []

def b(x=a()):
    print x

希望足以表明在def语句执行时不执行默认参数表达式不是一件容易的事,或者说没有道理,或者两者兼而有之。

我同意,当您尝试使用默认构造函数时,这是一个陷阱。

The shortest answer would probably be “definition is execution”, therefore the whole argument makes no strict sense. As a more contrived example, you may cite this:

def a(): return []

def b(x=a()):
    print x

Hopefully it’s enough to show that not executing the default argument expressions at the execution time of the def statement isn’t easy or doesn’t make sense, or both.

I agree it’s a gotcha when you try to use default constructors, though.


回答 16

使用None的简单解决方法

>>> def bar(b, data=None):
...     data = data or []
...     data.append(b)
...     return data
... 
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]

A simple workaround using None

>>> def bar(b, data=None):
...     data = data or []
...     data.append(b)
...     return data
... 
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3)
[3]
>>> bar(3, [34])
[34, 3]
>>> bar(3, [34])
[34, 3]

回答 17

如果考虑以下因素,这种行为就不足为奇了:

  1. 分配尝试时只读类属性的行为,并且
  2. 函数是对象(在接受的答案中有很好的解释)。

(2)的作用已在该线程中广泛讨论。(1)可能是令人惊讶的原因,因为这种行为在来自其他语言时不是“直观”的。

(1)有关类的Python 教程中进行了描述。在尝试为只读类属性分配值时:

…在最内层作用域之外找到的所有变量都是只读的(尝试写入此类变量只会在最内层作用域内创建一个新的局部变量,而使名称相同的外层变量保持不变)。

回到原始示例并考虑以上几点:

def foo(a=[]):
    a.append(5)
    return a

foo是一个对象,a是的属性foo(位于foo.func_defs[0])。由于a是列表,a因此是可变的,因此是的读写属性foo。实例化函数时,它将初始化为签名指定的空列表,并且只要函数对象存在,就可以进行读取和写入。

foo不覆盖默认值的情况下进行调用会使用中的默认值foo.func_defs。在这种情况下,foo.func_defs[0]用于a功能对象的代码范围内。更改更改,a更改foo.func_defs[0]foo对象的一部分,并在foo

现在,将此与模拟其他语言的默认参数行为的文档示例进行比较,以便每次执行函数时都使用函数签名默认值:

def foo(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

考虑到(1)(2),可以看到为什么这样做可以实现所需的行为:

  • foo功能对象被实例化,foo.func_defs[0]被设置为None,一个不可变的对象。
  • 当使用默认值执行函数(L在函数调用中未指定参数)时,foo.func_defs[0]None)在本地作用域中为L
  • 在时L = [],分配不能在处成功foo.func_defs[0],因为该属性是只读的。
  • 对于(1)还会L在本地范围内创建一个也命名为新的本地变量,并用于其余的函数调用。foo.func_defs[0]因此对于以后的调用保持不变foo

This behavior is not surprising if you take the following into consideration:

  1. The behavior of read-only class attributes upon assignment attempts, and that
  2. Functions are objects (explained well in the accepted answer).

The role of (2) has been covered extensively in this thread. (1) is likely the astonishment causing factor, as this behavior is not “intuitive” when coming from other languages.

(1) is described in the Python tutorial on classes. In an attempt to assign a value to a read-only class attribute:

…all variables found outside of the innermost scope are read-only (an attempt to write to such a variable will simply create a new local variable in the innermost scope, leaving the identically named outer variable unchanged).

Look back to the original example and consider the above points:

def foo(a=[]):
    a.append(5)
    return a

Here foo is an object and a is an attribute of foo (available at foo.func_defs[0]). Since a is a list, a is mutable and is thus a read-write attribute of foo. It is initialized to the empty list as specified by the signature when the function is instantiated, and is available for reading and writing as long as the function object exists.

Calling foo without overriding a default uses that default’s value from foo.func_defs. In this case, foo.func_defs[0] is used for a within function object’s code scope. Changes to a change foo.func_defs[0], which is part of the foo object and persists between execution of the code in foo.

Now, compare this to the example from the documentation on emulating the default argument behavior of other languages, such that the function signature defaults are used every time the function is executed:

def foo(a, L=None):
    if L is None:
        L = []
    L.append(a)
    return L

Taking (1) and (2) into account, one can see why this accomplishes the the desired behavior:

  • When the foo function object is instantiated, foo.func_defs[0] is set to None, an immutable object.
  • When the function is executed with defaults (with no parameter specified for L in the function call), foo.func_defs[0] (None) is available in the local scope as L.
  • Upon L = [], the assignment cannot succeed at foo.func_defs[0], because that attribute is read-only.
  • Per (1), a new local variable also named L is created in the local scope and used for the remainder of the function call. foo.func_defs[0] thus remains unchanged for future invocations of foo.

回答 18

我将演示将默认列表值传递给函数的替代结构(与字典同样有效)。

正如其他人广泛评论的那样,list参数在定义时绑定到函数,而不是在执行时绑定。由于列表和字典是可变的,因此对该参数的任何更改都会影响对该函数的其他调用。结果,随后对该函数的调用将收到此共享列表,该共享列表可能已被对该函数的任何其他调用更改。更糟糕的是,两个参数同时使用了此函数的共享参数,而忽略了另一个参数所做的更改。

错误的方法(可能是…)

def foo(list_arg=[5]):
    return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]  

# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()             
7

您可以使用以下命令验证它们是同一对象id

>>> id(a)
5347866528

>>> id(b)
5347866528

Per Brett Slatkin的“有效的Python:59种编写更好的Python的特定方式”,第20项:使用None和文档字符串指定动态默认参数(第48页)

在Python中达到预期结果的约定是提供默认值,None并在docstring中记录实际行为。

此实现可确保对函数的每次调用都可以接收默认列表,也可以将列表传递给函数。

首选方法

def foo(list_arg=None):
   """
   :param list_arg:  A list of input values. 
                     If none provided, used a list with a default value of 5.
   """
   if not list_arg:
       list_arg = [5]
   return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
>>> b
[5, 7]

c = foo([10])
c.append(11)
>>> c
[10, 11]

“错误方法”可能存在合法的用例,程序员可能希望共享默认的列表参数,但这比规则更可能是exceptions。

I am going to demonstrate an alternative structure to pass a default list value to a function (it works equally well with dictionaries).

As others have extensively commented, the list parameter is bound to the function when it is defined as opposed to when it is executed. Because lists and dictionaries are mutable, any alteration to this parameter will affect other calls to this function. As a result, subsequent calls to the function will receive this shared list which may have been altered by any other calls to the function. Worse yet, two parameters are using this function’s shared parameter at the same time oblivious to the changes made by the other.

Wrong Method (probably…):

def foo(list_arg=[5]):
    return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
# The value of 6 appended to variable 'a' is now part of the list held by 'b'.
>>> b
[5, 6, 7]  

# Although 'a' is expecting to receive 6 (the last element it appended to the list),
# it actually receives the last element appended to the shared list.
# It thus receives the value 7 previously appended by 'b'.
>>> a.pop()             
7

You can verify that they are one and the same object by using id:

>>> id(a)
5347866528

>>> id(b)
5347866528

Per Brett Slatkin’s “Effective Python: 59 Specific Ways to Write Better Python”, Item 20: Use None and Docstrings to specify dynamic default arguments (p. 48)

The convention for achieving the desired result in Python is to provide a default value of None and to document the actual behaviour in the docstring.

This implementation ensures that each call to the function either receives the default list or else the list passed to the function.

Preferred Method:

def foo(list_arg=None):
   """
   :param list_arg:  A list of input values. 
                     If none provided, used a list with a default value of 5.
   """
   if not list_arg:
       list_arg = [5]
   return list_arg

a = foo()
a.append(6)
>>> a
[5, 6]

b = foo()
b.append(7)
>>> b
[5, 7]

c = foo([10])
c.append(11)
>>> c
[10, 11]

There may be legitimate use cases for the ‘Wrong Method’ whereby the programmer intended the default list parameter to be shared, but this is more likely the exception than the rule.


回答 19

这里的解决方案是:

  1. 使用None作为默认值(或随机数object),以及交换机上,在运行时创建自己的价值观; 要么
  2. 使用a lambda作为默认参数,并在try块中调用它以获取默认值(这是lambda抽象用于的事情)。

第二个选项很好,因为该函数的用户可以传递一个可调用的(可能已经存在)(例如type

The solutions here are:

  1. Use None as your default value (or a nonce object), and switch on that to create your values at runtime; or
  2. Use a lambda as your default parameter, and call it within a try block to get the default value (this is the sort of thing that lambda abstraction is for).

The second option is nice because users of the function can pass in a callable, which may be already existing (such as a type)


回答 20

当我们这样做时:

def foo(a=[]):
    ...

… 如果调用者未传递a的值,则将参数分配a给一个未命名的列表。

为了简化讨论,让我们暂时为未命名列表命名。怎么pavlo

def foo(a=pavlo):
   ...

在任何时候,如果调用方法不告诉我们是什么a,我们就会重用pavlo

如果pavlo是可变的(可修改的),并且foo最终对其进行了修改,那么下次foo调用我们注意到的效果时无需指定a

因此,这就是您所看到的(记住,pavlo已初始化为[]):

 >>> foo()
 [5]

现在,pavlo是[5]。

foo()再次调用会再次修改pavlo

>>> foo()
[5, 5]

指定a呼叫时foo()确保pavlo不会被触摸。

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

因此,pavlo仍然是[5, 5]

>>> foo()
[5, 5, 5]

When we do this:

def foo(a=[]):
    ...

… we assign the argument a to an unnamed list, if the caller does not pass the value of a.

To make things simpler for this discussion, let’s temporarily give the unnamed list a name. How about pavlo ?

def foo(a=pavlo):
   ...

At any time, if the caller doesn’t tell us what a is, we reuse pavlo.

If pavlo is mutable (modifiable), and foo ends up modifying it, an effect we notice the next time foo is called without specifying a.

So this is what you see (Remember, pavlo is initialized to []):

 >>> foo()
 [5]

Now, pavlo is [5].

Calling foo() again modifies pavlo again:

>>> foo()
[5, 5]

Specifying a when calling foo() ensures pavlo is not touched.

>>> ivan = [1, 2, 3, 4]
>>> foo(a=ivan)
[1, 2, 3, 4, 5]
>>> ivan
[1, 2, 3, 4, 5]

So, pavlo is still [5, 5].

>>> foo()
[5, 5, 5]

回答 21

我有时会利用此行为来替代以下模式:

singleton = None

def use_singleton():
    global singleton

    if singleton is None:
        singleton = _make_singleton()

    return singleton.use_me()

如果singleton仅由使用use_singleton,则我喜欢以下模式作为替换:

# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
    return singleton.use_me()

我用它来实例化访问外部资源的客户端类,还用于创建字典或用于记忆的列表。

由于我认为这种模式并不为人所知,因此我做了简短的评论,以防止将来发生误解。

I sometimes exploit this behavior as an alternative to the following pattern:

singleton = None

def use_singleton():
    global singleton

    if singleton is None:
        singleton = _make_singleton()

    return singleton.use_me()

If singleton is only used by use_singleton, I like the following pattern as a replacement:

# _make_singleton() is called only once when the def is executed
def use_singleton(singleton=_make_singleton()):
    return singleton.use_me()

I’ve used this for instantiating client classes that access external resources, and also for creating dicts or lists for memoization.

Since I don’t think this pattern is well known, I do put a short comment in to guard against future misunderstandings.


回答 22

您可以通过替换对象来解决这个问题(并因此替换范围):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

丑陋,但是行得通。

You can get round this by replacing the object (and therefore the tie with the scope):

def foo(a=[]):
    a = list(a)
    a.append(5)
    return a

Ugly, but it works.


回答 23

可能确实是:

  1. 有人正在使用每种语言/库功能,并且
  2. 在这里切换行为是不明智的,但是

坚持上述两个功能,并且仍然提出另一点是完全一致的:

  1. 这是一个令人困惑的功能,不幸的是在Python中。

其他答案,或者至少其中一些答案得分为1和2,而不是3,或者得分为3,淡化得分为1和2。但是所有三个答案都是正确的。

的确,在此处中途更换马匹可能会造成重大损坏,并且通过更改Python以直观地处理Stefano的开头代码段可能会产生更多问题。确实可能是一个非常了解Python内部知识的人可以解释后果的雷区。然而,

现有的行为不是Python的,Python是成功的,因为很少有语言违反任何地方的最小惊讶原则 附近这很糟糕。根除它是否明智是一个真正的问题。这是一个设计缺陷。如果您通过尝试找出行为来更好地理解该语言,那么可以说C ++可以完成所有这些工作,甚至更多。通过导航(例如)细微的指针错误,您学到了很多东西。但这不是Python风格的:关心Python足以在这种行为面前持之以恒的人是被该语言吸引的人,因为Python比其他语言具有更少的惊喜。当涉猎者和好奇的人成为Pythonista者时,他们惊讶地发现需要花很少的时间才能完成某项工作-不是因为设计漏洞-我的意思是隐藏的逻辑难题-消除了被Python吸引的程序员的直觉因为它可行

It may be true that:

  1. Someone is using every language/library feature, and
  2. Switching the behavior here would be ill-advised, but

it is entirely consistent to hold to both of the features above and still make another point:

  1. It is a confusing feature and it is unfortunate in Python.

The other answers, or at least some of them either make points 1 and 2 but not 3, or make point 3 and downplay points 1 and 2. But all three are true.

It may be true that switching horses in midstream here would be asking for significant breakage, and that there could be more problems created by changing Python to intuitively handle Stefano’s opening snippet. And it may be true that someone who knew Python internals well could explain a minefield of consequences. However,

The existing behavior is not Pythonic, and Python is successful because very little about the language violates the principle of least astonishment anywhere near this badly. It is a real problem, whether or not it would be wise to uproot it. It is a design flaw. If you understand the language much better by trying to trace out the behavior, I can say that C++ does all of this and more; you learn a lot by navigating, for instance, subtle pointer errors. But this is not Pythonic: people who care about Python enough to persevere in the face of this behavior are people who are drawn to the language because Python has far fewer surprises than other language. Dabblers and the curious become Pythonistas when they are astonished at how little time it takes to get something working–not because of a design fl–I mean, hidden logic puzzle–that cuts against the intuitions of programmers who are drawn to Python because it Just Works.


回答 24

这不是设计缺陷。绊倒这个的人做错了什么。

我看到3种情况,您可能会遇到此问题:

  1. 您打算修改参数作为函数的副作用。在这种情况下,没有默认参数是没有意义的。唯一的exceptions是,当您滥用参数列表以具有函数属性(例如)时cache={},根本就不会期望使用实际参数来调用函数。
  2. 您打算保留该参数不变,但您无意中对其做了修改。那是一个错误,修复它。
  3. 您打算修改在函数内部使用的参数,但是并不希望修改在函数外部可见。在这种情况下,无论是否为默认值,都需要复制该参数!Python不是按值调用的语言,因此它不能为您创建副本,您需要对其进行明确说明。

问题中的示例可能属于类别1或3。奇怪的是,它同时修改了传递的列表并返回了它;您应该选择其中一个。

This is not a design flaw. Anyone who trips over this is doing something wrong.

There are 3 cases I see where you might run into this problem:

  1. You intend to modify the argument as a side effect of the function. In this case it never makes sense to have a default argument. The only exception is when you’re abusing the argument list to have function attributes, e.g. cache={}, and you wouldn’t be expected to call the function with an actual argument at all.
  2. You intend to leave the argument unmodified, but you accidentally did modify it. That’s a bug, fix it.
  3. You intend to modify the argument for use inside the function, but didn’t expect the modification to be viewable outside of the function. In that case you need to make a copy of the argument, whether it was the default or not! Python is not a call-by-value language so it doesn’t make the copy for you, you need to be explicit about it.

The example in the question could fall into category 1 or 3. It’s odd that it both modifies the passed list and returns it; you should pick one or the other.


回答 25

这个“ bug”给了我很多加班时间!但是我开始看到它的潜在用途(但是我还是希望它能在执行时使用)

我会给你我认为有用的例子。

def example(errors=[]):
    # statements
    # Something went wrong
    mistake = True
    if mistake:
        tryToFixIt(errors)
        # Didn't work.. let's try again
        tryToFixItAnotherway(errors)
        # This time it worked
    return errors

def tryToFixIt(err):
    err.append('Attempt to fix it')

def tryToFixItAnotherway(err):
    err.append('Attempt to fix it by another way')

def main():
    for item in range(2):
        errors = example()
    print '\n'.join(errors)

main()

打印以下内容

Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way

This “bug” gave me a lot of overtime work hours! But I’m beginning to see a potential use of it (but I would have liked it to be at the execution time, still)

I’m gonna give you what I see as a useful example.

def example(errors=[]):
    # statements
    # Something went wrong
    mistake = True
    if mistake:
        tryToFixIt(errors)
        # Didn't work.. let's try again
        tryToFixItAnotherway(errors)
        # This time it worked
    return errors

def tryToFixIt(err):
    err.append('Attempt to fix it')

def tryToFixItAnotherway(err):
    err.append('Attempt to fix it by another way')

def main():
    for item in range(2):
        errors = example()
    print '\n'.join(errors)

main()

prints the following

Attempt to fix it
Attempt to fix it by another way
Attempt to fix it
Attempt to fix it by another way

回答 26

只需将功能更改为:

def notastonishinganymore(a = []): 
    '''The name is just a joke :)'''
    a = a[:]
    a.append(5)
    return a

Just change the function to be:

def notastonishinganymore(a = []): 
    '''The name is just a joke :)'''
    a = a[:]
    a.append(5)
    return a

回答 27

我认为这个问题的答案在于python如何将数据传递给参数(通过值或引用传递),而不是可变性或python如何处理“ def”语句。

简介。首先,python中有两种类型的数据类型,一种是简单的基本数据类型,例如数字,另一种是对象。其次,当将数据传递给参数时,python按值传递基本数据类型,即,将值的本地副本传递给局部变量,但按引用传递对象,即指向对象的指针。

承认以上两点,让我们解释一下python代码发生了什么。这仅是因为通过引用传递了对象,但与可变/不可变无关,或者可以说,“ def”语句在定义时仅执行一次。

[]是一个对象,因此python将[]的引用传递给a,即,a仅是指向[]的指针,该指针作为对象位于内存中。[]只有一个副本,但是有很多引用。对于第一个foo(),通过append方法将列表[]更改为1。但是请注意,列表对象只有一个副本,该对象现在变为1。当运行第二个foo()时,effbot网页上显示的内容(不再评估项目)是错误的。a被评估为列表对象,尽管现在对象的内容为1。这是通过引用传递的效果!foo(3)的结果可以用相同的方式轻松得出。

为了进一步验证我的答案,让我们看一下另外两个代码。

====== 2号========

def foo(x, items=None):
    if items is None:
        items = []
    items.append(x)
    return items

foo(1)  #return [1]
foo(2)  #return [2]
foo(3)  #return [3]

[]是一个对象,对象也是这样None(前者是可变的,而后者是不可变的。但是可变性与问题无关)。空间中没有一个地方,但我们知道它在那里,那里只有一个副本。因此,每次调用foo时,项都会被评估为“无”(与之对应的答案是只被评估一次),显然,该引用(或地址)为“无”。然后在foo中,item更改为[],即指向另一个具有不同地址的对象。

====== 3号=======

def foo(x, items=[]):
    items.append(x)
    return items

foo(1)    # returns [1]
foo(2,[]) # returns [2]
foo(3)    # returns [1,3]

foo(1)的调用使项指向具有地址的列表对象[],例如11111111。在续集的foo函数中,列表的内容更改为1,但地址未更改,仍然为11111111然后foo(2,[])来了。尽管在调用foo(1)时,foo(2,[])中的[]与默认参数[]的内容相同,但是它们的地址却不同!由于我们显式提供了参数,items因此必须采用这个新地址[]例如2222222),并在进行一些更改后将其返回。现在执行foo(3)。因为只有x提供时,项目必须再次使用其默认值。默认值是多少?它是在定义foo函数时设置的:位于11111111的列表对象。因此,将这些项评估为具有元素1的地址11111111。位于2222222的列表也包含一个元素2,但是任何项目都不会指向该列表更多。因此,3的追加将成为items[1,3]。

从上面的解释中,我们可以看到,在接受的答案中推荐的effbot网页未能给出与此问题相关的答案。而且,我认为effbot网页中的一点是错误的。我认为有关UI.Button的代码是正确的:

for i in range(10):
    def callback():
        print "clicked button", i
    UI.Button("button %s" % i, callback)

每个按钮可以包含一个不同的回调函数,该函数将显示不同的值i。我可以提供一个示例来说明这一点:

x=[]
for i in range(10):
    def callback():
        print(i)
    x.append(callback) 

如果执行,x[7]()我们将得到预期的7,x[9]()并将得到9的另一个值i

I think the answer to this question lies in how python pass data to parameter (pass by value or by reference), not mutability or how python handle the “def” statement.

A brief introduction. First, there are two type of data types in python, one is simple elementary data type, like numbers, and another data type is objects. Second, when passing data to parameters, python pass elementary data type by value, i.e., make a local copy of the value to a local variable, but pass object by reference, i.e., pointers to the object.

Admitting the above two points, let’s explain what happened to the python code. It’s only because of passing by reference for objects, but has nothing to do with mutable/immutable, or arguably the fact that “def” statement is executed only once when it is defined.

[] is an object, so python pass the reference of [] to a, i.e., a is only a pointer to [] which lies in memory as an object. There is only one copy of [] with, however, many references to it. For the first foo(), the list [] is changed to 1 by append method. But Note that there is only one copy of the list object and this object now becomes 1. When running the second foo(), what effbot webpage says (items is not evaluated any more) is wrong. a is evaluated to be the list object, although now the content of the object is 1. This is the effect of passing by reference! The result of foo(3) can be easily derived in the same way.

To further validate my answer, let’s take a look at two additional codes.

====== No. 2 ========

def foo(x, items=None):
    if items is None:
        items = []
    items.append(x)
    return items

foo(1)  #return [1]
foo(2)  #return [2]
foo(3)  #return [3]

[] is an object, so is None (the former is mutable while the latter is immutable. But the mutability has nothing to do with the question). None is somewhere in the space but we know it’s there and there is only one copy of None there. So every time foo is invoked, items is evaluated (as opposed to some answer that it is only evaluated once) to be None, to be clear, the reference (or the address) of None. Then in the foo, item is changed to [], i.e., points to another object which has a different address.

====== No. 3 =======

def foo(x, items=[]):
    items.append(x)
    return items

foo(1)    # returns [1]
foo(2,[]) # returns [2]
foo(3)    # returns [1,3]

The invocation of foo(1) make items point to a list object [] with an address, say, 11111111. the content of the list is changed to 1 in the foo function in the sequel, but the address is not changed, still 11111111. Then foo(2,[]) is coming. Although the [] in foo(2,[]) has the same content as the default parameter [] when calling foo(1), their address are different! Since we provide the parameter explicitly, items has to take the address of this new [], say 2222222, and return it after making some change. Now foo(3) is executed. since only x is provided, items has to take its default value again. What’s the default value? It is set when defining the foo function: the list object located in 11111111. So the items is evaluated to be the address 11111111 having an element 1. The list located at 2222222 also contains one element 2, but it is not pointed by items any more. Consequently, An append of 3 will make items [1,3].

From the above explanations, we can see that the effbot webpage recommended in the accepted answer failed to give a relevant answer to this question. What is more, I think a point in the effbot webpage is wrong. I think the code regarding the UI.Button is correct:

for i in range(10):
    def callback():
        print "clicked button", i
    UI.Button("button %s" % i, callback)

Each button can hold a distinct callback function which will display different value of i. I can provide an example to show this:

x=[]
for i in range(10):
    def callback():
        print(i)
    x.append(callback) 

If we execute x[7]() we’ll get 7 as expected, and x[9]() will gives 9, another value of i.


回答 28

TLDR:定义时间默认值是一致的,并且更具表现力。


定义一个函数影响两个范围:该范围定义包含的功能,并执行范围由包含的功能。尽管很清楚块是如​​何映射到作用域的,但问题是在哪里def <name>(<args=defaults>):属于:

...                           # defining scope
def name(parameter=default):  # ???
    ...                       # execution scope

def name零件必须在定义范围内进行评估- name毕竟我们希望在那里可用。仅在内部评估函数将使其无法访问。

由于parameter是一个常量名,因此我们可以与同时“评估”它def name。这还有一个优势,那就是它可以生成具有已知签名的功能name(parameter=...):,而不是裸露的签名name(...):

现在,什么时候评估default

一致性已经说了“在定义时”:def <name>(<args=defaults>):在定义时最好也评估其他所有内容。延迟其中的一部分将是令人惊讶的选择。

两种选择都不相等:如果default在定义时求值,它仍然会影响执行时间。如果default在执行时评估,则不会影响定义时间。选择“在定义时”允许表达两种情况,而选择“在执行时”只能表达一种情况:

def name(parameter=defined):  # set default at definition time
    ...

def name(parameter=default):     # delay default until execution time
    parameter = default if parameter is None else parameter
    ...

TLDR: Define-time defaults are consistent and strictly more expressive.


Defining a function affects two scopes: the defining scope containing the function, and the execution scope contained by the function. While it is pretty clear how blocks map to scopes, the question is where def <name>(<args=defaults>): belongs to:

...                           # defining scope
def name(parameter=default):  # ???
    ...                       # execution scope

The def name part must evaluate in the defining scope – we want name to be available there, after all. Evaluating the function only inside itself would make it inaccessible.

Since parameter is a constant name, we can “evaluate” it at the same time as def name. This also has the advantage it produces the function with a known signature as name(parameter=...):, instead of a bare name(...):.

Now, when to evaluate default?

Consistency already says “at definition”: everything else of def <name>(<args=defaults>): is best evaluated at definition as well. Delaying parts of it would be the astonishing choice.

The two choices are not equivalent, either: If default is evaluated at definition time, it can still affect execution time. If default is evaluated at execution time, it cannot affect definition time. Choosing “at definition” allows expressing both cases, while choosing “at execution” can express only one:

def name(parameter=defined):  # set default at definition time
    ...

def name(parameter=default):     # delay default until execution time
    parameter = default if parameter is None else parameter
    ...

回答 29

其他所有答案都解释了为什么这实际上是一种不错的期望行为,或者为什么无论如何您都不需要这样做。Mine适用于那些固执己见的人,他们想行使自己的权利将语言屈服于自己的意愿,而不是反过来。

我们将使用装饰器“修复”此行为,该装饰器将复制默认值,而不是为保留其默认值的每个位置参数重用相同的实例。

import inspect
from copy import copy

def sanify(function):
    def wrapper(*a, **kw):
        # store the default values
        defaults = inspect.getargspec(function).defaults # for python2
        # construct a new argument list
        new_args = []
        for i, arg in enumerate(defaults):
            # allow passing positional arguments
            if i in range(len(a)):
                new_args.append(a[i])
            else:
                # copy the value
                new_args.append(copy(arg))
        return function(*new_args, **kw)
    return wrapper

现在,让我们使用此装饰器重新定义函数:

@sanify
def foo(a=[]):
    a.append(5)
    return a

foo() # '[5]'
foo() # '[5]' -- as desired

这对于带有多个参数的函数特别整洁。相比:

# the 'correct' approach
def bar(a=None, b=None, c=None):
    if a is None:
        a = []
    if b is None:
        b = []
    if c is None:
        c = []
    # finally do the actual work

# the nasty decorator hack
@sanify
def bar(a=[], b=[], c=[]):
    # wow, works right out of the box!

重要的是要注意,如果您尝试使用关键字args,上述解决方案将失效,如下所示:

foo(a=[4])

装饰器可以进行调整以允许这样做,但是我们将其留给读者练习;)

Every other answer explains why this is actually a nice and desired behavior, or why you shouldn’t be needing this anyway. Mine is for those stubborn ones who want to exercise their right to bend the language to their will, not the other way around.

We will “fix” this behavior with a decorator that will copy the default value instead of reusing the same instance for each positional argument left at its default value.

import inspect
from copy import copy

def sanify(function):
    def wrapper(*a, **kw):
        # store the default values
        defaults = inspect.getargspec(function).defaults # for python2
        # construct a new argument list
        new_args = []
        for i, arg in enumerate(defaults):
            # allow passing positional arguments
            if i in range(len(a)):
                new_args.append(a[i])
            else:
                # copy the value
                new_args.append(copy(arg))
        return function(*new_args, **kw)
    return wrapper

Now let’s redefine our function using this decorator:

@sanify
def foo(a=[]):
    a.append(5)
    return a

foo() # '[5]'
foo() # '[5]' -- as desired

This is particularly neat for functions that take multiple arguments. Compare:

# the 'correct' approach
def bar(a=None, b=None, c=None):
    if a is None:
        a = []
    if b is None:
        b = []
    if c is None:
        c = []
    # finally do the actual work

with

# the nasty decorator hack
@sanify
def bar(a=[], b=[], c=[]):
    # wow, works right out of the box!

It’s important to note that the above solution breaks if you try to use keyword args, like so:

foo(a=[4])

The decorator could be adjusted to allow for that, but we leave this as an exercise for the reader ;)