标签归档:instantiation

python是否与Java Class.forName()等效?

问题:python是否与Java Class.forName()等效?

我需要使用字符串参数并在Python中创建以该字符串命名的类的对象。在Java中,我会使用Class.forName().newInstance()。Python中是否有等效的东西?


感谢您的答复。回答那些想知道我在做什么的人:我想使用命令行参数作为类名,并实例化它。我实际上是在Jython编程并实例化Java类,因此是问题的Java实质。 getattr()效果很好。非常感谢。

I have the need to take a string argument and create an object of the class named in that string in Python. In Java, I would use Class.forName().newInstance(). Is there an equivalent in Python?


Thanks for the responses. To answer those who want to know what I’m doing: I want to use a command line argument as the class name, and instantiate it. I’m actually programming in Jython and instantiating Java classes, hence the Java-ness of the question. getattr() works great. Thanks much.


回答 0

python中的反射比Java中的反射更容易,更灵活。

我建议阅读本教程

没有直接函数(据我所知)具有完全限定的类名并返回该类,但是您拥有构建该函数所需的所有组件,并且可以将它们连接在一起。

不过,有一点建议:使用python时不要尝试以Java风格进行编程。

如果您可以解释您要尝试的操作,也许我们可以帮助您找到一种更Python的方法。

这是一个执行您想要的功能的函数:

def get_class( kls ):
    parts = kls.split('.')
    module = ".".join(parts[:-1])
    m = __import__( module )
    for comp in parts[1:]:
        m = getattr(m, comp)            
    return m

您可以使用该函数的返回值,就好像它是类本身一样。

这是一个用法示例:

>>> D = get_class("datetime.datetime")
>>> D
<type 'datetime.datetime'>
>>> D.now()
datetime.datetime(2009, 1, 17, 2, 15, 58, 883000)
>>> a = D( 2010, 4, 22 )
>>> a
datetime.datetime(2010, 4, 22, 0, 0)
>>> 

这是如何运作的?

我们正在使用__import__导入包含该类的模块的方法,这要求我们首先从完全限定的名称中提取模块名称。然后我们导入模块:

m = __import__( module )

在这种情况下,m只会引用顶层模块,

例如,如果你的类生活在foo.baz模块,然后m将模块foo
,我们可以很容易地获得一个参考foo.baz使用getattr( m, 'baz' )

要从顶层模块到达类,必须递归使用gettatr类名称的各个部分

举例来说,如果您的类的名称是,foo.baz.bar.Model那么我们这样做:

m = __import__( "foo.baz.bar" ) #m is package foo
m = getattr( m, "baz" ) #m is package baz
m = getattr( m, "bar" ) #m is module bar
m = getattr( m, "Model" ) #m is class Model

这是此循环中发生的事情:

for comp in parts[1:]:
    m = getattr(m, comp)    

在循环的最后,m将是对该类的引用。这意味着m实际上是itslef类,您可以例如执行以下操作:

a = m() #instantiate a new instance of the class    
b = m( arg1, arg2 ) # pass arguments to the constructor

Reflection in python is a lot easier and far more flexible than it is in Java.

I recommend reading this tutorial

There’s no direct function (that I know of) which takes a fully qualified class name and returns the class, however you have all the pieces needed to build that, and you can connect them together.

One bit of advice though: don’t try to program in Java style when you’re in python.

If you can explain what is it that you’re trying to do, maybe we can help you find a more pythonic way of doing it.

Here’s a function that does what you want:

def get_class( kls ):
    parts = kls.split('.')
    module = ".".join(parts[:-1])
    m = __import__( module )
    for comp in parts[1:]:
        m = getattr(m, comp)            
    return m

You can use the return value of this function as if it were the class itself.

Here’s a usage example:

>>> D = get_class("datetime.datetime")
>>> D
<type 'datetime.datetime'>
>>> D.now()
datetime.datetime(2009, 1, 17, 2, 15, 58, 883000)
>>> a = D( 2010, 4, 22 )
>>> a
datetime.datetime(2010, 4, 22, 0, 0)
>>> 

How does that work?

We’re using __import__ to import the module that holds the class, which required that we first extract the module name from the fully qualified name. Then we import the module:

m = __import__( module )

In this case, m will only refer to the top level module,

For example, if your class lives in foo.baz module, then m will be the module foo
We can easily obtain a reference to foo.baz using getattr( m, 'baz' )

To get from the top level module to the class, have to recursively use gettatr on the parts of the class name

Say for example, if you class name is foo.baz.bar.Model then we do this:

m = __import__( "foo.baz.bar" ) #m is package foo
m = getattr( m, "baz" ) #m is package baz
m = getattr( m, "bar" ) #m is module bar
m = getattr( m, "Model" ) #m is class Model

This is what’s happening in this loop:

for comp in parts[1:]:
    m = getattr(m, comp)    

At the end of the loop, m will be a reference to the class. This means that m is actually the class itslef, you can do for instance:

a = m() #instantiate a new instance of the class    
b = m( arg1, arg2 ) # pass arguments to the constructor

回答 1

假设该类在您的范围内:

globals()['classname'](args, to, constructor)

除此以外:

getattr(someModule, 'classname')(args, to, constructor)

编辑:注意,您不能给’att.bar’这样的名称来获取属性。您需要将其分割为。并从左到右在每个块上调用getattr()。这将处理:

module, rest = 'foo.bar.baz'.split('.', 1)
fooBar = reduce(lambda a, b: getattr(a, b), rest.split('.'), globals()[module])
someVar = fooBar(args, to, constructor)

Assuming the class is in your scope:

globals()['classname'](args, to, constructor)

Otherwise:

getattr(someModule, 'classname')(args, to, constructor)

Edit: Note, you can’t give a name like ‘foo.bar’ to getattr. You’ll need to split it by . and call getattr() on each piece left-to-right. This will handle that:

module, rest = 'foo.bar.baz'.split('.', 1)
fooBar = reduce(lambda a, b: getattr(a, b), rest.split('.'), globals()[module])
someVar = fooBar(args, to, constructor)

回答 2

def import_class_from_string(path):
    from importlib import import_module
    module_path, _, class_name = path.rpartition('.')
    mod = import_module(module_path)
    klass = getattr(mod, class_name)
    return klass

用法

In [59]: raise import_class_from_string('google.appengine.runtime.apiproxy_errors.DeadlineExceededError')()
---------------------------------------------------------------------------
DeadlineExceededError                     Traceback (most recent call last)
<ipython-input-59-b4e59d809b2f> in <module>()
----> 1 raise import_class_from_string('google.appengine.runtime.apiproxy_errors.DeadlineExceededError')()

DeadlineExceededError: 
def import_class_from_string(path):
    from importlib import import_module
    module_path, _, class_name = path.rpartition('.')
    mod = import_module(module_path)
    klass = getattr(mod, class_name)
    return klass

Usage

In [59]: raise import_class_from_string('google.appengine.runtime.apiproxy_errors.DeadlineExceededError')()
---------------------------------------------------------------------------
DeadlineExceededError                     Traceback (most recent call last)
<ipython-input-59-b4e59d809b2f> in <module>()
----> 1 raise import_class_from_string('google.appengine.runtime.apiproxy_errors.DeadlineExceededError')()

DeadlineExceededError: 

回答 3

另一个实现。

def import_class(class_string):
    """Returns class object specified by a string.

    Args:
        class_string: The string representing a class.

    Raises:
        ValueError if module part of the class is not specified.
    """
    module_name, _, class_name = class_string.rpartition('.')
    if module_name == '':
        raise ValueError('Class name must contain module part.')
    return getattr(
        __import__(module_name, globals(), locals(), [class_name], -1),
        class_name)

Yet another implementation.

def import_class(class_string):
    """Returns class object specified by a string.

    Args:
        class_string: The string representing a class.

    Raises:
        ValueError if module part of the class is not specified.
    """
    module_name, _, class_name = class_string.rpartition('.')
    if module_name == '':
        raise ValueError('Class name must contain module part.')
    return getattr(
        __import__(module_name, globals(), locals(), [class_name], -1),
        class_name)

回答 4

看来您正在从中间而不是开始着手。您到底想做什么?查找与给定字符串关联的类是达到目的的一种手段。

如果您弄清楚了问题,可能需要您自己进行心理重构,那么可能会发现一个更好的解决方案。

例如:您是否要根据对象的类型名称和一组参数来加载它?Python拼写了这种解开,您应该看一下pickle模块。即使解开流程完全符合您的描述,您也不必担心它在内部如何工作:

>>> class A(object):
...   def __init__(self, v):
...     self.v = v
...   def __reduce__(self):
...     return (self.__class__, (self.v,))
>>> a = A("example")
>>> import pickle
>>> b = pickle.loads(pickle.dumps(a))
>>> a.v, b.v
('example', 'example')
>>> a is b
False

It seems you’re approaching this from the middle instead of the beginning. What are you really trying to do? Finding the class associated with a given string is a means to an end.

If you clarify your problem, which might require your own mental refactoring, a better solution may present itself.

For instance: Are you trying to load a saved object based on its type name and a set of parameters? Python spells this unpickling and you should look at the pickle module. And even though the unpickling process does exactly what you describe, you don’t have to worry about how it works internally:

>>> class A(object):
...   def __init__(self, v):
...     self.v = v
...   def __reduce__(self):
...     return (self.__class__, (self.v,))
>>> a = A("example")
>>> import pickle
>>> b = pickle.loads(pickle.dumps(a))
>>> a.v, b.v
('example', 'example')
>>> a is b
False

回答 5

在python标准库中可以找到它,为unittest.TestLoader.loadTestsFromName。不幸的是,该方法继续进行其他与测试有关的活动,但是,此方法看起来可重复使用。我已经对其进行了编辑,以删除与测试相关的功能:

def get_object(name):
    """Retrieve a python object, given its dotted.name."""
    parts = name.split('.')
    parts_copy = parts[:]
    while parts_copy:
        try:
            module = __import__('.'.join(parts_copy))
            break
        except ImportError:
            del parts_copy[-1]
            if not parts_copy: raise
    parts = parts[1:]

    obj = module
    for part in parts:
        parent, obj = obj, getattr(obj, part)

    return obj

This is found in the python standard library, as unittest.TestLoader.loadTestsFromName. Unfortunately the method goes on to do additional test-related activities, but this first ha looks re-usable. I’ve edited it to remove the test-related functionality:

def get_object(name):
    """Retrieve a python object, given its dotted.name."""
    parts = name.split('.')
    parts_copy = parts[:]
    while parts_copy:
        try:
            module = __import__('.'.join(parts_copy))
            break
        except ImportError:
            del parts_copy[-1]
            if not parts_copy: raise
    parts = parts[1:]

    obj = module
    for part in parts:
        parent, obj = obj, getattr(obj, part)

    return obj

回答 6

我需要获取中所有现有类的对象my_package。因此,我将所有必要的类导入my_package__init__.py

所以我的目录结构是这样的:

/my_package
    - __init__.py
    - module1.py
    - module2.py
    - module3.py

我的__init__.py样子是这样的:

from .module1 import ClassA
from .module2 import ClassB

然后我创建一个像这样的函数:

def get_classes_from_module_name(module_name):
    return [_cls() for _, _cls in inspect.getmembers(__import__(module_name), inspect.isclass)]

哪里 module_name = 'my_package'

检查文档:https : //docs.python.org/3/library/inspect.html#inspect.getmembers

I needed to get objects for all existing classes in my_package. So I import all necessary classes into my_package‘s __init__.py.

So my directory structure is like this:

/my_package
    - __init__.py
    - module1.py
    - module2.py
    - module3.py

And my __init__.py looks like this:

from .module1 import ClassA
from .module2 import ClassB

Then I create a function like this:

def get_classes_from_module_name(module_name):
    return [_cls() for _, _cls in inspect.getmembers(__import__(module_name), inspect.isclass)]

Where module_name = 'my_package'

inspect doc: https://docs.python.org/3/library/inspect.html#inspect.getmembers


为什么[]比list()快?

问题:为什么[]比list()快?

我最近比较了[]和的处理速度,并list()惊讶地发现它的[]运行速度是的三倍以上list()。我跑了相同的测试与{}dict(),结果几乎相同:[]{}两个花了大约0.128sec /百万次,而list()dict()把每个粗0.428sec /万次。

为什么是这样?不要[]{}(可能()'',太)立即传回了一些空的股票面值的副本,而其明确命名同行(list()dict()tuple()str())完全去创建一个对象,他们是否真的有元素?

我不知道这两种方法有何不同,但我很想找出答案。我在文档中或SO上都找不到答案,而寻找空括号却比我预期的要麻烦得多。

通过分别调用timeit.timeit("[]")timeit.timeit("list()"),和timeit.timeit("{}")timeit.timeit("dict()")来比较列表和字典,以获得计时结果。我正在运行Python 2.7.9。

我最近发现“ 为什么True慢于if? ”比较了if Trueto 的性能,if 1并且似乎触及了类似的文字对全局场景;也许也值得考虑。

I recently compared the processing speeds of [] and list() and was surprised to discover that [] runs more than three times faster than list(). I ran the same test with {} and dict() and the results were practically identical: [] and {} both took around 0.128sec / million cycles, while list() and dict() took roughly 0.428sec / million cycles each.

Why is this? Do [] and {} (and probably () and '', too) immediately pass back a copies of some empty stock literal while their explicitly-named counterparts (list(), dict(), tuple(), str()) fully go about creating an object, whether or not they actually have elements?

I have no idea how these two methods differ but I’d love to find out. I couldn’t find an answer in the docs or on SO, and searching for empty brackets turned out to be more problematic than I’d expected.

I got my timing results by calling timeit.timeit("[]") and timeit.timeit("list()"), and timeit.timeit("{}") and timeit.timeit("dict()"), to compare lists and dictionaries, respectively. I’m running Python 2.7.9.

I recently discovered “Why is if True slower than if 1?” that compares the performance of if True to if 1 and seems to touch on a similar literal-versus-global scenario; perhaps it’s worth considering as well.


回答 0

因为[]{}文字语法。Python可以创建字节码仅用于创建列表或字典对象:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list()dict()是单独的对象。它们的名称需要解析,必须包含堆栈以推入参数,必须存储框架以供以后检索,并且必须进行调用。这都需要更多时间。

对于空的情况,这意味着您至少要有一个LOAD_NAME(必须在全局命名空间以及__builtin__模块中进行搜索),后跟一个CALL_FUNCTION必须保留当前帧的:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

您可以使用以下命令分别计时名称查找timeit

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

时间差异可能是字典哈希冲突。从调用这些对象的时间中减去这些时间,然后将结果与使用文字的时间进行比较:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

因此,1.00 - 0.31 - 0.30 == 0.39每1000万次调用必须调用该对象花费了额外的几秒钟。

您可以通过将全局名称别名为本地名称来避免全局查找成本(使用timeit设置,绑定到名称的所有内容都是本地名称):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

但您永远无法克服这些CALL_FUNCTION成本。

Because [] and {} are literal syntax. Python can create bytecode just to create the list or dictionary objects:

>>> import dis
>>> dis.dis(compile('[]', '', 'eval'))
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
>>> dis.dis(compile('{}', '', 'eval'))
  1           0 BUILD_MAP                0
              3 RETURN_VALUE        

list() and dict() are separate objects. Their names need to be resolved, the stack has to be involved to push the arguments, the frame has to be stored to retrieve later, and a call has to be made. That all takes more time.

For the empty case, that means you have at the very least a LOAD_NAME (which has to search through the global namespace as well as the __builtin__ module) followed by a CALL_FUNCTION, which has to preserve the current frame:

>>> dis.dis(compile('list()', '', 'eval'))
  1           0 LOAD_NAME                0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
>>> dis.dis(compile('dict()', '', 'eval'))
  1           0 LOAD_NAME                0 (dict)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        

You can time the name lookup separately with timeit:

>>> import timeit
>>> timeit.timeit('list', number=10**7)
0.30749011039733887
>>> timeit.timeit('dict', number=10**7)
0.4215109348297119

The time discrepancy there is probably a dictionary hash collision. Subtract those times from the times for calling those objects, and compare the result against the times for using literals:

>>> timeit.timeit('[]', number=10**7)
0.30478692054748535
>>> timeit.timeit('{}', number=10**7)
0.31482696533203125
>>> timeit.timeit('list()', number=10**7)
0.9991960525512695
>>> timeit.timeit('dict()', number=10**7)
1.0200958251953125

So having to call the object takes an additional 1.00 - 0.31 - 0.30 == 0.39 seconds per 10 million calls.

You can avoid the global lookup cost by aliasing the global names as locals (using a timeit setup, everything you bind to a name is a local):

>>> timeit.timeit('_list', '_list = list', number=10**7)
0.1866450309753418
>>> timeit.timeit('_dict', '_dict = dict', number=10**7)
0.19016098976135254
>>> timeit.timeit('_list()', '_list = list', number=10**7)
0.841480016708374
>>> timeit.timeit('_dict()', '_dict = dict', number=10**7)
0.7233691215515137

but you never can overcome that CALL_FUNCTION cost.


回答 1

list()需要全局查找和函数调用,但需要[]编译为一条指令。看到:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

list() requires a global lookup and a function call but [] compiles to a single instruction. See:

Python 2.7.3
>>> import dis
>>> print dis.dis(lambda: list())
  1           0 LOAD_GLOBAL              0 (list)
              3 CALL_FUNCTION            0
              6 RETURN_VALUE        
None
>>> print dis.dis(lambda: [])
  1           0 BUILD_LIST               0
              3 RETURN_VALUE        
None

回答 2

因为list是一个功能转化说一个字符串列表对象,而[]用于创建一个列表蝙蝠。尝试一下(可能对您更有意义):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

y = ["wham bam"]
>>> y
["wham bam"]

为您提供包含您所输入内容的实际列表。

Because list is a function to convert say a string to a list object, while [] is used to create a list off the bat. Try this (might make more sense to you):

x = "wham bam"
a = list(x)
>>> a
["w", "h", "a", "m", ...]

While

y = ["wham bam"]
>>> y
["wham bam"]

Gives you a actual list containing whatever you put in it.


回答 3

至此,答案非常好,并完全涵盖了这个问题。对于那些感兴趣的人,我将进一步从字节码中删除。我正在使用CPython的最新仓库;在这方面,旧版本的行为类似,但可能会稍作更改。

这是每个BUILD_LIST针对for []CALL_FUNCTIONfor 的执行情况的细分list()


BUILD_LIST指令:

您应该只查看恐怖:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

我知道那令人费解。这是多么简单:

  • 使用创建新列表PyList_New(主要是为新的列表对象分配内存),以oparg信号指示堆栈上的参数数量。开门见山。
  • 检查是否没有问题if (list==NULL)
  • 使用PyList_SET_ITEM(宏)添加位于堆栈上的所有参数(在我们的示例中此参数未执行)。

难怪它很快!它是为创建新列表而定制的,仅此而已:-)

CALL_FUNCTION指令:

窥视代码处理时,这是您看到的第一件事CALL_FUNCTION

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

看起来很无害吧?好吧,不是,不幸的call_function是,不是一个会立即调用该函数的直截了当的家伙,它不会。相反,它从堆栈中获取对象,获取堆栈中的所有参数,然后根据对象的类型进行切换。它是:

我们正在调用list类型,传入的参数call_functionPyList_Type。CPython现在必须调用一个泛型函数来处理名为的所有可调用对象_PyObject_FastCallKeywords,还有更多函数调用。

该函数再次检查某些函数类型(我不明白为什么),然后在为kwargs创建字典后,如果需要,继续调用_PyObject_FastCallDict

_PyObject_FastCallDict终于把我们带到某个地方!执行后甚至更多的检查抓住了tp_call从插槽type中的type我们在通过了,那就是它抓住type.tp_call。然后,它根据传入的参数来创建元组_PyStack_AsTuple,最后可以最终进行调用

tp_call,它将匹配type.__call__并最终创建列表对象。它调用与之__new__对应的列表PyType_GenericNew并为其分配内存PyType_GenericAlloc这实际上是它与追上的部分PyList_New,最后。所有以前的内容对于以通用方式处理对象都是必需的。

最后,使用任何可用参数type_call调用list.__init__并初始化列表,然后继续返回原来的方式。:-)

最后,记住 LOAD_NAME,这是另一个在这里做出贡献的家伙。


很容易看到,在处理我们的输入时,Python通常必须跳过圈以真正找到合适的C函数来完成工作。它不具有立即调用它的功能,因为它是动态的,有人可能会掩盖list并且男孩会做很多人做的事情),因此必须采取另一条路。

这是哪里 list()损失很多的地方:正在探索的Python需要做以找出它应该做什么。

另一方面,字面语法恰好意味着一回事。它无法更改,并且始终以预定的方式运行。

脚注:所有功能名称均可能从一个版本更改为另一个版本。关键点仍然存在,并且很可能在将来的任何版本中都存在,这是动态查找使事情变慢的原因。

The answers here are great, to the point and fully cover this question. I’ll drop a further step down from byte-code for those interested. I’m using the most recent repo of CPython; older versions behave similar in this regard but slight changes might be in place.

Here’s a break down of the execution for each of these, BUILD_LIST for [] and CALL_FUNCTION for list().


The BUILD_LIST instruction:

You should just view the horror:

PyObject *list =  PyList_New(oparg);
if (list == NULL)
    goto error;
while (--oparg >= 0) {
    PyObject *item = POP();
    PyList_SET_ITEM(list, oparg, item);
}
PUSH(list);
DISPATCH();

Terribly convoluted, I know. This is how simple it is:

  • Create a new list with PyList_New (this mainly allocates the memory for a new list object), oparg signalling the number of arguments on the stack. Straight to the point.
  • Check that nothing went wrong with if (list==NULL).
  • Add any arguments (in our case this isn’t executed) located on the stack with PyList_SET_ITEM (a macro).

No wonder it is fast! It’s custom-made for creating new lists, nothing else :-)

The CALL_FUNCTION instruction:

Here’s the first thing you see when you peek at the code handling CALL_FUNCTION:

PyObject **sp, *res;
sp = stack_pointer;
res = call_function(&sp, oparg, NULL);
stack_pointer = sp;
PUSH(res);
if (res == NULL) {
    goto error;
}
DISPATCH();

Looks pretty harmless, right? Well, no, unfortunately not, call_function is not a straightforward guy that will call the function immediately, it can’t. Instead, it grabs the object from the stack, grabs all arguments of the stack and then switches based on the type of the object; is it a:

We’re calling the list type, the argument passed in to call_function is PyList_Type. CPython now has to call a generic function to handle any callable objects named _PyObject_FastCallKeywords, yay more function calls.

This function again makes some checks for certain function types (which I cannot understand why) and then, after creating a dict for kwargs if required, goes on to call _PyObject_FastCallDict.

_PyObject_FastCallDict finally gets us somewhere! After performing even more checks it grabs the tp_call slot from the type of the type we’ve passed in, that is, it grabs type.tp_call. It then proceeds to create a tuple out of of the arguments passed in with _PyStack_AsTuple and, finally, a call can finally be made!

tp_call, which matches type.__call__ takes over and finally creates the list object. It calls the lists __new__ which corresponds to PyType_GenericNew and allocates memory for it with PyType_GenericAlloc: This is actually the part where it catches up with PyList_New, finally. All the previous are necessary to handle objects in a generic fashion.

In the end, type_call calls list.__init__ and initializes the list with any available arguments, then we go on a returning back the way we came. :-)

Finally, remmeber the LOAD_NAME, that’s another guy that contributes here.


It’s easy to see that, when dealing with our input, Python generally has to jump through hoops in order to actually find out the appropriate C function to do the job. It doesn’t have the curtesy of immediately calling it because it’s dynamic, someone might mask list (and boy do many people do) and another path must be taken.

This is where list() loses much: The exploring Python needs to do to find out what the heck it should do.

Literal syntax, on the other hand, means exactly one thing; it cannot be changed and always behaves in a pre-determined way.

Footnote: All function names are subject to change from one release to the other. The point still stands and most likely will stand in any future versions, it’s the dynamic look-up that slows things down.


回答 4

为什么[]要比list()

最大的原因是Python list()就像用户定义的函数一样对待,这意味着您可以通过别名别名来拦截它list并做一些不同的事情(例如使用您自己的子类列表或双端队列)。

它将立即使用创建新的内置列表实例[]

我的解释旨在为您提供直觉。

说明

[] 通常称为文字语法。

在语法中,这称为“列表显示”。从文档

列表显示是括在方括号中的一系列可能为空的表达式:

list_display ::=  "[" [starred_list | comprehension] "]"

列表显示将产生一个新的列表对象,其内容由表达式列表或理解列表指定。提供逗号分隔的表达式列表时,将按从左到右的顺序评估其元素,并将其按此顺序放入列表对象中。提供理解后,将根据理解产生的元素来构建列表。

简而言之,这意味着将list创建一个内置类型的对象。

不能回避这一点-这意味着Python可以尽快完成它。

另一方面,list()可以list使用内置列表构造函数拦截创建内置对象的过程。

例如,假设我们希望创建噪音较大的列表:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

然后,我们可以list在模块级别的全局范围内截取该名称,然后在创建时list,实际上创建了子类型列表:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

同样,我们可以将其从全局命名空间中删除

del list

并将其放在内置命名空间中:

import builtins
builtins.list = List

现在:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

并注意列表显示无条件创建列表:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

我们可能只是暂时执行此操作,所以请撤消更改-首先List从内置文件中删除新对象:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

哦,不,我们失去了原来的踪迹。

不用担心,我们仍然可以得到list-它是列表文字的类型:

>>> builtins.list = type([])
>>> list()
[]

所以…

为什么[]要比list()

如我们所见-我们可以覆盖list-但是我们不能截取文字类型的创建。使用时,list我们必须进行查找以查看是否存在任何内容。

然后,我们必须调用已查找的任何可调用对象。从语法上:

调用使用一系列可能为空的参数来调用可调用对象(例如,函数):

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

我们可以看到它对任何名称都具有相同的作用,而不仅仅是列表:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

因为[]在Python字节码级别没有函数调用:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

它只是直接建立列表而无需在字节码级别进行任何查找或调用。

结论

我们已经证明了list可以使用范围规则用用户代码拦截,并且可以list()查找可调用对象然后调用它。

[]列表显示或文字显示则避免了名称查找和函数调用。

Why is [] faster than list()?

The biggest reason is that Python treats list() just like a user-defined function, which means you can intercept it by aliasing something else to list and do something different (like use your own subclassed list or perhaps a deque).

It immediately creates a new instance of a builtin list with [].

My explanation seeks to give you the intuition for this.

Explanation

[] is commonly known as literal syntax.

In the grammar, this is referred to as a “list display”. From the docs:

A list display is a possibly empty series of expressions enclosed in square brackets:

list_display ::=  "[" [starred_list | comprehension] "]"

A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension.

In short, this means that a builtin object of type list is created.

There is no circumventing this – which means Python can do it as quickly as it may.

On the other hand, list() can be intercepted from creating a builtin list using the builtin list constructor.

For example, say we want our lists to be created noisily:

class List(list):
    def __init__(self, iterable=None):
        if iterable is None:
            super().__init__()
        else:
            super().__init__(iterable)
        print('List initialized.')

We could then intercept the name list on the module level global scope, and then when we create a list, we actually create our subtyped list:

>>> list = List
>>> a_list = list()
List initialized.
>>> type(a_list)
<class '__main__.List'>

Similarly we could remove it from the global namespace

del list

and put it in the builtin namespace:

import builtins
builtins.list = List

And now:

>>> list_0 = list()
List initialized.
>>> type(list_0)
<class '__main__.List'>

And note that the list display creates a list unconditionally:

>>> list_1 = []
>>> type(list_1)
<class 'list'>

We probably only do this temporarily, so lets undo our changes – first remove the new List object from the builtins:

>>> del builtins.list
>>> builtins.list
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'builtins' has no attribute 'list'
>>> list()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'list' is not defined

Oh, no, we lost track of the original.

Not to worry, we can still get list – it’s the type of a list literal:

>>> builtins.list = type([])
>>> list()
[]

So…

Why is [] faster than list()?

As we’ve seen – we can overwrite list – but we can’t intercept the creation of the literal type. When we use list we have to do the lookups to see if anything is there.

Then we have to call whatever callable we have looked up. From the grammar:

A call calls a callable object (e.g., a function) with a possibly empty series of arguments:

call                 ::=  primary "(" [argument_list [","] | comprehension] ")"

We can see that it does the same thing for any name, not just list:

>>> import dis
>>> dis.dis('list()')
  1           0 LOAD_NAME                0 (list)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE
>>> dis.dis('doesnotexist()')
  1           0 LOAD_NAME                0 (doesnotexist)
              2 CALL_FUNCTION            0
              4 RETURN_VALUE

For [] there is no function call at the Python bytecode level:

>>> dis.dis('[]')
  1           0 BUILD_LIST               0
              2 RETURN_VALUE

It simply goes straight to building the list without any lookups or calls at the bytecode level.

Conclusion

We have demonstrated that list can be intercepted with user code using the scoping rules, and that list() looks for a callable and then calls it.

Whereas [] is a list display, or a literal, and thus avoids the name lookup and function call.