标签归档:initialization

在python中从字典设置属性

问题:在python中从字典设置属性

是否可以通过python中的字典创建对象,使得每个键都是该对象的属性?

像这样:

 d = { 'name': 'Oscar', 'lastName': 'Reyes', 'age':32 }

 e = Employee(d) 
 print e.name # Oscar 
 print e.age + 10 # 42 

我认为这与该问题几乎完全相反:来自对象字段的Python字典

Is it possible to create an object from a dictionary in python in such a way that each key is an attribute of that object?

Something like this:

 d = { 'name': 'Oscar', 'lastName': 'Reyes', 'age':32 }

 e = Employee(d) 
 print e.name # Oscar 
 print e.age + 10 # 42 

I think it would be pretty much the inverse of this question: Python dictionary from an object’s fields


回答 0

当然,是这样的:

class Employee(object):
    def __init__(self, initial_data):
        for key in initial_data:
            setattr(self, key, initial_data[key])

更新资料

正如布伦特·纳什(Brent Nash)所建议的,您还可以通过允许使用关键字参数来使其更加灵活:

class Employee(object):
    def __init__(self, *initial_data, **kwargs):
        for dictionary in initial_data:
            for key in dictionary:
                setattr(self, key, dictionary[key])
        for key in kwargs:
            setattr(self, key, kwargs[key])

然后您可以这样称呼它:

e = Employee({"name": "abc", "age": 32})

或像这样:

e = Employee(name="abc", age=32)

甚至像这样:

employee_template = {"role": "minion"}
e = Employee(employee_template, name="abc", age=32)

Sure, something like this:

class Employee(object):
    def __init__(self, initial_data):
        for key in initial_data:
            setattr(self, key, initial_data[key])

Update

As Brent Nash suggests, you can make this more flexible by allowing keyword arguments as well:

class Employee(object):
    def __init__(self, *initial_data, **kwargs):
        for dictionary in initial_data:
            for key in dictionary:
                setattr(self, key, dictionary[key])
        for key in kwargs:
            setattr(self, key, kwargs[key])

Then you can call it like this:

e = Employee({"name": "abc", "age": 32})

or like this:

e = Employee(name="abc", age=32)

or even like this:

employee_template = {"role": "minion"}
e = Employee(employee_template, name="abc", age=32)

回答 1

以这种方式设置属性几乎肯定不是解决问题的最佳方法。要么:

  1. 您知道所有字段都应该提前。在这种情况下,您可以显式设置所有属性。这看起来像

    class Employee(object):
        def __init__(self, name, last_name, age):
            self.name = name
            self.last_name = last_name
            self.age = age
    
    d = {'name': 'Oscar', 'last_name': 'Reyes', 'age':32 }
    e = Employee(**d) 
    
    print e.name # Oscar 
    print e.age + 10 # 42 

    要么

  2. 您不知道所有字段都应该提前。在这种情况下,您应该将数据存储为dict,而不是污染对象命名空间。这些属性用于静态访问。这种情况看起来像

    class Employee(object):
        def __init__(self, data):
            self.data = data
    
    d = {'name': 'Oscar', 'last_name': 'Reyes', 'age':32 }
    e = Employee(d) 
    
    print e.data['name'] # Oscar 
    print e.data['age'] + 10 # 42 

与情况1基本等效的另一种解决方案是使用collections.namedtuple。有关如何实现的信息,请参见van的答案。

Setting attributes in this way is almost certainly not the best way to solve a problem. Either:

  1. You know what all the fields should be ahead of time. In that case, you can set all the attributes explicitly. This would look like

    class Employee(object):
        def __init__(self, name, last_name, age):
            self.name = name
            self.last_name = last_name
            self.age = age
    
    d = {'name': 'Oscar', 'last_name': 'Reyes', 'age':32 }
    e = Employee(**d) 
    
    print e.name # Oscar 
    print e.age + 10 # 42 
    

    or

  2. You don’t know what all the fields should be ahead of time. In this case, you should store the data as a dict instead of polluting an objects namespace. Attributes are for static access. This case would look like

    class Employee(object):
        def __init__(self, data):
            self.data = data
    
    d = {'name': 'Oscar', 'last_name': 'Reyes', 'age':32 }
    e = Employee(d) 
    
    print e.data['name'] # Oscar 
    print e.data['age'] + 10 # 42 
    

Another solution that is basically equivalent to case 1 is to use a collections.namedtuple. See van’s answer for how to implement that.


回答 2

您可以使用访问对象的属性__dict__,并对其调用update方法:

>>> class Employee(object):
...     def __init__(self, _dict):
...         self.__dict__.update(_dict)
... 


>>> dict = { 'name': 'Oscar', 'lastName': 'Reyes', 'age':32 }

>>> e = Employee(dict)

>>> e.name
'Oscar'

>>> e.age
32

You can access the attributes of an object with __dict__, and call the update method on it:

>>> class Employee(object):
...     def __init__(self, _dict):
...         self.__dict__.update(_dict)
... 


>>> dict = { 'name': 'Oscar', 'lastName': 'Reyes', 'age':32 }

>>> e = Employee(dict)

>>> e.name
'Oscar'

>>> e.age
32

回答 3

为什么不只使用属性名称作为字典的键?

class StructMyDict(dict):

     def __getattr__(self, name):
         try:
             return self[name]
         except KeyError as e:
             raise AttributeError(e)

     def __setattr__(self, name, value):
         self[name] = value

您可以使用命名参数,元组列表,字典或单独的属性分配进行初始化,例如:

nautical = StructMyDict(left = "Port", right = "Starboard") # named args

nautical2 = StructMyDict({"left":"Port","right":"Starboard"}) # dictionary

nautical3 = StructMyDict([("left","Port"),("right","Starboard")]) # tuples list

nautical4 = StructMyDict()  # fields TBD
nautical4.left = "Port"
nautical4.right = "Starboard"

for x in [nautical, nautical2, nautical3, nautical4]:
    print "%s <--> %s" % (x.left,x.right)

或者,您可以为未知值返回None,而不是引发属性错误。(web2py存储类中使用的一个技巧)

Why not just use attribute names as keys to a dictionary?

class StructMyDict(dict):

     def __getattr__(self, name):
         try:
             return self[name]
         except KeyError as e:
             raise AttributeError(e)

     def __setattr__(self, name, value):
         self[name] = value

You can initialize with named arguments, a list of tuples, or a dictionary, or individual attribute assignments, e.g.:

nautical = StructMyDict(left = "Port", right = "Starboard") # named args

nautical2 = StructMyDict({"left":"Port","right":"Starboard"}) # dictionary

nautical3 = StructMyDict([("left","Port"),("right","Starboard")]) # tuples list

nautical4 = StructMyDict()  # fields TBD
nautical4.left = "Port"
nautical4.right = "Starboard"

for x in [nautical, nautical2, nautical3, nautical4]:
    print "%s <--> %s" % (x.left,x.right)

Alternatively, instead of raising the attribute error, you can return None for unknown values. (A trick used in the web2py storage class)


回答 4

我认为,settattr如果您确实需要支持,那么使用答案是可行的方法dict

但是,如果Employeeobject只是可以使用点语法(.name)而不是dict语法(['name'])访问的结构,则可以使用namedtuple,如下所示:

from collections import namedtuple

Employee = namedtuple('Employee', 'name age')
e = Employee('noname01', 6)
print e
#>> Employee(name='noname01', age=6)

# create Employee from dictionary
d = {'name': 'noname02', 'age': 7}
e = Employee(**d)
print e
#>> Employee(name='noname02', age=7)
print e._asdict()
#>> {'age': 7, 'name': 'noname02'}

您确实具有_asdict()将所有属性作为字典访问的方法,但是以后只能在构造过程中才能添加其他属性。

I think that answer using settattr are the way to go if you really need to support dict.

But if Employee object is just a structure which you can access with dot syntax (.name) instead of dict syntax (['name']), you can use namedtuple like this:

from collections import namedtuple

Employee = namedtuple('Employee', 'name age')
e = Employee('noname01', 6)
print e
#>> Employee(name='noname01', age=6)

# create Employee from dictionary
d = {'name': 'noname02', 'age': 7}
e = Employee(**d)
print e
#>> Employee(name='noname02', age=7)
print e._asdict()
#>> {'age': 7, 'name': 'noname02'}

You do have _asdict() method to access all properties as dictionary, but you cannot add additional attributes later, only during the construction.


回答 5

例如说

class A():
    def __init__(self):
        self.x=7
        self.y=8
        self.z="name"

如果您想一次设置属性

d = {'x':100,'y':300,'z':"blah"}
a = A()
a.__dict__.update(d)

say for example

class A():
    def __init__(self):
        self.x=7
        self.y=8
        self.z="name"

if you want to set the attributes at once

d = {'x':100,'y':300,'z':"blah"}
a = A()
a.__dict__.update(d)

回答 6

与使用dict类似,您可以像这样使用kwargs:

class Person:
   def __init__(self, **kwargs):
       self.properties = kwargs

   def get_property(self, key):
       return self.properties.get(key, None)

   def main():
       timmy = Person(color = 'red')
       print(timmy.get_property('color')) #prints 'red'

similar to using a dict, you could just use kwargs like so:

class Person:
   def __init__(self, **kwargs):
       self.properties = kwargs

   def get_property(self, key):
       return self.properties.get(key, None)

   def main():
       timmy = Person(color = 'red')
       print(timmy.get_property('color')) #prints 'red'

Python-创建具有初始容量的列表

问题:Python-创建具有初始容量的列表

像这样的代码经常发生:

l = []
while foo:
    #baz
    l.append(bar)
    #qux

如果您要将数千个元素添加到列表中,这真的很慢,因为必须不断调整列表的大小以适应新元素。

在Java中,您可以创建具有初始容量的ArrayList。如果您知道列表的大小,这将大大提高效率。

我知道这样的代码通常可以重构为列表理解。但是,如果for / while循环非常复杂,则这是不可行的。我们的Python程序员有什么对等的地方吗?

Code like this often happens:

l = []
while foo:
    #baz
    l.append(bar)
    #qux

This is really slow if you’re about to append thousands of elements to your list, as the list will have to be constantly resized to fit the new elements.

In Java, you can create an ArrayList with an initial capacity. If you have some idea how big your list will be, this will be a lot more efficient.

I understand that code like this can often be re-factored into a list comprehension. If the for/while loop is very complicated, though, this is unfeasible. Is there any equivalent for us Python programmers?


回答 0

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

结果。(评估每个功能144次并平均持续时间)

simple append 0.0102
pre-allocate  0.0098

结论。没关系。

过早的优化是万恶之源。

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

Results. (evaluate each function 144 times and average the duration)

simple append 0.0102
pre-allocate  0.0098

Conclusion. It barely matters.

Premature optimization is the root of all evil.


回答 1

Python列表没有内置的预分配。如果您确实需要列出清单,并且需要避免附加的开销(并且您应该验证这样做),则可以执行以下操作:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

也许您可以通过使用生成器来避免使用此列表:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

这样,列表并不会全部存储在内存中,而只是根据需要生成。

Python lists have no built-in pre-allocation. If you really need to make a list, and need to avoid the overhead of appending (and you should verify that you do), you can do this:

l = [None] * 1000 # Make a list of 1000 None's
for i in xrange(1000):
    # baz
    l[i] = bar
    # qux

Perhaps you could avoid the list by using a generator instead:

def my_things():
    while foo:
        #baz
        yield bar
        #qux

for thing in my_things():
    # do something with thing

This way, the list isn’t every stored all in memory at all, merely generated as needed.


回答 2

短版:使用

pre_allocated_list = [None] * size

预先分配一个列表(也就是说,能够解决列表的“大小”元素,而不是通过追加逐步形成列表)。即使在大型列表上,此操作也非常快。分配新对象,这些对象以后将分配给列表元素,将花费更长的时间,并且在性能方面将成为程序中的瓶颈。

长版:

我认为应该考虑初始化时间。由于在python中所有内容都是引用,因此将每个元素设置为None还是某个字符串都没关系-两种方式都只是引用。如果要为每个要引用的元素创建新对象,则将花费更长的时间。

对于Python 3.2:

import time
import copy

def print_timing (func):
  def wrapper (*arg):
    t1 = time.time ()
    res = func (*arg)
    t2 = time.time ()
    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
    return res

  return wrapper

@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod=copy.deepcopy, cpargs=(), use_num = False):
  result = [None] * size
  if init is not None:
    if cp:
      for i in range (size):
          result[i] = init
    else:
      if use_num:
        for i in range (size):
            result[i] = cpmethod (i)
      else:
        for i in range (size):
            result[i] = cpmethod (cpargs)
  return result

@print_timing
def prealloc_array_by_appending (size):
  result = []
  for i in range (size):
    result.append (None)
  return result

@print_timing
def prealloc_array_by_extending (size):
  result = []
  none_list = [None]
  for i in range (size):
    result.extend (none_list)
  return result

def main ():
  n = 1000000
  x = prealloc_array_by_appending(n)
  y = prealloc_array_by_extending(n)
  a = prealloc_array(n, None)
  b = prealloc_array(n, "content", True)
  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
  d = prealloc_array(n, "content", False, "some object {}".format, None, True)
  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)

  print ("x[5] = {}".format (x[5]))
  print ("y[5] = {}".format (y[5]))
  print ("a[5] = {}".format (a[5]))
  print ("b[5] = {}".format (b[5]))
  print ("c[5] = {}".format (c[5]))
  print ("d[5] = {}".format (d[5]))
  print ("e[5] = {}".format (e[5]))
  print ("f[5] = {}".format (f[5]))
  print ("g[5] = {}".format (g[5]))

if __name__ == '__main__':
  main()

评价:

prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()

如您所见,仅列出对同一None对象的大量引用就花费很少的时间。

前置或扩展花费的时间更长(我没有平均任何东西,但是运行几次后,我可以告诉您扩展和附加花费的时间大致相同)。

为每个元素分配新对象-这是最耗时的时间。S.Lott的答案就是这样做-每次都格式化一个新字符串。这不是严格要求的-如果要预分配一些空间,只需列出一个None,然后随意将数据分配给list元素。不管是在创建列表时还是在创建列表之后生成数据,用哪种方式生成数据都比添加/扩展列表花费更多的时间。但是,如果您想要一个人口稀少的列表,那么从“无”列表开始肯定会更快。

Short version: use

pre_allocated_list = [None] * size

to pre-allocate a list (that is, to be able to address ‘size’ elements of the list instead of gradually forming the list by appending). This operation is VERY fast, even on big lists. Allocating new objects that will be later assigned to list elements will take MUCH longer and will be THE bottleneck in your program, performance-wise.

Long version:

I think that initialization time should be taken into account. Since in python everything is a reference, it doesn’t matter whether you set each element into None or some string – either way it’s only a reference. Though it will take longer if you want to create new object for each element to reference.

For Python 3.2:

import time
import copy

def print_timing (func):
  def wrapper (*arg):
    t1 = time.time ()
    res = func (*arg)
    t2 = time.time ()
    print ("{} took {} ms".format (func.__name__, (t2 - t1) * 1000.0))
    return res

  return wrapper

@print_timing
def prealloc_array (size, init = None, cp = True, cpmethod=copy.deepcopy, cpargs=(), use_num = False):
  result = [None] * size
  if init is not None:
    if cp:
      for i in range (size):
          result[i] = init
    else:
      if use_num:
        for i in range (size):
            result[i] = cpmethod (i)
      else:
        for i in range (size):
            result[i] = cpmethod (cpargs)
  return result

@print_timing
def prealloc_array_by_appending (size):
  result = []
  for i in range (size):
    result.append (None)
  return result

@print_timing
def prealloc_array_by_extending (size):
  result = []
  none_list = [None]
  for i in range (size):
    result.extend (none_list)
  return result

def main ():
  n = 1000000
  x = prealloc_array_by_appending(n)
  y = prealloc_array_by_extending(n)
  a = prealloc_array(n, None)
  b = prealloc_array(n, "content", True)
  c = prealloc_array(n, "content", False, "some object {}".format, ("blah"), False)
  d = prealloc_array(n, "content", False, "some object {}".format, None, True)
  e = prealloc_array(n, "content", False, copy.deepcopy, "a", False)
  f = prealloc_array(n, "content", False, copy.deepcopy, (), False)
  g = prealloc_array(n, "content", False, copy.deepcopy, [], False)

  print ("x[5] = {}".format (x[5]))
  print ("y[5] = {}".format (y[5]))
  print ("a[5] = {}".format (a[5]))
  print ("b[5] = {}".format (b[5]))
  print ("c[5] = {}".format (c[5]))
  print ("d[5] = {}".format (d[5]))
  print ("e[5] = {}".format (e[5]))
  print ("f[5] = {}".format (f[5]))
  print ("g[5] = {}".format (g[5]))

if __name__ == '__main__':
  main()

Evaluation:

prealloc_array_by_appending took 118.00003051757812 ms
prealloc_array_by_extending took 102.99992561340332 ms
prealloc_array took 3.000020980834961 ms
prealloc_array took 49.00002479553223 ms
prealloc_array took 316.9999122619629 ms
prealloc_array took 473.00004959106445 ms
prealloc_array took 1677.9999732971191 ms
prealloc_array took 2729.999780654907 ms
prealloc_array took 3001.999855041504 ms
x[5] = None
y[5] = None
a[5] = None
b[5] = content
c[5] = some object blah
d[5] = some object 5
e[5] = a
f[5] = []
g[5] = ()

As you can see, just making a big list of references to the same None object takes very little time.

Prepending or extending takes longer (i didn’t average anything, but after running this a few times i can tell you that extending and appending take roughly the same time).

Allocating new object for each element – that is what takes the most time. And S.Lott’s answer does that – formats a new string every time. Which is not strictly required – if you want to pre-allocate some space, just make a list of None, then assign data to list elements at will. Either way it takes more time to generate data than to append/extend a list, whether you generate it while creating the list, or after that. But if you want a sparsely-populated list, then starting with a list of None is definitely faster.


回答 3

使用Python的方式是:

x = [None] * numElements

或您希望预先弹出的任何默认值,例如

bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

[编辑:买者自负[Beer()] * 99语法创建一个 Beer,然后填充与99次的引用数组相同的单个实例]

Python的默认方法可能非常有效,尽管随着您增加元素数量,效率会下降。

比较

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    result = []
    i = 0
    while i < Elements:
        result.append(i)
        i += 1

def doAllocate():
    result = [None] * Elements
    i = 0
    while i < Elements:
        result[i] = i
        i += 1

def doGenerator():
    return list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        x = 0
        while x < Iterations:
            fn()
            x += 1


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

#include <vector>
typedef std::vector<unsigned int> Vec;

static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;

void doAppend()
{
    Vec v;
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doReserve()
{
    Vec v;
    v.reserve(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doAllocate()
{
    Vec v;
    v.resize(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v[i] = i;
    }
}

#include <iostream>
#include <chrono>
using namespace std;

void test(const char* name, void(*fn)(void))
{
    cout << name << ": ";

    auto start = chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < Iterations; ++i) {
        fn();
    }
    auto end = chrono::high_resolution_clock::now();

    auto elapsed = end - start;
    cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}

int main()
{
    cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';

    test("doAppend", doAppend);
    test("doReserve", doReserve);
    test("doAllocate", doAllocate);
}

在我的Windows 7 i7上,64位Python提供

Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms

尽管C ++提供(使用MSVC构建,64位,启用了优化)

Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms

C ++调试版本生成:

Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms

这里的要点是,使用Python可以将性能提高7-8%,并且如果您认为自己正在编写高性能的应用程序(或者正在编写用于Web服务的内容或其他内容),那么这是不容小at的,但是您可能需要重新考虑您选择的语言。

另外,这里的Python代码并不是真正的Python代码。在此切换到真正的Pythonesque代码可提供更好的性能:

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    for x in range(Iterations):
        result = []
        for i in range(Elements):
            result.append(i)

def doAllocate():
    for x in range(Iterations):
        result = [None] * Elements
        for i in range(Elements):
            result[i] = i

def doGenerator():
    for x in range(Iterations):
        result = list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        fn()


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

这使

Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms

(在32位中,doGenerator比doAllocate做得更好)。

在这里,doAppend和doAllocate之间的差距明显更大。

显然,这里的差异仅适用于您执行多次以上的操作,或者在负载较重的系统上执行的操作,这些数据将按数量级进行扩展,或者正在处理更大的清单。

这里的重点是:用pythonic方式获得最佳性能。

但是,如果您担心常规的高级性能,则Python是错误的语言。最根本的问题是,由于装饰器等(https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation#Data_Aggregation)等Python功能,Python函数调用传统上比其他语言慢300倍。

The Pythonic way for this is:

x = [None] * numElements

or whatever default value you wish to prepop with, e.g.

bottles = [Beer()] * 99
sea = [Fish()] * many
vegetarianPizzas = [None] * peopleOrderingPizzaNotQuiche

[EDIT: Caveat Emptor The [Beer()] * 99 syntax creates one Beer and then populates an array with 99 references to the same single instance]

Python’s default approach can be pretty efficient, although that efficiency decays as you increase the number of elements.

Compare

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    result = []
    i = 0
    while i < Elements:
        result.append(i)
        i += 1

def doAllocate():
    result = [None] * Elements
    i = 0
    while i < Elements:
        result[i] = i
        i += 1

def doGenerator():
    return list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        x = 0
        while x < Iterations:
            fn()
            x += 1


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

with

#include <vector>
typedef std::vector<unsigned int> Vec;

static const unsigned int Elements = 100000;
static const unsigned int Iterations = 144;

void doAppend()
{
    Vec v;
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doReserve()
{
    Vec v;
    v.reserve(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v.push_back(i);
    }
}

void doAllocate()
{
    Vec v;
    v.resize(Elements);
    for (unsigned int i = 0; i < Elements; ++i) {
        v[i] = i;
    }
}

#include <iostream>
#include <chrono>
using namespace std;

void test(const char* name, void(*fn)(void))
{
    cout << name << ": ";

    auto start = chrono::high_resolution_clock::now();
    for (unsigned int i = 0; i < Iterations; ++i) {
        fn();
    }
    auto end = chrono::high_resolution_clock::now();

    auto elapsed = end - start;
    cout << chrono::duration<double, milli>(elapsed).count() << "ms\n";
}

int main()
{
    cout << "Elements: " << Elements << ", Iterations: " << Iterations << '\n';

    test("doAppend", doAppend);
    test("doReserve", doReserve);
    test("doAllocate", doAllocate);
}

On my Windows 7 i7, 64-bit Python gives

Elements: 100000, Iterations: 144
doAppend: 3587.204933ms
doAllocate: 2701.154947ms
doGenerator: 1721.098185ms

While C++ gives (built with MSVC, 64-bit, Optimizations enabled)

Elements: 100000, Iterations: 144
doAppend: 74.0042ms
doReserve: 27.0015ms
doAllocate: 5.0003ms

C++ debug build produces:

Elements: 100000, Iterations: 144
doAppend: 2166.12ms
doReserve: 2082.12ms
doAllocate: 273.016ms

The point here is that with Python you can achieve a 7-8% performance improvement, and if you think you’re writing a high-performance app (or if you’re writing something that is used in a web service or something) then that isn’t to be sniffed at, but you may need to rethink your choice of language.

Also, the Python code here isn’t really Python code. Switching to truly Pythonesque code here gives better performance:

import time

class Timer(object):
    def __enter__(self):
        self.start = time.time()
        return self

    def __exit__(self, *args):
        end = time.time()
        secs = end - self.start
        msecs = secs * 1000  # millisecs
        print('%fms' % msecs)

Elements   = 100000
Iterations = 144

print('Elements: %d, Iterations: %d' % (Elements, Iterations))


def doAppend():
    for x in range(Iterations):
        result = []
        for i in range(Elements):
            result.append(i)

def doAllocate():
    for x in range(Iterations):
        result = [None] * Elements
        for i in range(Elements):
            result[i] = i

def doGenerator():
    for x in range(Iterations):
        result = list(i for i in range(Elements))


def test(name, fn):
    print("%s: " % name, end="")
    with Timer() as t:
        fn()


test('doAppend', doAppend)
test('doAllocate', doAllocate)
test('doGenerator', doGenerator)

Which gives

Elements: 100000, Iterations: 144
doAppend: 2153.122902ms
doAllocate: 1346.076965ms
doGenerator: 1614.092112ms

(in 32-bit doGenerator does better than doAllocate).

Here the gap between doAppend and doAllocate is significantly larger.

Obviously, the differences here really only apply if you are doing this more than a handful of times or if you are doing this on a heavily loaded system where those numbers are going to get scaled out by orders of magnitude, or if you are dealing with considerably larger lists.

The point here: Do it the pythonic way for the best performance.

But if you are worrying about general, high-level performance, Python is the wrong language. The most fundamental problem being that Python function calls has traditionally been upto 300x slower than other languages due to Python features like decorators etc (https://wiki.python.org/moin/PythonSpeed/PerformanceTips#Data_Aggregation#Data_Aggregation).


回答 4

正如其他人提到的那样,最简单的方法是预先植入带有NoneType对象的列表。

话虽这么说,您应该先了解Python列表的实际工作方式,然后再决定是否需要这样做。在列表的CPython实现中,始终在开销空间的情况下创建底层数组,并且数组的大小逐渐变大( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc),因此几乎不经常调整列表的大小。

由于这种行为,大多数 list.append()功能O(1)对于追加而言都是复杂的,仅当跨越这些边界之一时才具有增加的复杂度,此时复杂度将为O(n)。这种行为导致S. Lott的答案中执行时间的最小增加。

资料来源:http : //www.laurentluce.com/posts/python-list-implementation/

As others have mentioned, the simplest way to pre-seed a list with NoneType objects.

That being said, you should understand the way Python lists actually work before deciding this is necessary. In the CPython implementation of a list, the underlying array is always created with overhead room, in progressively larger sizes ( 4, 8, 16, 25, 35, 46, 58, 72, 88, 106, 126, 148, 173, 201, 233, 269, 309, 354, 405, 462, 526, 598, 679, 771, 874, 990, 1120, etc), so that resizing the list does not happen nearly so often.

Because of this behavior, most list.append() functions are O(1) complexity for appends, only having increased complexity when crossing one of these boundaries, at which point the complexity will be O(n). This behavior is what leads to the minimal increase in execution time in S. Lott’s answer.

Source: http://www.laurentluce.com/posts/python-list-implementation/


回答 5

我运行@ s.lott的代码,并通过预分配产生了10%的相同性能提升。使用生成器尝试了@jeremy的想法,并且能够比doAllocate更好地看到gen的性能。对于我的项目,10%的改善很重要,因此感谢大家,因为这对我们有很大帮助。

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

def doGen( size=10000 ):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size=1000
@print_timing
def testAppend():
    for i in xrange(size):
        doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
        doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
        doGen()


testAppend()
testAlloc()
testGen()

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

i ran @s.lott’s code and produced the same 10% perf increase by pre-allocating. tried @jeremy’s idea using a generator and was able to see the perf of the gen better than that of the doAllocate. For my proj the 10% improvement matters, so thanks to everyone as this helps a bunch.

def doAppend( size=10000 ):
    result = []
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result.append(message)
    return result

def doAllocate( size=10000 ):
    result=size*[None]
    for i in range(size):
        message= "some unique object %d" % ( i, )
        result[i]= message
    return result

def doGen( size=10000 ):
    return list("some unique object %d" % ( i, ) for i in xrange(size))

size=1000
@print_timing
def testAppend():
    for i in xrange(size):
        doAppend()

@print_timing
def testAlloc():
    for i in xrange(size):
        doAllocate()

@print_timing
def testGen():
    for i in xrange(size):
        doGen()


testAppend()
testAlloc()
testGen()

testAppend took 14440.000ms
testAlloc took 13580.000ms
testGen took 13430.000ms

回答 6

如果您使用的numpy具有更多类似C的数组,则会引起对Python中预分配的担忧。在这种情况下,预分配问题与数据的形状和默认值有关。

如果要在大量列表上进行数值计算并需要性能,请考虑使用numpy。

Concerns about pre-allocation in Python arise if you’re working with numpy, which has more C-like arrays. In this instance, pre-allocation concerns are about the shape of the data and the default value.

Consider numpy if you’re doing numerical computation on massive lists and want performance.


回答 7

对于某些应用程序,词典可能是您所需要的。例如,在find_totient方法中,我发现使用字典更方便,因为我没有零索引。

def totient(n):
    totient = 0

    if n == 1:
        totient = 1
    else:
        for i in range(1, n):
            if math.gcd(i, n) == 1:
                totient += 1
    return totient

def find_totients(max):
    totients = dict()
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

这个问题也可以通过预先分配的列表来解决:

def find_totients(max):
    totients = None*(max+1)
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

我觉得这不是那么优雅,而且容易出错,因为我存储的None会在我不小心使用错误的情况下引发异常,并且因为我需要考虑地图可以避免的边缘情况。

确实,字典的效率不高,但是正如其他人所评论的那样,速度的微小差异并不总会带来重大的维护风险。

For some applications, a dictionary may be what you are looking for. For example, in the find_totient method, I found it more convenient to use a dictionary since I didn’t have a zero index.

def totient(n):
    totient = 0

    if n == 1:
        totient = 1
    else:
        for i in range(1, n):
            if math.gcd(i, n) == 1:
                totient += 1
    return totient

def find_totients(max):
    totients = dict()
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

This problem could also be solved with a preallocated list:

def find_totients(max):
    totients = None*(max+1)
    for i in range(1,max+1):
        totients[i] = totient(i)

    print('Totients:')
    for i in range(1,max+1):
        print(i,totients[i])

I feel that this is not as elegant and prone to bugs because I’m storing None which could throw an exception if I accidentally use them wrong, and because I need to think about edge cases that the map lets me avoid.

It’s true the dictionary won’t be as efficient, but as others have commented, small differences in speed are not always worth significant maintenance hazards.


回答 8

据我了解,python列表已经与ArrayLists非常相似。但是,如果您想调整这些参数,我会在网上发现这篇帖子可能很有趣(基本上,只需创建自己的ScalableList扩展名即可):

http://mail.python.org/pipermail/python-list/2000-May/035082.html

From what I understand, python lists are already quite similar to ArrayLists. But if you want to tweak those parameters I found this post on the net that may be interesting (basically, just create your own ScalableList extension):

http://mail.python.org/pipermail/python-list/2000-May/035082.html