标签归档:Python

在Python中,“。append()”和“ + = []”之间有什么区别?

问题:在Python中,“。append()”和“ + = []”之间有什么区别?

之间有什么区别?

some_list1 = []
some_list1.append("something")

some_list2 = []
some_list2 += ["something"]

What is the difference between:

some_list1 = []
some_list1.append("something")

and

some_list2 = []
some_list2 += ["something"]

回答 0

对于您而言,唯一的区别是性能:append是两倍的速度。

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141

通常情况下,append会将一个项目添加到列表中,而+=将右侧列表的所有元素复制到左侧列表中。

更新:性能分析

比较字节码,我们可以假设appendversion在LOAD_ATTR+ CALL_FUNCTION和+ = version-中浪费了周期BUILD_LIST。显然BUILD_LIST大于LOAD_ATTR+ CALL_FUNCTION

>>> import dis
>>> dis.dis(compile("s = []; s.append('spam')", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('spam')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile("s = []; s += ['spam']", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('spam')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

我们可以通过减少LOAD_ATTR开销来进一步提高性能:

>>> timeit.Timer('a("something")', 's = []; a = s.append').timeit()
0.15924410999923566

For your case the only difference is performance: append is twice as fast.

Python 3.0 (r30:67507, Dec  3 2008, 20:14:27) [MSC v.1500 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.20177424499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.41192320500000079

Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import timeit
>>> timeit.Timer('s.append("something")', 's = []').timeit()
0.23079359499999999
>>> timeit.Timer('s += ["something"]', 's = []').timeit()
0.44208112500000141

In general case append will add one item to the list, while += will copy all elements of right-hand-side list into the left-hand-side list.

Update: perf analysis

Comparing bytecodes we can assume that append version wastes cycles in LOAD_ATTR + CALL_FUNCTION, and += version — in BUILD_LIST. Apparently BUILD_LIST outweighs LOAD_ATTR + CALL_FUNCTION.

>>> import dis
>>> dis.dis(compile("s = []; s.append('spam')", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_ATTR                1 (append)
             12 LOAD_CONST               0 ('spam')
             15 CALL_FUNCTION            1
             18 POP_TOP
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE
>>> dis.dis(compile("s = []; s += ['spam']", '', 'exec'))
  1           0 BUILD_LIST               0
              3 STORE_NAME               0 (s)
              6 LOAD_NAME                0 (s)
              9 LOAD_CONST               0 ('spam')
             12 BUILD_LIST               1
             15 INPLACE_ADD
             16 STORE_NAME               0 (s)
             19 LOAD_CONST               1 (None)
             22 RETURN_VALUE

We can improve performance even more by removing LOAD_ATTR overhead:

>>> timeit.Timer('a("something")', 's = []; a = s.append').timeit()
0.15924410999923566

回答 1

在您给出的示例中,append和之间在输出方面没有区别+=。但是append和之间+(这是最初询问的问题)之间存在区别。

>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312

>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720

如您所见,append+=具有相同的结果;他们将项目添加到列表中,而不生成新列表。使用+添加两个列表并生成一个新列表。

In the example you gave, there is no difference, in terms of output, between append and +=. But there is a difference between append and + (which the question originally asked about).

>>> a = []
>>> id(a)
11814312
>>> a.append("hello")
>>> id(a)
11814312

>>> b = []
>>> id(b)
11828720
>>> c = b + ["hello"]
>>> id(c)
11833752
>>> b += ["hello"]
>>> id(b)
11828720

As you can see, append and += have the same result; they add the item to the list, without producing a new list. Using + adds the two lists and produces a new list.


回答 2

>>> a=[]
>>> a.append([1,2])
>>> a
[[1, 2]]
>>> a=[]
>>> a+=[1,2]
>>> a
[1, 2]

看到append将单个元素添加到列表,可以是任何元素。+=[]加入列表。

>>> a=[]
>>> a.append([1,2])
>>> a
[[1, 2]]
>>> a=[]
>>> a+=[1,2]
>>> a
[1, 2]

See that append adds a single element to the list, which may be anything. +=[] joins the lists.


回答 3

+ =是一个分配。使用它时,您实际上是在说“ some_list2 = some_list2 + [‘something’]”。分配涉及重新绑定,因此:

l= []

def a1(x):
    l.append(x) # works

def a2(x):
    l= l+[x] # assign to l, makes l local
             # so attempt to read l for addition gives UnboundLocalError

def a3(x):
    l+= [x]  # fails for the same reason

+ =运算符通常还应该像list + list通常那样创建一个新的列表对象:

>>> l1= []
>>> l2= l1

>>> l1.append('x')
>>> l1 is l2
True

>>> l1= l1+['x']
>>> l1 is l2
False

但是实际上:

>>> l2= l1
>>> l1+= ['x']
>>> l1 is l2
True

这是因为Python列表实现了__iadd __()来使+ =扩展分配短路,而调用list.extend()。(这有点奇怪:它通常按照您的意思进行,但出于令人困惑的原因。)

通常,如果要添加/扩展现有列表,并且希望保留对同一列表的引用(而不是创建新列表),则最好明确并坚持使用append()/ extend()方法。

+= is an assignment. When you use it you’re really saying ‘some_list2= some_list2+[‘something’]’. Assignments involve rebinding, so:

l= []

def a1(x):
    l.append(x) # works

def a2(x):
    l= l+[x] # assign to l, makes l local
             # so attempt to read l for addition gives UnboundLocalError

def a3(x):
    l+= [x]  # fails for the same reason

The += operator should also normally create a new list object like list+list normally does:

>>> l1= []
>>> l2= l1

>>> l1.append('x')
>>> l1 is l2
True

>>> l1= l1+['x']
>>> l1 is l2
False

However in reality:

>>> l2= l1
>>> l1+= ['x']
>>> l1 is l2
True

This is because Python lists implement __iadd__() to make a += augmented assignment short-circuit and call list.extend() instead. (It’s a bit of a strange wart this: it usually does what you meant, but for confusing reasons.)

In general, if you’re appending/extended an existing list, and you want to keep the reference to the same list (instead of making a new one), it’s best to be explicit and stick with the append()/extend() methods.


回答 4

 some_list2 += ["something"]

实际上是

 some_list2.extend(["something"])

对于一个值,没有区别。文档指出:

s.append(x) 与… s[len(s):len(s)] = [x]
s.extend(x) 相同s[len(s):len(s)] = x

因此显然s.append(x)s.extend([x])

 some_list2 += ["something"]

is actually

 some_list2.extend(["something"])

for one value, there is no difference. Documentation states, that:

s.append(x) same as s[len(s):len(s)] = [x]
s.extend(x) same as s[len(s):len(s)] = x

Thus obviously s.append(x) is same as s.extend([x])


回答 5

区别在于,连接将使结果列表变平,而附加将使级别保持完整:

因此,例如:

myList = [ ]
listA = [1,2,3]
listB = ["a","b","c"]

使用append,最终得到一个列表列表:

>> myList.append(listA)
>> myList.append(listB)
>> myList
[[1,2,3],['a',b','c']]

而是使用串联,最终得到一个平面列表:

>> myList += listA + listB
>> myList
[1,2,3,"a","b","c"]

The difference is that concatenate will flatten the resulting list, whereas append will keep the levels intact:

So for example with:

myList = [ ]
listA = [1,2,3]
listB = ["a","b","c"]

Using append, you end up with a list of lists:

>> myList.append(listA)
>> myList.append(listB)
>> myList
[[1,2,3],['a',b','c']]

Using concatenate instead, you end up with a flat list:

>> myList += listA + listB
>> myList
[1,2,3,"a","b","c"]

回答 6

这里的性能测试不正确:

  1. 您不应只运行一次配置文件。
  2. 如果比较append与+ = []的次数,则应将append声明为局部函数。
  3. 时间结果在不同的python版本上是不同的:64位和32位

例如

timeit.Timer(’for xrange(100)中的i:app(i)’,’s = []; app = s.append’)。timeit()

好的测试可以在这里找到:http : //markandclick.com/1/post/2012/01/python-list-append-vs.html

The performance tests here are not correct:

  1. You shouldn’t run the profile only once.
  2. If comparing append vs. += [] number of times you should declare append as a local function.
  3. time results are different on different python versions: 64 and 32 bit

e.g.

timeit.Timer(‘for i in xrange(100): app(i)’, ‘s = [] ; app = s.append’).timeit()

good tests can be found here: http://markandclick.com/1/post/2012/01/python-list-append-vs.html


回答 7

除了其他答案中描述的方面之外,当您尝试构建列表列表时,append和+ []的行为也非常不同。

>>> list1=[[1,2],[3,4]]
>>> list2=[5,6]
>>> list3=list1+list2
>>> list3
[[1, 2], [3, 4], 5, 6]
>>> list1.append(list2)
>>> list1
[[1, 2], [3, 4], [5, 6]]

list1 + [‘5’,’6’]将“ 5”和“ 6”添加到list1作为单独的元素。list1.append([‘5’,’6’])将列表[‘5’,’6’]作为单个元素添加到list1。

In addition to the aspects described in the other answers, append and +[] have very different behaviors when you’re trying to build a list of lists.

>>> list1=[[1,2],[3,4]]
>>> list2=[5,6]
>>> list3=list1+list2
>>> list3
[[1, 2], [3, 4], 5, 6]
>>> list1.append(list2)
>>> list1
[[1, 2], [3, 4], [5, 6]]

list1+[‘5′,’6’] adds ‘5’ and ‘6’ to the list1 as individual elements. list1.append([‘5′,’6’]) adds the list [‘5′,’6’] to the list1 as a single element.


回答 8

其他答案中提到的重新绑定行为在某些情况下确实很重要:

>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

这是因为即使对象已就地突变,增强分配也会始终重新绑定。这里的绑定恰好是a[1] = *mutated list*,不适用于元组。

The rebinding behaviour mentioned in other answers does matter in certain circumstances:

>>> a = ([],[])
>>> a[0].append(1)
>>> a
([1], [])
>>> a[1] += [1]
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
TypeError: 'tuple' object does not support item assignment

That’s because augmented assignment always rebinds, even if the object was mutated in-place. The rebinding here happens to be a[1] = *mutated list*, which doesn’t work for tuples.


回答 9

让我们先举一个例子

list1=[1,2,3,4]
list2=list1     (that means they points to same object)

if we do 
list1=list1+[5]    it will create a new object of list
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4]

but if we append  then 
list1.append(5)     no new object of list created
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4,5]

extend(list) also do the same work as append it just append a list instead of a 
single variable 

let’s take an example first

list1=[1,2,3,4]
list2=list1     (that means they points to same object)

if we do 
list1=list1+[5]    it will create a new object of list
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4]

but if we append  then 
list1.append(5)     no new object of list created
print(list1)       output [1,2,3,4,5] 
print(list2)       output [1,2,3,4,5]

extend(list) also do the same work as append it just append a list instead of a 
single variable 

回答 10

append()方法将单个项目添加到现有列表中

some_list1 = []
some_list1.append("something")

因此,这里some_list1将被修改。

更新:

而使用+组合现有列表中列表的元素(多个元素),类似于扩展(由Flux纠正)。

some_list2 = []
some_list2 += ["something"]

因此,这里some_list2和[“ something”]是结合在一起的两个列表。

The append() method adds a single item to the existing list

some_list1 = []
some_list1.append("something")

So here the some_list1 will get modified.

Updated:

Whereas using + to combine the elements of lists (more than one element) in the existing list similar to the extend (as corrected by Flux).

some_list2 = []
some_list2 += ["something"]

So here the some_list2 and [“something”] are the two lists that are combined.


回答 11

“ +” 不会改变列表

.append()改变旧列表

“+” does not mutate the list

.append() mutates the old list


Python鼻子导入错误

问题:Python鼻子导入错误

我似乎无法获得鼻子测试框架来识别文件结构中测试脚本下的模块。我设置了最简单的示例来演示该问题。我会在下面解释。

这是包文件的结构:

./__init__.py
./foo.py
./tests
   ./__init__.py
   ./test_foo.py

foo.py包含:

def dumb_true():
    return True

tests / test_foo.py包含:

import foo

def test_foo():
    assert foo.dumb_true()

两个init .py文件均为空

如果我nosetests -vv在主目录(foo.py所在的目录)中运行,则会得到:

Failure: ImportError (No module named foo) ... ERROR

======================================================================
ERROR: Failure: ImportError (No module named foo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python/site-packages/nose-0.11.1-py2.6.egg/nose/loader.py", line 379, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/lib/python/site-packages/nose-0.11.1-py2.6.egg/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python/site-packages/nose-0.11.1-py2.6.egg/nose/importer.py", line 86, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/user/nose_testing/tests/test_foo.py", line 1, in <module>
    import foo
ImportError: No module named foo

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (errors=1)

当我从tests /目录中运行时,出现相同的错误。根据文档和我发现的示例,nose应该将所有父包都添加到路径以及调用它的目录中,但是在我看来,这似乎没有发生。

我正在使用Python 2.6.2运行Ubuntu 8.04。如果重要的话,我已经手动构建并安装了鼻子(不使用setup_tools)。

I can’t seem to get the nose testing framework to recognize modules beneath my test script in the file structure. I’ve set up the simplest example that demonstrates the problem. I’ll explain it below.

Here’s the the package file structure:

./__init__.py
./foo.py
./tests
   ./__init__.py
   ./test_foo.py

foo.py contains:

def dumb_true():
    return True

tests/test_foo.py contains:

import foo

def test_foo():
    assert foo.dumb_true()

Both init.py files are empty

If I run nosetests -vv in the main directory (where foo.py is), I get:

Failure: ImportError (No module named foo) ... ERROR

======================================================================
ERROR: Failure: ImportError (No module named foo)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/usr/lib/python/site-packages/nose-0.11.1-py2.6.egg/nose/loader.py", line 379, in loadTestsFromName
    addr.filename, addr.module)
  File "/usr/lib/python/site-packages/nose-0.11.1-py2.6.egg/nose/importer.py", line 39, in importFromPath
    return self.importFromDir(dir_path, fqname)
  File "/usr/lib/python/site-packages/nose-0.11.1-py2.6.egg/nose/importer.py", line 86, in importFromDir
    mod = load_module(part_fqname, fh, filename, desc)
  File "/home/user/nose_testing/tests/test_foo.py", line 1, in <module>
    import foo
ImportError: No module named foo

----------------------------------------------------------------------
Ran 1 test in 0.002s

FAILED (errors=1)

I get the same error when I run from inside the tests/ directory. According to the documentation and an example I found, nose is supposed to add all parent packages to the path as well as the directory from which it is called, but this doesn’t seem to be happening in my case.

I’m running Ubuntu 8.04 with Python 2.6.2. I’ve built and installed nose manually (not with setup_tools) if that matters.


回答 0

__init__.py的顶级目录中有一个。这使其成为一个包装。如果将其删除,则nosetests应该可以工作。

如果不删除它,则必须将其更改importimport dir.foo,这dir是目录名。

You’ve got an __init__.py in your top level directory. That makes it a package. If you remove it, your nosetests should work.

If you don’t remove it, you’ll have to change your import to import dir.foo, where dir is the name of your directory.


回答 1

您在虚拟环境中吗?就我而言,nosetests是中的/usr/bin/nosetests,正在使用/usr/bin/python。virtualenv中的软件包肯定不会在系统路径中。以下解决了此问题:

source myvirtualenv/activate
pip install nose
which nosetests
/home/me/myvirtualenv/bin/nosetests

Are you in a virtualenv? In my case, nosetests was the one in /usr/bin/nosetests, which was using /usr/bin/python. The packages in the virtualenv definitely won’t be in the system path. The following fixed this:

source myvirtualenv/activate
pip install nose
which nosetests
/home/me/myvirtualenv/bin/nosetests

回答 2

对于那些以后会发现此问题的人:如果__init__.py我的tests目录中没有文件,则会出现导入错误。

我的目录结构是这样的:

./tests/
  ./test_some_random_stuff.py

如果我进行了鼻子测试:

nosetests -w tests

它会给ImportError其他人看到的。如果我添加一个空白__init__.py文件,则可以正常工作:

./tests/
  ./__init__.py
  ./test_some_random_stuff.py

To those of you finding this question later on: I get the import error if I don’t have an __init__.py file in my tests directory.

My directory structure was like this:

./tests/
  ./test_some_random_stuff.py

If I ran nosetests:

nosetests -w tests

It would give the ImportError that everyone else is seeing. If I add a blank __init__.py file it works just fine:

./tests/
  ./__init__.py
  ./test_some_random_stuff.py

回答 3

另一个潜在的问题似乎是目录树中的连字符/破折号。我最近通过重命名目录固定鼻子导入错误的问题sub-dirsub_dir

Another potential problem appears to be hyphens/dashes in the directory tree. I recently fixed a nose ImportError issue by renaming a directory from sub-dir to sub_dir.


回答 4

当然,如果在导入的模块中存在语法错误,则会导致此错误。对我来说,当我备份一个tests文件时,这个问题浮出水面,该文件的路径与modules / tests.bak.py类似,位于与tests.py相同的目录中。另外,要处理Django应用程序中的init包/模块问题,可以运行以下命令(在bash / OSX shell中)以确保周围没有任何init .pyc文件:

find . -name '*.pyc' -delete

Of course if you have a syntax error in the module being imported that will cause this. For me the problem reared its head when I had a backup of a tests file with a path like module/tests.bak.py in the same directory as tests.py. Also, to deal with the init package/module problem in a Django app, you can run the following (in a bash/OSX shell) to make sure you don’t have any init.pyc files lying around:

find . -name '*.pyc' -delete

回答 5

我收到此错误消息是因为我nosetests从错误的目录运行命令。

傻了,但是发生了。

I got this error message because I run the nosetests command from the wrong directory.

Silly, but happens.


回答 6

我只是碰到另一件事可能导致这个问题:以形式命名测试testname.test.py。多余的.东西会使鼻子感到困惑,并导致它导入它不应该导入的东西。我想很明显,使用非常规的测试命名约定会破坏事情,但是我认为可能值得注意。

I just ran into one more thing that might cause this issue: naming of tests in the form testname.test.py. That extra . confounds nose and leads to it importing things it should not. I suppose it may be obvious that using unconventional test naming conventions will break things, but I thought it might be worth noting.


回答 7

例如,下面的目录结构,如果你想运行nosetestsm1m2m3以测试某些功能n.py,你应该使用from m2.m3 import ntest.py

m1
└── m2
    ├── __init__.py
    └── m3
        ├── __init__.py
        ├── n.py
        └── test
            └── test.py

For example, with the following directory structure, if you want to run nosetests in m1, m2 or m3 to test some functions in n.py, you should use from m2.m3 import n in test.py.

m1
└── m2
    ├── __init__.py
    └── m3
        ├── __init__.py
        ├── n.py
        └── test
            └── test.py

回答 8

只需完成一个问题:如果您正在为这样的结构而苦苦挣扎:

project
├── m1
    ├── __init__.py
    ├── foo1.py
    └──m2
       ├── __init__.py
       └── foo2.py

└── test
     ├── __init__.py
     └── test.py

也许您想从项目外部的路径运行测试,将项目路径包含在PYTHONPATH中。

export PYTHONPATH=$PYTHONPATH:$HOME/path/to/project

将其粘贴到您的.profile中。如果您处于虚拟环境中,请将其粘贴到venv根目录中的Activate中

Just to complete the question: If you’re struggling with structure like this:

project
├── m1
├    ├── __init__.py
├    ├── foo1.py
├    └──m2
├       ├── __init__.py
├       └── foo2.py
├
└── test
     ├── __init__.py
     └── test.py

And maybe you want to run test from a path outside the project, include your project path inside your PYTHONPATH.

export PYTHONPATH=$PYTHONPATH:$HOME/path/to/project

paste it inside your .profile. If you’re under a virtual environment, paste it inside the activate in your venv root


如何在Python中将日志记录配置为syslog?

问题:如何在Python中将日志记录配置为syslog?

我无法理解Python的logging模块。我的需求非常简单:我只想将所有内容记录到syslog中。阅读文档后,我想到了这个简单的测试脚本:

import logging
import logging.handlers

my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)

handler = logging.handlers.SysLogHandler()

my_logger.addHandler(handler)

my_logger.debug('this is debug')
my_logger.critical('this is critical')

但是此脚本不会在syslog中产生任何日志记录。怎么了?

I can’t get my head around Python’s logging module. My needs are very simple: I just want to log everything to syslog. After reading documentation I came up with this simple test script:

import logging
import logging.handlers

my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)

handler = logging.handlers.SysLogHandler()

my_logger.addHandler(handler)

my_logger.debug('this is debug')
my_logger.critical('this is critical')

But this script does not produce any log records in syslog. What’s wrong?


回答 0

将行更改为此:

handler = SysLogHandler(address='/dev/log')

这对我有用

import logging
import logging.handlers

my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)

handler = logging.handlers.SysLogHandler(address = '/dev/log')

my_logger.addHandler(handler)

my_logger.debug('this is debug')
my_logger.critical('this is critical')

Change the line to this:

handler = SysLogHandler(address='/dev/log')

This works for me

import logging
import logging.handlers

my_logger = logging.getLogger('MyLogger')
my_logger.setLevel(logging.DEBUG)

handler = logging.handlers.SysLogHandler(address = '/dev/log')

my_logger.addHandler(handler)

my_logger.debug('this is debug')
my_logger.critical('this is critical')

回答 1

无论是通过TCP堆栈登录到/ dev / log还是localhost,都应始终使用本地主机进行日志记录。这允许完全符合RFC且功能强大的系统日志记录守护程序来处理syslog。这消除了远程守护程序起作用的需要,并提供了syslog守护程序的增强功能,例如rsyslog和syslog-ng。SMTP也遵循相同的原则。只需将其交给本地SMTP软件即可。在这种情况下,请使用“程序模式”而不是守护程序,但这是相同的想法。让功能更强大的软件来处理它。使用TCP代替UDP进行syslog重试,排队,本地后台处理成为可能。您也可以按原样与代码分开[重新]配置这些守护程序。

为您的应用程序保存代码,让其他软件协同工作。

You should always use the local host for logging, whether to /dev/log or localhost through the TCP stack. This allows the fully RFC compliant and featureful system logging daemon to handle syslog. This eliminates the need for the remote daemon to be functional and provides the enhanced capabilities of syslog daemon’s such as rsyslog and syslog-ng for instance. The same philosophy goes for SMTP. Just hand it to the local SMTP software. In this case use ‘program mode’ not the daemon, but it’s the same idea. Let the more capable software handle it. Retrying, queuing, local spooling, using TCP instead of UDP for syslog and so forth become possible. You can also [re-]configure those daemons separately from your code as it should be.

Save your coding for your application, let other software do it’s job in concert.


回答 2

我发现syslog模块可以很容易地获得您描述的基本日志记录行为:

import syslog
syslog.syslog("This is a test message")
syslog.syslog(syslog.LOG_INFO, "Test message at INFO priority")

您还可以做其他事情,但是就我所知,即使只是前两行也可以满足您的要求。

I found the syslog module to make it quite easy to get the basic logging behavior you describe:

import syslog
syslog.syslog("This is a test message")
syslog.syslog(syslog.LOG_INFO, "Test message at INFO priority")

There are other things you could do, too, but even just the first two lines of that will get you what you’ve asked for as I understand it.


回答 3

从这里和其他地方将东西拼凑在一起,这就是我想出的在unbuntu 12.04和centOS6上运行的结果

创建一个/etc/rsyslog.d/以.conf结尾的文件,并添加以下文本

local6.*        /var/log/my-logfile

重新启动rsyslog,重新加载似乎不适用于新的日志文件。也许它只重新加载现有的conf文件?

sudo restart rsyslog

然后,您可以使用此测试程序来确保它确实有效。

import logging, sys
from logging import config

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(module)s P%(process)d T%(thread)d %(message)s'
            },
        },
    'handlers': {
        'stdout': {
            'class': 'logging.StreamHandler',
            'stream': sys.stdout,
            'formatter': 'verbose',
            },
        'sys-logger6': {
            'class': 'logging.handlers.SysLogHandler',
            'address': '/dev/log',
            'facility': "local6",
            'formatter': 'verbose',
            },
        },
    'loggers': {
        'my-logger': {
            'handlers': ['sys-logger6','stdout'],
            'level': logging.DEBUG,
            'propagate': True,
            },
        }
    }

config.dictConfig(LOGGING)


logger = logging.getLogger("my-logger")

logger.debug("Debug")
logger.info("Info")
logger.warn("Warn")
logger.error("Error")
logger.critical("Critical")

Piecing things together from here and other places, this is what I came up with that works on unbuntu 12.04 and centOS6

Create an file in /etc/rsyslog.d/ that ends in .conf and add the following text

local6.*        /var/log/my-logfile

Restart rsyslog, reloading did NOT seem to work for the new log files. Maybe it only reloads existing conf files?

sudo restart rsyslog

Then you can use this test program to make sure it actually works.

import logging, sys
from logging import config

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(module)s P%(process)d T%(thread)d %(message)s'
            },
        },
    'handlers': {
        'stdout': {
            'class': 'logging.StreamHandler',
            'stream': sys.stdout,
            'formatter': 'verbose',
            },
        'sys-logger6': {
            'class': 'logging.handlers.SysLogHandler',
            'address': '/dev/log',
            'facility': "local6",
            'formatter': 'verbose',
            },
        },
    'loggers': {
        'my-logger': {
            'handlers': ['sys-logger6','stdout'],
            'level': logging.DEBUG,
            'propagate': True,
            },
        }
    }

config.dictConfig(LOGGING)


logger = logging.getLogger("my-logger")

logger.debug("Debug")
logger.info("Info")
logger.warn("Warn")
logger.error("Error")
logger.critical("Critical")

回答 4

我添加了一些额外的注释,以防万一,它对任何人都有帮助,因为我发现此交换很有用,但需要一些额外的信息才能使其正常工作。

要使用SysLogHandler登录到特定工具,您需要指定工具值。例如,您已定义:

local3.* /var/log/mylog

在syslog中,那么您将要使用:

handler = logging.handlers.SysLogHandler(address = ('localhost',514), facility=19)

并且还需要让syslog侦听UDP以使用localhost而不是/ dev / log。

I add a little extra comment just in case it helps anyone because I found this exchange useful but needed this little extra bit of info to get it all working.

To log to a specific facility using SysLogHandler you need to specify the facility value. Say for example that you have defined:

local3.* /var/log/mylog

in syslog, then you’ll want to use:

handler = logging.handlers.SysLogHandler(address = ('localhost',514), facility=19)

and you also need to have syslog listening on UDP to use localhost instead of /dev/log.


回答 5

是否已将syslog.conf设置为处理工具=用户?

您可以使用工具参数设置python记录器使用的工具,如下所示:

handler = logging.handlers.SysLogHandler(facility=SysLogHandler.LOG_DAEMON)

Is your syslog.conf set up to handle facility=user?

You can set the facility used by the python logger with the facility argument, something like this:

handler = logging.handlers.SysLogHandler(facility=SysLogHandler.LOG_DAEMON)

回答 6

import syslog
syslog.openlog(ident="LOG_IDENTIFIER",logoption=syslog.LOG_PID, facility=syslog.LOG_LOCAL0)
syslog.syslog('Log processing initiated...')

上面的脚本将使用我们的自定义“ LOG_IDENTIFIER”登录到LOCAL0工具…您可以将LOCAL [0-7]用于本地目的。

import syslog
syslog.openlog(ident="LOG_IDENTIFIER",logoption=syslog.LOG_PID, facility=syslog.LOG_LOCAL0)
syslog.syslog('Log processing initiated...')

the above script will log to LOCAL0 facility with our custom “LOG_IDENTIFIER”… you can use LOCAL[0-7] for local purpose.


回答 7

来自https://github.com/luismartingil/per.scripts/tree/master/python_syslog

#!/usr/bin/python
# -*- coding: utf-8 -*-

'''
Implements a new handler for the logging module which uses the pure syslog python module.

@author:  Luis Martin Gil
@year: 2013
'''
import logging
import syslog

class SysLogLibHandler(logging.Handler):
    """A logging handler that emits messages to syslog.syslog."""
    FACILITY = [syslog.LOG_LOCAL0,
                syslog.LOG_LOCAL1,
                syslog.LOG_LOCAL2,
                syslog.LOG_LOCAL3,
                syslog.LOG_LOCAL4,
                syslog.LOG_LOCAL5,
                syslog.LOG_LOCAL6,
                syslog.LOG_LOCAL7]
    def __init__(self, n):
        """ Pre. (0 <= n <= 7) """
        try:
            syslog.openlog(logoption=syslog.LOG_PID, facility=self.FACILITY[n])
        except Exception , err:
            try:
                syslog.openlog(syslog.LOG_PID, self.FACILITY[n])
            except Exception, err:
                try:
                    syslog.openlog('my_ident', syslog.LOG_PID, self.FACILITY[n])
                except:
                    raise
        # We got it
        logging.Handler.__init__(self)

    def emit(self, record):
        syslog.syslog(self.format(record))

if __name__ == '__main__':
    """ Lets play with the log class. """
    # Some variables we need
    _id = 'myproj_v2.0'
    logStr = 'debug'
    logFacilityLocalN = 1

    # Defines a logging level and logging format based on a given string key.
    LOG_ATTR = {'debug': (logging.DEBUG,
                          _id + ' %(levelname)-9s %(name)-15s %(threadName)-14s +%(lineno)-4d %(message)s'),
                'info': (logging.INFO,
                         _id + ' %(levelname)-9s %(message)s'),
                'warning': (logging.WARNING,
                            _id + ' %(levelname)-9s %(message)s'),
                'error': (logging.ERROR,
                          _id + ' %(levelname)-9s %(message)s'),
                'critical': (logging.CRITICAL,
                             _id + ' %(levelname)-9s %(message)s')}
    loglevel, logformat = LOG_ATTR[logStr]

    # Configuring the logger
    logger = logging.getLogger()
    logger.setLevel(loglevel)

    # Clearing previous logs
    logger.handlers = []

    # Setting formaters and adding handlers.
    formatter = logging.Formatter(logformat)
    handlers = []
    handlers.append(SysLogLibHandler(logFacilityLocalN))
    for h in handlers:
        h.setFormatter(formatter)
        logger.addHandler(h)

    # Yep!
    logging.debug('test debug')
    logging.info('test info')
    logging.warning('test warning')
    logging.error('test error')
    logging.critical('test critical')

From https://github.com/luismartingil/per.scripts/tree/master/python_syslog

#!/usr/bin/python
# -*- coding: utf-8 -*-

'''
Implements a new handler for the logging module which uses the pure syslog python module.

@author:  Luis Martin Gil
@year: 2013
'''
import logging
import syslog

class SysLogLibHandler(logging.Handler):
    """A logging handler that emits messages to syslog.syslog."""
    FACILITY = [syslog.LOG_LOCAL0,
                syslog.LOG_LOCAL1,
                syslog.LOG_LOCAL2,
                syslog.LOG_LOCAL3,
                syslog.LOG_LOCAL4,
                syslog.LOG_LOCAL5,
                syslog.LOG_LOCAL6,
                syslog.LOG_LOCAL7]
    def __init__(self, n):
        """ Pre. (0 <= n <= 7) """
        try:
            syslog.openlog(logoption=syslog.LOG_PID, facility=self.FACILITY[n])
        except Exception , err:
            try:
                syslog.openlog(syslog.LOG_PID, self.FACILITY[n])
            except Exception, err:
                try:
                    syslog.openlog('my_ident', syslog.LOG_PID, self.FACILITY[n])
                except:
                    raise
        # We got it
        logging.Handler.__init__(self)

    def emit(self, record):
        syslog.syslog(self.format(record))

if __name__ == '__main__':
    """ Lets play with the log class. """
    # Some variables we need
    _id = 'myproj_v2.0'
    logStr = 'debug'
    logFacilityLocalN = 1

    # Defines a logging level and logging format based on a given string key.
    LOG_ATTR = {'debug': (logging.DEBUG,
                          _id + ' %(levelname)-9s %(name)-15s %(threadName)-14s +%(lineno)-4d %(message)s'),
                'info': (logging.INFO,
                         _id + ' %(levelname)-9s %(message)s'),
                'warning': (logging.WARNING,
                            _id + ' %(levelname)-9s %(message)s'),
                'error': (logging.ERROR,
                          _id + ' %(levelname)-9s %(message)s'),
                'critical': (logging.CRITICAL,
                             _id + ' %(levelname)-9s %(message)s')}
    loglevel, logformat = LOG_ATTR[logStr]

    # Configuring the logger
    logger = logging.getLogger()
    logger.setLevel(loglevel)

    # Clearing previous logs
    logger.handlers = []

    # Setting formaters and adding handlers.
    formatter = logging.Formatter(logformat)
    handlers = []
    handlers.append(SysLogLibHandler(logFacilityLocalN))
    for h in handlers:
        h.setFormatter(formatter)
        logger.addHandler(h)

    # Yep!
    logging.debug('test debug')
    logging.info('test info')
    logging.warning('test warning')
    logging.error('test error')
    logging.critical('test critical')

回答 8

这是推荐用于3.2及更高版本的yaml dictConfig方式。

在日志中cfg.yml

version: 1
disable_existing_loggers: true

formatters:
    default:
        format: "[%(process)d] %(name)s(%(funcName)s:%(lineno)s) - %(levelname)s: %(message)s"

handlers:
    syslog:
        class: logging.handlers.SysLogHandler
        level: DEBUG
        formatter: default
        address: /dev/log
        facility: local0

    rotating_file:
        class: logging.handlers.RotatingFileHandler
        level: DEBUG
        formatter: default
        filename: rotating.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8

root:
    level: DEBUG
    handlers: [syslog, rotating_file]
    propogate: yes

loggers:
    main:
        level: DEBUG
        handlers: [syslog, rotating_file]
        propogate: yes

使用以下命令加载配置:

log_config = yaml.safe_load(open('cfg.yml'))
logging.config.dictConfig(log_config)

配置了系统日志和直接文件。请注意,/dev/log是特定于操作系统的。

Here’s the yaml dictConfig way recommended for 3.2 & later.

In log cfg.yml:

version: 1
disable_existing_loggers: true

formatters:
    default:
        format: "[%(process)d] %(name)s(%(funcName)s:%(lineno)s) - %(levelname)s: %(message)s"

handlers:
    syslog:
        class: logging.handlers.SysLogHandler
        level: DEBUG
        formatter: default
        address: /dev/log
        facility: local0

    rotating_file:
        class: logging.handlers.RotatingFileHandler
        level: DEBUG
        formatter: default
        filename: rotating.log
        maxBytes: 10485760 # 10MB
        backupCount: 20
        encoding: utf8

root:
    level: DEBUG
    handlers: [syslog, rotating_file]
    propogate: yes

loggers:
    main:
        level: DEBUG
        handlers: [syslog, rotating_file]
        propogate: yes

Load the config using:

log_config = yaml.safe_load(open('cfg.yml'))
logging.config.dictConfig(log_config)

Configured both syslog & a direct file. Note that the /dev/log is OS specific.


回答 9

我将其修复在笔记本上。rsyslog服务未侦听套接字服务。

我在/etc/rsyslog.conf文件中配置此行, 并解决了问题:

$SystemLogSocketName /dev/log

I fix it on my notebook. The rsyslog service did not listen on socket service.

I config this line bellow in /etc/rsyslog.conf file and solved the problem:

$SystemLogSocketName /dev/log


回答 10

您还可以添加文件处理程序或旋转文件处理程序,以将日志发送到本地文件:http : //docs.python.org/2/library/logging.handlers.html

You can also add a file handler or rotating file handler to send your logs to a local file: http://docs.python.org/2/library/logging.handlers.html


存在相同名称的模块时从内置库导入

问题:存在相同名称的模块时从内置库导入

情况:-我的project_folder中有一个名为Calendar的模块-我想使用Python库中的内置Calendar类-当我从日历导入Calendar中使用时,它抱怨,因为它试图从我的模块中加载。

我进行了几次搜索,但似乎找不到解决我问题的方法。

有任何想法而不必重命名我的模块吗?

Situation: – There is a module in my project_folder called calendar – I would like to use the built-in Calendar class from the Python libraries – When I use from calendar import Calendar it complains because it’s trying to load from my module.

I’ve done a few searches and I can’t seem to find a solution to my problem.

Any ideas without having to rename my module?


回答 0

公认的解决方案包含一种现已弃用的方法。

这里的importlib文档为直接从python> = 3.5的文件路径中加载模块的更合适方法提供了一个很好的示例:

import importlib.util
import sys

# For illustrative purposes.
import tokenize
file_path = tokenize.__file__  # returns "/path/to/tokenize.py"
module_name = tokenize.__name__  # returns "tokenize"

spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)

因此,您可以从路径加载任何.py文件,并将模块名称设置为所需的名称。所以只要调整module_name为您希望模块在导入时使用的任何自定义名称即可。

要加载程序包而不是单个文件,file_path应为程序包根目录的路径__init__.py

The accepted solution contains a now-deprecated approach.

The importlib documentation here gives a good example of the more appropriate way to load a module directly from a file path for python >= 3.5:

import importlib.util
import sys

# For illustrative purposes.
import tokenize
file_path = tokenize.__file__  # returns "/path/to/tokenize.py"
module_name = tokenize.__name__  # returns "tokenize"

spec = importlib.util.spec_from_file_location(module_name, file_path)
module = importlib.util.module_from_spec(spec)
sys.modules[module_name] = module
spec.loader.exec_module(module)

So, you can load any .py file from a path and set the module name to be whatever you want. So just adjust the module_name to be whatever custom name you’d like the module to have upon importing.

To load a package instead of a single file, file_path should be the path to the package’s root __init__.py


回答 1

无需更改模块名称。相反,您可以使用absolute_import更改导入行为。例如,以stem / socket.py导入套接字模块,如下所示:

from __future__ import absolute_import
import socket

这仅适用于Python 2.5及更高版本;这是Python 3.0及更高版本中的默认行为。Pylint会抱怨该代码,但这是完全有效的。

Changing the name of your module is not necessary. Rather, you can use absolute_import to change the importing behavior. For example with stem/socket.py I import the socket module as follows:

from __future__ import absolute_import
import socket

This only works with Python 2.5 and above; it’s enabling behavior that is the default in Python 3.0 and higher. Pylint will complain about the code but it’s perfectly valid.


回答 2

实际上,解决这个问题很容易,但是实现总是有些脆弱,因为它取决于python导入机制的内部,并且在将来的版本中可能会发生变化。

(以下代码显示了如何加载本地和非本地模块以及它们如何共存)

def import_non_local(name, custom_name=None):
    import imp, sys

    custom_name = custom_name or name

    f, pathname, desc = imp.find_module(name, sys.path[1:])
    module = imp.load_module(custom_name, f, pathname, desc)
    f.close()

    return module

# Import non-local module, use a custom name to differentiate it from local
# This name is only used internally for identifying the module. We decide
# the name in the local scope by assigning it to the variable calendar.
calendar = import_non_local('calendar','std_calendar')

# import local module normally, as calendar_local
import calendar as calendar_local

print calendar.Calendar
print calendar_local

如果可能的话,最好的解决方案是避免使用与标准库或内置模块名称相同的名称来命名模块。

Actually, solving this is rather easy, but the implementation will always be a bit fragile, because it depends python import mechanism’s internals and they are subject to change in future versions.

(the following code shows how to load both local and non-local modules and how they may coexist)

def import_non_local(name, custom_name=None):
    import imp, sys

    custom_name = custom_name or name

    f, pathname, desc = imp.find_module(name, sys.path[1:])
    module = imp.load_module(custom_name, f, pathname, desc)
    f.close()

    return module

# Import non-local module, use a custom name to differentiate it from local
# This name is only used internally for identifying the module. We decide
# the name in the local scope by assigning it to the variable calendar.
calendar = import_non_local('calendar','std_calendar')

# import local module normally, as calendar_local
import calendar as calendar_local

print calendar.Calendar
print calendar_local

The best solution, if possible, is to avoid naming your modules with the same name as standard-library or built-in module names.


回答 3

解决此问题的唯一方法是自己劫持内部进口设备。这并不容易,而且充满了危险。您应不惜一切代价避免使用圣杯形的信标,因为其危险性很高。

重命名您的模块。

如果您想学习如何劫持内部导入机制,可以在这里找到如何执行此操作的方法:

有时有充分的理由陷入这种危险。您给出的原因不在其中。重命名您的模块。

如果您选择危险的路径,将会遇到的一个问题是,当您加载模块时,它会以“正式名称”结尾,这样Python就可以避免再次解析该模块的内容。可以在中找到模块的“正式名称”到模块对象本身的映射sys.modules

这意味着,如果您 import calendar在某个地方,那么导入的任何模块都将被视为具有官方名称的模块,calendar并且所有其他尝试到import calendar任何其他地方的模块(包括在主Python库的其他代码中)都将获得该日历。

可能可以使用 Python 2.x中 imputil模块模块导致从某些路径加载的模块以不同于sys.modulesfirst或类似的方式查找要导入的模块。但这是一件非常繁琐的事情,而且无论如何在Python 3.x中都无法使用。

您可以做的一件非常丑陋和可怕的事情,不涉及挂钩导入机制。您可能不应该这样做,但是可能会起作用。它将您的calendar模块变成系统日历模块和日历模块的混合体。感谢Boaz Yaniv提供的功能框架。将其放在calendar.py文件的开头:

import sys

def copy_in_standard_module_symbols(name, local_module):
    import imp

    for i in range(0, 100):
        random_name = 'random_name_%d' % (i,)
        if random_name not in sys.modules:
            break
        else:
            random_name = None
    if random_name is None:
        raise RuntimeError("Couldn't manufacture an unused module name.")
    f, pathname, desc = imp.find_module(name, sys.path[1:])
    module = imp.load_module(random_name, f, pathname, desc)
    f.close()
    del sys.modules[random_name]
    for key in module.__dict__:
        if not hasattr(local_module, key):
            setattr(local_module, key, getattr(module, key))

copy_in_standard_module_symbols('calendar', sys.modules[copy_in_standard_module_symbols.__module__])

The only way to solve this problem is to hijack the internal import machinery yourself. This is not easy, and fraught with peril. You should avoid the grail shaped beacon at all costs because the peril is too perilous.

Rename your module instead.

If you want to learn how to hijack the internal import machinery, here is where you would go about finding out how to do this:

There are sometimes good reasons to get into this peril. The reason you give is not among them. Rename your module.

If you take the perilous path, one problem you will encounter is that when you load a module it ends up with an ‘official name’ so that Python can avoid ever having to parse the contents of that module ever again. A mapping of the ‘official name’ of a module to the module object itself can be found in sys.modules.

This means that if you import calendar in one place, whatever module is imported will be thought of as the module with the official name calendar and all other attempts to import calendar anywhere else, including in other code that’s part of the main Python library, will get that calendar.

It might be possible to design a customer importer using the imputil module in Python 2.x that caused modules loaded from certain paths to look up the modules they were importing in something other than sys.modules first or something like that. But that’s an extremely hairy thing to be doing, and it won’t work in Python 3.x anyway.

There is an extremely ugly and horrible thing you can do that does not involve hooking the import mechanism. This is something you should probably not do, but it will likely work. It turns your calendar module into a hybrid of the system calendar module and your calendar module. Thanks to Boaz Yaniv for the skeleton of the function I use. Put this at the beginning of your calendar.py file:

import sys

def copy_in_standard_module_symbols(name, local_module):
    import imp

    for i in range(0, 100):
        random_name = 'random_name_%d' % (i,)
        if random_name not in sys.modules:
            break
        else:
            random_name = None
    if random_name is None:
        raise RuntimeError("Couldn't manufacture an unused module name.")
    f, pathname, desc = imp.find_module(name, sys.path[1:])
    module = imp.load_module(random_name, f, pathname, desc)
    f.close()
    del sys.modules[random_name]
    for key in module.__dict__:
        if not hasattr(local_module, key):
            setattr(local_module, key, getattr(module, key))

copy_in_standard_module_symbols('calendar', sys.modules[copy_in_standard_module_symbols.__module__])

回答 4

我想提供我的版本,该版本是Boaz Yaniv和Omnifarious解决方案的结合。它将导入模块的系统版本,与先前的答案有两个主要区别:

  • 支持“点”表示法,例如。包模块
  • 是系统模块上import语句的直接替代品,这意味着您只需要替换该行,并且如果已经对模块进行了调用,它们将照常工作

将其放在可访问的位置,以便您可以调用它(我的__init__.py文件中有我的名称):

class SysModule(object):
    pass

def import_non_local(name, local_module=None, path=None, full_name=None, accessor=SysModule()):
    import imp, sys, os

    path = path or sys.path[1:]
    if isinstance(path, basestring):
        path = [path]

    if '.' in name:
        package_name = name.split('.')[0]
        f, pathname, desc = imp.find_module(package_name, path)
        if pathname not in __path__:
            __path__.insert(0, pathname)
        imp.load_module(package_name, f, pathname, desc)
        v = import_non_local('.'.join(name.split('.')[1:]), None, pathname, name, SysModule())
        setattr(accessor, package_name, v)
        if local_module:
            for key in accessor.__dict__.keys():
                setattr(local_module, key, getattr(accessor, key))
        return accessor
    try:
        f, pathname, desc = imp.find_module(name, path)
        if pathname not in __path__:
            __path__.insert(0, pathname)
        module = imp.load_module(name, f, pathname, desc)
        setattr(accessor, name, module)
        if local_module:
            for key in accessor.__dict__.keys():
                setattr(local_module, key, getattr(accessor, key))
            return module
        return accessor
    finally:
        try:
            if f:
                f.close()
        except:
            pass

我想导入mysql.connection,但我已经有一个本地软件包mysql(官方mysql实用程序)。因此,要从系统mysql包中获取连接器,我将其替换为:

import mysql.connector

有了这个:

import sys
from mysql.utilities import import_non_local         # where I put the above function (mysql/utilities/__init__.py)
import_non_local('mysql.connector', sys.modules[__name__])

结果

# This unmodified line further down in the file now works just fine because mysql.connector has actually become part of the namespace
self.db_conn = mysql.connector.connect(**parameters)

I’d like to offer my version, which is a combination of Boaz Yaniv’s and Omnifarious’s solution. It will import the system version of a module, with two main differences from the previous answers:

  • Supports the ‘dot’ notation, eg. package.module
  • Is a drop-in replacement for the import statement on system modules, meaning you just have to replace that one line and if there are already calls being made to the module they will work as-is

Put this somewhere accessible so you can call it (I have mine in my __init__.py file):

class SysModule(object):
    pass

def import_non_local(name, local_module=None, path=None, full_name=None, accessor=SysModule()):
    import imp, sys, os

    path = path or sys.path[1:]
    if isinstance(path, basestring):
        path = [path]

    if '.' in name:
        package_name = name.split('.')[0]
        f, pathname, desc = imp.find_module(package_name, path)
        if pathname not in __path__:
            __path__.insert(0, pathname)
        imp.load_module(package_name, f, pathname, desc)
        v = import_non_local('.'.join(name.split('.')[1:]), None, pathname, name, SysModule())
        setattr(accessor, package_name, v)
        if local_module:
            for key in accessor.__dict__.keys():
                setattr(local_module, key, getattr(accessor, key))
        return accessor
    try:
        f, pathname, desc = imp.find_module(name, path)
        if pathname not in __path__:
            __path__.insert(0, pathname)
        module = imp.load_module(name, f, pathname, desc)
        setattr(accessor, name, module)
        if local_module:
            for key in accessor.__dict__.keys():
                setattr(local_module, key, getattr(accessor, key))
            return module
        return accessor
    finally:
        try:
            if f:
                f.close()
        except:
            pass

Example

I wanted to import mysql.connection, but I had a local package already called mysql (the official mysql utilities). So to get the connector from the system mysql package, I replaced this:

import mysql.connector

With this:

import sys
from mysql.utilities import import_non_local         # where I put the above function (mysql/utilities/__init__.py)
import_non_local('mysql.connector', sys.modules[__name__])

Result

# This unmodified line further down in the file now works just fine because mysql.connector has actually become part of the namespace
self.db_conn = mysql.connector.connect(**parameters)

回答 5

更改导入路径:

import sys
save_path = sys.path[:]
sys.path.remove('')
import calendar
sys.path = save_path

Change the import path:

import sys
save_path = sys.path[:]
sys.path.remove('')
import calendar
sys.path = save_path

CSV新行字符出现在未引用字段错误

问题:CSV新行字符出现在未引用字段错误

以下代码一直工作到今天,当我从Windows机器导入并出现此错误时:

在不带引号的字段中看到换行符-您是否需要在通用换行模式下打开文件?

import csv

class CSV:


    def __init__(self, file=None):
        self.file = file

    def read_file(self):
        data = []
        file_read = csv.reader(self.file)
        for row in file_read:
            data.append(row)
        return data

    def get_row_count(self):
        return len(self.read_file())

    def get_column_count(self):
        new_data = self.read_file()
        return len(new_data[0])

    def get_data(self, rows=1):
        data = self.read_file()

        return data[:rows]

如何解决此问题?

def upload_configurator(request, id=None):
    """
    A view that allows the user to configurator the uploaded CSV.
    """
    upload = Upload.objects.get(id=id)
    csvobject = CSV(upload.filepath)

    upload.num_records = csvobject.get_row_count()
    upload.num_columns = csvobject.get_column_count()
    upload.save()

    form = ConfiguratorForm()

    row_count = csvobject.get_row_count()
    colum_count = csvobject.get_column_count()
    first_row = csvobject.get_data(rows=1)
    first_two_rows = csvobject.get_data(rows=5)

the following code worked until today when I imported from a Windows machine and got this error:

new-line character seen in unquoted field – do you need to open the file in universal-newline mode?

import csv

class CSV:


    def __init__(self, file=None):
        self.file = file

    def read_file(self):
        data = []
        file_read = csv.reader(self.file)
        for row in file_read:
            data.append(row)
        return data

    def get_row_count(self):
        return len(self.read_file())

    def get_column_count(self):
        new_data = self.read_file()
        return len(new_data[0])

    def get_data(self, rows=1):
        data = self.read_file()

        return data[:rows]

How can I fix this issue?

def upload_configurator(request, id=None):
    """
    A view that allows the user to configurator the uploaded CSV.
    """
    upload = Upload.objects.get(id=id)
    csvobject = CSV(upload.filepath)

    upload.num_records = csvobject.get_row_count()
    upload.num_columns = csvobject.get_column_count()
    upload.save()

    form = ConfiguratorForm()

    row_count = csvobject.get_row_count()
    colum_count = csvobject.get_column_count()
    first_row = csvobject.get_data(rows=1)
    first_two_rows = csvobject.get_data(rows=5)

回答 0

最好先查看csv文件本身,但这可能对您有用,请尝试一下,替换:

file_read = csv.reader(self.file)

与:

file_read = csv.reader(self.file, dialect=csv.excel_tab)

或者,使用打开文件universal newline mode并将其传递给csv.reader,例如:

reader = csv.reader(open(self.file, 'rU'), dialect=csv.excel_tab)

或者,splitlines()像这样使用:

def read_file(self):
    with open(self.file, 'r') as f:
        data = [row for row in csv.reader(f.read().splitlines())]
    return data

It’ll be good to see the csv file itself, but this might work for you, give it a try, replace:

file_read = csv.reader(self.file)

with:

file_read = csv.reader(self.file, dialect=csv.excel_tab)

Or, open a file with universal newline mode and pass it to csv.reader, like:

reader = csv.reader(open(self.file, 'rU'), dialect=csv.excel_tab)

Or, use splitlines(), like this:

def read_file(self):
    with open(self.file, 'r') as f:
        data = [row for row in csv.reader(f.read().splitlines())]
    return data

回答 1

我意识到这是一篇过时的文章,但是遇到了同样的问题,但没有找到正确的答案,因此我将尝试一下

Python错误:

_csv.Error: new-line character seen in unquoted field

试图读取Macintosh(OS X之前的格式)的CSV文件引起的。这些是使用CR作为行尾的文本文件。如果使用MS Office,请确保选择纯CSV格式或CSV(MS-DOS)不要使用CSV(Macintosh)作为另存为类型。

我首选的EOL版本是LF(Unix / Linux / Apple),但我不认为MS Office提供了以这种格式保存的选项。

I realize this is an old post, but I ran into the same problem and don’t see the correct answer so I will give it a try

Python Error:

_csv.Error: new-line character seen in unquoted field

Caused by trying to read Macintosh (pre OS X formatted) CSV files. These are text files that use CR for end of line. If using MS Office make sure you select either plain CSV format or CSV (MS-DOS). Do not use CSV (Macintosh) as save-as type.

My preferred EOL version would be LF (Unix/Linux/Apple), but I don’t think MS Office provides the option to save in this format.


回答 2

对于Mac OS X,请以“ Windows逗号分隔(.csv)”格式保存CSV文件。

For Mac OS X, save your CSV file in “Windows Comma Separated (.csv)” format.


回答 3

如果您在Mac上遇到了这种情况(就像对我一样):

  1. 将文件另存为 CSV (MS-DOS Comma-Separated)
  2. 运行以下脚本

    with open(csv_filename, 'rU') as csvfile:
        csvreader = csv.reader(csvfile)
        for row in csvreader:
            print ', '.join(row)
    

If this happens to you on mac (as it did to me):

  1. Save the file as CSV (MS-DOS Comma-Separated)
  2. Run the following script

    with open(csv_filename, 'rU') as csvfile:
        csvreader = csv.reader(csvfile)
        for row in csvreader:
            print ', '.join(row)
    

回答 4

尝试先dos2unix在Windows导入的文件上运行

Try to run dos2unix on your windows imported files first


回答 5

这是我遇到的错误。我已将.csv文件保存在MAC OSX中。

保存时,将其另存为“ Windows逗号分隔值(.csv)”,此问题已解决。

This is an error that I faced. I had saved .csv file in MAC OSX.

While saving, save it as “Windows Comma Separated Values (.csv)” which resolved the issue.


回答 6

这在OSX上对我有用。

# allow variable to opened as files
from io import StringIO

# library to map other strange (accented) characters back into UTF-8
from unidecode import unidecode

# cleanse input file with Windows formating to plain UTF-8 string
with open(filename, 'rb') as fID:
    uncleansedBytes = fID.read()
    # decode the file using the correct encoding scheme
    # (probably this old windows one) 
    uncleansedText = uncleansedBytes.decode('Windows-1252')

    # replace carriage-returns with new-lines
    cleansedText = uncleansedText.replace('\r', '\n')

    # map any other non UTF-8 characters into UTF-8
    asciiText = unidecode(cleansedText)

# read each line of the csv file and store as an array of dicts, 
# use first line as field names for each dict. 
reader = csv.DictReader(StringIO(cleansedText))
for line_entry in reader:
    # do something with your read data 

This worked for me on OSX.

# allow variable to opened as files
from io import StringIO

# library to map other strange (accented) characters back into UTF-8
from unidecode import unidecode

# cleanse input file with Windows formating to plain UTF-8 string
with open(filename, 'rb') as fID:
    uncleansedBytes = fID.read()
    # decode the file using the correct encoding scheme
    # (probably this old windows one) 
    uncleansedText = uncleansedBytes.decode('Windows-1252')

    # replace carriage-returns with new-lines
    cleansedText = uncleansedText.replace('\r', '\n')

    # map any other non UTF-8 characters into UTF-8
    asciiText = unidecode(cleansedText)

# read each line of the csv file and store as an array of dicts, 
# use first line as field names for each dict. 
reader = csv.DictReader(StringIO(cleansedText))
for line_entry in reader:
    # do something with your read data 

回答 7

我知道这个问题已经回答了很长时间,但并不能解决我的问题。由于其他一些复杂性,我正在使用DictReader和StringIO进行csv读取。通过显式替换定界符,我能够更简单地解决问题:

with urllib.request.urlopen(q) as response:
    raw_data = response.read()
    encoding = response.info().get_content_charset('utf8') 
    data = raw_data.decode(encoding)
    if '\r\n' not in data:
        # proably a windows delimited thing...try to update it
        data = data.replace('\r', '\r\n')

对于庞大的CSV文件来说可能并不合理,但对于我的用例来说效果很好。

I know this has been answered for quite some time but not solve my problem. I am using DictReader and StringIO for my csv reading due to some other complications. I was able to solve problem more simply by replacing delimiters explicitly:

with urllib.request.urlopen(q) as response:
    raw_data = response.read()
    encoding = response.info().get_content_charset('utf8') 
    data = raw_data.decode(encoding)
    if '\r\n' not in data:
        # proably a windows delimited thing...try to update it
        data = data.replace('\r', '\r\n')

Might not be reasonable for enormous CSV files, but worked well for my use case.


回答 8

替代快速解决方案:我遇到了同样的错误。我在lubuntu机器上的GNUMERIC中重新打开了“奇怪的” csv文件,并将该文件导出为csv文件。这解决了问题。

Alternative and fast solution : I faced the same error. I reopened the “wierd” csv file in GNUMERIC on my lubuntu machine and exported the file as csv file. This corrected the issue.


Python中存在可变的命名元组吗?

问题:Python中存在可变的命名元组吗?

任何人都可以修改namedtuple或提供替代类,以使其适用于可变对象吗?

主要是为了提高可读性,我想要执行类似于namedtuple的操作:

from Camelot import namedgroup

Point = namedgroup('Point', ['x', 'y'])
p = Point(0, 0)
p.x = 10

>>> p
Point(x=10, y=0)

>>> p.x *= 10
Point(x=100, y=0)

腌制所得物体必须是可能的。并且根据命名元组的特征,在表示对象时输出的顺序必须与构造对象时参数列表的顺序相匹配。

Can anyone amend namedtuple or provide an alternative class so that it works for mutable objects?

Primarily for readability, I would like something similar to namedtuple that does this:

from Camelot import namedgroup

Point = namedgroup('Point', ['x', 'y'])
p = Point(0, 0)
p.x = 10

>>> p
Point(x=10, y=0)

>>> p.x *= 10
Point(x=100, y=0)

It must be possible to pickle the resulting object. And per the characteristics of named tuple, the ordering of the output when represented must match the order of the parameter list when constructing the object.


回答 0

还有就是一个可变的替代方案collections.namedtuplerecordclass

它具有与API相同的API和内存占用量,namedtuple并且支持分配(它也应该更快)。例如:

from recordclass import recordclass

Point = recordclass('Point', 'x y')

>>> p = Point(1, 2)
>>> p
Point(x=1, y=2)
>>> print(p.x, p.y)
1 2
>>> p.x += 2; p.y += 3; print(p)
Point(x=3, y=5)

对于python 3.6及更高版本recordclass(从0.5开始)支持typehints:

from recordclass import recordclass, RecordClass

class Point(RecordClass):
   x: int
   y: int

>>> Point.__annotations__
{'x':int, 'y':int}
>>> p = Point(1, 2)
>>> p
Point(x=1, y=2)
>>> print(p.x, p.y)
1 2
>>> p.x += 2; p.y += 3; print(p)
Point(x=3, y=5)

有一个更完整的示例(还包括性能比较)。

由于0.9 recordclass库提供了另一个变体- recordclass.structclass工厂功能。它可以产生类,其实例比__slots__基于实例的实例占用更少的内存。这对于具有属性值的实例非常重要,该属性值不打算具有参考周期。如果您需要创建数百万个实例,则可能有助于减少内存使用。这是一个说明性的例子

There is a mutable alternative to collections.namedtuplerecordclass.

It has the same API and memory footprint as namedtuple and it supports assignments (It should be faster as well). For example:

from recordclass import recordclass

Point = recordclass('Point', 'x y')

>>> p = Point(1, 2)
>>> p
Point(x=1, y=2)
>>> print(p.x, p.y)
1 2
>>> p.x += 2; p.y += 3; print(p)
Point(x=3, y=5)

For python 3.6 and higher recordclass (since 0.5) support typehints:

from recordclass import recordclass, RecordClass

class Point(RecordClass):
   x: int
   y: int

>>> Point.__annotations__
{'x':int, 'y':int}
>>> p = Point(1, 2)
>>> p
Point(x=1, y=2)
>>> print(p.x, p.y)
1 2
>>> p.x += 2; p.y += 3; print(p)
Point(x=3, y=5)

There is a more complete example (it also includes performance comparisons).

Since 0.9 recordclass library provides another variant — recordclass.structclass factory function. It can produce classes, whose instances occupy less memory than __slots__-based instances. This is can be important for the instances with attribute values, which has not intended to have reference cycles. It may help reduce memory usage if you need to create millions of instances. Here is an illustrative example.


回答 1

types.SimpleNamespace在Python 3.3中引入,并支持所要求的要求。

from types import SimpleNamespace
t = SimpleNamespace(foo='bar')
t.ham = 'spam'
print(t)
namespace(foo='bar', ham='spam')
print(t.foo)
'bar'
import pickle
with open('/tmp/pickle', 'wb') as f:
    pickle.dump(t, f)

types.SimpleNamespace was introduced in Python 3.3 and supports the requested requirements.

from types import SimpleNamespace
t = SimpleNamespace(foo='bar')
t.ham = 'spam'
print(t)
namespace(foo='bar', ham='spam')
print(t.foo)
'bar'
import pickle
with open('/tmp/pickle', 'wb') as f:
    pickle.dump(t, f)

回答 2

作为此任务的一种非常Pythonic的替代方法,从Python-3.7开始,您可以使用 dataclasses不仅行为可变的模块,NamedTuple因为它们使用常规的类定义,而且还支持其他类功能。

从PEP-0557:

尽管它们使用了非常不同的机制,但是可以将数据类视为“具有默认值的可变命名元组”。因为数据类使用普通的类定义语法,所以您可以自由使用继承,元类,文档字符串,用户定义的方法,类工厂和其他Python类功能。

提供了一个类装饰器,该类装饰器检查类定义中具有类型注释的变量,如PEP 526 “变量注释的语法”中所定义。在本文档中,此类变量称为字段。装饰器使用这些字段将生成的方法定义添加到类中,以支持实例初始化,repr,比较方法以及(可选)规范部分中描述的其他方法。这样的类称为数据类,但该类实际上没有什么特别的:装饰器将生成的方法添加到该类中并返回给定的相同类。

PEP-0557中引入了此功能,您可以在提供的文档链接上详细了解它。

例:

In [20]: from dataclasses import dataclass

In [21]: @dataclass
    ...: class InventoryItem:
    ...:     '''Class for keeping track of an item in inventory.'''
    ...:     name: str
    ...:     unit_price: float
    ...:     quantity_on_hand: int = 0
    ...: 
    ...:     def total_cost(self) -> float:
    ...:         return self.unit_price * self.quantity_on_hand
    ...:    

演示:

In [23]: II = InventoryItem('bisc', 2000)

In [24]: II
Out[24]: InventoryItem(name='bisc', unit_price=2000, quantity_on_hand=0)

In [25]: II.name = 'choco'

In [26]: II.name
Out[26]: 'choco'

In [27]: 

In [27]: II.unit_price *= 3

In [28]: II.unit_price
Out[28]: 6000

In [29]: II
Out[29]: InventoryItem(name='choco', unit_price=6000, quantity_on_hand=0)

As a very Pythonic alternative for this task, since Python-3.7, you can use dataclasses module that not only behaves like a mutable NamedTuple because they use normal class definitions they also support other classes features.

From PEP-0557:

Although they use a very different mechanism, Data Classes can be thought of as “mutable namedtuples with defaults”. Because Data Classes use normal class definition syntax, you are free to use inheritance, metaclasses, docstrings, user-defined methods, class factories, and other Python class features.

A class decorator is provided which inspects a class definition for variables with type annotations as defined in PEP 526, “Syntax for Variable Annotations”. In this document, such variables are called fields. Using these fields, the decorator adds generated method definitions to the class to support instance initialization, a repr, comparison methods, and optionally other methods as described in the Specification section. Such a class is called a Data Class, but there’s really nothing special about the class: the decorator adds generated methods to the class and returns the same class it was given.

This feature is introduced in PEP-0557 that you can read about it in more details on provided documentation link.

Example:

In [20]: from dataclasses import dataclass

In [21]: @dataclass
    ...: class InventoryItem:
    ...:     '''Class for keeping track of an item in inventory.'''
    ...:     name: str
    ...:     unit_price: float
    ...:     quantity_on_hand: int = 0
    ...: 
    ...:     def total_cost(self) -> float:
    ...:         return self.unit_price * self.quantity_on_hand
    ...:    

Demo:

In [23]: II = InventoryItem('bisc', 2000)

In [24]: II
Out[24]: InventoryItem(name='bisc', unit_price=2000, quantity_on_hand=0)

In [25]: II.name = 'choco'

In [26]: II.name
Out[26]: 'choco'

In [27]: 

In [27]: II.unit_price *= 3

In [28]: II.unit_price
Out[28]: 6000

In [29]: II
Out[29]: InventoryItem(name='choco', unit_price=6000, quantity_on_hand=0)

回答 3

截至2016 1月11日,最新的namedlist 1.7通过了Python 2.7和Python 3.5的所有测试它是纯python实现,recordclassC是C扩展。当然,是否需要C扩展名取决于您的要求。

您的测试(也请参见下面的注释):

from __future__ import print_function
import pickle
import sys
from namedlist import namedlist

Point = namedlist('Point', 'x y')
p = Point(x=1, y=2)

print('1. Mutation of field values')
p.x *= 10
p.y += 10
print('p: {}, {}\n'.format(p.x, p.y))

print('2. String')
print('p: {}\n'.format(p))

print('3. Representation')
print(repr(p), '\n')

print('4. Sizeof')
print('size of p:', sys.getsizeof(p), '\n')

print('5. Access by name of field')
print('p: {}, {}\n'.format(p.x, p.y))

print('6. Access by index')
print('p: {}, {}\n'.format(p[0], p[1]))

print('7. Iterative unpacking')
x, y = p
print('p: {}, {}\n'.format(x, y))

print('8. Iteration')
print('p: {}\n'.format([v for v in p]))

print('9. Ordered Dict')
print('p: {}\n'.format(p._asdict()))

print('10. Inplace replacement (update?)')
p._update(x=100, y=200)
print('p: {}\n'.format(p))

print('11. Pickle and Unpickle')
pickled = pickle.dumps(p)
unpickled = pickle.loads(pickled)
assert p == unpickled
print('Pickled successfully\n')

print('12. Fields\n')
print('p: {}\n'.format(p._fields))

print('13. Slots')
print('p: {}\n'.format(p.__slots__))

在Python 2.7上输出

1.字段值的突变  
p:10、12

2.字符串  
p:点(x = 10,y = 12)

3.陈述  
点(x = 10,y = 12) 

4. Sizeof  
p的大小:64 

5.按字段名称访问  
p:10、12

6.按索引访问  
p:10、12

7.迭代拆包  
p:10、12

8.迭代  
p:[10、12]

9.有序词典  
p:OrderedDict([['x',10),('y',12)])

10.就地更换(更新?)  
p:点(x = 100,y = 200)

11.泡菜和腌菜  
腌制成功

12.领域  
p:('x','y')

13.插槽  
p:('x','y')

与Python 3.5的唯一区别是namedlist变得更小,大小为56(Python 2.7报告64)。

请注意,我已将您的测试10更改为就地更换。namedlist_replace()哪些做了浅拷贝的方法,这使我感觉良好,因为namedtuple在标准库的工作方式。更改_replace()方法的语义会造成混乱。我认为该_update()方法应用于就地更新。还是我无法理解您的测试10的意图?

The latest namedlist 1.7 passes all of your tests with both Python 2.7 and Python 3.5 as of Jan 11, 2016. It is a pure python implementation whereas the recordclass is a C extension. Of course, it depends on your requirements whether a C extension is preferred or not.

Your tests (but also see the note below):

from __future__ import print_function
import pickle
import sys
from namedlist import namedlist

Point = namedlist('Point', 'x y')
p = Point(x=1, y=2)

print('1. Mutation of field values')
p.x *= 10
p.y += 10
print('p: {}, {}\n'.format(p.x, p.y))

print('2. String')
print('p: {}\n'.format(p))

print('3. Representation')
print(repr(p), '\n')

print('4. Sizeof')
print('size of p:', sys.getsizeof(p), '\n')

print('5. Access by name of field')
print('p: {}, {}\n'.format(p.x, p.y))

print('6. Access by index')
print('p: {}, {}\n'.format(p[0], p[1]))

print('7. Iterative unpacking')
x, y = p
print('p: {}, {}\n'.format(x, y))

print('8. Iteration')
print('p: {}\n'.format([v for v in p]))

print('9. Ordered Dict')
print('p: {}\n'.format(p._asdict()))

print('10. Inplace replacement (update?)')
p._update(x=100, y=200)
print('p: {}\n'.format(p))

print('11. Pickle and Unpickle')
pickled = pickle.dumps(p)
unpickled = pickle.loads(pickled)
assert p == unpickled
print('Pickled successfully\n')

print('12. Fields\n')
print('p: {}\n'.format(p._fields))

print('13. Slots')
print('p: {}\n'.format(p.__slots__))

Output on Python 2.7

1. Mutation of field values  
p: 10, 12

2. String  
p: Point(x=10, y=12)

3. Representation  
Point(x=10, y=12) 

4. Sizeof  
size of p: 64 

5. Access by name of field  
p: 10, 12

6. Access by index  
p: 10, 12

7. Iterative unpacking  
p: 10, 12

8. Iteration  
p: [10, 12]

9. Ordered Dict  
p: OrderedDict([('x', 10), ('y', 12)])

10. Inplace replacement (update?)  
p: Point(x=100, y=200)

11. Pickle and Unpickle  
Pickled successfully

12. Fields  
p: ('x', 'y')

13. Slots  
p: ('x', 'y')

The only difference with Python 3.5 is that the namedlist has become smaller, the size is 56 (Python 2.7 reports 64).

Note that I have changed your test 10 for in-place replacement. The namedlist has a _replace() method which does a shallow copy, and that makes perfect sense to me because the namedtuple in the standard library behaves the same way. Changing the semantics of the _replace() method would be confusing. In my opinion the _update() method should be used for in-place updates. Or maybe I failed to understand the intent of your test 10?


回答 4

看来这个问题的答案是否定的。

下面的内容非常接近,但从技术上讲并不是可变的。这将创建一个namedtuple()具有更新的x值的新实例:

Point = namedtuple('Point', ['x', 'y'])
p = Point(0, 0)
p = p._replace(x=10) 

另一方面,您可以使用创建一个简单的类__slots__,该类应该可以很好地用于频繁更新类实例属性:

class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

为了补充这个答案,我认为__slots__在这里很好用,因为当您创建许多类实例时,它的内存使用效率很高。唯一的缺点是您不能创建新的类属性。

这是一个说明内存效率的相关线程 -Dictionary vs Object-效率更高,为什么?

该线程答案中引用的内容非常简洁地解释了为什么__slots__内存效率更高-Python插槽

It seems like the answer to this question is no.

Below is pretty close, but it’s not technically mutable. This is creating a new namedtuple() instance with an updated x value:

Point = namedtuple('Point', ['x', 'y'])
p = Point(0, 0)
p = p._replace(x=10) 

On the other hand, you can create a simple class using __slots__ that should work well for frequently updating class instance attributes:

class Point:
    __slots__ = ['x', 'y']
    def __init__(self, x, y):
        self.x = x
        self.y = y

To add to this answer, I think __slots__ is good use here because it’s memory efficient when you create lots of class instances. The only downside is that you can’t create new class attributes.

Here’s one relevant thread that illustrates the memory efficiency – Dictionary vs Object – which is more efficient and why?

The quoted content in the answer of this thread is a very succinct explanation why __slots__ is more memory efficient – Python slots


回答 5

以下是适用于Python 3的良好解决方案:最小类使用__slots__Sequence抽象基类;不会执行类似的错误检测,但它可以工作,并且其行为基本上类似于可变元组(类型检查除外)。

from collections import Sequence

class NamedMutableSequence(Sequence):
    __slots__ = ()

    def __init__(self, *a, **kw):
        slots = self.__slots__
        for k in slots:
            setattr(self, k, kw.get(k))

        if a:
            for k, v in zip(slots, a):
                setattr(self, k, v)

    def __str__(self):
        clsname = self.__class__.__name__
        values = ', '.join('%s=%r' % (k, getattr(self, k))
                           for k in self.__slots__)
        return '%s(%s)' % (clsname, values)

    __repr__ = __str__

    def __getitem__(self, item):
        return getattr(self, self.__slots__[item])

    def __setitem__(self, item, value):
        return setattr(self, self.__slots__[item], value)

    def __len__(self):
        return len(self.__slots__)

class Point(NamedMutableSequence):
    __slots__ = ('x', 'y')

例:

>>> p = Point(0, 0)
>>> p.x = 10
>>> p
Point(x=10, y=0)
>>> p.x *= 10
>>> p
Point(x=100, y=0)

如果需要,您也可以使用一种方法来创建类(尽管使用显式类更为透明):

def namedgroup(name, members):
    if isinstance(members, str):
        members = members.split()
    members = tuple(members)
    return type(name, (NamedMutableSequence,), {'__slots__': members})

例:

>>> Point = namedgroup('Point', ['x', 'y'])
>>> Point(6, 42)
Point(x=6, y=42)

在Python 2中,您需要稍作调整-如果您从继承Sequence,则该类将具有__dict____slots__将停止工作。

Python 2中的解决方案是不继承Sequence,而是继承object。如果isinstance(Point, Sequence) == True需要,您需要将NamedMutableSequence作为基本类注册 到Sequence

Sequence.register(NamedMutableSequence)

The following is a good solution for Python 3: A minimal class using __slots__ and Sequence abstract base class; does not do fancy error detection or such, but it works, and behaves mostly like a mutable tuple (except for typecheck).

from collections import Sequence

class NamedMutableSequence(Sequence):
    __slots__ = ()

    def __init__(self, *a, **kw):
        slots = self.__slots__
        for k in slots:
            setattr(self, k, kw.get(k))

        if a:
            for k, v in zip(slots, a):
                setattr(self, k, v)

    def __str__(self):
        clsname = self.__class__.__name__
        values = ', '.join('%s=%r' % (k, getattr(self, k))
                           for k in self.__slots__)
        return '%s(%s)' % (clsname, values)

    __repr__ = __str__

    def __getitem__(self, item):
        return getattr(self, self.__slots__[item])

    def __setitem__(self, item, value):
        return setattr(self, self.__slots__[item], value)

    def __len__(self):
        return len(self.__slots__)

class Point(NamedMutableSequence):
    __slots__ = ('x', 'y')

Example:

>>> p = Point(0, 0)
>>> p.x = 10
>>> p
Point(x=10, y=0)
>>> p.x *= 10
>>> p
Point(x=100, y=0)

If you want, you can have a method to create the class too (though using an explicit class is more transparent):

def namedgroup(name, members):
    if isinstance(members, str):
        members = members.split()
    members = tuple(members)
    return type(name, (NamedMutableSequence,), {'__slots__': members})

Example:

>>> Point = namedgroup('Point', ['x', 'y'])
>>> Point(6, 42)
Point(x=6, y=42)

In Python 2 you need to adjust it slightly – if you inherit from Sequence, the class will have a __dict__ and the __slots__ will stop from working.

The solution in Python 2 is to not inherit from Sequence, but object. If isinstance(Point, Sequence) == True is desired, you need to register the NamedMutableSequence as a base class to Sequence:

Sequence.register(NamedMutableSequence)

回答 6

让我们通过动态类型创建来实现这一点:

import copy
def namedgroup(typename, fieldnames):

    def init(self, **kwargs): 
        attrs = {k: None for k in self._attrs_}
        for k in kwargs:
            if k in self._attrs_:
                attrs[k] = kwargs[k]
            else:
                raise AttributeError('Invalid Field')
        self.__dict__.update(attrs)

    def getattribute(self, attr):
        if attr.startswith("_") or attr in self._attrs_:
            return object.__getattribute__(self, attr)
        else:
            raise AttributeError('Invalid Field')

    def setattr(self, attr, value):
        if attr in self._attrs_:
            object.__setattr__(self, attr, value)
        else:
            raise AttributeError('Invalid Field')

    def rep(self):
         d = ["{}={}".format(v,self.__dict__[v]) for v in self._attrs_]
         return self._typename_ + '(' + ', '.join(d) + ')'

    def iterate(self):
        for x in self._attrs_:
            yield self.__dict__[x]
        raise StopIteration()

    def setitem(self, *args, **kwargs):
        return self.__dict__.__setitem__(*args, **kwargs)

    def getitem(self, *args, **kwargs):
        return self.__dict__.__getitem__(*args, **kwargs)

    attrs = {"__init__": init,
                "__setattr__": setattr,
                "__getattribute__": getattribute,
                "_attrs_": copy.deepcopy(fieldnames),
                "_typename_": str(typename),
                "__str__": rep,
                "__repr__": rep,
                "__len__": lambda self: len(fieldnames),
                "__iter__": iterate,
                "__setitem__": setitem,
                "__getitem__": getitem,
                }

    return type(typename, (object,), attrs)

这将在允许操作继续之前检查属性以查看它们是否有效。

那么,这是可腌制的吗?是(且仅当您执行以下操作时):

>>> import pickle
>>> Point = namedgroup("Point", ["x", "y"])
>>> p = Point(x=100, y=200)
>>> p2 = pickle.loads(pickle.dumps(p))
>>> p2.x
100
>>> p2.y
200
>>> id(p) != id(p2)
True

该定义必须在您的命名空间中,并且必须存在足够长的时间,以便pickle可以找到它。因此,如果您将其定义在包中,则应该可以使用。

Point = namedgroup("Point", ["x", "y"])

如果您执行以下操作,或者将定义设为临时定义,则Pickle将失败(例如,函数结束时超出范围):

some_point = namedgroup("Point", ["x", "y"])

是的,它确实保留了类型创建中列出的字段的顺序。

Let’s implement this with dynamic type creation:

import copy
def namedgroup(typename, fieldnames):

    def init(self, **kwargs): 
        attrs = {k: None for k in self._attrs_}
        for k in kwargs:
            if k in self._attrs_:
                attrs[k] = kwargs[k]
            else:
                raise AttributeError('Invalid Field')
        self.__dict__.update(attrs)

    def getattribute(self, attr):
        if attr.startswith("_") or attr in self._attrs_:
            return object.__getattribute__(self, attr)
        else:
            raise AttributeError('Invalid Field')

    def setattr(self, attr, value):
        if attr in self._attrs_:
            object.__setattr__(self, attr, value)
        else:
            raise AttributeError('Invalid Field')

    def rep(self):
         d = ["{}={}".format(v,self.__dict__[v]) for v in self._attrs_]
         return self._typename_ + '(' + ', '.join(d) + ')'

    def iterate(self):
        for x in self._attrs_:
            yield self.__dict__[x]
        raise StopIteration()

    def setitem(self, *args, **kwargs):
        return self.__dict__.__setitem__(*args, **kwargs)

    def getitem(self, *args, **kwargs):
        return self.__dict__.__getitem__(*args, **kwargs)

    attrs = {"__init__": init,
                "__setattr__": setattr,
                "__getattribute__": getattribute,
                "_attrs_": copy.deepcopy(fieldnames),
                "_typename_": str(typename),
                "__str__": rep,
                "__repr__": rep,
                "__len__": lambda self: len(fieldnames),
                "__iter__": iterate,
                "__setitem__": setitem,
                "__getitem__": getitem,
                }

    return type(typename, (object,), attrs)

This checks the attributes to see if they are valid before allowing the operation to continue.

So is this pickleable? Yes if (and only if) you do the following:

>>> import pickle
>>> Point = namedgroup("Point", ["x", "y"])
>>> p = Point(x=100, y=200)
>>> p2 = pickle.loads(pickle.dumps(p))
>>> p2.x
100
>>> p2.y
200
>>> id(p) != id(p2)
True

The definition has to be in your namespace, and must exist long enough for pickle to find it. So if you define this to be in your package, it should work.

Point = namedgroup("Point", ["x", "y"])

Pickle will fail if you do the following, or make the definition temporary (goes out of scope when the function ends, say):

some_point = namedgroup("Point", ["x", "y"])

And yes, it does preserve the order of the fields listed in the type creation.


回答 7

根据定义,元组是不可变的。

但是,您可以创建一个字典子类,在其中可以使用点符号访问属性。

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:class AttrDict(dict):
:
:    def __getattr__(self, name):
:        return self[name]
:
:    def __setattr__(self, name, value):
:        self[name] = value
:--

In [2]: test = AttrDict()

In [3]: test.a = 1

In [4]: test.b = True

In [5]: test
Out[5]: {'a': 1, 'b': True}

Tuples are by definition immutable.

You can however make a dictionary subclass where you can access the attributes with dot-notation;

In [1]: %cpaste
Pasting code; enter '--' alone on the line to stop or use Ctrl-D.
:class AttrDict(dict):
:
:    def __getattr__(self, name):
:        return self[name]
:
:    def __setattr__(self, name, value):
:        self[name] = value
:--

In [2]: test = AttrDict()

In [3]: test.a = 1

In [4]: test.b = True

In [5]: test
Out[5]: {'a': 1, 'b': True}

回答 8

如果您想要与namedtuples类似的行为但可变,请尝试namedlist

注意,为了可变,它不能是元组。

If you want similar behavior as namedtuples but mutable try namedlist

Note that in order to be mutable it cannot be a tuple.


回答 9

如果性能并不重要,则可以使用如下愚蠢的方法:

from collection import namedtuple

Point = namedtuple('Point', 'x y z')
mutable_z = Point(1,2,[3])

Provided performance is of little importance, one could use a silly hack like:

from collection import namedtuple

Point = namedtuple('Point', 'x y z')
mutable_z = Point(1,2,[3])

Django-DB-Migrations:无法更改表,因为它具有未决的触发事件

问题:Django-DB-Migrations:无法更改表,因为它具有未决的触发事件

我想从TextField中删除null = True:

-    footer=models.TextField(null=True, blank=True)
+    footer=models.TextField(blank=True, default='')

我创建了一个架构迁移:

manage.py schemamigration fooapp --auto

由于某些页脚列包含,NULL因此error在运行迁移时会得到以下信息:

django.db.utils.IntegrityError:“页脚”列包含空值

我将其添加到架构迁移中:

    for sender in orm['fooapp.EmailSender'].objects.filter(footer=None):
        sender.footer=''
        sender.save()

现在我得到:

django.db.utils.DatabaseError: cannot ALTER TABLE "fooapp_emailsender" because it has pending trigger events

怎么了?

I want to remove null=True from a TextField:

-    footer=models.TextField(null=True, blank=True)
+    footer=models.TextField(blank=True, default='')

I created a schema migration:

manage.py schemamigration fooapp --auto

Since some footer columns contain NULL I get this error if I run the migration:

django.db.utils.IntegrityError: column “footer” contains null values

I added this to the schema migration:

    for sender in orm['fooapp.EmailSender'].objects.filter(footer=None):
        sender.footer=''
        sender.save()

Now I get:

django.db.utils.DatabaseError: cannot ALTER TABLE "fooapp_emailsender" because it has pending trigger events

What is wrong?


回答 0

造成这种情况的另一个原因可能是因为您尝试将一列设置为NOT NULL实际上已经具有NULL值的时间。

Another reason for this maybe because you try to set a column to NOT NULL when it actually already has NULL values.


回答 1

每次迁移都在事务内部。在PostgreSQL中,您不得在一个事务中更新表然后更改表模式。

您需要拆分数据迁移和架构迁移。首先使用以下代码创建数据迁移:

 for sender in orm['fooapp.EmailSender'].objects.filter(footer=None):
    sender.footer=''
    sender.save()

然后创建架构迁移:

manage.py schemamigration fooapp --auto

现在,您有两个事务,并且应该在两个步骤中进行迁移。

Every migration is inside a transaction. In PostgreSQL you must not update the table and then alter the table schema in one transaction.

You need to split the data migration and the schema migration. First create the data migration with this code:

 for sender in orm['fooapp.EmailSender'].objects.filter(footer=None):
    sender.footer=''
    sender.save()

Then create the schema migration:

manage.py schemamigration fooapp --auto

Now you have two transactions and the migration in two steps should work.


回答 2

刚刚遇到这个问题。您还可以在模式迁移中使用db.start_transaction()和db.commit_transaction()将数据更改与模式更改分开。可能不那么干净,无法进行单独的数据迁移,但是在我的情况下,我需要架构,数据,然后再进行另一种架构迁移,因此我决定一次完成所有操作。

Have just hit this problem. You can also use db.start_transaction() and db.commit_transaction() in the schema migration to separate data changes from schema changes. Probably not so clean as to have a separate data migration but in my case I would need schema, data, and then another schema migration so I decided to do it all at once.


回答 3

在操作中,我将SET约束:

operations = [
    migrations.RunSQL('SET CONSTRAINTS ALL IMMEDIATE;'),
    migrations.RunPython(migration_func),
    migrations.RunSQL('SET CONSTRAINTS ALL DEFERRED;'),
]

At the operations I put SET CONSTRAINTS:

operations = [
    migrations.RunSQL('SET CONSTRAINTS ALL IMMEDIATE;'),
    migrations.RunPython(migration_func),
    migrations.RunSQL('SET CONSTRAINTS ALL DEFERRED;'),
]

回答 4

您正在更改列架构。该页脚列不能再包含空白值。该列中很可能已经有空白值存储在数据库中。Django将使用migration命令将数据库中的空白行从空白更新为现在的默认值。Django尝试更新页脚列为空值的行,并在看起来相同的同时更改架构(我不确定)。

问题是您无法更改试图同时更新其值的列模式。

一种解决方案是删除迁移文件以更新架构。然后,运行脚本以将所有这些值更新为默认值。然后重新运行迁移以更新架构。这样,更新已完成。Django迁移仅更改架构。

You are altering the column schema. That footer column can no longer contain a blank value. There are most likely blank values already stored in the DB for that column. Django is going to update those blank rows in your DB from blank to the now default value with the migrate command. Django tries to update the rows where footer column has a blank value and change the schema at the same time it seems (I’m not sure).

The problem is you can’t alter the same column schema you are trying to update the values for at the same time.

One solution would be to delete the migrations file updating the schema. Then, run a script to update all those values to your default value. Then re-run the migration to update the schema. This way, the update is already done. Django migration is only altering the schema.


像C#中的StringBuilder这样的Python字符串类?

问题:像C#中的StringBuilder这样的Python字符串类?

Python中是否像StringBuilderC#中一样有一些字符串类?

Is there some string class in Python like StringBuilder in C#?


回答 0

没有一对一的关联。对于非常好的文章,请参见Python中的高效字符串连接

使用Python编程语言构建长字符串有时会导致运行速度非常慢。在本文中,我研究了各种字符串连接方法的计算性能。

There is no one-to-one correlation. For a really good article please see Efficient String Concatenation in Python:

Building long strings in the Python progamming language can sometimes result in very slow running code. In this article I investigate the computational performance of various string concatenation methods.


回答 1

我使用了Oliver Crow的代码(由Andrew Hare给出的链接),并对其进行了一些修改以适应Python 2.7.3。(通过使用timeit包)。我在个人计算机Lenovo T61、6GB RAM,Debian GNU / Linux 6.0.6(挤压)上运行。

这是10,000次迭代的结果:

方法1:0.0538418292999秒
处理大小4800 kb
方法2:0.22602891922秒
处理大小4960 kb
method3:0.0605459213257秒
处理大小4980 kb
method4:0.0544030666351秒
处理大小5536 kb
method5:0.0551080703735秒
处理大小5272 kb
method6:0.0542731285095秒
处理大小5512 kb

并且进行了5,000,000次迭代(方法2被忽略了,因为它运行得太慢了,就像永远一样):

方法1:5.88603997231秒
处理大小37976 kb
方法3:8.40748500824秒
处理大小38024 kb
方法4:7.96380496025秒
程序大小321968 kb
方法5:8.03666186333秒
处理大小71720 kb
方法6:6.68192911148秒
处理大小38240 kb

很明显,Python的人在优化字符串连接方面做得非常出色,正如Hoare所说:“过早的优化是万恶之源” :-)

I have used the code of Oliver Crow (link given by Andrew Hare) and adapted it a bit to tailor Python 2.7.3. (by using timeit package). I ran on my personal computer, Lenovo T61, 6GB RAM, Debian GNU/Linux 6.0.6 (squeeze).

Here is the result for 10,000 iterations:

method1:  0.0538418292999 secs
process size 4800 kb
method2:  0.22602891922 secs
process size 4960 kb
method3:  0.0605459213257 secs
process size 4980 kb
method4:  0.0544030666351 secs
process size 5536 kb
method5:  0.0551080703735 secs
process size 5272 kb
method6:  0.0542731285095 secs
process size 5512 kb

and for 5,000,000 iterations (method 2 was ignored because it ran tooo slowly, like forever):

method1:  5.88603997231 secs
process size 37976 kb
method3:  8.40748500824 secs
process size 38024 kb
method4:  7.96380496025 secs
process size 321968 kb
method5:  8.03666186333 secs
process size 71720 kb
method6:  6.68192911148 secs
process size 38240 kb

It is quite obvious that Python guys have done pretty great job to optimize string concatenation, and as Hoare said: “premature optimization is the root of all evil” :-)


回答 2

依靠编译器优化是脆弱的。接受的答案中链接的基准和Antoine-tran给出的数字不可信。安德鲁·黑尔(Andrew Hare)错误地repr在其方法中包含了一个调用。这会平均降低所有方法的速度,但会掩盖构建字符串的实际代价。

使用join。它非常快速且更强大。

$ ipython3
Python 3.5.1 (default, Mar  2 2016, 03:38:02) 
IPython 4.1.2 -- An enhanced Interactive Python.

In [1]: values = [str(num) for num in range(int(1e3))]

In [2]: %%timeit
   ...: ''.join(values)
   ...: 
100000 loops, best of 3: 7.37 µs per loop

In [3]: %%timeit
   ...: result = ''
   ...: for value in values:
   ...:     result += value
   ...: 
10000 loops, best of 3: 82.8 µs per loop

In [4]: import io

In [5]: %%timeit
   ...: writer = io.StringIO()
   ...: for value in values:
   ...:     writer.write(value)
   ...: writer.getvalue()
   ...: 
10000 loops, best of 3: 81.8 µs per loop

Relying on compiler optimizations is fragile. The benchmarks linked in the accepted answer and numbers given by Antoine-tran are not to be trusted. Andrew Hare makes the mistake of including a call to repr in his methods. That slows all the methods equally but obscures the real penalty in constructing the string.

Use join. It’s very fast and more robust.

$ ipython3
Python 3.5.1 (default, Mar  2 2016, 03:38:02) 
IPython 4.1.2 -- An enhanced Interactive Python.

In [1]: values = [str(num) for num in range(int(1e3))]

In [2]: %%timeit
   ...: ''.join(values)
   ...: 
100000 loops, best of 3: 7.37 µs per loop

In [3]: %%timeit
   ...: result = ''
   ...: for value in values:
   ...:     result += value
   ...: 
10000 loops, best of 3: 82.8 µs per loop

In [4]: import io

In [5]: %%timeit
   ...: writer = io.StringIO()
   ...: for value in values:
   ...:     writer.write(value)
   ...: writer.getvalue()
   ...: 
10000 loops, best of 3: 81.8 µs per loop

回答 3

Python具有满足类似目的的几件事:

  • 从片段构建大字符串的一种常用方法是增长字符串列表,并在完成后将其加入。这是一个常用的Python习惯用法。
    • 要构建包含格式化数据的字符串,您需要单独进行格式化。
  • 为了在字符级别插入和删除,您将保留一个长度为一的字符串列表。(要通过字符串进行此操作,list(your_string)您可以调用。您也可以UserString.MutableString为此使用a 。
  • (c)StringIO.StringIO 对于原本会占用文件的内容很有用,但对于一般的字符串构建则没什么用。

Python has several things that fulfill similar purposes:

  • One common way to build large strings from pieces is to grow a list of strings and join it when you are done. This is a frequently-used Python idiom.
    • To build strings incorporating data with formatting, you would do the formatting separately.
  • For insertion and deletion at a character level, you would keep a list of length-one strings. (To make this from a string, you’d call list(your_string). You could also use a UserString.MutableString for this.
  • (c)StringIO.StringIO is useful for things that would otherwise take a file, but less so for general string building.

回答 4

从上面使用方法5(伪文件),我们可以获得非常好的性能和灵活性

from cStringIO import StringIO

class StringBuilder:
     _file_str = None

     def __init__(self):
         self._file_str = StringIO()

     def Append(self, str):
         self._file_str.write(str)

     def __str__(self):
         return self._file_str.getvalue()

现在使用它

sb = StringBuilder()

sb.Append("Hello\n")
sb.Append("World")

print sb

Using method 5 from above (The Pseudo File) we can get very good perf and flexibility

from cStringIO import StringIO

class StringBuilder:
     _file_str = None

     def __init__(self):
         self._file_str = StringIO()

     def Append(self, str):
         self._file_str.write(str)

     def __str__(self):
         return self._file_str.getvalue()

now using it

sb = StringBuilder()

sb.Append("Hello\n")
sb.Append("World")

print sb

回答 5

您可以尝试StringIOcStringIO


回答 6

没有显式的类似物-我认为您应该使用字符串串联(可能如前所述进行优化)或第三方类(我怀疑它们效率更高)-python中的列表是动态类型的,因此无法快速工作char []用于缓冲区(我假设)。由于许多语言中的字符串固有特性(不可变性),类似Stringbuilder的类不是过早的优化-允许进行许多优化(例如,为片段/子字符串引用相同的缓冲区)。类似于Stringbuilder / stringbuffer / stringstream的类的工作比连接字符串(产生许多仍需要分配和垃圾回收的小型临时对象)甚至字符串格式的类似于printf的工具要快得多,不需要解释格式化模式的开销,这对于很多格式调用。

There is no explicit analogue – i think you are expected to use string concatenations(likely optimized as said before) or third-party class(i doubt that they are a lot more efficient – lists in python are dynamic-typed so no fast-working char[] for buffer as i assume). Stringbuilder-like classes are not premature optimization because of innate feature of strings in many languages(immutability) – that allows many optimizations(for example, referencing same buffer for slices/substrings). Stringbuilder/stringbuffer/stringstream-like classes work a lot faster than concatenating strings(producing many small temporary objects that still need allocations and garbage collection) and even string formatting printf-like tools, not needing of interpreting formatting pattern overhead that is pretty consuming for a lot of format calls.


回答 7

如果您在这里寻找Python中的快速字符串连接方法,则不需要特殊的StringBuilder类。简单的串联也可以正常工作,而不会降低C#中的性能。

resultString = ""

resultString += "Append 1"
resultString += "Append 2"

有关性能结果,请参见Antoine-tran的答案

In case you are here looking for a fast string concatenation method in Python, then you do not need a special StringBuilder class. Simple concatenation works just as well without the performance penalty seen in C#.

resultString = ""

resultString += "Append 1"
resultString += "Append 2"

See Antoine-tran’s answer for performance results


Python可以测试列表中多个值的成员资格吗?

问题:Python可以测试列表中多个值的成员资格吗?

我想测试列表中是否有两个或多个值具有成员资格,但结果却出乎意料:

>>> 'a','b' in ['b', 'a', 'foo', 'bar']
('a', True)

那么,Python可以一次在列表中测试多个值的成员资格吗?结果是什么意思?

I want to test if two or more values have membership on a list, but I’m getting an unexpected result:

>>> 'a','b' in ['b', 'a', 'foo', 'bar']
('a', True)

So, Can Python test the membership of multiple values at once in a list? What does that result mean?


回答 0

这可以满足您的要求,并且几乎可以在所有情况下使用:

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])
True

该表达式'a','b' in ['b', 'a', 'foo', 'bar']无法按预期工作,因为Python将其解释为元组:

>>> 'a', 'b'
('a', 'b')
>>> 'a', 5 + 2
('a', 7)
>>> 'a', 'x' in 'xerxes'
('a', True)

其他选择

还有其他执行此测试的方法,但是它们不适用于许多不同种类的输入。正如Kabie指出的那样,您可以使用集合解决此问题…

>>> set(['a', 'b']).issubset(set(['a', 'b', 'foo', 'bar']))
True
>>> {'a', 'b'} <= {'a', 'b', 'foo', 'bar'}
True

…有时:

>>> {'a', ['b']} <= {'a', ['b'], 'foo', 'bar'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

只能使用可哈希元素创建集。但是生成器表达式all(x in container for x in items)几乎可以处理任何容器类型。唯一的要求是container可重复使用(即不是生成器)。items可以是任何可迭代的。

>>> container = [['b'], 'a', 'foo', 'bar']
>>> items = (i for i in ('a', ['b']))
>>> all(x in [['b'], 'a', 'foo', 'bar'] for x in items)
True

速度测试

在许多情况下,子集测试会比快all,但差异并不令人震惊-除非问题无关紧要,因为集合不是一个选择,除非。仅将列表转换为集合是为了进行这样的测试并不总是值得为此感到麻烦。而且,将生成器转换为集合有时会非常浪费,从而使程序运行速度降低了多个数量级。

这里有一些基准用于说明。最大的区别来当两个containeritems都比较小。在那种情况下,子集方法要快一个数量级:

>>> smallset = set(range(10))
>>> smallsubset = set(range(5))
>>> %timeit smallset >= smallsubset
110 ns ± 0.702 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit all(x in smallset for x in smallsubset)
951 ns ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

这看起来有很大的不同。但只要container是一组,all就可以在更大的规模上完美使用:

>>> bigset = set(range(100000))
>>> bigsubset = set(range(50000))
>>> %timeit bigset >= bigsubset
1.14 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit all(x in bigset for x in bigsubset)
5.96 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

使用子集测试仍然更快,但在这种规模下仅提高了约5倍。速度的提高归功于Python的快速c支持实现set,但是两种情况下的基本算法都是相同的。

如果items由于其他原因已经将您的信息存储在列表中,那么在使用子集测试方法之前,您必须将它们转换为集合。然后加速降到大约2.5倍:

>>> %timeit bigset >= set(bigsubseq)
2.1 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

而且,如果您container是一个序列,并且需要首先进行转换,那么加速会更小:

>>> %timeit set(bigseq) >= set(bigsubseq)
4.36 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

我们唯一灾难性地得到缓慢结果的时间是当我们container按顺序离开时:

>>> %timeit all(x in bigseq for x in bigsubseq)
184 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

当然,只有在必须的情况下,我们才会这样做。如果其中的所有项目bigseq都是可哈希的,那么我们将改为:

>>> %timeit bigset = set(bigseq); all(x in bigset for x in bigsubseq)
7.24 ms ± 78 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

这仅比替代方法(set(bigseq) >= set(bigsubseq),位于4.36以上的时间)快1.66倍。

因此,子集测试通常更快,但幅度并不惊人。另一方面,让我们看看何时all更快。如果items有一千万个值长,并且可能具有不存在的值container怎么办?

>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); set(bigset) >= set(hugeiter)
13.1 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); all(x in bigset for x in hugeiter)
2.33 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

在这种情况下,将生成器转换为生成器组非常浪费。该set构造具有消耗整个生成器。但是的捷径all确保了仅消耗一小部分生成器,因此它比子集测试快四个数量级

诚然,这是一个极端的例子。但正如它所显示的,您不能假设一种方法在所有情况下都更快。

结果

在大多数情况下container,至少当其所有元素都可哈希化时,转换为集合才是值得的。这是因为infor集为O(1),而insequence为O(n)。

另一方面,有时仅使用子集测试是值得的。如果您的测试项目已经存储在集中,则绝对可以这样做。否则,all只会慢一点,并且不需要任何其他存储。它也可以与大型项目生成器一起使用,在这种情况下有时可以大大提高速度。

This does what you want, and will work in nearly all cases:

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])
True

The expression 'a','b' in ['b', 'a', 'foo', 'bar'] doesn’t work as expected because Python interprets it as a tuple:

>>> 'a', 'b'
('a', 'b')
>>> 'a', 5 + 2
('a', 7)
>>> 'a', 'x' in 'xerxes'
('a', True)

Other Options

There are other ways to execute this test, but they won’t work for as many different kinds of inputs. As Kabie points out, you can solve this problem using sets…

>>> set(['a', 'b']).issubset(set(['a', 'b', 'foo', 'bar']))
True
>>> {'a', 'b'} <= {'a', 'b', 'foo', 'bar'}
True

…sometimes:

>>> {'a', ['b']} <= {'a', ['b'], 'foo', 'bar'}
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'list'

Sets can only be created with hashable elements. But the generator expression all(x in container for x in items) can handle almost any container type. The only requirement is that container be re-iterable (i.e. not a generator). items can be any iterable at all.

>>> container = [['b'], 'a', 'foo', 'bar']
>>> items = (i for i in ('a', ['b']))
>>> all(x in [['b'], 'a', 'foo', 'bar'] for x in items)
True

Speed Tests

In many cases, the subset test will be faster than all, but the difference isn’t shocking — except when the question is irrelevant because sets aren’t an option. Converting lists to sets just for the purpose of a test like this won’t always be worth the trouble. And converting generators to sets can sometimes be incredibly wasteful, slowing programs down by many orders of magnitude.

Here are a few benchmarks for illustration. The biggest difference comes when both container and items are relatively small. In that case, the subset approach is about an order of magnitude faster:

>>> smallset = set(range(10))
>>> smallsubset = set(range(5))
>>> %timeit smallset >= smallsubset
110 ns ± 0.702 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each)
>>> %timeit all(x in smallset for x in smallsubset)
951 ns ± 11.5 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

This looks like a big difference. But as long as container is a set, all is still perfectly usable at vastly larger scales:

>>> bigset = set(range(100000))
>>> bigsubset = set(range(50000))
>>> %timeit bigset >= bigsubset
1.14 ms ± 13.9 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
>>> %timeit all(x in bigset for x in bigsubset)
5.96 ms ± 37 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Using subset testing is still faster, but only by about 5x at this scale. The speed boost is due to Python’s fast c-backed implementation of set, but the fundamental algorithm is the same in both cases.

If your items are already stored in a list for other reasons, then you’ll have to convert them to a set before using the subset test approach. Then the speedup drops to about 2.5x:

>>> %timeit bigset >= set(bigsubseq)
2.1 ms ± 49.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

And if your container is a sequence, and needs to be converted first, then the speedup is even smaller:

>>> %timeit set(bigseq) >= set(bigsubseq)
4.36 ms ± 31.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

The only time we get disastrously slow results is when we leave container as a sequence:

>>> %timeit all(x in bigseq for x in bigsubseq)
184 ms ± 994 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

And of course, we’ll only do that if we must. If all the items in bigseq are hashable, then we’ll do this instead:

>>> %timeit bigset = set(bigseq); all(x in bigset for x in bigsubseq)
7.24 ms ± 78 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

That’s just 1.66x faster than the alternative (set(bigseq) >= set(bigsubseq), timed above at 4.36).

So subset testing is generally faster, but not by an incredible margin. On the other hand, let’s look at when all is faster. What if items is ten-million values long, and is likely to have values that aren’t in container?

>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); set(bigset) >= set(hugeiter)
13.1 s ± 167 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
>>> %timeit hugeiter = (x * 10 for bss in [bigsubseq] * 2000 for x in bss); all(x in bigset for x in hugeiter)
2.33 ms ± 65.2 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Converting the generator into a set turns out to be incredibly wasteful in this case. The set constructor has to consume the entire generator. But the short-circuiting behavior of all ensures that only a small portion of the generator needs to be consumed, so it’s faster than a subset test by four orders of magnitude.

This is an extreme example, admittedly. But as it shows, you can’t assume that one approach or the other will be faster in all cases.

The Upshot

Most of the time, converting container to a set is worth it, at least if all its elements are hashable. That’s because in for sets is O(1), while in for sequences is O(n).

On the other hand, using subset testing is probably only worth it sometimes. Definitely do it if your test items are already stored in a set. Otherwise, all is only a little slower, and doesn’t require any additional storage. It can also be used with large generators of items, and sometimes provides a massive speedup in that case.


回答 1

另一种方法是:

>>> set(['a','b']).issubset( ['b','a','foo','bar'] )
True

Another way to do it:

>>> set(['a','b']).issubset( ['b','a','foo','bar'] )
True

回答 2

我敢肯定,in它具有更高的优先级,,因此您的语句被解释为'a', ('b' in ['b' ...]),然后'a', True由于该'b'值在数组中而被求值。

请参阅先前的答案以了解如何做自己想做的事情。

I’m pretty sure in is having higher precedence than , so your statement is being interpreted as 'a', ('b' in ['b' ...]), which then evaluates to 'a', True since 'b' is in the array.

See previous answer for how to do what you want.


回答 3

如果您要检查所有输入匹配项

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

如果您想检查至少一场比赛

>>> any(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

If you want to check all of your input matches,

>>> all(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

if you want to check at least one match,

>>> any(x in ['b', 'a', 'foo', 'bar'] for x in ['a', 'b'])

回答 4

Python解析器将该语句评估为元组,其中第一个值为'a',第二个值为表达式'b' in ['b', 'a', 'foo', 'bar'](其值为True)。

您可以编写一个简单的函数来完成您想要的事情,但是:

def all_in(candidates, sequence):
    for element in candidates:
        if element not in sequence:
            return False
    return True

并这样称呼:

>>> all_in(('a', 'b'), ['b', 'a', 'foo', 'bar'])
True

The Python parser evaluated that statement as a tuple, where the first value was 'a', and the second value is the expression 'b' in ['b', 'a', 'foo', 'bar'] (which evaluates to True).

You can write a simple function do do what you want, though:

def all_in(candidates, sequence):
    for element in candidates:
        if element not in sequence:
            return False
    return True

And call it like:

>>> all_in(('a', 'b'), ['b', 'a', 'foo', 'bar'])
True

回答 5

[x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]

我认为这比选择的答案更好的原因是,您确实不需要调用“ all()”函数。在IF语句中,空列表的值为False,非空列表的值为True。

if [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]:
    ...Do something...

例:

>>> [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]
['a', 'b']
>>> [x for x in ['G','F'] if x in ['b', 'a', 'foo', 'bar']]
[]
[x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]

The reason I think this is better than the chosen answer is that you really don’t need to call the ‘all()’ function. Empty list evaluates to False in IF statements, non-empty list evaluates to True.

if [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]:
    ...Do something...

Example:

>>> [x for x in ['a','b'] if x in ['b', 'a', 'foo', 'bar']]
['a', 'b']
>>> [x for x in ['G','F'] if x in ['b', 'a', 'foo', 'bar']]
[]

回答 6

我想说,我们甚至可以将那些方括号排除在外。

array = ['b', 'a', 'foo', 'bar']
all([i in array for i in 'a', 'b'])

I would say we can even leave those square brackets out.

array = ['b', 'a', 'foo', 'bar']
all([i in array for i in 'a', 'b'])

回答 7

这里给出的两个答案都不会处理重复的元素。例如,如果要测试[1,2,2]是否是[1,2,3,4]的子列表,则两者都将返回True。那可能就是您的意思,但是我只是想澄清一下。如果要为[1,2,3,4]中的[1,2,2]返回false,则需要对两个列表进行排序,并在每个列表上检查带有移动索引的每个项目。for循环稍微复杂一点。

Both of the answers presented here will not handle repeated elements. For example, if you are testing whether [1,2,2] is a sublist of [1,2,3,4], both will return True. That may be what you mean to do, but I just wanted to clarify. If you want to return false for [1,2,2] in [1,2,3,4], you would need to sort both lists and check each item with a moving index on each list. Just a slightly more complicated for loop.


回答 8

没有lambdas,怎么能成为pythonic!..不能被认真对待..但是这种方式也适用:

orig_array = [ ..... ]
test_array = [ ... ]

filter(lambda x:x in test_array, orig_array) == test_array

如果要测试数组中是否包含任何值,请省略结尾部分:

filter(lambda x:x in test_array, orig_array)

how can you be pythonic without lambdas! .. not to be taken seriously .. but this way works too:

orig_array = [ ..... ]
test_array = [ ... ]

filter(lambda x:x in test_array, orig_array) == test_array

leave out the end part if you want to test if any of the values are in the array:

filter(lambda x:x in test_array, orig_array)

回答 9

这是我的做法:

A = ['a','b','c']
B = ['c']
logic = [(x in B) for x in A]
if True in logic:
    do something

Here’s how I did it:

A = ['a','b','c']
B = ['c']
logic = [(x in B) for x in A]
if True in logic:
    do something

获取熊猫应用函数中的行的索引

问题:获取熊猫应用函数中的行的索引

我正在尝试在整个DataFramePandas中应用的函数中访问行的索引。我有这样的事情:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df
   a  b  c
0  1  2  3
1  4  5  6

我将定义一个函数来访问给定行的元素

def rowFunc(row):
    return row['a'] + row['b'] * row['c']

我可以这样应用它:

df['d'] = df.apply(rowFunc, axis=1)
>>> df
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

太棒了!现在,如果我想将索引合并到函数中怎么办?DataFrame在添加之前,该行中任何给定行的索引都d将是Index([u'a', u'b', u'c', u'd'], dtype='object'),但是我想要0和1。所以我不能只访问row.index

我知道我可以在存储索引的表中创建一个临时列,但是我想知道它是否存储在行对象的某个地方。

I am trying to access the index of a row in a function applied across an entire DataFrame in Pandas. I have something like this:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df
   a  b  c
0  1  2  3
1  4  5  6

and I’ll define a function that access elements with a given row

def rowFunc(row):
    return row['a'] + row['b'] * row['c']

I can apply it like so:

df['d'] = df.apply(rowFunc, axis=1)
>>> df
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

Awesome! Now what if I want to incorporate the index into my function? The index of any given row in this DataFrame before adding d would be Index([u'a', u'b', u'c', u'd'], dtype='object'), but I want the 0 and 1. So I can’t just access row.index.

I know I could create a temporary column in the table where I store the index, but I’m wondering if it is stored in the row object somewhere.


回答 0

在这种情况下,要访问索引,请访问name属性:

In [182]:

df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
def rowFunc(row):
    return row['a'] + row['b'] * row['c']

def rowIndex(row):
    return row.name
df['d'] = df.apply(rowFunc, axis=1)
df['rowIndex'] = df.apply(rowIndex, axis=1)
df
Out[182]:
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

请注意,如果这确实是您要尝试执行的操作,则可以使用以下命令并且速度更快:

In [198]:

df['d'] = df['a'] + df['b'] * df['c']
df
Out[198]:
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

In [199]:

%timeit df['a'] + df['b'] * df['c']
%timeit df.apply(rowIndex, axis=1)
10000 loops, best of 3: 163 µs per loop
1000 loops, best of 3: 286 µs per loop

编辑

3年后再看这个问题,您可以这样做:

In[15]:
df['d'],df['rowIndex'] = df['a'] + df['b'] * df['c'], df.index
df

Out[15]: 
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

但是假设它并不那么简单,无论您rowFunc实际上在做什么,您都应该使用向量化函数,然后针对df索引使用它们:

In[16]:
df['newCol'] = df['a'] + df['b'] + df['c'] + df.index
df

Out[16]: 
   a  b  c   d  rowIndex  newCol
0  1  2  3   7         0       6
1  4  5  6  34         1      16

To access the index in this case you access the name attribute:

In [182]:

df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
def rowFunc(row):
    return row['a'] + row['b'] * row['c']

def rowIndex(row):
    return row.name
df['d'] = df.apply(rowFunc, axis=1)
df['rowIndex'] = df.apply(rowIndex, axis=1)
df
Out[182]:
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

Note that if this is really what you are trying to do that the following works and is much faster:

In [198]:

df['d'] = df['a'] + df['b'] * df['c']
df
Out[198]:
   a  b  c   d
0  1  2  3   7
1  4  5  6  34

In [199]:

%timeit df['a'] + df['b'] * df['c']
%timeit df.apply(rowIndex, axis=1)
10000 loops, best of 3: 163 µs per loop
1000 loops, best of 3: 286 µs per loop

EDIT

Looking at this question 3+ years later, you could just do:

In[15]:
df['d'],df['rowIndex'] = df['a'] + df['b'] * df['c'], df.index
df

Out[15]: 
   a  b  c   d  rowIndex
0  1  2  3   7         0
1  4  5  6  34         1

but assuming it isn’t as trivial as this, whatever your rowFunc is really doing, you should look to use the vectorised functions, and then use them against the df index:

In[16]:
df['newCol'] = df['a'] + df['b'] + df['c'] + df.index
df

Out[16]: 
   a  b  c   d  rowIndex  newCol
0  1  2  3   7         0       6
1  4  5  6  34         1      16

回答 1

要么:

1.与row.name内线apply(..., axis=1)通话:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y'])

   a  b  c
x  1  2  3
y  4  5  6

df.apply(lambda row: row.name, axis=1)

x    x
y    y

2.与iterrows()(较慢)

DataFrame.iterrows()允许您遍历行并访问其索引:

for idx, row in df.iterrows():
    ...

Either:

1. with row.name inside the apply(..., axis=1) call:

df = pandas.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'], index=['x','y'])

   a  b  c
x  1  2  3
y  4  5  6

df.apply(lambda row: row.name, axis=1)

x    x
y    y

2. with iterrows() (slower)

DataFrame.iterrows() allows you to iterate over rows, and access their index:

for idx, row in df.iterrows():
    ...

回答 2

要回答原始问题:是的,您可以在中访问行的索引值apply()。它在键下可用,name并且需要您指定axis=1(因为lambda处理行的列而不是列的行)。

工作示例(熊猫0.23.4):

>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df.set_index('a', inplace=True)
>>> df
   b  c
a      
1  2  3
4  5  6
>>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
>>> df
   b  c  index_x10
a                 
1  2  3         10
4  5  6         40

To answer the original question: yes, you can access the index value of a row in apply(). It is available under the key name and requires that you specify axis=1 (because the lambda processes the columns of a row and not the rows of a column).

Working example (pandas 0.23.4):

>>> import pandas as pd
>>> df = pd.DataFrame([[1,2,3],[4,5,6]], columns=['a','b','c'])
>>> df.set_index('a', inplace=True)
>>> df
   b  c
a      
1  2  3
4  5  6
>>> df['index_x10'] = df.apply(lambda row: 10*row.name, axis=1)
>>> df
   b  c  index_x10
a                 
1  2  3         10
4  5  6         40