标签归档:Python

创建重复N次的单个项目的列表

问题:创建重复N次的单个项目的列表

我想创建一系列长度不一的列表。每个列表将包含相同的元素e,重复n次数(其中n=列表的长度)。

如何创建列表,而不[e for number in xrange(n)]对每个列表使用列表理解?

I want to create a series of lists, all of varying lengths. Each list will contain the same element e, repeated n times (where n = length of the list).

How do I create the lists, without using a list comprehension [e for number in xrange(n)] for each list?


回答 0

您还可以编写:

[e] * n

您应该注意,例如,如果e是一个空列表,您将得到一个具有n个指向同一列表的引用的列表,而不是n个独立的空列表。

性能测试

乍看之下,似乎是重复是创建一个具有n个相同的元素列表的最快方法:

>>> timeit.timeit('itertools.repeat(0, 10)', 'import itertools', number = 1000000)
0.37095273281943264
>>> timeit.timeit('[0] * 10', 'import itertools', number = 1000000)
0.5577236771712819

但是等等-这不是一个公平的测试…

>>> itertools.repeat(0, 10)
repeat(0, 10)  # Not a list!!!

该函数itertools.repeat实际上并没有创建列表,它只是创建一个对象,您可以根据需要使用该对象来创建列表!让我们再试一次,但转换为列表:

>>> timeit.timeit('list(itertools.repeat(0, 10))', 'import itertools', number = 1000000)
1.7508119747063233

因此,如果您想要列表,请使用[e] * n。如果要延迟生成元素,请使用repeat

You can also write:

[e] * n

You should note that if e is for example an empty list you get a list with n references to the same list, not n independent empty lists.

Performance testing

At first glance it seems that repeat is the fastest way to create a list with n identical elements:

>>> timeit.timeit('itertools.repeat(0, 10)', 'import itertools', number = 1000000)
0.37095273281943264
>>> timeit.timeit('[0] * 10', 'import itertools', number = 1000000)
0.5577236771712819

But wait – it’s not a fair test…

>>> itertools.repeat(0, 10)
repeat(0, 10)  # Not a list!!!

The function itertools.repeat doesn’t actually create the list, it just creates an object that can be used to create a list if you wish! Let’s try that again, but converting to a list:

>>> timeit.timeit('list(itertools.repeat(0, 10))', 'import itertools', number = 1000000)
1.7508119747063233

So if you want a list, use [e] * n. If you want to generate the elements lazily, use repeat.


回答 1

>>> [5] * 4
[5, 5, 5, 5]

当重复的项目是列表时,请当心。该列表将不会被克隆:所有元素都将引用同一列表!

>>> x=[5]
>>> y=[x] * 4
>>> y
[[5], [5], [5], [5]]
>>> y[0][0] = 6
>>> y
[[6], [6], [6], [6]]
>>> [5] * 4
[5, 5, 5, 5]

Be careful when the item being repeated is a list. The list will not be cloned: all the elements will refer to the same list!

>>> x=[5]
>>> y=[x] * 4
>>> y
[[5], [5], [5], [5]]
>>> y[0][0] = 6
>>> y
[[6], [6], [6], [6]]

回答 2

在Python中创建重复n次的单项列表

不变物品

对于不可变的项目,例如“无”,布尔值,整数,浮点数,字符串,元组或Frozensets,可以这样进行:

[e] * 4

请注意,这最好仅与列表中的不可变项(字符串,元组,frozensets)一起使用,因为它们都指向内存中同一位置的同一项。当我必须构建一个包含所有字符串的架构的表时,我会经常使用它,这样就不必提供高度冗余的一对一映射。

schema = ['string'] * len(columns)

可变项

我已经使用Python很长时间了,而且从未见过用可变实例执行上述操作的用例。相反,要获取可变的空列表,集合或字典,您应该执行以下操作:

list_of_lists = [[] for _ in columns]

在这种情况下,下划线只是一个简单的变量名。

如果只有号码,那将是:

list_of_lists = [[] for _ in range(4)]

_不是真的很特别,但你的编码环境风格检查可能会抱怨,如果你不打算使用的变量和使用的其他任何名称。


对可变项使用不可变方法的注意事项:

当心使用可变对象,当您更改其中一个对象时,它们都会更改,因为它们都是同一对象:

foo = [[]] * 4
foo[0].append('x')

foo现在返回:

[['x'], ['x'], ['x'], ['x']]

但是对于不可变的对象,您可以使其起作用,因为您可以更改引用,而不是对象:

>>> l = [0] * 4
>>> l[0] += 1
>>> l
[1, 0, 0, 0]

>>> l = [frozenset()] * 4
>>> l[0] |= set('abc')
>>> l
[frozenset(['a', 'c', 'b']), frozenset([]), frozenset([]), frozenset([])]

但同样,可变对象对此没有好处,因为就地操作会更改对象,而不是引用:

l = [set()] * 4
>>> l[0] |= set('abc')    
>>> l
[set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b'])]

Create List of Single Item Repeated n Times in Python

Immutable items

For immutable items, like None, bools, ints, floats, strings, tuples, or frozensets, you can do it like this:

[e] * 4

Note that this is best only used with immutable items (strings, tuples, frozensets, ) in the list, because they all point to the same item in the same place in memory. I use this frequently when I have to build a table with a schema of all strings, so that I don’t have to give a highly redundant one to one mapping.

schema = ['string'] * len(columns)

Mutable items

I’ve used Python for a long time now, and I have never seen a use-case where I would do the above with a mutable instance. Instead, to get, say, a mutable empty list, set, or dict, you should do something like this:

list_of_lists = [[] for _ in columns]

The underscore is simply a throwaway variable name in this context.

If you only have the number, that would be:

list_of_lists = [[] for _ in range(4)]

The _ is not really special, but your coding environment style checker will probably complain if you don’t intend to use the variable and use any other name.


Caveats for using the immutable method with mutable items:

Beware doing this with mutable objects, when you change one of them, they all change because they’re all the same object:

foo = [[]] * 4
foo[0].append('x')

foo now returns:

[['x'], ['x'], ['x'], ['x']]

But with immutable objects, you can make it work because you change the reference, not the object:

>>> l = [0] * 4
>>> l[0] += 1
>>> l
[1, 0, 0, 0]

>>> l = [frozenset()] * 4
>>> l[0] |= set('abc')
>>> l
[frozenset(['a', 'c', 'b']), frozenset([]), frozenset([]), frozenset([])]

But again, mutable objects are no good for this, because in-place operations change the object, not the reference:

l = [set()] * 4
>>> l[0] |= set('abc')    
>>> l
[set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b']), set(['a', 'c', 'b'])]

回答 3

Itertools具有此功能:

import itertools
it = itertools.repeat(e,n)

当然可以itertools为您提供迭代器,而不是列表。[e] * n为您提供了一个列表,但是根据您对这些序列的处理方式,itertools变体可能会更加有效。

Itertools has a function just for that:

import itertools
it = itertools.repeat(e,n)

Of course itertools gives you a iterator instead of a list. [e] * n gives you a list, but, depending on what you will do with those sequences, the itertools variant can be much more efficient.


回答 4

正如其他人指出的那样,对可变对象使用*运算符会重复引用,因此,如果更改一个,则会全部更改。如果要创建可变对象的独立实例,则xrange语法是执行此操作的最Python方式。如果您有一个从未使用过的命名变量而感到困扰,则可以使用匿名下划线变量。

[e for _ in xrange(n)]

As others have pointed out, using the * operator for a mutable object duplicates references, so if you change one you change them all. If you want to create independent instances of a mutable object, your xrange syntax is the most Pythonic way to do this. If you are bothered by having a named variable that is never used, you can use the anonymous underscore variable.

[e for _ in xrange(n)]

回答 5

[e] * n

应该管用

[e] * n

should work


从Python调用C / C ++?

问题:从Python调用C / C ++?

构造与C或C ++库的Python绑定的最快方法是什么?

(如果这很重要,我正在使用Windows。)

What would be the quickest way to construct a Python binding to a C or C++ library?

(I am using Windows if this matters.)


回答 0

您应该看看Boost.Python。以下是他们网站上的简短介绍:

Boost Python库是用于连接Python和C ++的框架。它使您可以快速而无缝地将C ++类的函数和对象暴露给Python,反之亦然,而无需使用特殊工具-仅使用C ++编译器即可。它旨在非侵入性地包装C ++接口,因此您不必为了包装而完全更改C ++代码,从而使Boost.Python成为将第三方库公开给Python的理想选择。该库对高级元编程技术的使用简化了用户的语法,因此包装代码具有一种声明性接口定义语言(IDL)的外观。

You should have a look at Boost.Python. Here is the short introduction taken from their website:

The Boost Python Library is a framework for interfacing Python and C++. It allows you to quickly and seamlessly expose C++ classes functions and objects to Python, and vice-versa, using no special tools — just your C++ compiler. It is designed to wrap C++ interfaces non-intrusively, so that you should not have to change the C++ code at all in order to wrap it, making Boost.Python ideal for exposing 3rd-party libraries to Python. The library’s use of advanced metaprogramming techniques simplifies its syntax for users, so that wrapping code takes on the look of a kind of declarative interface definition language (IDL).


回答 1

ctypes模块是标准库的一部分,因此比swig更稳定和更易于使用,而swig总是会给我带来麻烦

使用ctypes时,您需要满足对python的任何编译时依赖性,并且绑定将对任何具有ctypes的python起作用,而不仅仅是针对ctypes的python。

假设您要在一个名为foo.cpp的文件中讨论一个简单的C ++示例类:

#include <iostream>

class Foo{
    public:
        void bar(){
            std::cout << "Hello" << std::endl;
        }
};

由于ctypes只能与C函数对话,因此您需要提供将其声明为extern“ C”的那些函数

extern "C" {
    Foo* Foo_new(){ return new Foo(); }
    void Foo_bar(Foo* foo){ foo->bar(); }
}

接下来,您必须将其编译为共享库

g++ -c -fPIC foo.cpp -o foo.o
g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

最后,您必须编写python包装器(例如,在fooWrapper.py中)

from ctypes import cdll
lib = cdll.LoadLibrary('./libfoo.so')

class Foo(object):
    def __init__(self):
        self.obj = lib.Foo_new()

    def bar(self):
        lib.Foo_bar(self.obj)

一旦有了,您可以像这样称呼它

f = Foo()
f.bar() #and you will see "Hello" on the screen

ctypes module is part of the standard library, and therefore is more stable and widely available than swig, which always tended to give me problems.

With ctypes, you need to satisfy any compile time dependency on python, and your binding will work on any python that has ctypes, not just the one it was compiled against.

Suppose you have a simple C++ example class you want to talk to in a file called foo.cpp:

#include <iostream>

class Foo{
    public:
        void bar(){
            std::cout << "Hello" << std::endl;
        }
};

Since ctypes can only talk to C functions, you need to provide those declaring them as extern “C”

extern "C" {
    Foo* Foo_new(){ return new Foo(); }
    void Foo_bar(Foo* foo){ foo->bar(); }
}

Next you have to compile this to a shared library

g++ -c -fPIC foo.cpp -o foo.o
g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

And finally you have to write your python wrapper (e.g. in fooWrapper.py)

from ctypes import cdll
lib = cdll.LoadLibrary('./libfoo.so')

class Foo(object):
    def __init__(self):
        self.obj = lib.Foo_new()

    def bar(self):
        lib.Foo_bar(self.obj)

Once you have that you can call it like

f = Foo()
f.bar() #and you will see "Hello" on the screen

回答 2

最快的方法是使用SWIG

来自SWIG 教程的示例

/* File : example.c */
int fact(int n) {
    if (n <= 1) return 1;
    else return n*fact(n-1);
}

接口文件:

/* example.i */
%module example
%{
/* Put header files here or function declarations like below */
extern int fact(int n);
%}

extern int fact(int n);

在Unix上构建Python模块:

swig -python example.i
gcc -fPIC -c example.c example_wrap.c -I/usr/local/include/python2.7
gcc -shared example.o example_wrap.o -o _example.so

用法:

>>> import example
>>> example.fact(5)
120

请注意,您必须具有python-dev。同样在某些系统中,python头文件会根据您的安装方式位于/usr/include/python2.7中。

从教程中:

SWIG是一个相当完整的C ++编译器,几乎支持所有语言功能。这包括预处理,指针,类,继承,甚至C ++模板。SWIG还可以用于以目标语言将结构和类打包为代理类,从而以非常自然的方式公开基础功能。

The quickest way to do this is using SWIG.

Example from SWIG tutorial:

/* File : example.c */
int fact(int n) {
    if (n <= 1) return 1;
    else return n*fact(n-1);
}

Interface file:

/* example.i */
%module example
%{
/* Put header files here or function declarations like below */
extern int fact(int n);
%}

extern int fact(int n);

Building a Python module on Unix:

swig -python example.i
gcc -fPIC -c example.c example_wrap.c -I/usr/local/include/python2.7
gcc -shared example.o example_wrap.o -o _example.so

Usage:

>>> import example
>>> example.fact(5)
120

Note that you have to have python-dev. Also in some systems python header files will be in /usr/include/python2.7 based on the way you have installed it.

From the tutorial:

SWIG is a fairly complete C++ compiler with support for nearly every language feature. This includes preprocessing, pointers, classes, inheritance, and even C++ templates. SWIG can also be used to package structures and classes into proxy classes in the target language — exposing the underlying functionality in a very natural manner.


回答 3

我从这一页的Python <-> C ++绑定开始了我的旅程,目的是链接高级数据类型(带有Python列表的多维STL向量):-)

尝试过基于ctypesboost.python的解决方案(并且不是软件工程师),当需要高级数据类型绑定时,我发现它们很复杂,而对于这种情况,我发现SWIG更加简单。

因此,该示例使用了SWIG,并且已经在Linux中进行了测试(但是SWIG可用,并且在Windows中也被广泛使用)。

目的是使C ++函数可用于Python,该函数采用2D STL向量形式的矩阵并返回每一行的平均值(作为1D STL向量)。

C ++中的代码(“ code.cpp”)如下:

#include <vector>
#include "code.h"

using namespace std;

vector<double> average (vector< vector<double> > i_matrix) {

  // Compute average of each row..
  vector <double> averages;
  for (int r = 0; r < i_matrix.size(); r++){
    double rsum = 0.0;
    double ncols= i_matrix[r].size();
    for (int c = 0; c< i_matrix[r].size(); c++){
      rsum += i_matrix[r][c];
    }
    averages.push_back(rsum/ncols);
  }
  return averages;
}

等效的标头(“ code.h”)为:

#ifndef _code
#define _code

#include <vector>

std::vector<double> average (std::vector< std::vector<double> > i_matrix);

#endif

我们首先编译C ++代码以创建目标文件:

g++ -c -fPIC code.cpp

然后,我们为C ++函数定义一个SWIG接口定义文件(“ code.i”)。

%module code
%{
#include "code.h"
%}
%include "std_vector.i"
namespace std {

  /* On a side note, the names VecDouble and VecVecdouble can be changed, but the order of first the inner vector matters! */
  %template(VecDouble) vector<double>;
  %template(VecVecdouble) vector< vector<double> >;
}

%include "code.h"

使用SWIG,我们从SWIG接口定义文件生成C ++接口源代码。

swig -c++ -python code.i

最后,我们编译生成的C ++接口源文件,并将所有内容链接在一起,以生成可由Python直接导入的共享库(“ _”很重要):

g++ -c -fPIC code_wrap.cxx  -I/usr/include/python2.7 -I/usr/lib/python2.7
g++ -shared -Wl,-soname,_code.so -o _code.so code.o code_wrap.o

现在,我们可以在Python脚本中使用该函数:

#!/usr/bin/env python

import code
a= [[3,5,7],[8,10,12]]
print a
b = code.average(a)
print "Assignment done"
print a
print b

I started my journey in the Python <-> C++ binding from this page, with the objective of linking high level data types (multidimensional STL vectors with Python lists) :-)

Having tried the solutions based on both ctypes and boost.python (and not being a software engineer) I have found them complex when high level datatypes binding is required, while I have found SWIG much more simple for such cases.

This example uses therefore SWIG, and it has been tested in Linux (but SWIG is available and is widely used in Windows too).

The objective is to make a C++ function available to Python that takes a matrix in form of a 2D STL vector and returns an average of each row (as a 1D STL vector).

The code in C++ (“code.cpp”) is as follow:

#include <vector>
#include "code.h"

using namespace std;

vector<double> average (vector< vector<double> > i_matrix) {

  // Compute average of each row..
  vector <double> averages;
  for (int r = 0; r < i_matrix.size(); r++){
    double rsum = 0.0;
    double ncols= i_matrix[r].size();
    for (int c = 0; c< i_matrix[r].size(); c++){
      rsum += i_matrix[r][c];
    }
    averages.push_back(rsum/ncols);
  }
  return averages;
}

The equivalent header (“code.h”) is:

#ifndef _code
#define _code

#include <vector>

std::vector<double> average (std::vector< std::vector<double> > i_matrix);

#endif

We first compile the C++ code to create an object file:

g++ -c -fPIC code.cpp

We then define a SWIG interface definition file (“code.i”) for our C++ functions.

%module code
%{
#include "code.h"
%}
%include "std_vector.i"
namespace std {

  /* On a side note, the names VecDouble and VecVecdouble can be changed, but the order of first the inner vector matters! */
  %template(VecDouble) vector<double>;
  %template(VecVecdouble) vector< vector<double> >;
}

%include "code.h"

Using SWIG, we generate a C++ interface source code from the SWIG interface definition file..

swig -c++ -python code.i

We finally compile the generated C++ interface source file and link everything together to generate a shared library that is directly importable by Python (the “_” matters):

g++ -c -fPIC code_wrap.cxx  -I/usr/include/python2.7 -I/usr/lib/python2.7
g++ -shared -Wl,-soname,_code.so -o _code.so code.o code_wrap.o

We can now use the function in Python scripts:

#!/usr/bin/env python

import code
a= [[3,5,7],[8,10,12]]
print a
b = code.average(a)
print "Assignment done"
print a
print b

回答 4

还有pybind11,就像Boost.Python的轻量级版本,并且与所有现代C ++编译器兼容:

https://pybind11.readthedocs.io/en/latest/

There is also pybind11, which is like a lightweight version of Boost.Python and compatible with all modern C++ compilers:

https://pybind11.readthedocs.io/en/latest/


回答 5

检查出pyrexCython。它们是类似于Python的语言,用于C / C ++和Python之间的接口。

Check out pyrex or Cython. They’re Python-like languages for interfacing between C/C++ and Python.


回答 6

对于现代C ++,请使用cppyy: http

它基于Cling / Clang / LLVM的C ++解释器。绑定是在运行时执行的,不需要其他中间语言。感谢Clang,它支持C ++ 17。

使用pip安装它:

    $ pip install cppyy

对于小型项目,只需加载相关的库和您感兴趣的标头。例如,从ctypes示例中获取代码就是该线程,但分为标头和代码部分:

    $ cat foo.h
    class Foo {
    public:
        void bar();
    };

    $ cat foo.cpp
    #include "foo.h"
    #include <iostream>

    void Foo::bar() { std::cout << "Hello" << std::endl; }

编译:

    $ g++ -c -fPIC foo.cpp -o foo.o
    $ g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

并使用它:

    $ python
    >>> import cppyy
    >>> cppyy.include("foo.h")
    >>> cppyy.load_library("foo")
    >>> from cppyy.gbl import Foo
    >>> f = Foo()
    >>> f.bar()
    Hello
    >>>

大型项目通过自动加载准备的反射信息和cmake片段来创建它们而受支持,因此安装包的用户可以简单地运行:

    $ python
    >>> import cppyy
    >>> f = cppyy.gbl.Foo()
    >>> f.bar()
    Hello
    >>>

多亏了LLVM,高级功能才得以实现,例如自动模板实例化。继续示例:

    >>> v = cppyy.gbl.std.vector[cppyy.gbl.Foo]()
    >>> v.push_back(f)
    >>> len(v)
    1
    >>> v[0].bar()
    Hello
    >>>

注意:我是cppyy的作者。

For modern C++, use cppyy: http://cppyy.readthedocs.io/en/latest/

It’s based on Cling, the C++ interpreter for Clang/LLVM. Bindings are at run-time and no additional intermediate language is necessary. Thanks to Clang, it supports C++17.

Install it using pip:

    $ pip install cppyy

For small projects, simply load the relevant library and the headers that you are interested in. E.g. take the code from the ctypes example is this thread, but split in header and code sections:

    $ cat foo.h
    class Foo {
    public:
        void bar();
    };

    $ cat foo.cpp
    #include "foo.h"
    #include <iostream>

    void Foo::bar() { std::cout << "Hello" << std::endl; }

Compile it:

    $ g++ -c -fPIC foo.cpp -o foo.o
    $ g++ -shared -Wl,-soname,libfoo.so -o libfoo.so  foo.o

and use it:

    $ python
    >>> import cppyy
    >>> cppyy.include("foo.h")
    >>> cppyy.load_library("foo")
    >>> from cppyy.gbl import Foo
    >>> f = Foo()
    >>> f.bar()
    Hello
    >>>

Large projects are supported with auto-loading of prepared reflection information and the cmake fragments to create them, so that users of installed packages can simply run:

    $ python
    >>> import cppyy
    >>> f = cppyy.gbl.Foo()
    >>> f.bar()
    Hello
    >>>

Thanks to LLVM, advanced features are possible, such as automatic template instantiation. To continue the example:

    >>> v = cppyy.gbl.std.vector[cppyy.gbl.Foo]()
    >>> v.push_back(f)
    >>> len(v)
    1
    >>> v[0].bar()
    Hello
    >>>

Note: I’m the author of cppyy.


回答 7

本文声称Python是科学家的全部需要,基本上说:首先用Python制作一切原型。然后,当您需要加快零件速度时,请使用SWIG并将其转换为C。

This paper, claiming Python to be all a scientist needs, basically says: First prototype everything in Python. Then when you need to speed a part up, use SWIG and translate this part to C.


回答 8

我从未使用过它,但是我听说过有关ctypes的好东西。如果您要在C ++中使用它,请确保通过来避开名称修饰extern "C"感谢弗洛里安·博斯(FlorianBösch)的评论。

I’ve never used it but I’ve heard good things about ctypes. If you’re trying to use it with C++, be sure to evade name mangling via extern "C". Thanks for the comment, Florian Bösch.


回答 9

我认为cffi for python是一个选择。

目的是从Python调用C代码。您应该能够在不学习第三语言的情况下进行操作:每种选择都要求您学习他们自己的语言(Cython,SWIG)或API(ctypes)。因此,我们尝试假设您了解Python和C,并尽量减少了您需要学习的API附加位。

http://cffi.readthedocs.org/en/release-0.7/

I think cffi for python can be an option.

The goal is to call C code from Python. You should be able to do so without learning a 3rd language: every alternative requires you to learn their own language (Cython, SWIG) or API (ctypes). So we tried to assume that you know Python and C and minimize the extra bits of API that you need to learn.

http://cffi.readthedocs.org/en/release-0.7/


回答 10

问题是,如果我理解正确的话,如何从Python调用C函数。然后最好的选择是Ctypes(BTW可在所有Python变体中移植)。

>>> from ctypes import *
>>> libc = cdll.msvcrt
>>> print libc.time(None)
1438069008
>>> printf = libc.printf
>>> printf("Hello, %s\n", "World!")
Hello, World!
14
>>> printf("%d bottles of beer\n", 42)
42 bottles of beer
19

有关详细指南,您可能需要参考我的博客文章

The question is how to call a C function from Python, if I understood correctly. Then the best bet are Ctypes (BTW portable across all variants of Python).

>>> from ctypes import *
>>> libc = cdll.msvcrt
>>> print libc.time(None)
1438069008
>>> printf = libc.printf
>>> printf("Hello, %s\n", "World!")
Hello, World!
14
>>> printf("%d bottles of beer\n", 42)
42 bottles of beer
19

For a detailed guide you may want to refer to my blog article.


回答 11

一份正式的Python文档包含有关使用C / C ++扩展Python的详细信息。即使不使用SWIG,它也非常简单,并且在Windows上也能很好地工作。

One of the official Python documents contains details on extending Python using C/C++. Even without the use of SWIG, it’s quite straightforward and works perfectly well on Windows.


回答 12

除非您期望编写Java包装程序,否则Cython绝对是必经之路,在这种情况下,SWIG可能更可取。

我建议使用 runcython命令行实用程序,它使使用Cython的过程非常容易。如果您需要将结构化数据传递给C ++,请查看Google的protobuf库,它非常方便。

这是我使用这两种工具的最小示例:

https://github.com/nicodjimenez/python2cpp

希望它可以是一个有用的起点。

Cython is definitely the way to go, unless you anticipate writing Java wrappers, in which case SWIG may be preferable.

I recommend using the runcython command line utility, it makes the process of using Cython extremely easy. If you need to pass structured data to C++, take a look at Google’s protobuf library, it’s very convenient.

Here is a minimal examples I made that uses both tools:

https://github.com/nicodjimenez/python2cpp

Hope it can be a useful starting point.


回答 13

首先,您应该确定自己的特定目的。上面提到有关扩展和嵌入Python解释器的官方Python文档,我可以添加一个很好的二进制扩展概述。用例可分为3类:

  • 加速器模块:运行速度比CPython中运行的等效纯Python代码更快。
  • 包装模块:将现有的C接口公开给Python代码。
  • 低级系统访问:访问CPython运行时,操作系统或底层硬件的低级功能。

为了给其他感兴趣的人提供更广阔的视野,并且由于您的最初问题有点含糊(“对C或C ++库”),我认为此信息可能对您很有趣。在上面的链接上,您可以了解使用二进制扩展名及其替代方法的缺点。

除了建议的其他答案外,如果您需要加速器模块,还可以尝试Numba。它的工作原理是“通过在导入时间,运行时或静态(使用附带的pycc工具)使用LLVM编译器基础结构生成优化的机器代码”。

First you should decide what is your particular purpose. The official Python documentation on extending and embedding the Python interpreter was mentioned above, I can add a good overview of binary extensions. The use cases can be divided into 3 categories:

  • accelerator modules: to run faster than the equivalent pure Python code runs in CPython.
  • wrapper modules: to expose existing C interfaces to Python code.
  • low level system access: to access lower level features of the CPython runtime, the operating system, or the underlying hardware.

In order to give some broader perspective for other interested and since your initial question is a bit vague (“to a C or C++ library”) I think this information might be interesting to you. On the link above you can read on disadvantages of using binary extensions and its alternatives.

Apart from the other answers suggested, if you want an accelerator module, you can try Numba. It works “by generating optimized machine code using the LLVM compiler infrastructure at import time, runtime, or statically (using the included pycc tool)”.


回答 14

我喜欢cppyy,它很容易用C ++代码扩展Python,并在需要时大大提高了性能。

它功能强大且坦率地说非常易于使用,

这是一个示例,说明如何创建numpy数组并将其传递给C ++中的类成员函数。

import cppyy
cppyy.add_include_path("include")
cppyy.include('mylib/Buffer.h')


s = cppyy.gbl.buffer.Buffer()
numpy_array = np.empty(32000, np.float64)
s.get_numpy_array(numpy_array.data, numpy_array.size)

在C ++中:

struct Buffer {
  void get_numpy_array(int beam, double *ad, int size) {
    // fill the array
  }
}

您还可以非常轻松地(使用CMake)创建Python模块,这样您就可以避免一直重新编译C ++代码。

I love cppyy, it makes it very easy to extend Python with C++ code, dramatically increasing performance when needed.

It is powerful and frankly very simple to use,

here it is an example of how you can create a numpy array and pass it to a class member function in C++.

import cppyy
cppyy.add_include_path("include")
cppyy.include('mylib/Buffer.h')


s = cppyy.gbl.buffer.Buffer()
numpy_array = np.empty(32000, np.float64)
s.get_numpy_array(numpy_array.data, numpy_array.size)

in C++ :

struct Buffer {
  void get_numpy_array(int beam, double *ad, int size) {
    // fill the array
  }
}

You can also create a Python module very easily (with CMake), this way you will avoid recompile the C++ code all the times.


回答 15

pybind11最小可运行示例

pybind11之前在https://stackoverflow.com/a/38542539/895245中提到过,但是我想在这里给出一个具体的用法示例以及有关实现的更多讨论。

总而言之,我强烈推荐pybind11,因为它确实很容易使用:您只需包含一个标头,然后pybind11使用模板魔术检查要公开给Python的C ++类并透明地进行。

这种模板魔术的缺点是,它会立即减慢编译速度,从而会给使用pybind11的任何文件增加几秒钟的时间,例如,请参见对此问题进行的调查PyTorch同意

这是一个最小的可运行示例,使您了解pybind11的出色程度:

class_test.cpp

#include <string>

#include <pybind11/pybind11.h>

struct ClassTest {
    ClassTest(const std::string &name) : name(name) { }
    void setName(const std::string &name_) { name = name_; }
    const std::string &getName() const { return name; }
    std::string name;
};

namespace py = pybind11;

PYBIND11_PLUGIN(class_test) {
    py::module m("my_module", "pybind11 example plugin");
    py::class_<ClassTest>(m, "ClassTest")
        .def(py::init<const std::string &>())
        .def("setName", &ClassTest::setName)
        .def("getName", &ClassTest::getName)
        .def_readwrite("name", &ClassTest::name);
    return m.ptr();
}

class_test_main.py

#!/usr/bin/env python3

import class_test

my_class_test = class_test.ClassTest("abc");
print(my_class_test.getName())
my_class_test.setName("012")
print(my_class_test.getName())
assert(my_class_test.getName() == my_class_test.name)

编译并运行:

#!/usr/bin/env bash
set -eux
g++ `python3-config --cflags` -shared -std=c++11 -fPIC class_test.cpp \
  -o class_test`python3-config --extension-suffix` `python3-config --libs`
./class_test_main.py

此示例说明pybind11如何使您轻松地将ClassTestC ++类公开给Python!编译会产生一个名为的文件class_test.cpython-36m-x86_64-linux-gnu.so,该文件会class_test_main.py自动作为文件的定义点class_test本地定义的模块。

也许只有在您尝试使用本机Python API手动执行相同操作时,才会意识到它的强大程度,例如,请参见以下示例,该示例包含大约10倍的代码:https : //github.com /cirosantilli/python-cheat/blob/4f676f62e87810582ad53b2fb426b74eae52aad5/py_from_c/pure.c在该示例上,您可以看到C代码如何痛苦地,明确地定义Python类及其所包含的所有信息(成员,方法,其他信息)。元数据…)。也可以看看:

pybind11声称与https://stackoverflow.com/a/145436/895245Boost.Python所提到的相似,但因其摆脱了Boost项目中的膨胀而变得更加微不足道:

pybind11是一个轻量级的仅标头的库,它公开了Python中的C ++类型,反之亦然,主要是创建现有C ++代码的Python绑定。它的目标和语法类似于David Abrahams出色的Boost.Python库:通过使用编译时自省来推断类型信息,从而最大程度地减少了传统扩展模块中的样板代码。

Boost.Python的主要问题以及创建类似项目的原因是Boost。Boost是庞大而复杂的实用程序套件,可与几乎所有现有的C ++编译器一起使用。这种兼容性有其代价:奥秘的模板技巧和变通办法对于支持最早的和最新的编译器标本是必需的。现在,与C ++ 11兼容的编译器已广泛可用,这种繁琐的机制已变得过大且不必要。

可以将此库视为Boost.Python的小型独立版本,其中剥离了与绑定生成无关的所有内容。没有注释,核心头文件仅需要约4K行代码,并依赖于Python(2.7或3.x,或PyPy2.7> = 5.7)和C ++标准库。由于某些新的C ++ 11语言功能(特别是:元组,lambda函数和可变参数模板),因此可以实现这种紧凑的实现。自创建以来,该库在许多方面都超越了Boost.Python,从而在许多常见情况下大大简化了绑定代码。

pybind11还是当前Microsoft Python C绑定文档中突出显示的唯一非本地替代方法,网址为:https ://docs.microsoft.com/zh-cn/visualstudio/python/working-with-c-cpp-python-in- visual-studio?view = vs-2019存档)。

已在Ubuntu 18.04,pybind11 2.0.1,Python 3.6.8,GCC 7.4.0上进行了测试。

pybind11 minimal runnable example

pybind11 was previously mentioned at https://stackoverflow.com/a/38542539/895245 but I would like to give here a concrete usage example and some further discussion about implementation.

All and all, I highly recommend pybind11 because it is really easy to use: you just include a header and then pybind11 uses template magic to inspect the C++ class you want to expose to Python and does that transparently.

The downside of this template magic is that it slows down compilation immediately adding a few seconds to any file that uses pybind11, see for example the investigation done on this issue. PyTorch agrees.

Here is a minimal runnable example to give you a feel of how awesome pybind11 is:

class_test.cpp

#include <string>

#include <pybind11/pybind11.h>

struct ClassTest {
    ClassTest(const std::string &name) : name(name) { }
    void setName(const std::string &name_) { name = name_; }
    const std::string &getName() const { return name; }
    std::string name;
};

namespace py = pybind11;

PYBIND11_PLUGIN(class_test) {
    py::module m("my_module", "pybind11 example plugin");
    py::class_<ClassTest>(m, "ClassTest")
        .def(py::init<const std::string &>())
        .def("setName", &ClassTest::setName)
        .def("getName", &ClassTest::getName)
        .def_readwrite("name", &ClassTest::name);
    return m.ptr();
}

class_test_main.py

#!/usr/bin/env python3

import class_test

my_class_test = class_test.ClassTest("abc");
print(my_class_test.getName())
my_class_test.setName("012")
print(my_class_test.getName())
assert(my_class_test.getName() == my_class_test.name)

Compile and run:

#!/usr/bin/env bash
set -eux
g++ `python3-config --cflags` -shared -std=c++11 -fPIC class_test.cpp \
  -o class_test`python3-config --extension-suffix` `python3-config --libs`
./class_test_main.py

This example shows how pybind11 allows you to effortlessly expose the ClassTest C++ class to Python! Compilation produces a file named class_test.cpython-36m-x86_64-linux-gnu.so which class_test_main.py automatically picks up as the definition point for the class_test natively defined module.

Perhaps the realization of how awesome this is only sinks in if you try to do the same thing by hand with the native Python API, see for example this example of doing that, which has about 10x more code: https://github.com/cirosantilli/python-cheat/blob/4f676f62e87810582ad53b2fb426b74eae52aad5/py_from_c/pure.c On that example you can see how the C code has to painfully and explicitly define the Python class bit by bit with all the information it contains (members, methods, further metadata…). See also:

pybind11 claims to be similar to Boost.Python which was mentioned at https://stackoverflow.com/a/145436/895245 but more minimal because it is freed from the bloat of being inside the Boost project:

pybind11 is a lightweight header-only library that exposes C++ types in Python and vice versa, mainly to create Python bindings of existing C++ code. Its goals and syntax are similar to the excellent Boost.Python library by David Abrahams: to minimize boilerplate code in traditional extension modules by inferring type information using compile-time introspection.

The main issue with Boost.Python—and the reason for creating such a similar project—is Boost. Boost is an enormously large and complex suite of utility libraries that works with almost every C++ compiler in existence. This compatibility has its cost: arcane template tricks and workarounds are necessary to support the oldest and buggiest of compiler specimens. Now that C++11-compatible compilers are widely available, this heavy machinery has become an excessively large and unnecessary dependency.

Think of this library as a tiny self-contained version of Boost.Python with everything stripped away that isn’t relevant for binding generation. Without comments, the core header files only require ~4K lines of code and depend on Python (2.7 or 3.x, or PyPy2.7 >= 5.7) and the C++ standard library. This compact implementation was possible thanks to some of the new C++11 language features (specifically: tuples, lambda functions and variadic templates). Since its creation, this library has grown beyond Boost.Python in many ways, leading to dramatically simpler binding code in many common situations.

pybind11 is also the only non-native alternative hightlighted by the current Microsoft Python C binding documentation at: https://docs.microsoft.com/en-us/visualstudio/python/working-with-c-cpp-python-in-visual-studio?view=vs-2019 (archive).

Tested on Ubuntu 18.04, pybind11 2.0.1, Python 3.6.8, GCC 7.4.0.


python异常消息捕获

问题:python异常消息捕获

import ftplib
import urllib2
import os
import logging
logger = logging.getLogger('ftpuploader')
hdlr = logging.FileHandler('ftplog.log')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.INFO)
FTPADDR = "some ftp address"

def upload_to_ftp(con, filepath):
    try:
        f = open(filepath,'rb')                # file to send
        con.storbinary('STOR '+ filepath, f)         # Send the file
        f.close()                                # Close file and FTP
        logger.info('File successfully uploaded to '+ FTPADDR)
    except, e:
        logger.error('Failed to upload to ftp: '+ str(e))

这似乎不起作用,出现语法错误,将所有类型的异常记录到文件中的正确方法是什么

import ftplib
import urllib2
import os
import logging
logger = logging.getLogger('ftpuploader')
hdlr = logging.FileHandler('ftplog.log')
formatter = logging.Formatter('%(asctime)s %(levelname)s %(message)s')
hdlr.setFormatter(formatter)
logger.addHandler(hdlr)
logger.setLevel(logging.INFO)
FTPADDR = "some ftp address"

def upload_to_ftp(con, filepath):
    try:
        f = open(filepath,'rb')                # file to send
        con.storbinary('STOR '+ filepath, f)         # Send the file
        f.close()                                # Close file and FTP
        logger.info('File successfully uploaded to '+ FTPADDR)
    except, e:
        logger.error('Failed to upload to ftp: '+ str(e))

This doesn’t seem to work, I get syntax error, what is the proper way of doing this for logging all kind of exceptions to a file


回答 0

您必须定义要捕获的异常类型。所以写except Exception, e:的,而不是except, e:一个普通的异常(即无论如何都会被记录)。

其他可能性是通过这种方式编写您的整个try / except代码:

try:
    with open(filepath,'rb') as f:
        con.storbinary('STOR '+ filepath, f)
    logger.info('File successfully uploaded to '+ FTPADDR)
except Exception, e: # work on python 2.x
    logger.error('Failed to upload to ftp: '+ str(e))

在Python 3.x和现代版本的Python 2.x中,使用except Exception as e代替except Exception, e

try:
    with open(filepath,'rb') as f:
        con.storbinary('STOR '+ filepath, f)
    logger.info('File successfully uploaded to '+ FTPADDR)
except Exception as e: # work on python 3.x
    logger.error('Failed to upload to ftp: '+ str(e))

You have to define which type of exception you want to catch. So write except Exception, e: instead of except, e: for a general exception (that will be logged anyway).

Other possibility is to write your whole try/except code this way:

try:
    with open(filepath,'rb') as f:
        con.storbinary('STOR '+ filepath, f)
    logger.info('File successfully uploaded to '+ FTPADDR)
except Exception, e: # work on python 2.x
    logger.error('Failed to upload to ftp: '+ str(e))

in Python 3.x and modern versions of Python 2.x use except Exception as e instead of except Exception, e:

try:
    with open(filepath,'rb') as f:
        con.storbinary('STOR '+ filepath, f)
    logger.info('File successfully uploaded to '+ FTPADDR)
except Exception as e: # work on python 3.x
    logger.error('Failed to upload to ftp: '+ str(e))

回答 1

python 3不再支持该语法。请改用以下内容。

try:
    do_something()
except BaseException as e:
    logger.error('Failed to do something: ' + str(e))

The syntax is no longer supported in python 3. Use the following instead.

try:
    do_something()
except BaseException as e:
    logger.error('Failed to do something: ' + str(e))

回答 2

将其更新为更简单的记录器(适用于python 2和3)。您不需要回溯模块。

import logging

logger = logging.Logger('catch_all')

def catchEverythingInLog():
    try:
        ... do something ...
    except Exception as e:
        logger.error(e, exc_info=True)
        ... exception handling ...

现在这是旧方法(尽管仍然有效):

import sys, traceback

def catchEverything():
    try:
        ... some operation(s) ...
    except:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        ... exception handling ...

exc_value是错误消息。

Updating this to something simpler for logger (works for both python 2 and 3). You do not need traceback module.

import logging

logger = logging.Logger('catch_all')

def catchEverythingInLog():
    try:
        ... do something ...
    except Exception as e:
        logger.error(e, exc_info=True)
        ... exception handling ...

This is now the old way (though still works):

import sys, traceback

def catchEverything():
    try:
        ... some operation(s) ...
    except:
        exc_type, exc_value, exc_traceback = sys.exc_info()
        ... exception handling ...

exc_value is the error message.


回答 3

在某些情况下,您可以使用e.messagee.messages ..但是,并非在所有情况下都有效。无论如何,使用str(e)更安全

try:
  ...
except Exception as e:
  print(e.message)

There are some cases where you can use the e.message or e.messages.. But it does not work in all cases. Anyway the more safe is to use the str(e)

try:
  ...
except Exception as e:
  print(e.message)

回答 4

如果您需要错误类,错误消息和堆栈跟踪(或其中一些),请使用sys.exec_info()

带有某些格式的最少工作代码:

import sys
import traceback

try:
    ans = 1/0
except BaseException as ex:
    # Get current system exception
    ex_type, ex_value, ex_traceback = sys.exc_info()

    # Extract unformatter stack traces as tuples
    trace_back = traceback.extract_tb(ex_traceback)

    # Format stacktrace
    stack_trace = list()

    for trace in trace_back:
        stack_trace.append("File : %s , Line : %d, Func.Name : %s, Message : %s" % (trace[0], trace[1], trace[2], trace[3]))

    print("Exception type : %s " % ex_type.__name__)
    print("Exception message : %s" %ex_value)
    print("Stack trace : %s" %stack_trace)

给出以下输出:

Exception type : ZeroDivisionError
Exception message : division by zero
Stack trace : ['File : .\\test.py , Line : 5, Func.Name : <module>, Message : ans = 1/0']

函数sys.exc_info()为您提供有关最新异常的详细信息。返回的元组(type, value, traceback)

traceback是回溯对象的实例。您可以使用提供的方法来格式化跟踪。在追溯文档中可以找到更多内容

If you want the error class, error message and stack trace (or some of those), use sys.exec_info().

Minimal working code with some formatting:

import sys
import traceback

try:
    ans = 1/0
except BaseException as ex:
    # Get current system exception
    ex_type, ex_value, ex_traceback = sys.exc_info()

    # Extract unformatter stack traces as tuples
    trace_back = traceback.extract_tb(ex_traceback)

    # Format stacktrace
    stack_trace = list()

    for trace in trace_back:
        stack_trace.append("File : %s , Line : %d, Func.Name : %s, Message : %s" % (trace[0], trace[1], trace[2], trace[3]))

    print("Exception type : %s " % ex_type.__name__)
    print("Exception message : %s" %ex_value)
    print("Stack trace : %s" %stack_trace)

Which gives the following output:

Exception type : ZeroDivisionError
Exception message : division by zero
Stack trace : ['File : .\\test.py , Line : 5, Func.Name : <module>, Message : ans = 1/0']

The function sys.exc_info() gives you details about the most recent exception. It returns a tuple of (type, value, traceback).

traceback is an instance of traceback object. You can format the trace with the methods provided. More can be found in the traceback documentation .


回答 5

您可以使用logger.exception("msg")traceback记录异常:

try:
    #your code
except Exception as e:
    logger.exception('Failed: ' + str(e))

You can use logger.exception("msg") for logging exception with traceback:

try:
    #your code
except Exception as e:
    logger.exception('Failed: ' + str(e))

回答 6

在python 3.6之后,您可以使用格式化的字符串文字。干净利落!(https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498

try
 ...
except Exception as e:
    logger.error(f"Failed to upload to ftp: {e}")

After python 3.6, you can use formatted string literal. It’s neat! (https://docs.python.org/3/whatsnew/3.6.html#whatsnew36-pep498)

try
 ...
except Exception as e:
    logger.error(f"Failed to upload to ftp: {e}")

回答 7

您可以尝试明确指定BaseException类型。但是,这只会捕获BaseException的派生类。尽管这包括所有实现提供的异常,但也可能会引发任意旧式类。

try:
  do_something()
except BaseException, e:
  logger.error('Failed to do something: ' + str(e))

You can try specifying the BaseException type explicitly. However, this will only catch derivatives of BaseException. While this includes all implementation-provided exceptions, it is also possibly to raise arbitrary old-style classes.

try:
  do_something()
except BaseException, e:
  logger.error('Failed to do something: ' + str(e))

回答 8

使用str(ex)打印执行

try:
   #your code
except ex:
   print(str(ex))

Use str(ex) to print execption

try:
   #your code
except ex:
   print(str(ex))

回答 9

对于未来的奋斗者,在python 3.8.2(可能还有之前的几个版本)中,语法为

except Attribute as e:
    print(e)

for the future strugglers, in python 3.8.2(and maybe a few versions before that), the syntax is

except Attribute as e:
    print(e)

回答 10

使用str(e)repr(e)表示异常,您将无法获得实际的堆栈跟踪,因此查找异常在哪里没有帮助。

阅读其他答案和日志记录包doc之后,以下两种方法可以很好地打印实际的堆栈跟踪信息,以便于调试:

logger.debug()与参数一起使用exc_info

try:
    # my code
exception SomeError as e:
    logger.debug(e, exc_info=True)

采用 logger.exception()

或者我们可以直接使用它logger.exception()来打印异常。

try:
    # my code
exception SomeError as e:
    logger.exception(e)

Using str(e) or repr(e) to represent the exception, you won’t get the actual stack trace, so it is not helpful to find where the exception is.

After reading other answers and the logging package doc, the following two ways works great to print the actual stack trace for easier debugging:

use logger.debug() with parameter exc_info

try:
    # my code
exception SomeError as e:
    logger.debug(e, exc_info=True)

use logger.exception()

or we can directly use logger.exception() to print the exception.

try:
    # my code
exception SomeError as e:
    logger.exception(e)

设置预定的工作?

问题:设置预定的工作?

我一直在使用Django开发Web应用程序,并且很好奇是否有一种方法可以安排作业定期运行。

基本上,我只想遍历数据库并自动定期进行一些计算/更新,但是我似乎找不到任何有关此操作的文档。

有人知道如何设置吗?

需要说明的是:我知道我可以为此设置cron工作,但我很好奇Django中是否有某些功能可以提供此功能。我希望人们能够自己部署此应用程序,而无需进行大量配置(最好为零)。

我已经考虑过通过简单地检查自从上次将请求发送到站点以来是否应该运行作业来“追溯地”触发这些操作,但是我希望有一些清洁的方法。

I’ve been working on a web app using Django, and I’m curious if there is a way to schedule a job to run periodically.

Basically I just want to run through the database and make some calculations/updates on an automatic, regular basis, but I can’t seem to find any documentation on doing this.

Does anyone know how to set this up?

To clarify: I know I can set up a cron job to do this, but I’m curious if there is some feature in Django that provides this functionality. I’d like people to be able to deploy this app themselves without having to do much config (preferably zero).

I’ve considered triggering these actions “retroactively” by simply checking if a job should have been run since the last time a request was sent to the site, but I’m hoping for something a bit cleaner.


回答 0

我采用的一种解决方案是这样做:

1)创建一个自定义管理命令,例如

python manage.py my_cool_command

2)使用cron(在Linux上)或at在要求的时间(在Windows上)运行我的命令。

这是一个简单的解决方案,不需要安装沉重的AMQP堆栈。但是,使用其他答案中提到的诸如Celery之类的东西有很好的优势。特别是,使用Celery很好,不必将应用程序逻辑散布到crontab文件中。但是,cron解决方案非常适合中小型应用程序,并且您不需要太多外部依赖项。

编辑:

在更高版本的Windows中,at不建议在Windows 8,Server 2012及更高版本中使用该命令。您可以使用schtasks.exe相同的用途。

****更新****这是django doc 的新链接,用于编写自定义管理命令

One solution that I have employed is to do this:

1) Create a custom management command, e.g.

python manage.py my_cool_command

2) Use cron (on Linux) or at (on Windows) to run my command at the required times.

This is a simple solution that doesn’t require installing a heavy AMQP stack. However there are nice advantages to using something like Celery, mentioned in the other answers. In particular, with Celery it is nice to not have to spread your application logic out into crontab files. However the cron solution works quite nicely for a small to medium sized application and where you don’t want a lot of external dependencies.

EDIT:

In later version of windows the at command is deprecated for Windows 8, Server 2012 and above. You can use schtasks.exe for same use.

**** UPDATE **** This the new link of django doc for writing the custom management command


回答 1

Celery是基于AMQP(RabbitMQ)构建的分布式任务队列。它还以cron类的方式处理周期性任务(请参阅周期性任务)。根据您的应用程序,可能值得一试。

用django(docs)设置Celery非常容易,并且在停机的情况下,定期任务实际上会跳过错过的任务。如果任务失败,Celery还具有内置的重试机制。

Celery is a distributed task queue, built on AMQP (RabbitMQ). It also handles periodic tasks in a cron-like fashion (see periodic tasks). Depending on your app, it might be worth a gander.

Celery is pretty easy to set up with django (docs), and periodic tasks will actually skip missed tasks in case of a downtime. Celery also has built-in retry mechanisms, in case a task fails.


回答 2

我们已经开源了我认为是结构化应用程序的源代码。Brian的解决方案也暗指。我们希望收到任何/所有反馈!

https://github.com/tivix/django-cron

它带有一个管理命令:

./manage.py runcrons

做到了。每个cron都被建模为一个类(因此其所有OO),并且每个cron都以不同的频率运行,并且我们确保相同cron类型不会并行运行(以防万一cron自身花费的时间比其频率更长!)

We’ve open-sourced what I think is a structured app. that Brian’s solution above alludes too. We would love any / all feedback!

https://github.com/tivix/django-cron

It comes with one management command:

./manage.py runcrons

That does the job. Each cron is modeled as a class (so its all OO) and each cron runs at a different frequency and we make sure the same cron type doesn’t run in parallel (in case crons themselves take longer time to run than their frequency!)


回答 3

如果您使用的是标准POSIX操作系统,请使用cron

如果您使用的是Windows,请

编写Django管理命令以

  1. 找出他们使用的平台。

  2. 为您的用户执行适当的“ AT”命令,为您的用户更新crontab。

If you’re using a standard POSIX OS, you use cron.

If you’re using Windows, you use at.

Write a Django management command to

  1. Figure out what platform they’re on.

  2. Either execute the appropriate “AT” command for your users, or update the crontab for your users.


回答 4

有趣的新可插拔Django应用:django-chronograph

您只需要添加一个用作计时器的cron条目,即可在脚本中运行一个非常漂亮的Django管理界面。

Interesting new pluggable Django app: django-chronograph

You only have to add one cron entry which acts as a timer, and you have a very nice Django admin interface into the scripts to run.


回答 5

看一下Django Poor Man’s Cron,这是一个Django应用,它利用垃圾邮件搜索引擎,搜索引擎索引机器人等以大致固定的时间间隔运行计划的任务

请参阅:http : //code.google.com/p/django-poormanscron/

Look at Django Poor Man’s Cron which is a Django app that makes use of spambots, search engine indexing robots and alike to run scheduled tasks in approximately regular intervals

See: http://code.google.com/p/django-poormanscron/


回答 6

布赖恩·尼尔(Brian Neal)建议通过cron运行管理命令效果很好,但是如果您正在寻找更强大的功能(但还不如Celery(Celery)那么细腻),我可以考虑一下Kronos这样的库:

# app/cron.py

import kronos

@kronos.register('0 * * * *')
def task():
    pass

Brian Neal’s suggestion of running management commands via cron works well, but if you’re looking for something a little more robust (yet not as elaborate as Celery) I’d look into a library like Kronos:

# app/cron.py

import kronos

@kronos.register('0 * * * *')
def task():
    pass

回答 7

RabbitMQ和Celery比Cron具有更多的功能和任务处理功能。如果任务失败不是问题,并且您认为您将在下一个调用中处理损坏的任务,那么Cron就足够了。

Celery & AMQP将让您处理损坏的任务,并且它将由另一位工作人员再次执行(Celery工作人员侦听要处理的下一个任务),直到到达任务的max_retries属性为止。您甚至可以在发生故障时调用任务,例如记录故障,或在发生故障后向管理员发送电子邮件max_retries

而且,当您需要扩展应用程序时,您可以分发Celery和AMQP服务器。

RabbitMQ and Celery have more features and task handling capabilities than Cron. If task failure isn’t an issue, and you think you will handle broken tasks in the next call, then Cron is sufficient.

Celery & AMQP will let you handle the broken task, and it will get executed again by another worker (Celery workers listen for the next task to work on), until the task’s max_retries attribute is reached. You can even invoke tasks on failure, like logging the failure, or sending an email to the admin once the max_retries has been reached.

And you can distribute Celery and AMQP servers when you need to scale your application.


回答 8

我之前有完全相同的要求,最终使用APScheduler用户指南)解决了这一要求

它使调度作业变得非常简单,并使它独立于某些代码的基于请求的执行。以下是一个简单的示例。

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()
job = None

def tick():
    print('One tick!')\

def start_job():
    global job
    job = scheduler.add_job(tick, 'interval', seconds=3600)
    try:
        scheduler.start()
    except:
        pass

希望这对某人有帮助!

I had exactly the same requirement a while ago, and ended up solving it using APScheduler (User Guide)

It makes scheduling jobs super simple, and keeps it independent for from request-based execution of some code. Following is a simple example.

from apscheduler.schedulers.background import BackgroundScheduler

scheduler = BackgroundScheduler()
job = None

def tick():
    print('One tick!')\

def start_job():
    global job
    job = scheduler.add_job(tick, 'interval', seconds=3600)
    try:
        scheduler.start()
    except:
        pass

Hope this helps somebody!


回答 9

我个人使用cron,但是django-extensionsJobs Scheduling部分看起来很有趣。

I personally use cron, but the Jobs Scheduling parts of django-extensions looks interesting.


回答 10

尽管不是Django的一部分,但Airflow是一个较新的项目(截至2016年),对任务管理很有用。

Airflow是一个工作流自动化和调度系统,可用于创作和管理数据管道。基于Web的UI为开发人员提供了一系列用于管理和查看这些管道的选项。

Airflow用Python编写,并使用Flask构建。

Airflow由Airbnb的Maxime Beauchemin创建,并于2015年春季开源。它于2016年冬季加入Apache Software Foundation的孵化计划。这是Git项目页面和一些其他背景信息

Although not part of Django, Airflow is a more recent project (as of 2016) that is useful for task management.

Airflow is a workflow automation and scheduling system that can be used to author and manage data pipelines. A web-based UI provides the developer with a range of options for managing and viewing these pipelines.

Airflow is written in Python and is built using Flask.

Airflow was created by Maxime Beauchemin at Airbnb and open sourced in the spring of 2015. It joined the Apache Software Foundation’s incubation program in the winter of 2016. Here is the Git project page and some addition background information.


回答 11

将以下内容放在cron.py文件的顶部:

#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'

# imports and code below

Put the following at the top of your cron.py file:

#!/usr/bin/python
import os, sys
sys.path.append('/path/to/') # the parent directory of the project
sys.path.append('/path/to/project') # these lines only needed if not on path
os.environ['DJANGO_SETTINGS_MODULE'] = 'myproj.settings'

# imports and code below

回答 12

我只是想到了这个相当简单的解决方案:

  1. 定义一个视图函数do_work(req,param),就像在其他任何视图中一样,通过URL映射,返回HttpResponse等。
  2. 根据您的时间偏好设置(或在Windows中使用AT或计划任务)设置cron作业,该作业运行curl http:// localhost / your / mapped / url?param = value

您可以添加参数,但只需将参数添加到URL。

跟我说你们的想法。

[更新]我现在正在使用来自django-extensions的 runjob命令,而不是curl。

我的cron看起来像这样:

@hourly python /path/to/project/manage.py runjobs hourly

…等等,每天,每月等。您也可以将其设置为运行特定作业。

我发现它更易于管理和清洁。不需要将URL映射到视图。只需定义您的工作类别和crontab即可。

I just thought about this rather simple solution:

  1. Define a view function do_work(req, param) like you would with any other view, with URL mapping, return a HttpResponse and so on.
  2. Set up a cron job with your timing preferences (or using AT or Scheduled Tasks in Windows) which runs curl http://localhost/your/mapped/url?param=value.

You can add parameters but just adding parameters to the URL.

Tell me what you guys think.

[Update] I’m now using runjob command from django-extensions instead of curl.

My cron looks something like this:

@hourly python /path/to/project/manage.py runjobs hourly

… and so on for daily, monthly, etc’. You can also set it up to run a specific job.

I find it more managable and a cleaner. Doesn’t require mapping a URL to a view. Just define your job class and crontab and you’re set.


回答 13

在代码部分之后,我可以写任何东西,就像我的views.py :)

#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################

来自 http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/

after the part of code,I can write anything just like my views.py :)

#######################################
import os,sys
sys.path.append('/home/administrator/development/store')
os.environ['DJANGO_SETTINGS_MODULE']='store.settings'
from django.core.management impor setup_environ
from store import settings
setup_environ(settings)
#######################################

from http://www.cotellese.net/2007/09/27/running-external-scripts-against-django-models/


回答 14

您绝对应该检查django-q!它不需要任何额外的配置,并且很可能具有处理商业项目中任何生产问题所需的一切。

它是积极开发的,并且与django,django ORM,mongo,redis很好地集成在一起。这是我的配置:

# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
    # Match recommended settings from docs.
    'name': 'DjangoORM',
    'workers': 4,
    'queue_limit': 50,
    'bulk': 10,
    'orm': 'default',

# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,

# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,

# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,

# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,

# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,

# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
    'sentry': RAVEN_CONFIG,
},
}

You should definitely check out django-q! It requires no additional configuration and has quite possibly everything needed to handle any production issues on commercial projects.

It’s actively developed and integrates very well with django, django ORM, mongo, redis. Here is my configuration:

# django-q
# -------------------------------------------------------------------------
# See: http://django-q.readthedocs.io/en/latest/configure.html
Q_CLUSTER = {
    # Match recommended settings from docs.
    'name': 'DjangoORM',
    'workers': 4,
    'queue_limit': 50,
    'bulk': 10,
    'orm': 'default',

# Custom Settings
# ---------------
# Limit the amount of successful tasks saved to Django.
'save_limit': 10000,

# See https://github.com/Koed00/django-q/issues/110.
'catch_up': False,

# Number of seconds a worker can spend on a task before it's terminated.
'timeout': 60 * 5,

# Number of seconds a broker will wait for a cluster to finish a task before presenting it again. This needs to be
# longer than `timeout`, otherwise the same task will be processed multiple times.
'retry': 60 * 6,

# Whether to force all async() calls to be run with sync=True (making them synchronous).
'sync': False,

# Redirect worker exceptions directly to Sentry error reporter.
'error_reporter': {
    'sentry': RAVEN_CONFIG,
},
}

回答 15

用于计划程序作业的Django APScheduler。Advanced Python Scheduler(APScheduler)是一个Python库,可让您安排Python代码稍后执行,一次或定期执行。您可以根据需要随时添加或删除旧作业。

注意:我是这个图书馆的作者

安装APScheduler

pip install apscheduler

查看文件功能调用

文件名:scheduler_jobs.py

def FirstCronTest():
    print("")
    print("I am executed..!")

配置调度程序

制作execute.py文件并添加以下代码

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()

您的书面函数在这里,调度程序函数写在scheduler_jobs中

import scheduler_jobs 

scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()

链接文件以执行

现在,在Url文件底部添加以下行

import execute

Django APScheduler for Scheduler Jobs. Advanced Python Scheduler (APScheduler) is a Python library that lets you schedule your Python code to be executed later, either just once or periodically. You can add new jobs or remove old ones on the fly as you please.

note: I’m the author of this library

Install APScheduler

pip install apscheduler

View file function to call

file name: scheduler_jobs.py

def FirstCronTest():
    print("")
    print("I am executed..!")

Configuring the scheduler

make execute.py file and add the below codes

from apscheduler.schedulers.background import BackgroundScheduler
scheduler = BackgroundScheduler()

Your written functions Here, the scheduler functions are written in scheduler_jobs

import scheduler_jobs 

scheduler.add_job(scheduler_jobs.FirstCronTest, 'interval', seconds=10)
scheduler.start()

Link the File for Execution

Now, add the below line in the bottom of Url file

import execute

回答 16

我今天对你的问题也有类似的看法。

我不想让它通过服务器cron来处理(最后,大多数库只是cron助手)。

因此,我创建了一个调度模块并将其附加到init

这不是最好的方法,但是它可以帮助我将所有代码都放在一个地方,并且其执行与主应用程序有关。

I had something similar with your problem today.

I didn’t wanted to have it handled by the server trhough cron (and most of the libs were just cron helpers in the end).

So i’ve created a scheduling module and attached it to the init .

It’s not the best approach, but it helps me to have all the code in a single place and with its execution related to the main app.


回答 17

是的,上面的方法很棒。我尝试了其中一些。最后,我发现了这样的方法:

    from threading import Timer

    def sync():

        do something...

        sync_timer = Timer(self.interval, sync, ())
        sync_timer.start()

就像递归一样。

好的,我希望这种方法可以满足您的要求。:)

Yes, the method above is so great. And I tried some of them. At last, I found a method like this:

    from threading import Timer

    def sync():

        do something...

        sync_timer = Timer(self.interval, sync, ())
        sync_timer.start()

Just like Recursive.

Ok, I hope this method can meet your requirement. :)


回答 18

与Celery相比,更现代的解决方案是Django Q:https//django-q.readthedocs.io/en/latest/index.html

它具有出色的文档,并且很容易理解。缺少Windows支持,因为Windows不支持流程分支。但是,如果您使用Windows for Linux子系统创建开发环境,则效果很好。

A more modern solution (compared to Celery) is Django Q: https://django-q.readthedocs.io/en/latest/index.html

It has great documentation and is easy to grok. Windows support is lacking, because Windows does not support process forking. But it works fine if you create your dev environment using the Windows for Linux Subsystem.


回答 19

我用Celery做我的定期任务。首先,您需要按以下步骤安装它:

pip install django-celery

不要忘记注册django-celery设置,然后您可以执行以下操作:

from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
@periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
 #your code

I use celery to create my periodical tasks. First you need to install it as follows:

pip install django-celery

Don’t forget to register django-celery in your settings and then you could do something like this:

from celery import task
from celery.decorators import periodic_task
from celery.task.schedules import crontab
from celery.utils.log import get_task_logger
@periodic_task(run_every=crontab(minute="0", hour="23"))
def do_every_midnight():
 #your code

回答 20

我不确定这对任何人都有用,因为我必须提供系统的其他用户来计划作业,而又不让他们访问实际的服务器(Windows)任务计划程序,因此我创建了这个可重用的应用程序。

请注意,用户可以访问服务器上的一个共享文件夹,可以在其中创建所需的command / task / .bat文件。然后可以使用此应用安排此任务。

应用名称为 Django_Windows_Scheduler

屏幕截图:

I am not sure will this be useful for anyone, since I had to provide other users of the system to schedule the jobs, without giving them access to the actual server(windows) Task Scheduler, I created this reusable app.

Please note users have access to one shared folder on server where they can create required command/task/.bat file. This task then can be scheduled using this app.

App name is Django_Windows_Scheduler

ScreenShot:


回答 21

如果您想要比Celery可靠的产品,请尝试构建在AWS SQS / SNS之上的TaskHawk

请参阅:http : //taskhawk.readthedocs.io

If you want something more reliable than Celery, try TaskHawk which is built on top of AWS SQS/SNS.

Refer: http://taskhawk.readthedocs.io


回答 22

对于简单的dockerized项目,我真的看不到任何现有的合适答案。

因此,我写了一个非常准系统的解决方案,不需要外部库或触发器,它们可以独立运行。无需外部os-cron,就可以在每种环境下工作。

它通过添加中间件来工作: middleware.py

import threading

def should_run(name, seconds_interval):
    from application.models import CronJob
    from django.utils.timezone import now

    try:
        c = CronJob.objects.get(name=name)
    except CronJob.DoesNotExist:
        CronJob(name=name, last_ran=now()).save()
        return True

    if (now() - c.last_ran).total_seconds() >= seconds_interval:
        c.last_ran = now()
        c.save()
        return True

    return False


class CronTask:
    def __init__(self, name, seconds_interval, function):
        self.name = name
        self.seconds_interval = seconds_interval
        self.function = function


def cron_worker(*_):
    if not should_run("main", 60):
        return

    # customize this part:
    from application.models import Event
    tasks = [
        CronTask("events", 60 * 30, Event.clean_stale_objects),
        # ...
    ]

    for task in tasks:
        if should_run(task.name, task.seconds_interval):
            task.function()


def cron_middleware(get_response):

    def middleware(request):
        response = get_response(request)
        threading.Thread(target=cron_worker).start()
        return response

    return middleware

models/cron.py

from django.db import models


class CronJob(models.Model):
    name = models.CharField(max_length=10, primary_key=True)
    last_ran = models.DateTimeField()

settings.py

MIDDLEWARE = [
    ...
    'application.middleware.cron_middleware',
    ...
]

For simple dockerized projects, I could not really see any existing answer fit.

So I wrote a very barebones solution without the need of external libraries or triggers, which runs on its own. No external os-cron needed, should work in every environment.

It works by adding a middleware: middleware.py

import threading

def should_run(name, seconds_interval):
    from application.models import CronJob
    from django.utils.timezone import now

    try:
        c = CronJob.objects.get(name=name)
    except CronJob.DoesNotExist:
        CronJob(name=name, last_ran=now()).save()
        return True

    if (now() - c.last_ran).total_seconds() >= seconds_interval:
        c.last_ran = now()
        c.save()
        return True

    return False


class CronTask:
    def __init__(self, name, seconds_interval, function):
        self.name = name
        self.seconds_interval = seconds_interval
        self.function = function


def cron_worker(*_):
    if not should_run("main", 60):
        return

    # customize this part:
    from application.models import Event
    tasks = [
        CronTask("events", 60 * 30, Event.clean_stale_objects),
        # ...
    ]

    for task in tasks:
        if should_run(task.name, task.seconds_interval):
            task.function()


def cron_middleware(get_response):

    def middleware(request):
        response = get_response(request)
        threading.Thread(target=cron_worker).start()
        return response

    return middleware

models/cron.py:

from django.db import models


class CronJob(models.Model):
    name = models.CharField(max_length=10, primary_key=True)
    last_ran = models.DateTimeField()

settings.py:

MIDDLEWARE = [
    ...
    'application.middleware.cron_middleware',
    ...
]

回答 23

简单的方法是编写一个自定义的shell命令(请参阅Django文档)并在Linux上使用cronjob执行它。但是,我强烈建议您使用像RabbitMQ这样的消息代理以及Celery。也许你可以看看这个教程

Simple way is to write a custom shell command see Django Documentation and execute it using a cronjob on linux. However i would highly recommend using a message broker like RabbitMQ coupled with celery. Maybe you can have a look at this Tutorial


使用Python迭代字符串中的每个字符

问题:使用Python迭代字符串中的每个字符

在C ++中,我可以std::string像这样迭代:

std::string str = "Hello World!";

for (int i = 0; i < str.length(); ++i)
{
    std::cout << str[i] << std::endl;
}

如何在Python中遍历字符串?

In C++, I can iterate over an std::string like this:

std::string str = "Hello World!";

for (int i = 0; i < str.length(); ++i)
{
    std::cout << str[i] << std::endl;
}

How do I iterate over a string in Python?


回答 0

正如约翰内斯指出的那样,

for c in "string":
    #do something with c

您可以使用struct迭代python中的几乎所有内容for loop

例如,open("file.txt")返回文件对象(并打开文件),对其进行迭代,然后对该文件中的行进行迭代

with open(filename) as f:
    for line in f:
        # do something with line

如果那看起来像魔术,那还算不错,但是它背后的想法真的很简单。

有一个简单的迭代器协议可以应用于任何对象,以使for循环在其上起作用。

只需实现定义一个next()方法的迭代器,然后__iter__在类上实现一个方法使其可迭代即可。(__iter__当然,应返回一个迭代器对象,即定义的对象next()

参阅官方文件

As Johannes pointed out,

for c in "string":
    #do something with c

You can iterate pretty much anything in python using the for loop construct,

for example, open("file.txt") returns a file object (and opens the file), iterating over it iterates over lines in that file

with open(filename) as f:
    for line in f:
        # do something with line

If that seems like magic, well it kinda is, but the idea behind it is really simple.

There’s a simple iterator protocol that can be applied to any kind of object to make the for loop work on it.

Simply implement an iterator that defines a next() method, and implement an __iter__ method on a class to make it iterable. (the __iter__ of course, should return an iterator object, that is, an object that defines next())

See official documentation


回答 1

如果在遍历字符串时需要访问索引,请使用enumerate()

>>> for i, c in enumerate('test'):
...     print i, c
... 
0 t
1 e
2 s
3 t

If you need access to the index as you iterate through the string, use enumerate():

>>> for i, c in enumerate('test'):
...     print i, c
... 
0 t
1 e
2 s
3 t

回答 2

更简单:

for c in "test":
    print c

Even easier:

for c in "test":
    print c

回答 3

只是为了做出更全面的回答,如果您确实想将方钉强行塞入圆孔中,则对字符串进行迭代的C方法可以在Python中应用。

i = 0
while i < len(str):
    print str[i]
    i += 1

但是话又说回来,当字符串具有固有的可迭代性时,为什么要这样做呢?

for i in str:
    print i

Just to make a more comprehensive answer, the C way of iterating over a string can apply in Python, if you really wanna force a square peg into a round hole.

i = 0
while i < len(str):
    print str[i]
    i += 1

But then again, why do that when strings are inherently iterable?

for i in str:
    print i

回答 4

好吧,您也可以像这样做一些有趣的事情,并通过使用for循环来完成您的工作

#suppose you have variable name
name = "Mr.Suryaa"
for index in range ( len ( name ) ):
    print ( name[index] ) #just like c and c++ 

答案是

先生 。苏里亚

但是,由于range()创建的是序列值的列表,因此您可以直接使用名称

for e in name:
    print(e)

这也可以产生相同的结果,并且看起来更好,并且可以与列表,元组和字典之类的任何序列一起使用。

我们曾经使用过“内置函数”(Python社区中的BIF)

1)range()-range()BIF用于创建索引示例

for i in range ( 5 ) :
can produce 0 , 1 , 2 , 3 , 4

2)len()-len()BIF用于找出给定字符串的长度

Well you can also do something interesting like this and do your job by using for loop

#suppose you have variable name
name = "Mr.Suryaa"
for index in range ( len ( name ) ):
    print ( name[index] ) #just like c and c++ 

Answer is

M r . S u r y a a

However since range() create a list of the values which is sequence thus you can directly use the name

for e in name:
    print(e)

This also produces the same result and also looks better and works with any sequence like list, tuple, and dictionary.

We have used tow Built in Functions ( BIFs in Python Community )

1) range() – range() BIF is used to create indexes Example

for i in range ( 5 ) :
can produce 0 , 1 , 2 , 3 , 4

2) len() – len() BIF is used to find out the length of given string


回答 5

如果您想使用一种更实用的方法遍历字符串(可能以某种方式进行转换),则可以将字符串拆分为字符,对每个函数应用一个函数,然后将所得的字符列表重新组合为字符串。

字符串本质上是一个字符列表,因此“ map”将遍历字符串-作为第二个参数-将函数-第一个参数应用于每个参数。

例如,这里我使用一种简单的lambda方法,因为我要做的只是对字符的微不足道的修改:在这里,增加每个字符的值:

>>> ''.join(map(lambda x: chr(ord(x)+1), "HAL"))
'IBM'

或更一般而言:

>>> ''.join(map(my_function, my_string))

其中my_function接受一个char值并返回一个char值。

If you would like to use a more functional approach to iterating over a string (perhaps to transform it somehow), you can split the string into characters, apply a function to each one, then join the resulting list of characters back into a string.

A string is inherently a list of characters, hence ‘map’ will iterate over the string – as second argument – applying the function – the first argument – to each one.

For example, here I use a simple lambda approach since all I want to do is a trivial modification to the character: here, to increment each character value:

>>> ''.join(map(lambda x: chr(ord(x)+1), "HAL"))
'IBM'

or more generally:

>>> ''.join(map(my_function, my_string))

where my_function takes a char value and returns a char value.


回答 6

这里使用几个答案rangexrange通常会更好,因为它返回生成器,而不是完全实例化的列表。在内存和/或长度可变的可迭代项可能成为问题的情况下,它xrange是优越的。

Several answers here use range. xrange is generally better as it returns a generator, rather than a fully-instantiated list. Where memory and or iterables of widely-varying lengths can be an issue, xrange is superior.


回答 7

如果您曾经在需要的情况下运行get the next char of the word using __next__(),请记住创建一个string_iterator并对其进行迭代,而不要迭代original string (it does not have the __next__() method)

在此示例中,当我找到一个char =时,[我一直在寻找下一个单词,但没有找到],所以我需要使用__next__

这里的字符串的for循环将无济于事

myString = "'string' 4 '['RP0', 'LC0']' '[3, 4]' '[3, '4']'"
processedInput = ""
word_iterator = myString.__iter__()
for idx, char in enumerate(word_iterator):
    if char == "'":
        continue

    processedInput+=char

    if char == '[':
        next_char=word_iterator.__next__()
        while(next_char != "]"):
          processedInput+=next_char
          next_char=word_iterator.__next__()
        else:
          processedInput+=next_char

If you ever run in a situation where you need to get the next char of the word using __next__(), remember to create a string_iterator and iterate over it and not the original string (it does not have the __next__() method)

In this example, when I find a char = [ I keep looking into the next word while I don’t find ], so I need to use __next__

here a for loop over the string wouldn’t help

myString = "'string' 4 '['RP0', 'LC0']' '[3, 4]' '[3, '4']'"
processedInput = ""
word_iterator = myString.__iter__()
for idx, char in enumerate(word_iterator):
    if char == "'":
        continue

    processedInput+=char

    if char == '[':
        next_char=word_iterator.__next__()
        while(next_char != "]"):
          processedInput+=next_char
          next_char=word_iterator.__next__()
        else:
          processedInput+=next_char

列表是否包含简短的包含功能?

问题:列表是否包含简短的包含功能?

我看到人们正在使用any另一个列表来查看列表中是否存在某项,但是有一种快速的方法吗?

if list.contains(myItem):
    # do something

I see people are using any to gather another list to see if an item exists in a list, but is there a quick way to just do?:

if list.contains(myItem):
    # do something

回答 0

您可以使用以下语法:

if myItem in list:
    # do something

同样,逆运算符:

if myItem not in list:
    # do something

它适用于列表,元组,集合和字典(检查键)。

请注意,这是列表和元组中的O(n)操作,而集合和字典中是O(1)操作。

You can use this syntax:

if myItem in list:
    # do something

Also, inverse operator:

if myItem not in list:
    # do something

It’s work fine for lists, tuples, sets and dicts (check keys).

Note that this is an O(n) operation in lists and tuples, but an O(1) operation in sets and dicts.


回答 1

除了别人说过的话,您可能还想知道什么in是调用list.__contains__方法,您可以在编写的任何类上定义该方法,并且可以非常方便地全面使用python。  

愚蠢的用途可能是:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

In addition to what other have said, you may also be interested to know that what in does is to call the list.__contains__ method, that you can define on any class you write and can get extremely handy to use python at his full extent.  

A dumb use may be:

>>> class ContainsEverything:
    def __init__(self):
        return None
    def __contains__(self, *elem, **k):
        return True


>>> a = ContainsEverything()
>>> 3 in a
True
>>> a in a
True
>>> False in a
True
>>> False not in a
False
>>>         

回答 2

我最近想出了这条衬垫,用于获取True列表中是否包含任何数量的项目,或者该列表中不包含任何项目或False根本不包含任何项目。使用next(...)会给它提供默认的返回值(False),这意味着它的运行速度应比运行整个列表理解的速度快得多。

list_does_contain = next((True for item in list_to_test if item == test_item), False)

I came up with this one liner recently for getting True if a list contains any number of occurrences of an item, or False if it contains no occurrences or nothing at all. Using next(...) gives this a default return value (False) and means it should run significantly faster than running the whole list comprehension.

list_does_contain = next((True for item in list_to_test if item == test_item), False)


回答 3

如果该项目不存在,则list方法index将返回,-1如果该项目存在,则将返回该项目在列表中的索引。或者,if您可以在语句中执行以下操作:

if myItem in list:
    #do things

您还可以使用以下if语句检查元素是否不在列表中:

if myItem not in list:
    #do things

The list method index will return -1 if the item is not present, and will return the index of the item in the list if it is present. Alternatively in an if statement you can do the following:

if myItem in list:
    #do things

You can also check if an element is not in a list with the following if statement:

if myItem not in list:
    #do things

“ is”运算符对整数的行为异常

问题:“ is”运算符对整数的行为异常

为什么以下内容在Python中表现异常?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

我正在使用Python 2.5.2。尝试使用某些不同版本的Python,Python 2.3.3似乎在99到100之间显示了上述行为。

基于以上所述,我可以假设Python是内部实现的,因此“小”整数的存储方式与大整数的存储方式不同,并且is运算符可以分辨出这种差异。为什么要泄漏抽象?当我事先不知道它们是否为数字时,比较两个任意对象以查看它们是否相同的更好方法是什么?

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result
>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?
>>> 257 is 257
True           # Yet the literal numbers compare properly

I am using Python 2.5.2. Trying some different versions of Python, it appears that Python 2.3.3 shows the above behaviour between 99 and 100.

Based on the above, I can hypothesize that Python is internally implemented such that “small” integers are stored in a different way than larger integers and the is operator can tell the difference. Why the leaky abstraction? What is a better way of comparing two arbitrary objects to see whether they are the same when I don’t know in advance whether they are numbers or not?


回答 0

看看这个:

>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

这是我在Python 2文档“普通整数对象”中发现的内容(对于Python 3也是一样):

当前的实现为-5到256之间的所有整数保留一个整数对象数组,当您在该范围内创建int时,实际上实际上是返回对现有对象的引用。因此应该可以更改1的值。我怀疑在这种情况下Python的行为是不确定的。:-)

Take a look at this:

>>> a = 256
>>> b = 256
>>> id(a)
9987148
>>> id(b)
9987148
>>> a = 257
>>> b = 257
>>> id(a)
11662816
>>> id(b)
11662828

Here’s what I found in the Python 2 documentation, “Plain Integer Objects” (It’s the same for Python 3):

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)


回答 1

Python的“ is”运算符在使用整数时表现异常吗?

总结-让我强调一下:不要is用于比较整数。

这不是您应该有任何期望的行为。

相反,分别使用==!=比较相等和不平等。例如:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

说明

要知道这一点,您需要了解以下内容。

首先,该怎么is办?它是一个比较运算符。从文档中

运算符isis not测试对象标识:x is y当且仅当x和y是同一对象时才为true。x is not y产生反真值。

因此,以下内容是等效的。

>>> a is b
>>> id(a) == id(b)

文档中

id 返回对象的“身份”。这是一个整数(或长整数),在该对象的生存期内,此整数保证是唯一且恒定的。具有不重叠生存期的两个对象可能具有相同的id()值。

请注意,CPython(Python的参考实现)中对象的ID是内存中的位置这一事实是实现细节。Python的其他实现(例如Jython或IronPython)可以轻松地使用的不同实现id

那么用例是is什么呢? PEP8描述

与单例之类的比较None应始终使用isis not,而不应使用相等运算符。

问题

您询问并陈述以下问题(带有代码):

为什么以下内容在Python中表现异常?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

不是预期的结果。为什么会这样?这仅意味着256两者a和引用的整数值是整数b的相同实例。整数在Python中是不可变的,因此它们不能更改。这对任何代码都没有影响。不应期望。这仅仅是一个实现细节。

但是也许我们应该为每次声明一个等于256的值而在内存中没有新的单独实例感到高兴。

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

看起来我们现在有两个单独的整数实例,它们的值257在内存中。由于整数是不可变的,因此会浪费内存。希望我们不要浪费很多。我们可能不是。但是不能保证这种行为。

>>> 257 is 257
True           # Yet the literal numbers compare properly

好吧,这看起来好像您的Python特定实现正在尝试变得聪明,除非必须这样做,否则不会在内存中创建冗余值的整数。您似乎表明您正在使用Python的引用实现,即CPython。对CPython有好处。

如果CPython可以在全球范围内做到这一点甚至更好,如果它可以便宜地做到这一点(因为查找会花费一定的成本),也许还有另一种实现方式。

但是对于对代码的影响,您不必在乎整数是否是整数的特定实例。您只需要关心该实例的值是什么,就可以使用普通的比较运算符,即==

是什么is

is检查id两个对象的相同。在CPython中,id是内存中的位置,但是在另一个实现中,它可能是其他一些唯一标识的数字。要用代码重新声明:

>>> a is b

是相同的

>>> id(a) == id(b)

那我们为什么要使用is呢?

相对于说,这是一个非常快速的检查,检查两个很长的字符串的值是否相等。但是由于它适用于对象的唯一性,因此我们的用例有限。实际上,我们主要是想用它来检查None,这是一个单例(内存中一个地方存在的唯一实例)。如果有可能将其他单例合并is,我们可以创建其他单例,我们可能会与进行检查,但这相对较少。这是一个示例(将在Python 2和3中运行),例如

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

哪些打印:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

因此,我们看到,使用is和和哨兵,我们可以区分何时bar不带参数调用和何时带调用None。这些是主要的用例的is-不要没有用它来测试整数,字符串,元组,或者其他喜欢这些东西的平等。

Python’s “is” operator behaves unexpectedly with integers?

In summary – let me emphasize: Do not use is to compare integers.

This isn’t behavior you should have any expectations about.

Instead, use == and != to compare for equality and inequality, respectively. For example:

>>> a = 1000
>>> a == 1000       # Test integers like this,
True
>>> a != 5000       # or this!
True
>>> a is 1000       # Don't do this! - Don't use `is` to test integers!!
False

Explanation

To know this, you need to know the following.

First, what does is do? It is a comparison operator. From the documentation:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value.

And so the following are equivalent.

>>> a is b
>>> id(a) == id(b)

From the documentation:

id Return the “identity” of an object. This is an integer (or long integer) which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same id() value.

Note that the fact that the id of an object in CPython (the reference implementation of Python) is the location in memory is an implementation detail. Other implementations of Python (such as Jython or IronPython) could easily have a different implementation for id.

So what is the use-case for is? PEP8 describes:

Comparisons to singletons like None should always be done with is or is not, never the equality operators.

The Question

You ask, and state, the following question (with code):

Why does the following behave unexpectedly in Python?

>>> a = 256
>>> b = 256
>>> a is b
True           # This is an expected result

It is not an expected result. Why is it expected? It only means that the integers valued at 256 referenced by both a and b are the same instance of integer. Integers are immutable in Python, thus they cannot change. This should have no impact on any code. It should not be expected. It is merely an implementation detail.

But perhaps we should be glad that there is not a new separate instance in memory every time we state a value equals 256.

>>> a = 257
>>> b = 257
>>> a is b
False          # What happened here? Why is this False?

Looks like we now have two separate instances of integers with the value of 257 in memory. Since integers are immutable, this wastes memory. Let’s hope we’re not wasting a lot of it. We’re probably not. But this behavior is not guaranteed.

>>> 257 is 257
True           # Yet the literal numbers compare properly

Well, this looks like your particular implementation of Python is trying to be smart and not creating redundantly valued integers in memory unless it has to. You seem to indicate you are using the referent implementation of Python, which is CPython. Good for CPython.

It might be even better if CPython could do this globally, if it could do so cheaply (as there would a cost in the lookup), perhaps another implementation might.

But as for impact on code, you should not care if an integer is a particular instance of an integer. You should only care what the value of that instance is, and you would use the normal comparison operators for that, i.e. ==.

What is does

is checks that the id of two objects are the same. In CPython, the id is the location in memory, but it could be some other uniquely identifying number in another implementation. To restate this with code:

>>> a is b

is the same as

>>> id(a) == id(b)

Why would we want to use is then?

This can be a very fast check relative to say, checking if two very long strings are equal in value. But since it applies to the uniqueness of the object, we thus have limited use-cases for it. In fact, we mostly want to use it to check for None, which is a singleton (a sole instance existing in one place in memory). We might create other singletons if there is potential to conflate them, which we might check with is, but these are relatively rare. Here’s an example (will work in Python 2 and 3) e.g.

SENTINEL_SINGLETON = object() # this will only be created one time.

def foo(keyword_argument=None):
    if keyword_argument is None:
        print('no argument given to foo')
    bar()
    bar(keyword_argument)
    bar('baz')

def bar(keyword_argument=SENTINEL_SINGLETON):
    # SENTINEL_SINGLETON tells us if we were not passed anything
    # as None is a legitimate potential argument we could get.
    if keyword_argument is SENTINEL_SINGLETON:
        print('no argument given to bar')
    else:
        print('argument to bar: {0}'.format(keyword_argument))

foo()

Which prints:

no argument given to foo
no argument given to bar
argument to bar: None
argument to bar: baz

And so we see, with is and a sentinel, we are able to differentiate between when bar is called with no arguments and when it is called with None. These are the primary use-cases for is – do not use it to test for equality of integers, strings, tuples, or other things like these.


回答 2

这取决于您是否要看两个事物是否相等或相同的对象。

is检查它们是否是相同的对象,而不仅仅是相等。小整数可能指向相同的内存位置以提高空间效率

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

您应该==用来比较任意对象的相等性。您可以使用__eq____ne__属性指定行为。

It depends on whether you’re looking to see if 2 things are equal, or the same object.

is checks to see if they are the same object, not just equal. The small ints are probably pointing to the same memory location for space efficiency

In [29]: a = 3
In [30]: b = 3
In [31]: id(a)
Out[31]: 500729144
In [32]: id(b)
Out[32]: 500729144

You should use == to compare equality of arbitrary objects. You can specify the behavior with the __eq__, and __ne__ attributes.


回答 3

我来晚了,但是,您想从中获得答案吗?我将尝试以介绍性的方式对此进行说明,以便更多的人可以跟进。


关于CPython的一件好事是您实际上可以看到其来源。我将使用3.5版本的链接,但是找到相应的2.x链接是微不足道的。

在CPython中,用于创建新对象的C-API函数intPyLong_FromLong(long v)。此功能的说明是:

当前的实现为-5到256之间的所有整数保留一个整数对象数组,当您在该范围内创建int时,实际上实际上是返回对现有对象的引用。因此应该可以更改1的值。我怀疑在这种情况下Python的行为是不确定的。:-)

(我的斜体)

不了解您,但我看到了并想:让我们找到那个数组!

如果您还不熟悉实现CPython的C代码,则应 ; 一切都井井有条,可读性强。对于我们而言,我们需要在看Objects子目录中的主源代码目录树

PyLong_FromLong处理long对象,因此不难推断我们需要窥视内部longobject.c。看完内部,您可能会觉得事情很混乱。它们是,但不要担心,我们正在寻找的功能在第230行令人不寒而栗,等待我们检查出来。这是一个很小的函数,因此主体(不包括声明)可以轻松粘贴到此处:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

现在,我们不是C 主代码-haxxorz,但我们也不傻,我们可以看到这CHECK_SMALL_INT(ival);一切诱人地窥视着我们。我们可以理解,这与此有关。让我们来看看:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

因此,get_small_int如果值ival满足条件,则它是一个调用函数的宏:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

那么什么是NSMALLNEGINTSNSMALLPOSINTS?宏!他们在这里

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

所以我们的条件是if (-5 <= ival && ival < 257)通话get_small_int

接下来,让我们看一下get_small_int它的所有荣耀(好吧,我们只看它的身体,因为那是有趣的地方):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

好的,声明一个PyObject,断言先前的条件成立并执行赋值:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints看起来很像我们一直在寻找的那个数组,它是!我们只要阅读该死的文档,我们就永远知道!

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

是的,这是我们的家伙。当您要int在该范围内创建一个新[NSMALLNEGINTS, NSMALLPOSINTS)对象时,您只需返回对已预先分配的现有对象的引用。

由于引用引用的是同一对象,因此id()直接发布或检查其上的身份is将返回完全相同的内容。

但是,什么时候分配它们?

_PyLong_Init Python 初始化期间,将很乐意进入for循环为您执行此操作:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

查看源代码以阅读循环体!

希望我的解释使您现在对C的认识清楚(很明显是故意的)。


但是,257 is 257?这是怎么回事?

这实际上更容易解释,我已经尝试过这样做;这是由于Python将这个交互式语句作为一个单独的块执行:

>>> 257 is 257

在编译此语句期间,CPython将看到您有两个匹配的文字,并将使用相同的PyLongObject表示形式257。如果您自己进行编译并检查其内容,则可以看到以下内容:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

当CPython进行操作时,现在将要加载完全相同的对象:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

所以is会回来的True

I’m late but, you want some source with your answer? I’ll try and word this in an introductory manner so more folks can follow along.


A good thing about CPython is that you can actually see the source for this. I’m going to use links for the 3.5 release, but finding the corresponding 2.x ones is trivial.

In CPython, the C-API function that handles creating a new int object is PyLong_FromLong(long v). The description for this function is:

The current implementation keeps an array of integer objects for all integers between -5 and 256, when you create an int in that range you actually just get back a reference to the existing object. So it should be possible to change the value of 1. I suspect the behaviour of Python in this case is undefined. :-)

(My italics)

Don’t know about you but I see this and think: Let’s find that array!

If you haven’t fiddled with the C code implementing CPython you should; everything is pretty organized and readable. For our case, we need to look in the Objects subdirectory of the main source code directory tree.

PyLong_FromLong deals with long objects so it shouldn’t be hard to deduce that we need to peek inside longobject.c. After looking inside you might think things are chaotic; they are, but fear not, the function we’re looking for is chilling at line 230 waiting for us to check it out. It’s a smallish function so the main body (excluding declarations) is easily pasted here:

PyObject *
PyLong_FromLong(long ival)
{
    // omitting declarations

    CHECK_SMALL_INT(ival);

    if (ival < 0) {
        /* negate: cant write this as abs_ival = -ival since that
           invokes undefined behaviour when ival is LONG_MIN */
        abs_ival = 0U-(unsigned long)ival;
        sign = -1;
    }
    else {
        abs_ival = (unsigned long)ival;
    }

    /* Fast path for single-digit ints */
    if (!(abs_ival >> PyLong_SHIFT)) {
        v = _PyLong_New(1);
        if (v) {
            Py_SIZE(v) = sign;
            v->ob_digit[0] = Py_SAFE_DOWNCAST(
                abs_ival, unsigned long, digit);
        }
        return (PyObject*)v; 
}

Now, we’re no C master-code-haxxorz but we’re also not dumb, we can see that CHECK_SMALL_INT(ival); peeking at us all seductively; we can understand it has something to do with this. Let’s check it out:

#define CHECK_SMALL_INT(ival) \
    do if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS) { \
        return get_small_int((sdigit)ival); \
    } while(0)

So it’s a macro that calls function get_small_int if the value ival satisfies the condition:

if (-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS)

So what are NSMALLNEGINTS and NSMALLPOSINTS? Macros! Here they are:

#ifndef NSMALLPOSINTS
#define NSMALLPOSINTS           257
#endif
#ifndef NSMALLNEGINTS
#define NSMALLNEGINTS           5
#endif

So our condition is if (-5 <= ival && ival < 257) call get_small_int.

Next let’s look at get_small_int in all its glory (well, we’ll just look at its body because that’s where the interesting things are):

PyObject *v;
assert(-NSMALLNEGINTS <= ival && ival < NSMALLPOSINTS);
v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];
Py_INCREF(v);

Okay, declare a PyObject, assert that the previous condition holds and execute the assignment:

v = (PyObject *)&small_ints[ival + NSMALLNEGINTS];

small_ints looks a lot like that array we’ve been searching for, and it is! We could’ve just read the damn documentation and we would’ve know all along!:

/* Small integers are preallocated in this array so that they
   can be shared.
   The integers that are preallocated are those in the range
   -NSMALLNEGINTS (inclusive) to NSMALLPOSINTS (not inclusive).
*/
static PyLongObject small_ints[NSMALLNEGINTS + NSMALLPOSINTS];

So yup, this is our guy. When you want to create a new int in the range [NSMALLNEGINTS, NSMALLPOSINTS) you’ll just get back a reference to an already existing object that has been preallocated.

Since the reference refers to the same object, issuing id() directly or checking for identity with is on it will return exactly the same thing.

But, when are they allocated??

During initialization in _PyLong_Init Python will gladly enter in a for loop do do this for you:

for (ival = -NSMALLNEGINTS; ival <  NSMALLPOSINTS; ival++, v++) {

Check out the source to read the loop body!

I hope my explanation has made you C things clearly now (pun obviously intented).


But, 257 is 257? What’s up?

This is actually easier to explain, and I have attempted to do so already; it’s due to the fact that Python will execute this interactive statement as a single block:

>>> 257 is 257

During complilation of this statement, CPython will see that you have two matching literals and will use the same PyLongObject representing 257. You can see this if you do the compilation yourself and examine its contents:

>>> codeObj = compile("257 is 257", "blah!", "exec")
>>> codeObj.co_consts
(257, None)

When CPython does the operation, it’s now just going to load the exact same object:

>>> import dis
>>> dis.dis(codeObj)
  1           0 LOAD_CONST               0 (257)   # dis
              3 LOAD_CONST               0 (257)   # dis again
              6 COMPARE_OP               8 (is)

So is will return True.


回答 4

您可以检入源文件intobject.c,Python会缓存小整数以提高效率。每次创建对小整数的引用时,都是在引用缓存的小整数,而不是新对象。257不是一个小整数,因此它被计算为另一个对象。

最好==用于此目的。

As you can check in source file intobject.c, Python caches small integers for efficiency. Every time you create a reference to a small integer, you are referring the cached small integer, not a new object. 257 is not an small integer, so it is calculated as a different object.

It is better to use == for that purpose.


回答 5

我认为您的假设是正确的。实验id(对象的身份):

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

看来数字<= 255被当作​​文字,而上面的任何东西都被不同地对待!

I think your hypotheses is correct. Experiment with id (identity of object):

In [1]: id(255)
Out[1]: 146349024

In [2]: id(255)
Out[2]: 146349024

In [3]: id(257)
Out[3]: 146802752

In [4]: id(257)
Out[4]: 148993740

In [5]: a=255

In [6]: b=255

In [7]: c=257

In [8]: d=257

In [9]: id(a), id(b), id(c), id(d)
Out[9]: (146349024, 146349024, 146783024, 146804020)

It appears that numbers <= 255 are treated as literals and anything above is treated differently!


回答 6

对于整数,字符串或日期时间之类的不可变值对象,对象标识并不是特别有用。最好考虑平等。身份本质上是值对象的实现细节-由于它们是不可变的,因此对同一个对象或多个对象具有多个引用之间没有有效的区别。

For immutable value objects, like ints, strings or datetimes, object identity is not especially useful. It’s better to think about equality. Identity is essentially an implementation detail for value objects – since they’re immutable, there’s no effective difference between having multiple refs to the same object or multiple objects.


回答 7

现有答案中都没有指出另一个问题。允许Python合并任何两个不可变的值,并且预先创建的小int值不是发生这种情况的唯一方法。永远不能保证 Python实现会做到这一点,但他们所做的不仅仅只是小整数。


一方面,还有一些其他预先创建的值,例如empty tuplestrbytes和一些短字符串(在CPython 3.6中,这是256个单字符Latin-1字符串)。例如:

>>> a = ()
>>> b = ()
>>> a is b
True

而且,即使是非预先创建的值也可以相同。考虑以下示例:

>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True

这不限于int值:

>>> g, h = 42.23e100, 42.23e100
>>> g is h
True

显然,CPython没有为预先创建float42.23e100。那么,这是怎么回事?

CPython的编译器将合并一些已知不变类型等的恒定值intfloatstrbytes,在相同的编译单元。对于一个模块,整个模块是一个编译单元,但是在交互式解释器中,每个语句都是一个单独的编译单元。由于cd是在单独的语句中定义的,因此不会合并它们的值。由于ef是在同一条语句中定义的,因此将合并它们的值。


您可以通过分解字节码来查看发生了什么。尝试定义一个执行该操作的函数,e, f = 128, 128然后对其进行调用dis.dis,您将看到只有一个常数值(128, 128)

>>> def f(): i, j = 258, 258
>>> dis.dis(f)
  1           0 LOAD_CONST               2 ((128, 128))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               0 (i)
              6 STORE_FAST               1 (j)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480

您可能会注意到,128即使字节码实际上并未使用编译器,编译器也已将其存储为常量,这使您了解了CPython编译器所做的优化很少。这意味着(非空)元组实际上不会最终合并:

>>> k, l = (1, 2), (1, 2)
>>> k is l
False

把在一个函数,dis它,看看co_consts-there是一个12两个(1, 2)共享相同的元组12,但不相同,并且((1, 2), (1, 2))具有两个不同的元组相等的元组。


CPython还有另外一个优化:字符串实习。与编译器常量折叠不同,这不限于源代码文字:

>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True

另一方面,它仅限于内部存储类型“ ascii compact”,“ compact”或“ legacy ready”str类型和字符串,并且在许多情况下,只有“ ascii compact”会被嵌入。


无论如何,不​​同实现之间,同一实现的版本之间,甚至同一实现的同一副本上运行相同代码的时间之间,关于值必须是,可能是或不能不同的规则有所不同。 。

有趣的是值得学习一个特定Python的规则。但是在代码中不值得依赖它们。唯一安全的规则是:

  • 不要编写假定两个相等但分别创建的不可变值相同的代码(不要使用x is y,请使用x == y
  • 不要编写假定两个相等但分别创建的不可变值不同的代码(不要使用x is not y,请使用x != y

或者,换句话说,仅用于is测试已记录的单例(如None)或仅在代码中的一个位置创建的单例(如_sentinel = object()成语)。

There’s another issue that isn’t pointed out in any of the existing answers. Python is allowed to merge any two immutable values, and pre-created small int values are not the only way this can happen. A Python implementation is never guaranteed to do this, but they all do it for more than just small ints.


For one thing, there are some other pre-created values, such as the empty tuple, str, and bytes, and some short strings (in CPython 3.6, it’s the 256 single-character Latin-1 strings). For example:

>>> a = ()
>>> b = ()
>>> a is b
True

But also, even non-pre-created values can be identical. Consider these examples:

>>> c = 257
>>> d = 257
>>> c is d
False
>>> e, f = 258, 258
>>> e is f
True

And this isn’t limited to int values:

>>> g, h = 42.23e100, 42.23e100
>>> g is h
True

Obviously, CPython doesn’t come with a pre-created float value for 42.23e100. So, what’s going on here?

The CPython compiler will merge constant values of some known-immutable types like int, float, str, bytes, in the same compilation unit. For a module, the whole module is a compilation unit, but at the interactive interpreter, each statement is a separate compilation unit. Since c and d are defined in separate statements, their values aren’t merged. Since e and f are defined in the same statement, their values are merged.


You can see what’s going on by disassembling the bytecode. Try defining a function that does e, f = 128, 128 and then calling dis.dis on it, and you’ll see that there’s a single constant value (128, 128)

>>> def f(): i, j = 258, 258
>>> dis.dis(f)
  1           0 LOAD_CONST               2 ((128, 128))
              2 UNPACK_SEQUENCE          2
              4 STORE_FAST               0 (i)
              6 STORE_FAST               1 (j)
              8 LOAD_CONST               0 (None)
             10 RETURN_VALUE
>>> f.__code__.co_consts
(None, 128, (128, 128))
>>> id(f.__code__.co_consts[1], f.__code__.co_consts[2][0], f.__code__.co_consts[2][1])
4305296480, 4305296480, 4305296480

You may notice that the compiler has stored 128 as a constant even though it’s not actually used by the bytecode, which gives you an idea of how little optimization CPython’s compiler does. Which means that (non-empty) tuples actually don’t end up merged:

>>> k, l = (1, 2), (1, 2)
>>> k is l
False

Put that in a function, dis it, and look at the co_consts—there’s a 1 and a 2, two (1, 2) tuples that share the same 1 and 2 but are not identical, and a ((1, 2), (1, 2)) tuple that has the two distinct equal tuples.


There’s one more optimization that CPython does: string interning. Unlike compiler constant folding, this isn’t restricted to source code literals:

>>> m = 'abc'
>>> n = 'abc'
>>> m is n
True

On the other hand, it is limited to the str type, and to strings of internal storage kind “ascii compact”, “compact”, or “legacy ready”, and in many cases only “ascii compact” will get interned.


At any rate, the rules for what values must be, might be, or cannot be distinct vary from implementation to implementation, and between versions of the same implementation, and maybe even between runs of the same code on the same copy of the same implementation.

It can be worth learning the rules for one specific Python for the fun of it. But it’s not worth relying on them in your code. The only safe rule is:

  • Do not write code that assumes two equal but separately-created immutable values are identical (don’t use x is y, use x == y)
  • Do not write code that assumes two equal but separately-created immutable values are distinct (don’t use x is not y, use x != y)

Or, in other words, only use is to test for the documented singletons (like None) or that are only created in one place in the code (like the _sentinel = object() idiom).


回答 8

is 身份相等运算符(功能类似于id(a) == id(b));只是两个相等的数字不一定是同一对象。出于性能原因,一些小整数正好会被记住,因此它们往往是相同的(因为它们是不可变的,因此可以这样做)。

===另一方面,PHP的运算符被描述为检查相等性和类型:x == y and type(x) == type(y)根据Paulo Freitas的评论。这足以满足通用数,但不同于以荒谬方式is定义的类__eq__

class Unequal:
    def __eq__(self, other):
        return False

对于“内置”类,PHP显然允许相同的东西(我指的是在C级实现,而不是在PHP中实现)。计时器对象可能有点荒谬,它每次用作数字时,其值都不同。相当为什么要模拟Visual Basic,Now而不是显示它是带有time.time()我不知道。

Greg Hewgill(OP)发表了一条澄清的评论:“我的目标是比较对象标识,而不是价值相等。除了数字,我希望对象标识与价值相等相同。”

这将有另一个答案,因为我们必须将事物归类为数字,以选择是否与==或进行比较isCPython定义数字协议,包括PyNumber_Check,但这不能从Python本身访问。

我们可以尝试使用isinstance所有已知的数字类型,但这不可避免地是不完整的。类型模块包含一个StringTypes列表,但没有NumberTypes。从Python 2.6开始,内置数字类具有基类numbers.Number,但存在相同的问题:

import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)

顺便说一句,NumPy将产生低数字的单独实例。

我实际上不知道这个问题的答案。我想从理论上讲可以使用ctypes进行调用PyNumber_Check,但是即使该函数也已经受到参数,并且肯定不是可移植的。我们只需要对我们目前要测试的内容有所保留。

最后,此问题源于Python最初没有类型树,其谓词如Scheme number?Haskell的 类型类 Numis检查对象身份,而不是值相等。PHP的历史也很悠久,===显然is在PHP5中的对象上起作用,而在PHP4中没有。这就是跨语言(包括一种语言的版本)之间转移的越来越大的痛苦。

is is the identity equality operator (functioning like id(a) == id(b)); it’s just that two equal numbers aren’t necessarily the same object. For performance reasons some small integers happen to be memoized so they will tend to be the same (this can be done since they are immutable).

PHP’s === operator, on the other hand, is described as checking equality and type: x == y and type(x) == type(y) as per Paulo Freitas’ comment. This will suffice for common numbers, but differ from is for classes that define __eq__ in an absurd manner:

class Unequal:
    def __eq__(self, other):
        return False

PHP apparently allows the same thing for “built-in” classes (which I take to mean implemented at C level, not in PHP). A slightly less absurd use might be a timer object, which has a different value every time it’s used as a number. Quite why you’d want to emulate Visual Basic’s Now instead of showing that it is an evaluation with time.time() I don’t know.

Greg Hewgill (OP) made one clarifying comment “My goal is to compare object identity, rather than equality of value. Except for numbers, where I want to treat object identity the same as equality of value.”

This would have yet another answer, as we have to categorize things as numbers or not, to select whether we compare with == or is. CPython defines the number protocol, including PyNumber_Check, but this is not accessible from Python itself.

We could try to use isinstance with all the number types we know of, but this would inevitably be incomplete. The types module contains a StringTypes list but no NumberTypes. Since Python 2.6, the built in number classes have a base class numbers.Number, but it has the same problem:

import numpy, numbers
assert not issubclass(numpy.int16,numbers.Number)
assert issubclass(int,numbers.Number)

By the way, NumPy will produce separate instances of low numbers.

I don’t actually know an answer to this variant of the question. I suppose one could theoretically use ctypes to call PyNumber_Check, but even that function has been debated, and it’s certainly not portable. We’ll just have to be less particular about what we test for now.

In the end, this issue stems from Python not originally having a type tree with predicates like Scheme’s number?, or Haskell’s type class Num. is checks object identity, not value equality. PHP has a colorful history as well, where === apparently behaves as is only on objects in PHP5, but not PHP4. Such are the growing pains of moving across languages (including versions of one).


回答 9

字符串也会发生这种情况:

>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

现在一切似乎都很好。

>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

这也是预期的。

>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)

>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)

现在那是出乎意料的。

It also happens with strings:

>>> s = b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

Now everything seems fine.

>>> s = 'somestr'
>>> b = 'somestr'
>>> s == b, s is b, id(s), id(b)
(True, True, 4555519392, 4555519392)

That’s expected too.

>>> s1 = b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, True, 4555308080, 4555308080)

>>> s1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> b1 = 'somestrdaasd ad ad asd as dasddsg,dlfg ,;dflg, dfg a'
>>> s1 == b1, s1 is b1, id(s1), id(b1)
(True, False, 4555308176, 4555308272)

Now that’s unexpected.


回答 10

Python 3.8的新增功能:Python行为的变化

现在,当将身份检查(和 )与某些类型的文字(例如字符串,整数)一起使用时,编译器会生成SyntaxWarning。这些通常在CPython中偶然地起作用,但是语言规范无法保证。该警告建议用户改用相等性测试( 和)。isis not==!=

What’s New In Python 3.8: Changes in Python behavior:

The compiler now produces a SyntaxWarning when identity checks (is and is not) are used with certain types of literals (e.g. strings, ints). These can often work by accident in CPython, but are not guaranteed by the language spec. The warning advises users to use equality tests (== and !=) instead.


如何用空格填充Python字符串?

问题:如何用空格填充Python字符串?

我想用空格填充字符串。我知道以下内容适用于零:

>>> print  "'%06d'"%4
'000004'

但是,当我想要这个怎么办?:

'hi    '

当然,我可以测量字符串长度并这样做str+" "*leftover,但我想用最短的方法。

I want to fill out a string with spaces. I know that the following works for zero’s:

>>> print  "'%06d'"%4
'000004'

But what should I do when I want this?:

'hi    '

of course I can measure string length and do str+" "*leftover, but I’d like the shortest way.


回答 0

您可以使用str.ljust(width[, fillchar])

返回长度为width的左对齐字符串。使用指定的fillchar(默认为空格)填充。如果width小于,则返回原始字符串len(s)

>>> 'hi'.ljust(10)
'hi        '

You can do this with str.ljust(width[, fillchar]):

Return the string left justified in a string of length width. Padding is done using the specified fillchar (default is a space). The original string is returned if width is less than len(s).

>>> 'hi'.ljust(10)
'hi        '

回答 1

为了即使在格式化复杂的字符串时也可以使用灵活的方法,您可能应该使用string-formatting mini-language,无论使用哪种str.format()方法

>>> '{0: <16} StackOverflow!'.format('Hi')  # Python >=2.6
'Hi               StackOverflow!'

F-串

>>> f'{"Hi": <16} StackOverflow!'  # Python >= 3.6
'Hi               StackOverflow!'

For a flexible method that works even when formatting complicated string, you probably should use the string-formatting mini-language, using either the str.format() method

>>> '{0: <16} StackOverflow!'.format('Hi')  # Python >=2.6
'Hi               StackOverflow!'

of f-strings

>>> f'{"Hi": <16} StackOverflow!'  # Python >= 3.6
'Hi               StackOverflow!'

回答 2

新的(ish)字符串格式方法使您可以使用嵌套关键字参数来做一些有趣的事情。最简单的情况:

>>> '{message: <16}'.format(message='Hi')
'Hi             '

如果要16作为变量传递:

>>> '{message: <{width}}'.format(message='Hi', width=16)
'Hi              '

如果要为整个工具包和kaboodle传递变量,请执行以下操作

'{message:{fill}{align}{width}}'.format(
   message='Hi',
   fill=' ',
   align='<',
   width=16,
)

结果(您猜对了):

'Hi              '

The new(ish) string format method lets you do some fun stuff with nested keyword arguments. The simplest case:

>>> '{message: <16}'.format(message='Hi')
'Hi             '

If you want to pass in 16 as a variable:

>>> '{message: <{width}}'.format(message='Hi', width=16)
'Hi              '

If you want to pass in variables for the whole kit and kaboodle:

'{message:{fill}{align}{width}}'.format(
   message='Hi',
   fill=' ',
   align='<',
   width=16,
)

Which results in (you guessed it):

'Hi              '

回答 3

您可以尝试以下方法:

print "'%-100s'" % 'hi'

You can try this:

print "'%-100s'" % 'hi'

回答 4

正确的方法是使用官方文档中所述的Python格式语法

对于这种情况,它将简单地是:
'{:10}'.format('hi')
哪个输出:
'hi '

说明:

format_spec ::=  [[fill]align][sign][#][0][width][,][.precision][type]
fill        ::=  <any character>
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  integer
precision   ::=  integer
type        ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

您几乎需要知道的全部都在那里^。

更新:从python 3.6开始,使用文字字符串插值更加方便!

foo = 'foobar'
print(f'{foo:10} is great!')
# foobar     is great!

Correct way of doing this would be to use Python’s format syntax as described in the official documentation

For this case it would simply be:
'{:10}'.format('hi')
which outputs:
'hi '

Explanation:

format_spec ::=  [[fill]align][sign][#][0][width][,][.precision][type]
fill        ::=  <any character>
align       ::=  "<" | ">" | "=" | "^"
sign        ::=  "+" | "-" | " "
width       ::=  integer
precision   ::=  integer
type        ::=  "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%"

Pretty much all you need to know is there ^.

Update: as of python 3.6 it’s even more convenient with literal string interpolation!

foo = 'foobar'
print(f'{foo:10} is great!')
# foobar     is great!

回答 5

用途str.ljust()

>>> 'Hi'.ljust(6)
'Hi    '

您还应该考虑string.zfill()str.ljust()以及str.center()用于字符串格式化。这些可以链接起来并指定“ fill ”字符,因此:

>>> ('3'.zfill(8) + 'blind'.rjust(8) + 'mice'.ljust(8, '.')).center(40)
'        00000003   blindmice....        '

这些字符串格式化操作的优势在于可以在Python v2和v3中使用。

看一下pydoc str某个时间:里面有很多好东西。

Use str.ljust():

>>> 'Hi'.ljust(6)
'Hi    '

You should also consider string.zfill(), str.ljust() and str.center() for string formatting. These can be chained and have the ‘fill‘ character specified, thus:

>>> ('3'.zfill(8) + 'blind'.rjust(8) + 'mice'.ljust(8, '.')).center(40)
'        00000003   blindmice....        '

These string formatting operations have the advantage of working in Python v2 and v3.

Take a look at pydoc str sometime: there’s a wealth of good stuff in there.


回答 6

从Python 3.6开始,您可以执行

>>> strng = 'hi'
>>> f'{strng: <10}'

文字字符串插值

或者,如果您的填充大小在变量中,例如这样(感谢@Matt M.!):

>>> to_pad = 10
>>> f'{strng: <{to_pad}}'

As of Python 3.6 you can just do

>>> strng = 'hi'
>>> f'{strng: <10}'

with literal string interpolation.

Or, if your padding size is in a variable, like this (thanks @Matt M.!):

>>> to_pad = 10
>>> f'{strng: <{to_pad}}'

回答 7

您还可以将字符串居中

'{0: ^20}'.format('nice')

you can also center your string:

'{0: ^20}'.format('nice')

回答 8

对字符串使用Python 2.7的迷你格式

'{0: <8}'.format('123')

左对齐,并用”字符填充到8个字符。

Use Python 2.7’s mini formatting for strings:

'{0: <8}'.format('123')

This left aligns, and pads to 8 characters with the ‘ ‘ character.


回答 9

只需删除0,它将增加空间:

>>> print  "'%6d'"%4

Just remove the 0 and it will add space instead:

>>> print  "'%6d'"%4

回答 10

使用切片会不会更pythonic?

例如,要在字符串的右边填充空格,直到其长度为10个字符:

>>> x = "string"    
>>> (x + " " * 10)[:10]   
'string    '

要在其左侧填充空格,直到其长度为15个字符:

>>> (" " * 15 + x)[-15:]
'         string'

当然,它需要知道要填充多长时间,但是并不需要测量开始的字符串的长度。

Wouldn’t it be more pythonic to use slicing?

For example, to pad a string with spaces on the right until it’s 10 characters long:

>>> x = "string"    
>>> (x + " " * 10)[:10]   
'string    '

To pad it with spaces on the left until it’s 15 characters long:

>>> (" " * 15 + x)[-15:]
'         string'

It requires knowing how long you want to pad to, of course, but it doesn’t require measuring the length of the string you’re starting with.


回答 11

一个很好的技巧来代替各种打印格式:

(1)在右边加空格:

('hi' + '        ')[:8]

(2)在左前导零处填充:

('0000' + str(2))[-4:]

A nice trick to use in place of the various print formats:

(1) Pad with spaces to the right:

('hi' + '        ')[:8]

(2) Pad with leading zeros on the left:

('0000' + str(2))[-4:]

回答 12

您可以使用列表理解来做到这一点,这也会使您对空格的数量有所了解,并且只能是一个内衬。

"hello" + " ".join([" " for x in range(1,10)])
output --> 'hello                 '

You could do it using list comprehension, this’d give you an idea about the number of spaces too and would be a one liner.

"hello" + " ".join([" " for x in range(1,10)])
output --> 'hello                 '

Python文件的常见标头格式是什么?

问题:Python文件的常见标头格式是什么?

在有关Python编码准则的文档中,我遇到了以下Python源文件的头格式:

#!/usr/bin/env python

"""Foobar.py: Description of what foobar does."""

__author__      = "Barack Obama"
__copyright__   = "Copyright 2009, Planet Earth"

这是Python世界中标头的标准格式吗?我还可以在标题中添加哪些其他字段/信息?Python专家分享了有关好的Python源标头的指南:-)

I came across the following header format for Python source files in a document about Python coding guidelines:

#!/usr/bin/env python

"""Foobar.py: Description of what foobar does."""

__author__      = "Barack Obama"
__copyright__   = "Copyright 2009, Planet Earth"

Is this the standard format of headers in the Python world? What other fields/information can I put in the header? Python gurus share your guidelines for good Python source headers :-)


回答 0

它是Foobar模块的所有元数据。

第一个是docstring模块的,已经在Peter的答案中进行了解释。

如何组织模块(源文件)?(存档)

每个文件的第一行应为#!/usr/bin/env python这样就可以将文件作为脚本运行,例如在CGI上下文中隐式调用解释器。

接下来应该是带有说明的文档字符串。如果描述很长,那么第一行应该是一个简短的摘要,该摘要应具有其自身的意义,并以换行符分隔其余部分。

所有代码(包括import语句)都应遵循docstring。否则,解释器将无法识别该文档字符串,并且您将无法在交互式会话中访问该文档字符串(例如,通过obj.__doc__)或使用自动化工具生成文档时。

首先导入内置模块,然后导入第三方模块,然后再导入路径和您自己的模块。特别是,模块的路径和名称的添加可能会快速更改:将它们放在一个位置可以使它们更容易找到。

接下来应该是作者信息。此信息应遵循以下格式:

__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
                    "Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"

状态通常应为“原型”,“开发”或“生产”之一。__maintainer__应该是将修复错误并进行改进(如果导入)的人。__credits__不同之处在于__author__,这些__credits__人报告了错误修复,提出了建议等,但实际上并未编写代码。

在这里你有更多的信息,上市__author____authors____contact____copyright____license____deprecated____date____version__作为公认的元数据。

Its all metadata for the Foobar module.

The first one is the docstring of the module, that is already explained in Peter’s answer.

How do I organize my modules (source files)? (Archive)

The first line of each file shoud be #!/usr/bin/env python. This makes it possible to run the file as a script invoking the interpreter implicitly, e.g. in a CGI context.

Next should be the docstring with a description. If the description is long, the first line should be a short summary that makes sense on its own, separated from the rest by a newline.

All code, including import statements, should follow the docstring. Otherwise, the docstring will not be recognized by the interpreter, and you will not have access to it in interactive sessions (i.e. through obj.__doc__) or when generating documentation with automated tools.

Import built-in modules first, followed by third-party modules, followed by any changes to the path and your own modules. Especially, additions to the path and names of your modules are likely to change rapidly: keeping them in one place makes them easier to find.

Next should be authorship information. This information should follow this format:

__author__ = "Rob Knight, Gavin Huttley, and Peter Maxwell"
__copyright__ = "Copyright 2007, The Cogent Project"
__credits__ = ["Rob Knight", "Peter Maxwell", "Gavin Huttley",
                    "Matthew Wakefield"]
__license__ = "GPL"
__version__ = "1.0.1"
__maintainer__ = "Rob Knight"
__email__ = "rob@spot.colorado.edu"
__status__ = "Production"

Status should typically be one of “Prototype”, “Development”, or “Production”. __maintainer__ should be the person who will fix bugs and make improvements if imported. __credits__ differs from __author__ in that __credits__ includes people who reported bug fixes, made suggestions, etc. but did not actually write the code.

Here you have more information, listing __author__, __authors__, __contact__, __copyright__, __license__, __deprecated__, __date__ and __version__ as recognized metadata.


回答 1

我强烈赞成使用最少的文件头,我的意思是:

  • #!如果是可执行脚本,则使用hashbang(行)
  • 模块文档字符串
  • 导入,以标准方式分组,例如:
  import os    # standard library
  import sys

  import requests  # 3rd party packages

  import mypackage.mymodule  # local source
  import mypackage.myothermodule  

即。三组进口,它们之间有一个空白行。在每个组中,对进口进行排序。最后一组,从本地来源进口,可以是如图所示的绝对进口,也可以是明确的相对进口。

其他所有内容都浪费时间,占用视觉空间,并且会产生误导。

如果您有法律免责声明或许可信息,它将进入一个单独的文件中。它不需要感染每个源代码文件。您的版权应该是其中的一部分。人们应该可以在您的网站上找到它LICENSE文件中,而不是随机的源代码。

源代码控件已经维护了诸如作者身份和日期之类的元数据。无需在文件本身中添加更详细,错误且过时的相同信息版本。

我不认为每个人都需要将任何其他数据放入其所有源文件中。您可能有这样做的特定要求,但是根据定义,这种情况仅适用于您。它们在“为所有人推荐的通用标题”中没有位置。

I strongly favour minimal file headers, by which I mean just:

  • The hashbang (#! line) if this is an executable script
  • Module docstring
  • Imports, grouped in the standard way, eg:
  import os    # standard library
  import sys

  import requests  # 3rd party packages

  import mypackage.mymodule  # local source
  import mypackage.myothermodule  

ie. three groups of imports, with a single blank line between them. Within each group, imports are sorted. The final group, imports from local source, can either be absolute imports as shown, or explicit relative imports.

Everything else is a waste of time, visual space, and is actively misleading.

If you have legal disclaimers or licencing info, it goes into a separate file. It does not need to infect every source code file. Your copyright should be part of this. People should be able to find it in your LICENSE file, not random source code.

Metadata such as authorship and dates is already maintained by your source control. There is no need to add a less-detailed, erroneous, and out-of-date version of the same info in the file itself.

I don’t believe there is any other data that everyone needs to put into all their source files. You may have some particular requirement to do so, but such things apply, by definition, only to you. They have no place in “general headers recommended for everyone”.


回答 2

上面的答案确实很完整,但是如果您想要一个快速且脏的标题来复制粘贴,请使用以下命令:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Module documentation goes here
   and here
   and ...
"""

为什么这是一个好人:

  • 第一行适用于* nix用户。它将在用户路径中选择Python解释器,因此将自动选择用户首选的解释器。
  • 第二个是文件编码。如今,每个文件都必须具有关联的编码。UTF-8可以在任何地方使用。只是遗留项目会使用其他编码。
  • 和一个非常简单的文档。它可以填充多行。

也可以看看: https //www.python.org/dev/peps/pep-0263/

如果仅在每个文件中编写一个类,则甚至不需要文档(它会放在类doc中)。

The answers above are really complete, but if you want a quick and dirty header to copy’n paste, use this:

#!/usr/bin/env python
# -*- coding: utf-8 -*-

"""Module documentation goes here
   and here
   and ...
"""

Why this is a good one:

  • The first line is for *nix users. It will choose the Python interpreter in the user path, so will automatically choose the user preferred interpreter.
  • The second one is the file encoding. Nowadays every file must have a encoding associated. UTF-8 will work everywhere. Just legacy projects would use other encoding.
  • And a very simple documentation. It can fill multiple lines.

See also: https://www.python.org/dev/peps/pep-0263/

If you just write a class in each file, you don’t even need the documentation (it would go inside the class doc).


回答 3

如果使用的是非ASCII字符集,另请参阅PEP 263

抽象

该PEP建议引入一种语法,以声明Python源文件的编码。然后,Python解析器将使用编码信息使用给定的编码来解释文件。最值得注意的是,这增强了源代码中Unicode文字的解释,并使得可以在例如Unicode的编辑器中直接使用UTF-8编写Unicode文字。

问题

在Python 2.1中,只能使用基于Latin-1的编码“ unicode-escape”编写Unicode文字。这使得编程环境对在许多亚洲国家这样的非拉丁1语言环境中生活和工作的Python用户而言相当不利。程序员可以使用最喜欢的编码来编写其8位字符串,但是绑定到Unicode文字的“ unicode-escape”编码。

拟议的解决方案

我建议通过在文件顶部使用特殊注释来声明编码,从而使每个源文件都可以看到和更改Python源代码。

为了使Python知道此编码声明,在处理Python源代码数据方面需要进行一些概念上的更改。

定义编码

如果未提供其他编码提示,Python将默认使用ASCII作为标准编码。

要定义源代码编码,必须将魔术注释作为源文件的第一行或第二行放置在源文件中,例如:

      # coding=<encoding name>

或(使用流行的编辑器认可的格式)

      #!/usr/bin/python
      # -*- coding: <encoding name> -*-

要么

      #!/usr/bin/python
      # vim: set fileencoding=<encoding name> :

Also see PEP 263 if you are using a non-ascii characterset

Abstract

This PEP proposes to introduce a syntax to declare the encoding of a Python source file. The encoding information is then used by the Python parser to interpret the file using the given encoding. Most notably this enhances the interpretation of Unicode literals in the source code and makes it possible to write Unicode literals using e.g. UTF-8 directly in an Unicode aware editor.

Problem

In Python 2.1, Unicode literals can only be written using the Latin-1 based encoding “unicode-escape”. This makes the programming environment rather unfriendly to Python users who live and work in non-Latin-1 locales such as many of the Asian countries. Programmers can write their 8-bit strings using the favorite encoding, but are bound to the “unicode-escape” encoding for Unicode literals.

Proposed Solution

I propose to make the Python source code encoding both visible and changeable on a per-source file basis by using a special comment at the top of the file to declare the encoding.

To make Python aware of this encoding declaration a number of concept changes are necessary with respect to the handling of Python source code data.

Defining the Encoding

Python will default to ASCII as standard encoding if no other encoding hints are given.

To define a source code encoding, a magic comment must be placed into the source files either as first or second line in the file, such as:

      # coding=<encoding name>

or (using formats recognized by popular editors)

      #!/usr/bin/python
      # -*- coding: <encoding name> -*-

or

      #!/usr/bin/python
      # vim: set fileencoding=<encoding name> :


我可以强迫pip重新安装当前版本吗?

问题:我可以强迫pip重新安装当前版本吗?

我遇到过以下情况:当前版本的软件包似乎无法正常工作,需要重新安装。但是,pip install -U不要触摸已经是最新的软件包。我看到了如何通过先卸载(使用pip uninstall)然后安装来强制进行重新安装,但是有没有办法在一个步骤中简单地将“更新”强制为名义上的当前版本?

I’ve come across situations where a current version of a package seems not to be working and requires reinstallation. But pip install -U won’t touch a package that is already up-to-date. I see how to force a reinstallation by first uninstalling (with pip uninstall) and then installing, but is there a way to simply force an “update” to a nominally current version in a single step?


回答 0

pip install --upgrade --force-reinstall <package>

升级时,请重新安装所有软件包,即使它们已经是最新的。

pip install -I <package>
pip install --ignore-installed <package>

忽略已安装的软件包(改为重新安装)。

pip install --upgrade --force-reinstall <package>

When upgrading, reinstall all packages even if they are already up-to-date.

pip install -I <package>
pip install --ignore-installed <package>

Ignore the installed packages (reinstalling instead).


回答 1

您可能希望拥有所有三个选项:--upgrade--force-reinstall确保重新安装,同时--no-deps避免重新安装依赖项。

$ sudo pip install --upgrade --no-deps --force-reinstall <packagename>

否则,您可能会遇到pip开始重新编译Numpy或其他大型软件包的问题。

You might want to have all three options: --upgrade and --force-reinstall ensures reinstallation, while --no-deps avoids reinstalling dependencies.

$ sudo pip install --upgrade --no-deps --force-reinstall <packagename>

Otherwise you might run into the problem that pip starts to recompile Numpy or other large packages.


回答 2

如果要重新安装Requirements.txt文件中指定的软件包而不进行升级,那么只需重新安装Requirements.txt文件中指定的特定版本:

pip install -r requirements.txt --ignore-installed

If you want to reinstall packages specified in a requirements.txt file, without upgrading, so just reinstall the specific versions specified in the requirements.txt file:

pip install -r requirements.txt --ignore-installed

回答 3

--force-reinstall

似乎没有强制使用python2.7和pip-1.5重新安装

我不得不用

--no-deps --ignore-installed
--force-reinstall

doesn’t appear to force reinstall using python2.7 with pip-1.5

I’ve had to use

--no-deps --ignore-installed

回答 4

如果您的文本文件包含大量软件包,则需要添加-r标志

pip install --upgrade --no-deps --force-reinstall -r requirements.txt

If you have a text file with loads of packages you need to add the -r flag

pip install --upgrade --no-deps --force-reinstall -r requirements.txt

回答 5

如果您需要强制重新安装pip本身,则可以执行以下操作:

python -m pip install --upgrade --force-reinstall pip

In the case you need to force the reinstallation of pip itself you can do:

python -m pip install --upgrade --force-reinstall pip

回答 6

sudo pip3 install --upgrade --force-reinstall --no-deps --no-cache-dir <package-name>==<package-version>

一些相关的答案:

点安装选项“忽略安装”和“强制重新安装”之间的区别

sudo pip3 install --upgrade --force-reinstall --no-deps --no-cache-dir <package-name>==<package-version>

Some relevant answers:

Difference between pip install options “ignore-installed” and “force-reinstall”