标签归档:Python

** wargs的目的和用途是什么?

问题:** wargs的目的和用途是什么?

**kwargsPython 的用途是什么?

我知道您可以objects.filter在表上进行传递**kwargs参数。  

我还可以指定时间增量timedelta(hours = time1)吗?

它是如何工作的?它被归类为“拆包”吗?喜欢a,b=1,2吗?

What are the uses for **kwargs in Python?

I know you can do an objects.filter on a table and pass in a **kwargs argument.  

Can I also do this for specifying time deltas i.e. timedelta(hours = time1)?

How exactly does it work? Is it classes as ‘unpacking’? Like a,b=1,2?


回答 0

您可以**kwargs用来让函数接受任意数量的关键字参数(“ kwargs”表示“关键字参数”):

>>> def print_keyword_args(**kwargs):
...     # kwargs is a dict of the keyword args passed to the function
...     for key, value in kwargs.iteritems():
...         print "%s = %s" % (key, value)
... 
>>> print_keyword_args(first_name="John", last_name="Doe")
first_name = John
last_name = Doe

您还可以**kwargs在调用函数时使用语法,方法是构造关键字参数字典并将其传递给函数:

>>> kwargs = {'first_name': 'Bobby', 'last_name': 'Smith'}
>>> print_keyword_args(**kwargs)
first_name = Bobby
last_name = Smith

Python指南,包含了如何工作的,有一些很好的例子沿着一个很好的解释。

<-更新->

对于使用Python 3的用户,请使用items()代替iteritems()

You can use **kwargs to let your functions take an arbitrary number of keyword arguments (“kwargs” means “keyword arguments”):

>>> def print_keyword_args(**kwargs):
...     # kwargs is a dict of the keyword args passed to the function
...     for key, value in kwargs.iteritems():
...         print "%s = %s" % (key, value)
... 
>>> print_keyword_args(first_name="John", last_name="Doe")
first_name = John
last_name = Doe

You can also use the **kwargs syntax when calling functions by constructing a dictionary of keyword arguments and passing it to your function:

>>> kwargs = {'first_name': 'Bobby', 'last_name': 'Smith'}
>>> print_keyword_args(**kwargs)
first_name = Bobby
last_name = Smith

The Python Tutorial contains a good explanation of how it works, along with some nice examples.

<–Update–>

For people using Python 3, instead of iteritems(), use items()


回答 1

开箱字典

** 打开字典包装。

这个

func(a=1, b=2, c=3)

是相同的

args = {'a': 1, 'b': 2, 'c':3}
func(**args)

如果必须构造参数,这将非常有用:

args = {'name': person.name}
if hasattr(person, "address"):
    args["address"] = person.address
func(**args)  # either expanded to func(name=person.name) or
              #                    func(name=person.name, address=person.address)

函数的打包参数

def setstyle(**styles):
    for key, value in styles.iteritems():      # styles is a regular dictionary
        setattr(someobject, key, value)

这使您可以使用如下功能:

setstyle(color="red", bold=False)

Unpacking dictionaries

** unpacks dictionaries.

This

func(a=1, b=2, c=3)

is the same as

args = {'a': 1, 'b': 2, 'c':3}
func(**args)

It’s useful if you have to construct parameters:

args = {'name': person.name}
if hasattr(person, "address"):
    args["address"] = person.address
func(**args)  # either expanded to func(name=person.name) or
              #                    func(name=person.name, address=person.address)

Packing parameters of a function

def setstyle(**styles):
    for key, value in styles.iteritems():      # styles is a regular dictionary
        setattr(someobject, key, value)

This lets you use the function like this:

setstyle(color="red", bold=False)

回答 2

kwargs只是添加到参数的字典。

字典可以包含键,值对。那就是怪兽。好的,这就是方法。

目的不是那么简单。

例如(非常假设),您有一个仅调用其他例程来完成工作的接口:

def myDo(what, where, why):
   if what == 'swim':
      doSwim(where, why)
   elif what == 'walk':
      doWalk(where, why)
   ...

现在,您将获得一个新方法“ drive”:

elif what == 'drive':
   doDrive(where, why, vehicle)

但是请稍等,这里有一个新的参数“ vehicle”-您以前不知道。现在,您必须将其添加到myDo函数的签名中。

在这里,您可以使用kwargs-您只需在签名中添加kwargs:

def myDo(what, where, why, **kwargs):
   if what == 'drive':
      doDrive(where, why, **kwargs)
   elif what == 'swim':
      doSwim(where, why, **kwargs)

这样,您不必每次某些被调用的例程可能更改时都更改接口函数的签名。

这只是一个很好的例子,您可能会发现kwargs有帮助。

kwargs is just a dictionary that is added to the parameters.

A dictionary can contain key, value pairs. And that are the kwargs. Ok, this is how.

The what for is not so simple.

For example (very hypothetical) you have an interface that just calls other routines to do the job:

def myDo(what, where, why):
   if what == 'swim':
      doSwim(where, why)
   elif what == 'walk':
      doWalk(where, why)
   ...

Now you get a new method “drive”:

elif what == 'drive':
   doDrive(where, why, vehicle)

But wait a minute, there is a new parameter “vehicle” — you did not know it before. Now you must add it to the signature of the myDo-function.

Here you can throw kwargs into play — you just add kwargs to the signature:

def myDo(what, where, why, **kwargs):
   if what == 'drive':
      doDrive(where, why, **kwargs)
   elif what == 'swim':
      doSwim(where, why, **kwargs)

This way you don’t need to change the signature of your interface function every time some of your called routines might change.

This is just one nice example you could find kwargs helpful.


回答 3

基于好的样本有时比冗长的论述更好,我将使用所有python变量参数传递工具(位置参数和命名参数)编写两个函数。您应该可以轻松地自己查看它的作用:

def f(a = 0, *args, **kwargs):
    print("Received by f(a, *args, **kwargs)")
    print("=> f(a=%s, args=%s, kwargs=%s" % (a, args, kwargs))
    print("Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)")
    g(10, 11, 12, *args, d = 13, e = 14, **kwargs)

def g(f, g = 0, *args, **kwargs):
    print("Received by g(f, g = 0, *args, **kwargs)")
    print("=> g(f=%s, g=%s, args=%s, kwargs=%s)" % (f, g, args, kwargs))

print("Calling f(1, 2, 3, 4, b = 5, c = 6)")
f(1, 2, 3, 4, b = 5, c = 6)

这是输出:

Calling f(1, 2, 3, 4, b = 5, c = 6)
Received by f(a, *args, **kwargs) 
=> f(a=1, args=(2, 3, 4), kwargs={'c': 6, 'b': 5}
Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)
Received by g(f, g = 0, *args, **kwargs)
=> g(f=10, g=11, args=(12, 2, 3, 4), kwargs={'c': 6, 'b': 5, 'e': 14, 'd': 13})

On the basis that a good sample is sometimes better than a long discourse I will write two functions using all python variable argument passing facilities (both positional and named arguments). You should easily be able to see what it does by yourself:

def f(a = 0, *args, **kwargs):
    print("Received by f(a, *args, **kwargs)")
    print("=> f(a=%s, args=%s, kwargs=%s" % (a, args, kwargs))
    print("Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)")
    g(10, 11, 12, *args, d = 13, e = 14, **kwargs)

def g(f, g = 0, *args, **kwargs):
    print("Received by g(f, g = 0, *args, **kwargs)")
    print("=> g(f=%s, g=%s, args=%s, kwargs=%s)" % (f, g, args, kwargs))

print("Calling f(1, 2, 3, 4, b = 5, c = 6)")
f(1, 2, 3, 4, b = 5, c = 6)

And here is the output:

Calling f(1, 2, 3, 4, b = 5, c = 6)
Received by f(a, *args, **kwargs) 
=> f(a=1, args=(2, 3, 4), kwargs={'c': 6, 'b': 5}
Calling g(10, 11, 12, *args, d = 13, e = 14, **kwargs)
Received by g(f, g = 0, *args, **kwargs)
=> g(f=10, g=11, args=(12, 2, 3, 4), kwargs={'c': 6, 'b': 5, 'e': 14, 'd': 13})

回答 4

Motif:*args**kwargs用作需要传递给函数调用的参数的占位符

使用*args**kwargs调用函数

def args_kwargs_test(arg1, arg2, arg3):
    print "arg1:", arg1
    print "arg2:", arg2
    print "arg3:", arg3

现在我们将使用*args上面定义的函数

#args can either be a "list" or "tuple"
>>> args = ("two", 3, 5)  
>>> args_kwargs_test(*args)

结果:

arg1:两个
arg2:3
arg3:5


现在,使用**kwargs来调用相同的功能

#keyword argument "kwargs" has to be a dictionary
>>> kwargs = {"arg3":3, "arg2":'two', "arg1":5}
>>> args_kwargs_test(**kwargs)

结果:

arg1:5
arg2:两个
arg3:3

底线:*args没有智能,它只是将传入的args插值到参数(按从左到右的顺序),同时**kwargs通过将适当的值放在所需的位置@来智能地运行

Motif: *args and **kwargs serves as a placeholder for the arguments that need to be passed to a function call

using *args and **kwargs to call a function

def args_kwargs_test(arg1, arg2, arg3):
    print "arg1:", arg1
    print "arg2:", arg2
    print "arg3:", arg3

Now we’ll use *args to call the above defined function

#args can either be a "list" or "tuple"
>>> args = ("two", 3, 5)  
>>> args_kwargs_test(*args)

result:

arg1: two
arg2: 3
arg3: 5


Now, using **kwargs to call the same function

#keyword argument "kwargs" has to be a dictionary
>>> kwargs = {"arg3":3, "arg2":'two', "arg1":5}
>>> args_kwargs_test(**kwargs)

result:

arg1: 5
arg2: two
arg3: 3

Bottomline : *args has no intelligence, it simply interpolates the passed args to the parameters(in left-to-right order) while **kwargs behaves intelligently by placing the appropriate value @ the required place


回答 5

  • kwargs**kwargs只是变量名。你可以很好地拥有**anyVariableName
  • kwargs代表“关键字参数”。但是我觉得最好将它们称为“命名参数”,因为它们只是随名称一起传递的参数(我对“关键字参数”一词中的“关键字”一词没有任何意义。我猜“关键字”通常是指编程语言保留的单词,因此程序员不要将其用于变量名。因此,我们给名称 param1param2两个传递给函数的参数值如下:func(param1="val1",param2="val2"),而不是仅传递值:func(val1,val2)。因此,我认为应将它们适当地称为“命名参数的任意数量”,因为我们可以指定任意数量的这些参数(即,funcfunc(**kwargs)

可以这么说,让我先解释“命名参数”,然后再解释“任意数量的命名参数” kwargs

命名参数

  • 命名的args应该跟随位置args
  • args的顺序并不重要
  • def function1(param1,param2="arg2",param3="arg3"):
        print("\n"+str(param1)+" "+str(param2)+" "+str(param3)+"\n")
    
    function1(1)                      #1 arg2 arg3   #1 positional arg
    function1(param1=1)               #1 arg2 arg3   #1 named arg
    function1(1,param2=2)             #1 2 arg3      #1 positional arg, 1 named arg
    function1(param1=1,param2=2)      #1 2 arg3      #2 named args       
    function1(param2=2, param1=1)     #1 2 arg3      #2 named args out of order
    function1(1, param3=3, param2=2)  #1 2 3         #
    
    #function1()                      #invalid: required argument missing
    #function1(param2=2,1)            #invalid: SyntaxError: non-keyword arg after keyword arg
    #function1(1,param1=11)           #invalid: TypeError: function1() got multiple values for argument 'param1'
    #function1(param4=4)              #invalid: TypeError: function1() got an unexpected keyword argument 'param4'

任意数量的命名参数 kwargs

  • 功能参数顺序:
    1. 位置参数
    2. 捕获任意数量参数的形式参数(带*前缀)
    3. 命名形式参数
    4. 形式参数,用于捕获任意数量的命名参数(带**前缀)
  • def function2(param1, *tupleParams, param2, param3, **dictionaryParams):
        print("param1: "+ param1)
        print("param2: "+ param2)
        print("param3: "+ param3)
        print("custom tuple params","-"*10)
        for p in tupleParams:
            print(str(p) + ",")
        print("custom named params","-"*10)
        for k,v in dictionaryParams.items():
            print(str(k)+":"+str(v))
    
    function2("arg1",
              "custom param1",
              "custom param2",
              "custom param3",
              param3="arg3",
              param2="arg2", 
              customNamedParam1 = "val1",
              customNamedParam2 = "val2"
              )
    
    # Output
    #
    #param1: arg1
    #param2: arg2
    #param3: arg3
    #custom tuple params ----------
    #custom param1,
    #custom param2,
    #custom param3,
    #custom named params ----------
    #customNamedParam2:val2
    #customNamedParam1:val1

为自定义参数传递元组和dict变量

最后,请允许我注意我们可以通过

  • 作为元组变量的“捕获任意数量参数的形式参数”
  • “形式参数捕获任意数量的命名参数”作为dict变量

因此,可以进行以下相同的调用:

tupleCustomArgs = ("custom param1", "custom param2", "custom param3")
dictCustomNamedArgs = {"customNamedParam1":"val1", "customNamedParam2":"val2"}

function2("arg1",
      *tupleCustomArgs,    #note *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs     #note **
      )

最后请注意***上面的函数调用。如果我们忽略它们,可能会导致不良结果。

省略*元组参数:

function2("arg1",
      tupleCustomArgs,   #omitting *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs
      )

版画

param1: arg1
param2: arg2
param3: arg3
custom tuple params ----------
('custom param1', 'custom param2', 'custom param3'),
custom named params ----------
customNamedParam2:val2
customNamedParam1:val1

元组上方('custom param1', 'custom param2', 'custom param3')按原样打印。

省略dict参数:

function2("arg1",
      *tupleCustomArgs,   
      param3="arg3",
      param2="arg2", 
      dictCustomNamedArgs   #omitting **
      )

dictCustomNamedArgs
         ^
SyntaxError: non-keyword arg after keyword arg
  • kwargs in **kwargs is just variable name. You can very well have **anyVariableName
  • kwargs stands for “keyword arguments”. But I feel they should better be called as “named arguments”, as these are simply arguments passed along with names (I dont find any significance to the word “keyword” in the term “keyword arguments”. I guess “keyword” usually means words reserved by programming language and hence not to be used by the programmer for variable names. No such thing is happening here in case of kwargs.). So we give names param1 and param2 to two parameter values passed to the function as follows: func(param1="val1",param2="val2"), instead of passing only values: func(val1,val2). Thus, I feel they should be appropriately called “arbitrary number of named arguments” as we can specify any number of these parameters (that is, arguments) if func has signature func(**kwargs)

So being said that let me explain “named arguments” first and then “arbitrary number of named arguments” kwargs.

Named arguments

  • named args should follow positional args
  • order of named args is not important
  • Example

    def function1(param1,param2="arg2",param3="arg3"):
        print("\n"+str(param1)+" "+str(param2)+" "+str(param3)+"\n")
    
    function1(1)                      #1 arg2 arg3   #1 positional arg
    function1(param1=1)               #1 arg2 arg3   #1 named arg
    function1(1,param2=2)             #1 2 arg3      #1 positional arg, 1 named arg
    function1(param1=1,param2=2)      #1 2 arg3      #2 named args       
    function1(param2=2, param1=1)     #1 2 arg3      #2 named args out of order
    function1(1, param3=3, param2=2)  #1 2 3         #
    
    #function1()                      #invalid: required argument missing
    #function1(param2=2,1)            #invalid: SyntaxError: non-keyword arg after keyword arg
    #function1(1,param1=11)           #invalid: TypeError: function1() got multiple values for argument 'param1'
    #function1(param4=4)              #invalid: TypeError: function1() got an unexpected keyword argument 'param4'
    

Arbitrary number of named arguments kwargs

  • Sequence of function parameters:
    1. positional parameters
    2. formal parameter capturing arbitrary number of arguments (prefixed with *)
    3. named formal parameters
    4. formal parameter capturing arbitrary number of named parameters (prefixed with **)
  • Example

    def function2(param1, *tupleParams, param2, param3, **dictionaryParams):
        print("param1: "+ param1)
        print("param2: "+ param2)
        print("param3: "+ param3)
        print("custom tuple params","-"*10)
        for p in tupleParams:
            print(str(p) + ",")
        print("custom named params","-"*10)
        for k,v in dictionaryParams.items():
            print(str(k)+":"+str(v))
    
    function2("arg1",
              "custom param1",
              "custom param2",
              "custom param3",
              param3="arg3",
              param2="arg2", 
              customNamedParam1 = "val1",
              customNamedParam2 = "val2"
              )
    
    # Output
    #
    #param1: arg1
    #param2: arg2
    #param3: arg3
    #custom tuple params ----------
    #custom param1,
    #custom param2,
    #custom param3,
    #custom named params ----------
    #customNamedParam2:val2
    #customNamedParam1:val1
    

Passing tuple and dict variables for custom args

To finish it up, let me also note that we can pass

  • “formal parameter capturing arbitrary number of arguments” as tuple variable and
  • “formal parameter capturing arbitrary number of named parameters” as dict variable

Thus the same above call can be made as follows:

tupleCustomArgs = ("custom param1", "custom param2", "custom param3")
dictCustomNamedArgs = {"customNamedParam1":"val1", "customNamedParam2":"val2"}

function2("arg1",
      *tupleCustomArgs,    #note *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs     #note **
      )

Finally note * and ** in function calls above. If we omit them, we may get ill results.

Omitting * in tuple args:

function2("arg1",
      tupleCustomArgs,   #omitting *
      param3="arg3",
      param2="arg2", 
      **dictCustomNamedArgs
      )

prints

param1: arg1
param2: arg2
param3: arg3
custom tuple params ----------
('custom param1', 'custom param2', 'custom param3'),
custom named params ----------
customNamedParam2:val2
customNamedParam1:val1

Above tuple ('custom param1', 'custom param2', 'custom param3') is printed as is.

Omitting dict args:

function2("arg1",
      *tupleCustomArgs,   
      param3="arg3",
      param2="arg2", 
      dictCustomNamedArgs   #omitting **
      )

gives

dictCustomNamedArgs
         ^
SyntaxError: non-keyword arg after keyword arg

回答 6

此外,在调用kwargs函数时,还可以混合使用不同的用法:

def test(**kwargs):
    print kwargs['a']
    print kwargs['b']
    print kwargs['c']


args = { 'b': 2, 'c': 3}

test( a=1, **args )

给出以下输出:

1
2
3

注意** kwargs必须是最后一个参数

As an addition, you can also mix different ways of usage when calling kwargs functions:

def test(**kwargs):
    print kwargs['a']
    print kwargs['b']
    print kwargs['c']


args = { 'b': 2, 'c': 3}

test( a=1, **args )

gives this output:

1
2
3

Note that **kwargs has to be the last argument


回答 7

kwargs是一种语法糖,用于将名称参数作为字典(对于func)传递,或将字典作为命名参数(对func)传递

kwargs are a syntactic sugar to pass name arguments as dictionaries(for func), or dictionaries as named arguments(to func)


回答 8

这是一个用于解释用法的简单函数:

def print_wrap(arg1, *args, **kwargs):
    print(arg1)
    print(args)
    print(kwargs)
    print(arg1, *args, **kwargs)

函数定义中指定的所有参数都将放入args列表或kwargs列表中,具体取决于它们是否为关键字参数:

>>> print_wrap('one', 'two', 'three', end='blah', sep='--')
one
('two', 'three')
{'end': 'blah', 'sep': '--'}
one--two--threeblah

如果添加永远不会传递给函数的关键字参数,则会引发错误:

>>> print_wrap('blah', dead_arg='anything')
TypeError: 'dead_arg' is an invalid keyword argument for this function

Here’s a simple function that serves to explain the usage:

def print_wrap(arg1, *args, **kwargs):
    print(arg1)
    print(args)
    print(kwargs)
    print(arg1, *args, **kwargs)

Any arguments that are not specified in the function definition will be put in the args list, or the kwargs list, depending on whether they are keyword arguments or not:

>>> print_wrap('one', 'two', 'three', end='blah', sep='--')
one
('two', 'three')
{'end': 'blah', 'sep': '--'}
one--two--threeblah

If you add a keyword argument that never gets passed to a function, an error will be raised:

>>> print_wrap('blah', dead_arg='anything')
TypeError: 'dead_arg' is an invalid keyword argument for this function

回答 9

这是一个示例,希望对您有所帮助:

#! /usr/bin/env python
#
def g( **kwargs) :
  print ( "In g ready to print kwargs" )
  print kwargs
  print ( "in g, calling f")
  f ( **kwargs )
  print ( "In g, after returning from f")

def f( **kwargs ) :
  print ( "in f, printing kwargs")
  print ( kwargs )
  print ( "In f, after printing kwargs")


g( a="red", b=5, c="Nassau")

g( q="purple", w="W", c="Charlie", d=[4, 3, 6] )

运行该程序时,您将获得:

$ python kwargs_demo.py 
In g ready to print kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
in g, calling f
in f, printing kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
In f, after printing kwargs
In g, after returning from f
In g ready to print kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
in g, calling f
in f, printing kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
In f, after printing kwargs
In g, after returning from f

这里的关键是,调用中可变数量的命名实参将转换为函数中的字典。

Here is an example that I hope is helpful:

#! /usr/bin/env python
#
def g( **kwargs) :
  print ( "In g ready to print kwargs" )
  print kwargs
  print ( "in g, calling f")
  f ( **kwargs )
  print ( "In g, after returning from f")

def f( **kwargs ) :
  print ( "in f, printing kwargs")
  print ( kwargs )
  print ( "In f, after printing kwargs")


g( a="red", b=5, c="Nassau")

g( q="purple", w="W", c="Charlie", d=[4, 3, 6] )

When you run the program, you get:

$ python kwargs_demo.py 
In g ready to print kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
in g, calling f
in f, printing kwargs
{'a': 'red', 'c': 'Nassau', 'b': 5}
In f, after printing kwargs
In g, after returning from f
In g ready to print kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
in g, calling f
in f, printing kwargs
{'q': 'purple', 'c': 'Charlie', 'd': [4, 3, 6], 'w': 'W'}
In f, after printing kwargs
In g, after returning from f

The key take away here is that the variable number of named arguments in the call translate into a dictionary in the function.


回答 10

这是了解python拆包的简单示例,

>>> def f(*args, **kwargs):
...    print 'args', args, 'kwargs', kwargs

eg1:

>>>f(1, 2)
>>> args (1,2) kwargs {} #args return parameter without reference as a tuple
>>>f(a = 1, b = 2)
>>> args () kwargs {'a': 1, 'b': 2} #args is empty tuple and kwargs return parameter with reference as a dictionary

This is the simple example to understand about python unpacking,

>>> def f(*args, **kwargs):
...    print 'args', args, 'kwargs', kwargs

eg1:

>>>f(1, 2)
>>> args (1,2) kwargs {} #args return parameter without reference as a tuple
>>>f(a = 1, b = 2)
>>> args () kwargs {'a': 1, 'b': 2} #args is empty tuple and kwargs return parameter with reference as a dictionary

回答 11

在Java中,可以使用构造函数重载类并允许多个输入参数。在python中,您可以使用kwargs提供类似的行为。

Java示例:https//beginnersbook.com/2013/05/constructor-overloading/

python示例:

class Robot():
    # name is an arg and color is a kwarg
    def __init__(self,name, color='red'):
        self.name = name
        self.color = color

red_robot = Robot('Bob')
blue_robot = Robot('Bob', color='blue')

print("I am a {color} robot named {name}.".format(color=red_robot.color, name=red_robot.name))
print("I am a {color} robot named {name}.".format(color=blue_robot.color, name=blue_robot.name))

>>> I am a red robot named Bob.
>>> I am a blue robot named Bob.

只是另一种思考方式。

In Java, you use constructors to overload classes and allow for multiple input parameters. In python, you can use kwargs to provide similar behavior.

java example: https://beginnersbook.com/2013/05/constructor-overloading/

python example:

class Robot():
    # name is an arg and color is a kwarg
    def __init__(self,name, color='red'):
        self.name = name
        self.color = color

red_robot = Robot('Bob')
blue_robot = Robot('Bob', color='blue')

print("I am a {color} robot named {name}.".format(color=red_robot.color, name=red_robot.name))
print("I am a {color} robot named {name}.".format(color=blue_robot.color, name=blue_robot.name))

>>> I am a red robot named Bob.
>>> I am a blue robot named Bob.

just another way to think about it.


回答 12

关键字参数通常在Python中简化为kwargs。在计算机编程中

关键字参数指的是一种计算机语言对函数调用的支持,该函数清楚地说明了函数调用中每个参数的名称。

参数名称** kwargs之前的两个星号的用法是,当一个人不知道将多少个关键字参数传递给该函数时。在这种情况下,它称为任意/通配符关键字参数。

Django的接收器函数就是一个例子。

def my_callback(sender, **kwargs):
    print("Request finished!")

请注意,该函数带有一个sender参数以及通配符关键字参数(** kwargs);所有信号处理程序都必须采用这些参数。所有信号都发送关键字参数,并且可以随时更改这些关键字参数。在request_finished的情况下,它被记录为不发送任何参数,这意味着我们可能很想将信号处理编写为my_callback(sender)。

这将是错误的-实际上,如果您这样做,Django将抛出错误。这是因为在任何时候都可以将参数添加到信号中,并且您的接收器必须能够处理这些新参数。

请注意,它不必称为kwargs,但必须具有**(名称kwargs是一个约定)。

Keyword Arguments are often shortened to kwargs in Python. In computer programming,

keyword arguments refer to a computer language’s support for function calls that clearly state the name of each parameter within the function call.

The usage of the two asterisk before the parameter name, **kwargs, is when one doesn’t know how many keyword arguments will be passed into the function. When that’s the case, it’s called Arbitrary / Wildcard Keyword Arguments.

One example of this is Django’s receiver functions.

def my_callback(sender, **kwargs):
    print("Request finished!")

Notice that the function takes a sender argument, along with wildcard keyword arguments (**kwargs); all signal handlers must take these arguments. All signals send keyword arguments, and may change those keyword arguments at any time. In the case of request_finished, it’s documented as sending no arguments, which means we might be tempted to write our signal handling as my_callback(sender).

This would be wrong – in fact, Django will throw an error if you do so. That’s because at any point arguments could get added to the signal and your receiver must be able to handle those new arguments.

Note that it doesn’t have to be called kwargs, but it needs to have ** (the name kwargs is a convention).


__slots__的用法?

问题:__slots__的用法?

__slots__Python 的目的是什么-尤其是关于何时要使用它,何时不使用它?

What is the purpose of __slots__ in Python — especially with respect to when I would want to use it, and when not?


回答 0

在Python中,目的是__slots__什么?在什么情况下应该避免这种情况?

TLDR:

特殊属性__slots__允许您显式说明您希望对象实例具有哪些实例属性,并具有预期的结果:

  1. 更快的属性访问。
  2. 节省内存空间

节省的空间来自

  1. 将值引用存储在插槽中而不是中__dict__
  2. 如果父类拒绝它们并且您声明,则拒绝__dict____weakref__创建__slots__

快速警告

请注意,您只应在继承树中一次声明一个特定的插槽。例如:

class Base:
    __slots__ = 'foo', 'bar'

class Right(Base):
    __slots__ = 'baz', 

class Wrong(Base):
    __slots__ = 'foo', 'bar', 'baz'        # redundant foo and bar

遇到错误时,Python不会反对(它应该会),否则问题可能不会显现出来,但是您的对象将比原先占用更多的空间。Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

这是因为基准站的插槽描述符的插槽与错误的插槽分开。通常不应该这样,但是可以:

>>> w = Wrong()
>>> w.foo = 'foo'
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
'foo'

最大的警告是多重继承-无法将多个“具有非空插槽的父类”组合在一起。

为适应此限制,请遵循最佳做法:排除其父母(或他们的具体类)将共同继承的除一个或所有父代之外的所有抽象-给这些抽象留出空位(就像在父类中的抽象基类一样)标准库)。

有关示例,请参见下面有关多重继承的部分。

要求:

  • 要使名为in的属性__slots__实际上存储在插槽中而不是存储在插槽中__dict__,则类必须从继承object

  • 为防止创建__dict__,您必须继承,object并且继承中的所有类都必须声明,__slots__并且它们都不能具有'__dict__'条目。

如果您想继续阅读,有很多细节。

为什么使用__slots__:更快的属性访问。

Python的创建者Guido van Rossum 指出,他实际上是__slots__为了更快地访问属性而创建的。

证明可观的显着更快访问是微不足道的:

import timeit

class Foo(object): __slots__ = 'foo',

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = 'foo'
        obj.foo
        del obj.foo
    return get_set_delete

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

在Ubuntu 3.5上的Python 3.5中,插槽式访问的速度几乎快了30%。

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

在Windows上的Python 2中,我测得的速度要快15%。

为何使用__slots__:内存节省

的另一个目的__slots__是减少每个对象实例占用的内存空间。

我自己对文档的贡献清楚地说明了背后的原因

通过使用节省的空间__dict__可能很大。

SQLAlchemy将大量内存节省归因__slots__

为了验证这一点,请在Ubuntu Linux上使用Python 2.7的Anaconda发行版(带有guppy.hpy(又是堆的)和)sys.getsizeof,不__slots__声明且没有其他声明的类实例的大小为64字节。但这包括__dict__。再次感谢Python的惰性求值,在__dict__引用它之前,显然不会调用,但是没有数据的类通常是无用的。当存在时,该__dict__属性另外至少为280个字节。

相反,__slots__声明为()(无数据)的类实例只有16个字节,插槽中有一项的总字节数为56个,插槽中有两项的总数为64个字节。

对于64位Python,我说明了dict在3.6中增长的每个点(0、1和2属性除外)的for __slots____dict__(未定义插槽)在Python 2.7和3.6中以字节为单位的内存消耗:

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272   16         56 + 112 | if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

因此,尽管Python 3中的指令较小,但我们仍然可以看到__slots__实例可以很好地扩展以节省内存,这是您要使用的主要原因__slots__

只是为了完整起见,请注意,在类的命名空间中,每个插槽的一次性成本为Python 2中64字节,而在Python 3中为72字节,因为插槽使用数据描述符(如属性)称为“成员”。

>>> Foo.foo
<member 'foo' of 'Foo' objects>
>>> type(Foo.foo)
<class 'member_descriptor'>
>>> getsizeof(Foo.foo)
72

演示__slots__

要拒绝创建__dict__,必须子类化object

class Base(object): 
    __slots__ = ()

现在:

>>> b = Base()
>>> b.a = 'a'
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = 'a'
AttributeError: 'Base' object has no attribute 'a'

或子类化另一个定义的类 __slots__

class Child(Base):
    __slots__ = ('a',)

现在:

c = Child()
c.a = 'a'

但:

>>> c.b = 'b'
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = 'b'
AttributeError: 'Child' object has no attribute 'b'

要在对有槽位的__dict__对象进行子类化时允许创建,只需添加'__dict__'__slots__(请注意,槽位是有序的,并且您不应重复父类中已经存在的槽位):

class SlottedWithDict(Child): 
    __slots__ = ('__dict__', 'b')

swd = SlottedWithDict()
swd.a = 'a'
swd.b = 'b'
swd.c = 'c'

>>> swd.__dict__
{'c': 'c'}

或者甚至不需要__slots__在子类中声明,并且仍将使用父级的插槽,但不限制创建__dict__

class NoSlots(Child): pass
ns = NoSlots()
ns.a = 'a'
ns.b = 'b'

和:

>>> ns.__dict__
{'b': 'b'}

但是,__slots__可能会导致多重继承问题:

class BaseA(object): 
    __slots__ = ('a',)

class BaseB(object): 
    __slots__ = ('b',)

由于从具有两个非空插槽的父母创建子类失败:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

如果遇到此问题,则可以将其__slots__从父级中移除,或者如果您可以控制父级,则给他们留空的空位,或重构为抽象:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ('a',)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ('b',)

class Child(AbstractA, AbstractB): 
    __slots__ = ('a', 'b')

c = Child() # no problem!

添加'__dict__'__slots__以获得动态分配:

class Foo(object):
    __slots__ = 'bar', 'baz', '__dict__'

现在:

>>> foo = Foo()
>>> foo.boink = 'boink'

因此,'__dict__'在具有插槽的情况下,我们将失去一些尺寸上的好处,因为它具有动态分配的优势,并且仍具有我们确实期望的名称的插槽。

当您从未插入槽的对象继承时,使用时会得到相同的语义__slots____slots__指向插入槽的值的名称,而其他所有值都放在实例的中__dict__

避免这样做__slots__是因为您不希望出现这种情况,因为它实际上并不是一个很好的理由- 如果需要,只需添加"__dict__"您的属性即可__slots__

如果需要该功能,可以类似地将__weakref____slots__显式添加。

子类化namedtuple时,设置为空tuple:

内置namedtuple使不可变的实例非常轻巧(本质上是元组的大小),但是要获得好处,如果您将它们子类化,则需要自己做:

from collections import namedtuple
class MyNT(namedtuple('MyNT', 'bar baz')):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

用法:

>>> nt = MyNT('bar', 'baz')
>>> nt.bar
'bar'
>>> nt.baz
'baz'

尝试分配意外属性会引发,AttributeError因为我们已阻止创建__dict__

>>> nt.quux = 'quux'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyNT' object has no attribute 'quux'

可以__dict__通过设置off 允许创建__slots__ = (),但是不能__slots__对元组的子类型使用非空。

最大的警告:多重继承

即使多个父级的非空插槽相同,也不能一起使用:

class Foo(object): 
    __slots__ = 'foo', 'bar'
class Bar(object):
    __slots__ = 'foo', 'bar' # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

使用空__slots__父似乎提供了最大的灵活性,允许孩子选择阻止或允许(通过增加'__dict__'获得动态分配,见上面部分)创建的__dict__

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ('foo', 'bar')
b = Baz()
b.foo, b.bar = 'foo', 'bar'

你不具备有槽-因此,如果您添加它们,后来删除它们,它不应引起任何问题。

走出放在这里肢体:如果您撰写的混入或使用抽象基类,它不打算被实例化,空__slots__在那些父母似乎是在灵活性作为子类方面最好的一段路要走。

为了演示,首先,让我们创建一个我们希望在多重继承下使用的代码的类。

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f'{type(self).__name__}({repr(self.a)}, {repr(self.b)})'

我们可以通过继承并声明预期的位置来直接使用以上内容:

class Foo(AbstractBase):
    __slots__ = 'a', 'b'

但是我们对此并不在意,这是微不足道的单一继承,我们需要另一个我们也可能继承的类,也许带有嘈杂的属性:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print('getting c!')
        return self._c
    @c.setter
    def c(self, arg):
        print('setting c!')
        self._c = arg

现在,如果两个基地都有非空插槽,我们将无法进行以下操作。(实际上,如果我们愿意,我们可以给AbstractBase非空槽a和b,并将它们排除在下面的声明之外-将它们留在里面是错误的):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = 'a b _c'.split()

现在,我们具有通过多重继承的功能,并且仍然可以拒绝__dict____weakref__实例化:

>>> c = Concretion('a', 'b')
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion('a', 'b')
>>> c.d = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Concretion' object has no attribute 'd'

其他避免插槽的情况:

  • __class__除非插槽布局相同,否则要在不具有它们的另一个类(并且不能添加它们)上执行分配时,请避免使用它们。(我对了解谁在做什么以及为什么这样做很感兴趣。)
  • 如果您想将诸如long,tuple或str之类的可变长度内建子类化,并想为其添加属性,请避免使用它们。
  • 如果您坚持通过实例变量的类属性提供默认值,请避免使用它们。

您也许可以从__slots__ 文档的其余部分(最新的3.7 dev文档)中找出更多的警告,我最近做出了很大的贡献。

对其他答案的批评

当前的最佳答案引用了过时的信息,而且非常容易波动,并且在某些重要方面未达到要求。

不要“仅__slots__在实例化许多对象时使用”

我引用:

__slots__如果要实例化大量(数百个,数千个)同一类的对象,则需要使用。”

例如,来自collections模块的抽象基类未实例化,但__slots__已为其声明。

为什么?

如果用户希望拒绝__dict____weakref__创建,则这些内容在父类中必须不可用。

__slots__ 创建接口或混入时有助于重用。

的确,许多Python用户并不是为可重用性而编写的,但是当您这样做时,可以选择拒绝不必要的空间使用是很有价值的。

__slots__ 不会破坏酸洗

腌制开槽的物体时,您可能会发现它带有误导性的抱怨TypeError

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

这实际上是不正确的。此消息来自最早的协议,这是默认协议。您可以使用-1参数选择最新的协议。在Python 2.7中为2(在2.3中引入),在3.6中为4

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

在Python 2.7中:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

在Python 3.6中

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

所以我会牢记这一点,因为这是一个已解决的问题。

评论(至2016年10月2日)被接受

第一段是一半简短的解释,一半是预测的。这是真正回答问题的唯一部分

正确的用法__slots__是节省对象空间。静态结构不允许创建后添加任何内容,而不是具有允许随时向对象添加属性的动态命令。这样可以为使用插槽的每个对象节省一个指令的开销

后半部分是一厢情愿的想法,并且超出了预期:

尽管有时这是一个有用的优化,但是如果Python解释器足够动态,则仅在实际向对象添加内容时才需要dict,就完全没有必要了。

Python实际上做了类似的事情,只在__dict__访问时创建,但是创建许多没有数据的对象是相当荒谬的。

第二段过分简化,错过了避免的实际原因__slots__。以下不是避免使用插槽的真正原因(出于实际原因,请参阅上面我的回答的其余部分。):

它们以一种可被控制怪胎和静态类型临时表滥用的方式更改具有插槽的对象的行为。

然后,它继续讨论了使用Python实现该有害目标的其他方法,而不是讨论与之相关的任何方法__slots__

第三段是更多的如意算盘。答案者甚至根本没有写过这些杂乱的内容,而是为该网站的批评者弹药。

内存使用证据

创建一些普通对象和带槽对象:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

实例化其中的一百万:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

检查guppy.hpy().heap()

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

访问常规对象及其对象,__dict__然后再次检查:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

这与Python历史一致,来自Python 2.2中的Unifying类型和类。

如果您将内置类型作为子类,则多余的空间会自动添加到实例中以容纳__dict____weakrefs__。(__dict__尽管直到使用完,它才会被初始化,因此您不必担心空字典为您创建的每个实例所占用的空间。)如果不需要此多余的空间,则可以在短语中添加“ __slots__ = []”你的班。

In Python, what is the purpose of __slots__ and what are the cases one should avoid this?

TLDR:

The special attribute __slots__ allows you to explicitly state which instance attributes you expect your object instances to have, with the expected results:

  1. faster attribute access.
  2. space savings in memory.

The space savings is from

  1. Storing value references in slots instead of __dict__.
  2. Denying __dict__ and __weakref__ creation if parent classes deny them and you declare __slots__.

Quick Caveats

Small caveat, you should only declare a particular slot one time in an inheritance tree. For example:

class Base:
    __slots__ = 'foo', 'bar'

class Right(Base):
    __slots__ = 'baz', 

class Wrong(Base):
    __slots__ = 'foo', 'bar', 'baz'        # redundant foo and bar

Python doesn’t object when you get this wrong (it probably should), problems might not otherwise manifest, but your objects will take up more space than they otherwise should. Python 3.8:

>>> from sys import getsizeof
>>> getsizeof(Right()), getsizeof(Wrong())
(56, 72)

This is because the Base’s slot descriptor has a slot separate from the Wrong’s. This shouldn’t usually come up, but it could:

>>> w = Wrong()
>>> w.foo = 'foo'
>>> Base.foo.__get__(w)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: foo
>>> Wrong.foo.__get__(w)
'foo'

The biggest caveat is for multiple inheritance – multiple “parent classes with nonempty slots” cannot be combined.

To accommodate this restriction, follow best practices: Factor out all but one or all parents’ abstraction which their concrete class respectively and your new concrete class collectively will inherit from – giving the abstraction(s) empty slots (just like abstract base classes in the standard library).

See section on multiple inheritance below for an example.

Requirements:

  • To have attributes named in __slots__ to actually be stored in slots instead of a __dict__, a class must inherit from object.

  • To prevent the creation of a __dict__, you must inherit from object and all classes in the inheritance must declare __slots__ and none of them can have a '__dict__' entry.

There are a lot of details if you wish to keep reading.

Why use __slots__: Faster attribute access.

The creator of Python, Guido van Rossum, states that he actually created __slots__ for faster attribute access.

It is trivial to demonstrate measurably significant faster access:

import timeit

class Foo(object): __slots__ = 'foo',

class Bar(object): pass

slotted = Foo()
not_slotted = Bar()

def get_set_delete_fn(obj):
    def get_set_delete():
        obj.foo = 'foo'
        obj.foo
        del obj.foo
    return get_set_delete

and

>>> min(timeit.repeat(get_set_delete_fn(slotted)))
0.2846834529991611
>>> min(timeit.repeat(get_set_delete_fn(not_slotted)))
0.3664822799983085

The slotted access is almost 30% faster in Python 3.5 on Ubuntu.

>>> 0.3664822799983085 / 0.2846834529991611
1.2873325658284342

In Python 2 on Windows I have measured it about 15% faster.

Why use __slots__: Memory Savings

Another purpose of __slots__ is to reduce the space in memory that each object instance takes up.

My own contribution to the documentation clearly states the reasons behind this:

The space saved over using __dict__ can be significant.

SQLAlchemy attributes a lot of memory savings to __slots__.

To verify this, using the Anaconda distribution of Python 2.7 on Ubuntu Linux, with guppy.hpy (aka heapy) and sys.getsizeof, the size of a class instance without __slots__ declared, and nothing else, is 64 bytes. That does not include the __dict__. Thank you Python for lazy evaluation again, the __dict__ is apparently not called into existence until it is referenced, but classes without data are usually useless. When called into existence, the __dict__ attribute is a minimum of 280 bytes additionally.

In contrast, a class instance with __slots__ declared to be () (no data) is only 16 bytes, and 56 total bytes with one item in slots, 64 with two.

For 64 bit Python, I illustrate the memory consumption in bytes in Python 2.7 and 3.6, for __slots__ and __dict__ (no slots defined) for each point where the dict grows in 3.6 (except for 0, 1, and 2 attributes):

       Python 2.7             Python 3.6
attrs  __slots__  __dict__*   __slots__  __dict__* | *(no slots defined)
none   16         56 + 272†   16         56 + 112† | †if __dict__ referenced
one    48         56 + 272    48         56 + 112
two    56         56 + 272    56         56 + 112
six    88         56 + 1040   88         56 + 152
11     128        56 + 1040   128        56 + 240
22     216        56 + 3344   216        56 + 408     
43     384        56 + 3344   384        56 + 752

So, in spite of smaller dicts in Python 3, we see how nicely __slots__ scale for instances to save us memory, and that is a major reason you would want to use __slots__.

Just for completeness of my notes, note that there is a one-time cost per slot in the class’s namespace of 64 bytes in Python 2, and 72 bytes in Python 3, because slots use data descriptors like properties, called “members”.

>>> Foo.foo
<member 'foo' of 'Foo' objects>
>>> type(Foo.foo)
<class 'member_descriptor'>
>>> getsizeof(Foo.foo)
72

Demonstration of __slots__:

To deny the creation of a __dict__, you must subclass object:

class Base(object): 
    __slots__ = ()

now:

>>> b = Base()
>>> b.a = 'a'
Traceback (most recent call last):
  File "<pyshell#38>", line 1, in <module>
    b.a = 'a'
AttributeError: 'Base' object has no attribute 'a'

Or subclass another class that defines __slots__

class Child(Base):
    __slots__ = ('a',)

and now:

c = Child()
c.a = 'a'

but:

>>> c.b = 'b'
Traceback (most recent call last):
  File "<pyshell#42>", line 1, in <module>
    c.b = 'b'
AttributeError: 'Child' object has no attribute 'b'

To allow __dict__ creation while subclassing slotted objects, just add '__dict__' to the __slots__ (note that slots are ordered, and you shouldn’t repeat slots that are already in parent classes):

class SlottedWithDict(Child): 
    __slots__ = ('__dict__', 'b')

swd = SlottedWithDict()
swd.a = 'a'
swd.b = 'b'
swd.c = 'c'

and

>>> swd.__dict__
{'c': 'c'}

Or you don’t even need to declare __slots__ in your subclass, and you will still use slots from the parents, but not restrict the creation of a __dict__:

class NoSlots(Child): pass
ns = NoSlots()
ns.a = 'a'
ns.b = 'b'

And:

>>> ns.__dict__
{'b': 'b'}

However, __slots__ may cause problems for multiple inheritance:

class BaseA(object): 
    __slots__ = ('a',)

class BaseB(object): 
    __slots__ = ('b',)

Because creating a child class from parents with both non-empty slots fails:

>>> class Child(BaseA, BaseB): __slots__ = ()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    class Child(BaseA, BaseB): __slots__ = ()
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

If you run into this problem, You could just remove __slots__ from the parents, or if you have control of the parents, give them empty slots, or refactor to abstractions:

from abc import ABC

class AbstractA(ABC):
    __slots__ = ()

class BaseA(AbstractA): 
    __slots__ = ('a',)

class AbstractB(ABC):
    __slots__ = ()

class BaseB(AbstractB): 
    __slots__ = ('b',)

class Child(AbstractA, AbstractB): 
    __slots__ = ('a', 'b')

c = Child() # no problem!

Add '__dict__' to __slots__ to get dynamic assignment:

class Foo(object):
    __slots__ = 'bar', 'baz', '__dict__'

and now:

>>> foo = Foo()
>>> foo.boink = 'boink'

So with '__dict__' in slots we lose some of the size benefits with the upside of having dynamic assignment and still having slots for the names we do expect.

When you inherit from an object that isn’t slotted, you get the same sort of semantics when you use __slots__ – names that are in __slots__ point to slotted values, while any other values are put in the instance’s __dict__.

Avoiding __slots__ because you want to be able to add attributes on the fly is actually not a good reason – just add "__dict__" to your __slots__ if this is required.

You can similarly add __weakref__ to __slots__ explicitly if you need that feature.

Set to empty tuple when subclassing a namedtuple:

The namedtuple builtin make immutable instances that are very lightweight (essentially, the size of tuples) but to get the benefits, you need to do it yourself if you subclass them:

from collections import namedtuple
class MyNT(namedtuple('MyNT', 'bar baz')):
    """MyNT is an immutable and lightweight object"""
    __slots__ = ()

usage:

>>> nt = MyNT('bar', 'baz')
>>> nt.bar
'bar'
>>> nt.baz
'baz'

And trying to assign an unexpected attribute raises an AttributeError because we have prevented the creation of __dict__:

>>> nt.quux = 'quux'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'MyNT' object has no attribute 'quux'

You can allow __dict__ creation by leaving off __slots__ = (), but you can’t use non-empty __slots__ with subtypes of tuple.

Biggest Caveat: Multiple inheritance

Even when non-empty slots are the same for multiple parents, they cannot be used together:

class Foo(object): 
    __slots__ = 'foo', 'bar'
class Bar(object):
    __slots__ = 'foo', 'bar' # alas, would work if empty, i.e. ()

>>> class Baz(Foo, Bar): pass
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: Error when calling the metaclass bases
    multiple bases have instance lay-out conflict

Using an empty __slots__ in the parent seems to provide the most flexibility, allowing the child to choose to prevent or allow (by adding '__dict__' to get dynamic assignment, see section above) the creation of a __dict__:

class Foo(object): __slots__ = ()
class Bar(object): __slots__ = ()
class Baz(Foo, Bar): __slots__ = ('foo', 'bar')
b = Baz()
b.foo, b.bar = 'foo', 'bar'

You don’t have to have slots – so if you add them, and remove them later, it shouldn’t cause any problems.

Going out on a limb here: If you’re composing mixins or using abstract base classes, which aren’t intended to be instantiated, an empty __slots__ in those parents seems to be the best way to go in terms of flexibility for subclassers.

To demonstrate, first, let’s create a class with code we’d like to use under multiple inheritance

class AbstractBase:
    __slots__ = ()
    def __init__(self, a, b):
        self.a = a
        self.b = b
    def __repr__(self):
        return f'{type(self).__name__}({repr(self.a)}, {repr(self.b)})'

We could use the above directly by inheriting and declaring the expected slots:

class Foo(AbstractBase):
    __slots__ = 'a', 'b'

But we don’t care about that, that’s trivial single inheritance, we need another class we might also inherit from, maybe with a noisy attribute:

class AbstractBaseC:
    __slots__ = ()
    @property
    def c(self):
        print('getting c!')
        return self._c
    @c.setter
    def c(self, arg):
        print('setting c!')
        self._c = arg

Now if both bases had nonempty slots, we couldn’t do the below. (In fact, if we wanted, we could have given AbstractBase nonempty slots a and b, and left them out of the below declaration – leaving them in would be wrong):

class Concretion(AbstractBase, AbstractBaseC):
    __slots__ = 'a b _c'.split()

And now we have functionality from both via multiple inheritance, and can still deny __dict__ and __weakref__ instantiation:

>>> c = Concretion('a', 'b')
>>> c.c = c
setting c!
>>> c.c
getting c!
Concretion('a', 'b')
>>> c.d = 'd'
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Concretion' object has no attribute 'd'

Other cases to avoid slots:

  • Avoid them when you want to perform __class__ assignment with another class that doesn’t have them (and you can’t add them) unless the slot layouts are identical. (I am very interested in learning who is doing this and why.)
  • Avoid them if you want to subclass variable length builtins like long, tuple, or str, and you want to add attributes to them.
  • Avoid them if you insist on providing default values via class attributes for instance variables.

You may be able to tease out further caveats from the rest of the __slots__ documentation (the 3.7 dev docs are the most current), which I have made significant recent contributions to.

Critiques of other answers

The current top answers cite outdated information and are quite hand-wavy and miss the mark in some important ways.

Do not “only use __slots__ when instantiating lots of objects”

I quote:

“You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class.”

Abstract Base Classes, for example, from the collections module, are not instantiated, yet __slots__ are declared for them.

Why?

If a user wishes to deny __dict__ or __weakref__ creation, those things must not be available in the parent classes.

__slots__ contributes to reusability when creating interfaces or mixins.

It is true that many Python users aren’t writing for reusability, but when you are, having the option to deny unnecessary space usage is valuable.

__slots__ doesn’t break pickling

When pickling a slotted object, you may find it complains with a misleading TypeError:

>>> pickle.loads(pickle.dumps(f))
TypeError: a class that defines __slots__ without defining __getstate__ cannot be pickled

This is actually incorrect. This message comes from the oldest protocol, which is the default. You can select the latest protocol with the -1 argument. In Python 2.7 this would be 2 (which was introduced in 2.3), and in 3.6 it is 4.

>>> pickle.loads(pickle.dumps(f, -1))
<__main__.Foo object at 0x1129C770>

in Python 2.7:

>>> pickle.loads(pickle.dumps(f, 2))
<__main__.Foo object at 0x1129C770>

in Python 3.6

>>> pickle.loads(pickle.dumps(f, 4))
<__main__.Foo object at 0x1129C770>

So I would keep this in mind, as it is a solved problem.

Critique of the (until Oct 2, 2016) accepted answer

The first paragraph is half short explanation, half predictive. Here’s the only part that actually answers the question

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. This saves the overhead of one dict for every object that uses slots

The second half is wishful thinking, and off the mark:

While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Python actually does something similar to this, only creating the __dict__ when it is accessed, but creating lots of objects with no data is fairly ridiculous.

The second paragraph oversimplifies and misses actual reasons to avoid __slots__. The below is not a real reason to avoid slots (for actual reasons, see the rest of my answer above.):

They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies.

It then goes on to discuss other ways of accomplishing that perverse goal with Python, not discussing anything to do with __slots__.

The third paragraph is more wishful thinking. Together it is mostly off-the-mark content that the answerer didn’t even author and contributes to ammunition for critics of the site.

Memory usage evidence

Create some normal objects and slotted objects:

>>> class Foo(object): pass
>>> class Bar(object): __slots__ = ()

Instantiate a million of them:

>>> foos = [Foo() for f in xrange(1000000)]
>>> bars = [Bar() for b in xrange(1000000)]

Inspect with guppy.hpy().heap():

>>> guppy.hpy().heap()
Partition of a set of 2028259 objects. Total size = 99763360 bytes.
 Index  Count   %     Size   % Cumulative  % Kind (class / dict of class)
     0 1000000  49 64000000  64  64000000  64 __main__.Foo
     1     169   0 16281480  16  80281480  80 list
     2 1000000  49 16000000  16  96281480  97 __main__.Bar
     3   12284   1   987472   1  97268952  97 str
...

Access the regular objects and their __dict__ and inspect again:

>>> for f in foos:
...     f.__dict__
>>> guppy.hpy().heap()
Partition of a set of 3028258 objects. Total size = 379763480 bytes.
 Index  Count   %      Size    % Cumulative  % Kind (class / dict of class)
     0 1000000  33 280000000  74 280000000  74 dict of __main__.Foo
     1 1000000  33  64000000  17 344000000  91 __main__.Foo
     2     169   0  16281480   4 360281480  95 list
     3 1000000  33  16000000   4 376281480  99 __main__.Bar
     4   12284   0    987472   0 377268952  99 str
...

This is consistent with the history of Python, from Unifying types and classes in Python 2.2

If you subclass a built-in type, extra space is automatically added to the instances to accomodate __dict__ and __weakrefs__. (The __dict__ is not initialized until you use it though, so you shouldn’t worry about the space occupied by an empty dictionary for each instance you create.) If you don’t need this extra space, you can add the phrase “__slots__ = []” to your class.


回答 1

引用雅各布·哈伦Jacob Hallen)的话

正确的用法__slots__是节省对象空间。静态结构不允许创建后添加任何内容,而不是具有允许随时向对象添加属性的动态命令。[这种使用__slots__消除了每个对象一个字典的开销。]尽管这有时是有用的优化,但是如果Python解释器足够动态,以至仅在实际添加了dict时才需要该字典,则完全没有必要。宾语。

不幸的是,插槽有副作用。它们以一种可被控制怪胎和静态类型临时表滥用的方式更改具有插槽的对象的行为。这是不好的,因为控制怪胎应该滥用元类,而静态类型之间应该滥用装饰器,因为在Python中,应该只有一种明显的方法。

使CPython足够智能来处理节省的空间__slots__是一项艰巨的任务,这可能就是为什么它不在P3k更改列表中的原因(至今)。

Quoting Jacob Hallen:

The proper use of __slots__ is to save space in objects. Instead of having a dynamic dict that allows adding attributes to objects at anytime, there is a static structure which does not allow additions after creation. [This use of __slots__ eliminates the overhead of one dict for every object.] While this is sometimes a useful optimization, it would be completely unnecessary if the Python interpreter was dynamic enough so that it would only require the dict when there actually were additions to the object.

Unfortunately there is a side effect to slots. They change the behavior of the objects that have slots in a way that can be abused by control freaks and static typing weenies. This is bad, because the control freaks should be abusing the metaclasses and the static typing weenies should be abusing decorators, since in Python, there should be only one obvious way of doing something.

Making CPython smart enough to handle saving space without __slots__ is a major undertaking, which is probably why it is not on the list of changes for P3k (yet).


回答 2

你会想使用__slots__,如果你要实例化同一个类的对象很多(几百,几千)。__slots__仅作为内存优化工具存在。

不建议将其__slots__用于约束属性创建。

__slots__使用默认的(最旧的)泡菜协议将不能使用酸洗对象。有必要指定一个更高的版本。

python的其他一些自省功能也可能受到不利影响。

You would want to use __slots__ if you are going to instantiate a lot (hundreds, thousands) of objects of the same class. __slots__ only exists as a memory optimization tool.

It’s highly discouraged to use __slots__ for constraining attribute creation.

Pickling objects with __slots__ won’t work with the default (oldest) pickle protocol; it’s necessary to specify a later version.

Some other introspection features of python may also be adversely affected.


回答 3

每个python对象都有一个__dict__属性,该属性是包含所有其他属性的字典。例如,当您键入self.attrpython时实际上正在执行self.__dict__['attr']。您可以想象使用字典存储属性会花费一些额外的空间和时间来访问它。

但是,当您使用 __slots__,为该类创建的任何对象将没有__dict__属性。相反,所有属性访问都直接通过指针进行。

因此,如果要使用C样式结构而不是完整的类,则可以使用它__slots__来压缩对象的大小并减少属性访问时间。一个很好的例子是一个包含属性x&y的Point类。如果您有很多要点,可以尝试使用__slots__以节省一些内存。

Each python object has a __dict__ atttribute which is a dictionary containing all other attributes. e.g. when you type self.attr python is actually doing self.__dict__['attr']. As you can imagine using a dictionary to store attribute takes some extra space & time for accessing it.

However, when you use __slots__, any object created for that class won’t have a __dict__ attribute. Instead, all attribute access is done directly via pointers.

So if want a C style structure rather than a full fledged class you can use __slots__ for compacting size of the objects & reducing attribute access time. A good example is a Point class containing attributes x & y. If you are going to have a lot of points, you can try using __slots__ in order to conserve some memory.


回答 4

除了其他答案,这是使用的示例__slots__

>>> class Test(object):   #Must be new-style class!
...  __slots__ = ['x', 'y']
... 
>>> pt = Test()
>>> dir(pt)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', 
 '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__slots__', '__str__', 'x', 'y']
>>> pt.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: x
>>> pt.x = 1
>>> pt.x
1
>>> pt.z = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute 'z'
>>> pt.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__dict__'
>>> pt.__slots__
['x', 'y']

因此,要实现__slots__,只需要多花一行(如果还没有,请将您的类变成新样式的类)。这样,您可以将这些类的内存占用量减少5倍,而在必要时以及必须编写自定义的pickle代码的代价是。

In addition to the other answers, here is an example of using __slots__:

>>> class Test(object):   #Must be new-style class!
...  __slots__ = ['x', 'y']
... 
>>> pt = Test()
>>> dir(pt)
['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', 
 '__init__', '__module__', '__new__', '__reduce__', '__reduce_ex__', 
 '__repr__', '__setattr__', '__slots__', '__str__', 'x', 'y']
>>> pt.x
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: x
>>> pt.x = 1
>>> pt.x
1
>>> pt.z = 2
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute 'z'
>>> pt.__dict__
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'Test' object has no attribute '__dict__'
>>> pt.__slots__
['x', 'y']

So, to implement __slots__, it only takes an extra line (and making your class a new-style class if it isn’t already). This way you can reduce the memory footprint of those classes 5-fold, at the expense of having to write custom pickle code, if and when that becomes necessary.


回答 5

插槽对于库调用非常有用,以消除进行函数调用时的“命名方法分派”。SWIG 文档中提到了这一点。对于想要减少使用插槽的通常称为函数的函数开销的高性能库,速度要快得多。

现在,这可能与OP问题没有直接关系。它与构建扩展有关,而不是与在对象上使用slot语法有关。但这确实有助于完整了解插槽的使用情况以及它们背后的一些原因。

Slots are very useful for library calls to eliminate the “named method dispatch” when making function calls. This is mentioned in the SWIG documentation. For high performance libraries that want to reduce function overhead for commonly called functions using slots is much faster.

Now this may not be directly related to the OPs question. It is related more to building extensions than it does to using the slots syntax on an object. But it does help complete the picture for the usage of slots and some of the reasoning behind them.


回答 6

类实例的属性具有3个属性:实例,属性名称和属性值。

常规属性访问中,实例充当字典,而属性名称充当该字典中查找值的键。

instance(attribute)->值

__slots__访问,属性名称充当字典,实例充当字典中查找值的键。

属性(实例)->值

flyweight模式中,属性的名称充当字典,而值充当该字典中查找实例的键。

attribute(value)->实例

An attribute of a class instance has 3 properties: the instance, the name of the attribute, and the value of the attribute.

In regular attribute access, the instance acts as a dictionary and the name of the attribute acts as the key in that dictionary looking up value.

instance(attribute) –> value

In __slots__ access, the name of the attribute acts as the dictionary and the instance acts as the key in the dictionary looking up value.

attribute(instance) –> value

In flyweight pattern, the name of the attribute acts as the dictionary and the value acts as the key in that dictionary looking up the instance.

attribute(value) –> instance


回答 7

一个非常简单的__slot__属性示例。

问题:无 __slots__

如果__slot__我的Class中没有属性,则可以向对象添加新属性。

class Test:
    pass

obj1=Test()
obj2=Test()

print(obj1.__dict__)  #--> {}
obj1.x=12
print(obj1.__dict__)  # --> {'x': 12}
obj1.y=20
print(obj1.__dict__)  # --> {'x': 12, 'y': 20}

obj2.x=99
print(obj2.__dict__)  # --> {'x': 99}

如果您看上面的示例,可以看到obj1obj2都有自己的xy属性,而python还dict为每个对象创建了一个属性(obj1obj2)。

假设我的类Test是否有成千上万个此类对象?dict为每个对象创建一个附加属性将导致我的代码中大量开销(内存,计算能力等)。

解决方案: __slots__

现在,在下面的示例中,我的类Test包含__slots__属性。现在,我无法向对象添加新属性(attribute除外x),而python不再创建dict属性。这消除了每个对象的开销,如果您有很多对象,开销可能会变得很大。

class Test:
    __slots__=("x")

obj1=Test()
obj2=Test()
obj1.x=12
print(obj1.x)  # --> 12
obj2.x=99
print(obj2.x)  # --> 99

obj1.y=28
print(obj1.y)  # --> AttributeError: 'Test' object has no attribute 'y'

A very simple example of __slot__ attribute.

Problem: Without __slots__

If I don’t have __slot__ attribute in my class, I can add new attributes to my objects.

class Test:
    pass

obj1=Test()
obj2=Test()

print(obj1.__dict__)  #--> {}
obj1.x=12
print(obj1.__dict__)  # --> {'x': 12}
obj1.y=20
print(obj1.__dict__)  # --> {'x': 12, 'y': 20}

obj2.x=99
print(obj2.__dict__)  # --> {'x': 99}

If you look at example above, you can see that obj1 and obj2 have their own x and y attributes and python has also created a dict attribute for each object (obj1 and obj2).

Suppose if my class Test has thousands of such objects? Creating an additional attribute dict for each object will cause lot of overhead (memory, computing power etc.) in my code.

Solution: With __slots__

Now in the following example my class Test contains __slots__ attribute. Now I can’t add new attributes to my objects (except attribute x) and python doesn’t create a dict attribute anymore. This eliminates overhead for each object, which can become significant if you have many objects.

class Test:
    __slots__=("x")

obj1=Test()
obj2=Test()
obj1.x=12
print(obj1.x)  # --> 12
obj2.x=99
print(obj2.x)  # --> 99

obj1.y=28
print(obj1.y)  # --> AttributeError: 'Test' object has no attribute 'y'

回答 8

另一个晦涩的用法__slots__是从ProxyTypes包(以前是PEAK项目的一部分)中向对象代理添加属性。它ObjectWrapper允许您代理另一个对象,但拦截与代理对象的所有交互。它不是很常用(并且没有Python 3支持),但是我们已经使用它来实现基于龙卷风的异步实现周围的线程安全的阻塞包装,该龙卷风使用线程安全来反弹通过ioloop对代理对象的所有访问concurrent.Future对象以同步并返回结果。

默认情况下,对代理对象的任何属性访问都将为您提供代理对象的结果。如果需要在代理对象上添加属性,__slots__则可以使用。

from peak.util.proxies import ObjectWrapper

class Original(object):
    def __init__(self):
        self.name = 'The Original'

class ProxyOriginal(ObjectWrapper):

    __slots__ = ['proxy_name']

    def __init__(self, subject, proxy_name):
        # proxy_info attributed added directly to the
        # Original instance, not the ProxyOriginal instance
        self.proxy_info = 'You are proxied by {}'.format(proxy_name)

        # proxy_name added to ProxyOriginal instance, since it is
        # defined in __slots__
        self.proxy_name = proxy_name

        super(ProxyOriginal, self).__init__(subject)

if __name__ == "__main__":
    original = Original()
    proxy = ProxyOriginal(original, 'Proxy Overlord')

    # Both statements print "The Original"
    print "original.name: ", original.name
    print "proxy.name: ", proxy.name

    # Both statements below print 
    # "You are proxied by Proxy Overlord", since the ProxyOriginal
    # __init__ sets it to the original object 
    print "original.proxy_info: ", original.proxy_info
    print "proxy.proxy_info: ", proxy.proxy_info

    # prints "Proxy Overlord"
    print "proxy.proxy_name: ", proxy.proxy_name
    # Raises AttributeError since proxy_name is only set on 
    # the proxy object
    print "original.proxy_name: ", proxy.proxy_name

Another somewhat obscure use of __slots__ is to add attributes to an object proxy from the ProxyTypes package, formerly part of the PEAK project. Its ObjectWrapper allows you to proxy another object, but intercept all interactions with the proxied object. It is not very commonly used (and no Python 3 support), but we have used it to implement a thread-safe blocking wrapper around an async implementation based on tornado that bounces all access to the proxied object through the ioloop, using thread-safe concurrent.Future objects to synchronise and return results.

By default any attribute access to the proxy object will give you the result from the proxied object. If you need to add an attribute on the proxy object, __slots__ can be used.

from peak.util.proxies import ObjectWrapper

class Original(object):
    def __init__(self):
        self.name = 'The Original'

class ProxyOriginal(ObjectWrapper):

    __slots__ = ['proxy_name']

    def __init__(self, subject, proxy_name):
        # proxy_info attributed added directly to the
        # Original instance, not the ProxyOriginal instance
        self.proxy_info = 'You are proxied by {}'.format(proxy_name)

        # proxy_name added to ProxyOriginal instance, since it is
        # defined in __slots__
        self.proxy_name = proxy_name

        super(ProxyOriginal, self).__init__(subject)

if __name__ == "__main__":
    original = Original()
    proxy = ProxyOriginal(original, 'Proxy Overlord')

    # Both statements print "The Original"
    print "original.name: ", original.name
    print "proxy.name: ", proxy.name

    # Both statements below print 
    # "You are proxied by Proxy Overlord", since the ProxyOriginal
    # __init__ sets it to the original object 
    print "original.proxy_info: ", original.proxy_info
    print "proxy.proxy_info: ", proxy.proxy_info

    # prints "Proxy Overlord"
    print "proxy.proxy_name: ", proxy.proxy_name
    # Raises AttributeError since proxy_name is only set on 
    # the proxy object
    print "original.proxy_name: ", proxy.proxy_name

回答 9

您基本上没有用__slots__

在您认为自己可能需要的时间里__slots__,您实际上想使用轻量级轻量级设计模式。在这些情况下,您不再希望使用纯Python对象。相反,您希望在数组,结构或numpy数组周围使用类似Python对象的包装器。

class Flyweight(object):

    def get(self, theData, index):
        return theData[index]

    def set(self, theData, index, value):
        theData[index]= value

类似于类的包装器没有属性-它仅提供对基础数据起作用的方法。这些方法可以简化为类方法。实际上,可以将其简化为仅对底层数据数组进行操作的函数。

You have — essentially — no use for __slots__.

For the time when you think you might need __slots__, you actually want to use Lightweight or Flyweight design patterns. These are cases when you no longer want to use purely Python objects. Instead, you want a Python object-like wrapper around an array, struct, or numpy array.

class Flyweight(object):

    def get(self, theData, index):
        return theData[index]

    def set(self, theData, index, value):
        theData[index]= value

The class-like wrapper has no attributes — it just provides methods that act on the underlying data. The methods can be reduced to class methods. Indeed, it could be reduced to just functions operating on the underlying array of data.


回答 10

最初的问题是关于一般用例,而不仅仅是内存。因此,在这里应该提到的是,在实例化大量对象时,您也会获得更好的性能 -有趣的是,例如,将大型文档解析为对象或从数据库中解析时。

这是使用插槽和不使用插槽创建具有一百万个条目的对象树的比较。作为参考,在对树使用普通字典时的性能(在OSX上为Py2.7.10):

********** RUN 1 **********
1.96036410332 <class 'css_tree_select.element.Element'>
3.02922606468 <class 'css_tree_select.element.ElementNoSlots'>
2.90828204155 dict
********** RUN 2 **********
1.77050495148 <class 'css_tree_select.element.Element'>
3.10655999184 <class 'css_tree_select.element.ElementNoSlots'>
2.84120798111 dict
********** RUN 3 **********
1.84069895744 <class 'css_tree_select.element.Element'>
3.21540498734 <class 'css_tree_select.element.ElementNoSlots'>
2.59615707397 dict
********** RUN 4 **********
1.75041103363 <class 'css_tree_select.element.Element'>
3.17366290092 <class 'css_tree_select.element.ElementNoSlots'>
2.70941114426 dict

测试类(ident,来自插槽的appart):

class Element(object):
    __slots__ = ['_typ', 'id', 'parent', 'childs']
    def __init__(self, typ, id, parent=None):
        self._typ = typ
        self.id = id
        self.childs = []
        if parent:
            self.parent = parent
            parent.childs.append(self)

class ElementNoSlots(object): (same, w/o slots)

测试代码,详细模式:

na, nb, nc = 100, 100, 100
for i in (1, 2, 3, 4):
    print '*' * 10, 'RUN', i, '*' * 10
    # tree with slot and no slot:
    for cls in Element, ElementNoSlots:
        t1 = time.time()
        root = cls('root', 'root')
        for i in xrange(na):
            ela = cls(typ='a', id=i, parent=root)
            for j in xrange(nb):
                elb = cls(typ='b', id=(i, j), parent=ela)
                for k in xrange(nc):
                    elc = cls(typ='c', id=(i, j, k), parent=elb)
        to =  time.time() - t1
        print to, cls
        del root

    # ref: tree with dicts only:
    t1 = time.time()
    droot = {'childs': []}
    for i in xrange(na):
        ela =  {'typ': 'a', id: i, 'childs': []}
        droot['childs'].append(ela)
        for j in xrange(nb):
            elb =  {'typ': 'b', id: (i, j), 'childs': []}
            ela['childs'].append(elb)
            for k in xrange(nc):
                elc =  {'typ': 'c', id: (i, j, k), 'childs': []}
                elb['childs'].append(elc)
    td = time.time() - t1
    print td, 'dict'
    del droot

The original question was about general use cases not only about memory. So it should be mentioned here that you also get better performance when instantiating large amounts of objects – interesting e.g. when parsing large documents into objects or from a database.

Here is a comparison of creating object trees with a million entries, using slots and without slots. As a reference also the performance when using plain dicts for the trees (Py2.7.10 on OSX):

********** RUN 1 **********
1.96036410332 <class 'css_tree_select.element.Element'>
3.02922606468 <class 'css_tree_select.element.ElementNoSlots'>
2.90828204155 dict
********** RUN 2 **********
1.77050495148 <class 'css_tree_select.element.Element'>
3.10655999184 <class 'css_tree_select.element.ElementNoSlots'>
2.84120798111 dict
********** RUN 3 **********
1.84069895744 <class 'css_tree_select.element.Element'>
3.21540498734 <class 'css_tree_select.element.ElementNoSlots'>
2.59615707397 dict
********** RUN 4 **********
1.75041103363 <class 'css_tree_select.element.Element'>
3.17366290092 <class 'css_tree_select.element.ElementNoSlots'>
2.70941114426 dict

Test classes (ident, appart from slots):

class Element(object):
    __slots__ = ['_typ', 'id', 'parent', 'childs']
    def __init__(self, typ, id, parent=None):
        self._typ = typ
        self.id = id
        self.childs = []
        if parent:
            self.parent = parent
            parent.childs.append(self)

class ElementNoSlots(object): (same, w/o slots)

testcode, verbose mode:

na, nb, nc = 100, 100, 100
for i in (1, 2, 3, 4):
    print '*' * 10, 'RUN', i, '*' * 10
    # tree with slot and no slot:
    for cls in Element, ElementNoSlots:
        t1 = time.time()
        root = cls('root', 'root')
        for i in xrange(na):
            ela = cls(typ='a', id=i, parent=root)
            for j in xrange(nb):
                elb = cls(typ='b', id=(i, j), parent=ela)
                for k in xrange(nc):
                    elc = cls(typ='c', id=(i, j, k), parent=elb)
        to =  time.time() - t1
        print to, cls
        del root

    # ref: tree with dicts only:
    t1 = time.time()
    droot = {'childs': []}
    for i in xrange(na):
        ela =  {'typ': 'a', id: i, 'childs': []}
        droot['childs'].append(ela)
        for j in xrange(nb):
            elb =  {'typ': 'b', id: (i, j), 'childs': []}
            ela['childs'].append(elb)
            for k in xrange(nc):
                elc =  {'typ': 'c', id: (i, j, k), 'childs': []}
                elb['childs'].append(elc)
    td = time.time() - t1
    print td, 'dict'
    del droot

如何从日期中减去一天?

问题:如何从日期中减去一天?

我有一个Python datetime.datetime对象。减去一天的最佳方法是什么?

I have a Python datetime.datetime object. What is the best way to subtract one day?


回答 0

您可以使用timedelta对象:

from datetime import datetime, timedelta

d = datetime.today() - timedelta(days=days_to_subtract)

You can use a timedelta object:

from datetime import datetime, timedelta

d = datetime.today() - timedelta(days=days_to_subtract)

回答 1

减去 datetime.timedelta(days=1)

Subtract datetime.timedelta(days=1)


回答 2

如果您的Python日期时间对象可识别时区,则应注意避免DST转换周围的错误(或由于其他原因导致UTC偏移量发生变化):

from datetime import datetime, timedelta
from tzlocal import get_localzone # pip install tzlocal

DAY = timedelta(1)
local_tz = get_localzone()   # get local timezone
now = datetime.now(local_tz) # get timezone-aware datetime object
day_ago = local_tz.normalize(now - DAY) # exactly 24 hours ago, time may differ
naive = now.replace(tzinfo=None) - DAY # same time
yesterday = local_tz.localize(naive, is_dst=None) # but elapsed hours may differ

在一般情况下,day_agoyesterday如果UTC偏移量本地时区中的最后一天发生了变化可能会有所不同。

例如,夏令时/夏令时在美国/洛杉矶时区的Sun 2-Nov-2014的02:00:00 AM结束,因此,如果:

import pytz # pip install pytz

local_tz = pytz.timezone('America/Los_Angeles')
now = local_tz.localize(datetime(2014, 11, 2, 10), is_dst=None)
# 2014-11-02 10:00:00 PST-0800

然后day_agoyesterday不同:

  • day_ago恰好是24小时前(相对于now),但在上午11点而不是上午10点now
  • yesterday是昨天上午10点,但是是25小时前(相对于now),而不是24小时。

pendulum模块自动处理它:

>>> import pendulum  # $ pip install pendulum

>>> now = pendulum.create(2014, 11, 2, 10, tz='America/Los_Angeles')
>>> day_ago = now.subtract(hours=24)  # exactly 24 hours ago
>>> yesterday = now.subtract(days=1)  # yesterday at 10 am but it is 25 hours ago

>>> (now - day_ago).in_hours()
24
>>> (now - yesterday).in_hours()
25

>>> now
<Pendulum [2014-11-02T10:00:00-08:00]>
>>> day_ago
<Pendulum [2014-11-01T11:00:00-07:00]>
>>> yesterday
<Pendulum [2014-11-01T10:00:00-07:00]>

If your Python datetime object is timezone-aware than you should be careful to avoid errors around DST transitions (or changes in UTC offset for other reasons):

from datetime import datetime, timedelta
from tzlocal import get_localzone # pip install tzlocal

DAY = timedelta(1)
local_tz = get_localzone()   # get local timezone
now = datetime.now(local_tz) # get timezone-aware datetime object
day_ago = local_tz.normalize(now - DAY) # exactly 24 hours ago, time may differ
naive = now.replace(tzinfo=None) - DAY # same time
yesterday = local_tz.localize(naive, is_dst=None) # but elapsed hours may differ

In general, day_ago and yesterday may differ if UTC offset for the local timezone has changed in the last day.

For example, daylight saving time/summer time ends on Sun 2-Nov-2014 at 02:00:00 A.M. in America/Los_Angeles timezone therefore if:

import pytz # pip install pytz

local_tz = pytz.timezone('America/Los_Angeles')
now = local_tz.localize(datetime(2014, 11, 2, 10), is_dst=None)
# 2014-11-02 10:00:00 PST-0800

then day_ago and yesterday differ:

  • day_ago is exactly 24 hours ago (relative to now) but at 11 am, not at 10 am as now
  • yesterday is yesterday at 10 am but it is 25 hours ago (relative to now), not 24 hours.

pendulum module handles it automatically:

>>> import pendulum  # $ pip install pendulum

>>> now = pendulum.create(2014, 11, 2, 10, tz='America/Los_Angeles')
>>> day_ago = now.subtract(hours=24)  # exactly 24 hours ago
>>> yesterday = now.subtract(days=1)  # yesterday at 10 am but it is 25 hours ago

>>> (now - day_ago).in_hours()
24
>>> (now - yesterday).in_hours()
25

>>> now
<Pendulum [2014-11-02T10:00:00-08:00]>
>>> day_ago
<Pendulum [2014-11-01T11:00:00-07:00]>
>>> yesterday
<Pendulum [2014-11-01T10:00:00-07:00]>

回答 3

只是为了阐明对它有帮助的替代方法和用例:

  • 从当前日期时间减去1天:
from datetime import datetime, timedelta
print datetime.now() + timedelta(days=-1)  # Here, I am adding a negative timedelta
  • 在案例中很有用,如果您想增加5天并从当前日期时间中减去5小时。即从现在算起5天,但少5个小时的日期时间是什么?
from datetime import datetime, timedelta
print datetime.now() + timedelta(days=5, hours=-5)

它可以类似地与其他参数一起使用,例如秒,周等

Just to Elaborate an alternate method and a Use case for which it is helpful:

  • Subtract 1 day from current datetime:
from datetime import datetime, timedelta
print datetime.now() + timedelta(days=-1)  # Here, I am adding a negative timedelta
  • Useful in the Case, If you want to add 5 days and subtract 5 hours from current datetime. i.e. What is the Datetime 5 days from now but 5 hours less ?
from datetime import datetime, timedelta
print datetime.now() + timedelta(days=5, hours=-5)

It can similarly be used with other parameters e.g. seconds, weeks etc


回答 4

当我想计算上个月的第一天/最后一天或其他相对时间增量等时,我也喜欢使用另一个好函数。

从relativedelta功能dateutil功能(一个强大的扩展到datetime LIB)

import datetime as dt
from dateutil.relativedelta import relativedelta
#get first and last day of this and last month)
today = dt.date.today()
first_day_this_month = dt.date(day=1, month=today.month, year=today.year)
last_day_last_month = first_day_this_month - relativedelta(days=1)
print (first_day_this_month, last_day_last_month)

>2015-03-01 2015-02-28

Also just another nice function i like to use when i want to compute i.e. first/last day of the last month or other relative timedeltas etc. …

The relativedelta function from dateutil function (a powerful extension to the datetime lib)

import datetime as dt
from dateutil.relativedelta import relativedelta
#get first and last day of this and last month)
today = dt.date.today()
first_day_this_month = dt.date(day=1, month=today.month, year=today.year)
last_day_last_month = first_day_this_month - relativedelta(days=1)
print (first_day_this_month, last_day_last_month)

>2015-03-01 2015-02-28

回答 5

存在温和的箭头模块

import arrow
utc = arrow.utcnow()
utc_yesterday = utc.shift(days=-1)
print(utc, '\n', utc_yesterday)

输出:

2017-04-06T11:17:34.431397+00:00 
 2017-04-05T11:17:34.431397+00:00

Genial arrow module exists

import arrow
utc = arrow.utcnow()
utc_yesterday = utc.shift(days=-1)
print(utc, '\n', utc_yesterday)

output:

2017-04-06T11:17:34.431397+00:00 
 2017-04-05T11:17:34.431397+00:00

从相对路径导入模块

问题:从相对路径导入模块

给定相对路径,如何导入Python模块?

例如,如果dirFoo包含Foo.pydirBar,和dirBar包含Bar.py,我怎么导入Bar.pyFoo.py

这是一个视觉表示:

dirFoo\
    Foo.py
    dirBar\
        Bar.py

Foo希望包含Bar,但重组文件夹层次结构不是一种选择。

How do I import a Python module given its relative path?

For example, if dirFoo contains Foo.py and dirBar, and dirBar contains Bar.py, how do I import Bar.py into Foo.py?

Here’s a visual representation:

dirFoo\
    Foo.py
    dirBar\
        Bar.py

Foo wishes to include Bar, but restructuring the folder hierarchy is not an option.


回答 0

假设您的两个目录都是真实的Python包(__init__.py文件中确实有文件),那么这是一个相对于脚本位置包含模块的安全解决方案。

我假设您想这样做,因为您需要在脚本中包括一组模块。我在多个产品的生产环境中使用了此功能,并在许多特殊情况下工作,例如:从另一个目录调用或使用python执行的脚本执行而不是打开新的解释器。

 import os, sys, inspect
 # realpath() will make your script run, even if you symlink it :)
 cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
 if cmd_folder not in sys.path:
     sys.path.insert(0, cmd_folder)

 # Use this if you want to include modules from a subfolder
 cmd_subfolder = os.path.realpath(os.path.abspath(os.path.join(os.path.split(inspect.getfile( inspect.currentframe() ))[0],"subfolder")))
 if cmd_subfolder not in sys.path:
     sys.path.insert(0, cmd_subfolder)

 # Info:
 # cmd_folder = os.path.dirname(os.path.abspath(__file__)) # DO NOT USE __file__ !!!
 # __file__ fails if the script is called in different ways on Windows.
 # __file__ fails if someone does os.chdir() before.
 # sys.argv[0] also fails, because it doesn't not always contains the path.

另外,这种方法确实可以让您强制Python使用模块,而不是系统上安装的模块。

警告!我真的不知道当前模块在egg文件中时会发生什么。它也可能失败。

Assuming that both your directories are real Python packages (do have the __init__.py file inside them), here is a safe solution for inclusion of modules relatively to the location of the script.

I assume that you want to do this, because you need to include a set of modules with your script. I use this in production in several products and works in many special scenarios like: scripts called from another directory or executed with python execute instead of opening a new interpreter.

 import os, sys, inspect
 # realpath() will make your script run, even if you symlink it :)
 cmd_folder = os.path.realpath(os.path.abspath(os.path.split(inspect.getfile( inspect.currentframe() ))[0]))
 if cmd_folder not in sys.path:
     sys.path.insert(0, cmd_folder)

 # Use this if you want to include modules from a subfolder
 cmd_subfolder = os.path.realpath(os.path.abspath(os.path.join(os.path.split(inspect.getfile( inspect.currentframe() ))[0],"subfolder")))
 if cmd_subfolder not in sys.path:
     sys.path.insert(0, cmd_subfolder)

 # Info:
 # cmd_folder = os.path.dirname(os.path.abspath(__file__)) # DO NOT USE __file__ !!!
 # __file__ fails if the script is called in different ways on Windows.
 # __file__ fails if someone does os.chdir() before.
 # sys.argv[0] also fails, because it doesn't not always contains the path.

As a bonus, this approach does let you force Python to use your module instead of the ones installed on the system.

Warning! I don’t really know what is happening when current module is inside an egg file. It probably fails too.


回答 1

确保dirBar具有__init__.py文件-这会将目录创建到Python包中。

Be sure that dirBar has the __init__.py file — this makes a directory into a Python package.


回答 2

您也可以将子目录添加到Python路径中,以便将其作为普通脚本导入。

import sys
sys.path.insert(0, <path to dirFoo>)
import Bar

You could also add the subdirectory to your Python path so that it imports as a normal script.

import sys
sys.path.insert(0, <path to dirFoo>)
import Bar

回答 3

import os
import sys
lib_path = os.path.abspath(os.path.join(__file__, '..', '..', '..', 'lib'))
sys.path.append(lib_path)

import mymodule
import os
import sys
lib_path = os.path.abspath(os.path.join(__file__, '..', '..', '..', 'lib'))
sys.path.append(lib_path)

import mymodule

回答 4

只需执行简单的操作即可从其他文件夹导入.py文件。

假设您有一个目录,例如:

lib/abc.py

然后只需将一个空文件保留在lib文件夹中,命名为

__init__.py

然后用

from lib.abc import <Your Module name>

__init__.py文件保留在导入模块层次结构的每个文件夹中。

Just do simple things to import the .py file from a different folder.

Let’s say you have a directory like:

lib/abc.py

Then just keep an empty file in lib folder as named

__init__.py

And then use

from lib.abc import <Your Module name>

Keep the __init__.py file in every folder of the hierarchy of the import module.


回答 5

如果您以这种方式构建项目:

src\
  __init__.py
  main.py
  dirFoo\
    __init__.py
    Foo.py
  dirBar\
    __init__.py
    Bar.py

然后从Foo.py您应该可以执行以下操作:

import dirFoo.Foo

要么:

from dirFoo.Foo import FooObject

根据Tom的评论,这确实要求src可以通过site_packages或您的搜索路径访问该文件夹。而且,正如他所提到的,__init__.py当您首次在该包/目录中导入模块时,它是隐式导入的。通常__init__.py只是一个空文件。

If you structure your project this way:

src\
  __init__.py
  main.py
  dirFoo\
    __init__.py
    Foo.py
  dirBar\
    __init__.py
    Bar.py

Then from Foo.py you should be able to do:

import dirFoo.Foo

Or:

from dirFoo.Foo import FooObject

Per Tom’s comment, this does require that the src folder is accessible either via site_packages or your search path. Also, as he mentions, __init__.py is implicitly imported when you first import a module in that package/directory. Typically __init__.py is simply an empty file.


回答 6

最简单的方法是使用sys.path.append()。

但是,您可能也对imp模块感兴趣。它提供对内部导入功能的访问。

# mod_name is the filename without the .py/.pyc extention
py_mod = imp.load_source(mod_name,filename_path) # Loads .py file
py_mod = imp.load_compiled(mod_name,filename_path) # Loads .pyc file 

当您不知道模块名称时,可以使用它来动态加载模块。

过去我曾使用它来创建应用程序的插件类型接口,用户可以在其中编写具有应用程序特定功能的脚本,然后将其脚本放置在特定目录中。

此外,这些功能可能会很有用:

imp.find_module(name[, path])
imp.load_module(name, file, pathname, description)

The easiest method is to use sys.path.append().

However, you may be also interested in the imp module. It provides access to internal import functions.

# mod_name is the filename without the .py/.pyc extention
py_mod = imp.load_source(mod_name,filename_path) # Loads .py file
py_mod = imp.load_compiled(mod_name,filename_path) # Loads .pyc file 

This can be used to load modules dynamically when you don’t know a module’s name.

I’ve used this in the past to create a plugin type interface to an application, where the user would write a script with application specific functions, and just drop thier script in a specific directory.

Also, these functions may be useful:

imp.find_module(name[, path])
imp.load_module(name, file, pathname, description)

回答 7

这是相关的PEP:

http://www.python.org/dev/peps/pep-0328/

特别是,假定dirFoo是dirBar的目录。

在dirFoo \ Foo.py中:

from ..dirBar import Bar

This is the relevant PEP:

http://www.python.org/dev/peps/pep-0328/

In particular, presuming dirFoo is a directory up from dirBar…

In dirFoo\Foo.py:

from ..dirBar import Bar

回答 8

不对脚本进行任何修改的最简单方法是设置PYTHONPATH环境变量。由于sys.path是从以下位置初始化的:

  1. 包含输入脚本的目录(或当前目录)。
  2. PYTHONPATH(目录名称列表,语法与shell变量PATH相同)。
  3. 取决于安装的默认值。

赶紧跑:

export PYTHONPATH=/absolute/path/to/your/module

您的sys.path将包含以上路径,如下所示:

print sys.path

['', '/absolute/path/to/your/module', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client', '/usr/lib/python2.7/dist-packages/ubuntuone-client', '/usr/lib/python2.7/dist-packages/ubuntuone-control-panel', '/usr/lib/python2.7/dist-packages/ubuntuone-couch', '/usr/lib/python2.7/dist-packages/ubuntuone-installer', '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']

The easiest way without any modification to your script is to set PYTHONPATH environment variable. Because sys.path is initialized from these locations:

  1. The directory containing the input script (or the current directory).
  2. PYTHONPATH (a list of directory names, with the same syntax as the shell variable PATH).
  3. The installation-dependent default.

Just run:

export PYTHONPATH=/absolute/path/to/your/module

You sys.path will contains above path, as show below:

print sys.path

['', '/absolute/path/to/your/module', '/usr/lib/python2.7', '/usr/lib/python2.7/plat-linux2', '/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old', '/usr/lib/python2.7/lib-dynload', '/usr/local/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages', '/usr/lib/python2.7/dist-packages/PIL', '/usr/lib/python2.7/dist-packages/gst-0.10', '/usr/lib/python2.7/dist-packages/gtk-2.0', '/usr/lib/pymodules/python2.7', '/usr/lib/python2.7/dist-packages/ubuntu-sso-client', '/usr/lib/python2.7/dist-packages/ubuntuone-client', '/usr/lib/python2.7/dist-packages/ubuntuone-control-panel', '/usr/lib/python2.7/dist-packages/ubuntuone-couch', '/usr/lib/python2.7/dist-packages/ubuntuone-installer', '/usr/lib/python2.7/dist-packages/ubuntuone-storage-protocol']

回答 9

我认为最好的选择是将__ init __.py放在文件夹中,然后使用

from dirBar.Bar import *

不建议使用sys.path.append(),因为如果您使用与现有python包相同的文件名,则可能会出错。我还没有测试,但这将是模棱两可的。

In my opinion the best choice is to put __ init __.py in the folder and call the file with

from dirBar.Bar import *

It is not recommended to use sys.path.append() because something might gone wrong if you use the same file name as the existing python package. I haven’t test that but that will be ambiguous.


回答 10

Linux用户的快捷方式

如果您只是在修改而不关心部署问题,则可以使用符号链接(假设文件系统支持它)使模块或程序包直接在请求模块的文件夹中可见。

ln -s (path)/module_name.py

要么

ln -s (path)/package_name

注意:“模块”是带有.py扩展名的任何文件,“包”是包含该文件的任何文件夹__init__.py(可以是空文件)。从使用的角度来看,模块和程序包是相同的-都按照import命令的要求公开了它们包含的“定义和语句” 。

请参阅:http : //docs.python.org/2/tutorial/modules.html

The quick-and-dirty way for Linux users

If you are just tinkering around and don’t care about deployment issues, you can use a symbolic link (assuming your filesystem supports it) to make the module or package directly visible in the folder of the requesting module.

ln -s (path)/module_name.py

or

ln -s (path)/package_name

Note: A “module” is any file with a .py extension and a “package” is any folder that contains the file __init__.py (which can be an empty file). From a usage standpoint, modules and packages are identical — both expose their contained “definitions and statements” as requested via the import command.

See: http://docs.python.org/2/tutorial/modules.html


回答 11

from .dirBar import Bar

代替:

from dirBar import Bar

以防万一可能会安装另一个dirBar并混淆foo.py阅读器。

from .dirBar import Bar

instead of:

from dirBar import Bar

just in case there could be another dirBar installed and confuse a foo.py reader.


回答 12

对于这种情况,要将Bar.py导入Foo.py,首先,将这些文件夹转换为Python包,如下所示:

dirFoo\
    __init__.py
    Foo.py
    dirBar\
        __init__.py
        Bar.py

然后我会在Foo.py中这样做:

from .dirBar import Bar

如果我希望命名空间看起来像Bar。不管,或

from . import dirBar

如果我想要命名空间dirBar.Bar。随便。如果您在dirBar包下有更多模块,则第二种情况很有用。

For this case to import Bar.py into Foo.py, first I’d turn these folders into Python packages like so:

dirFoo\
    __init__.py
    Foo.py
    dirBar\
        __init__.py
        Bar.py

Then I would do it like this in Foo.py:

from .dirBar import Bar

If I wanted the namespacing to look like Bar.whatever, or

from . import dirBar

If I wanted the namespacing dirBar.Bar.whatever. This second case is useful if you have more modules under the dirBar package.


回答 13

添加__init__.py文件:

dirFoo\
    Foo.py
    dirBar\
        __init__.py
        Bar.py

然后将此代码添加到Foo.py的开头:

import sys
sys.path.append('dirBar')
import Bar

Add an __init__.py file:

dirFoo\
    Foo.py
    dirBar\
        __init__.py
        Bar.py

Then add this code to the start of Foo.py:

import sys
sys.path.append('dirBar')
import Bar

回答 14

相对sys.path示例:

# /lib/my_module.py
# /src/test.py


if __name__ == '__main__' and __package__ is None:
    sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../lib')))
import my_module

基于答案。

Relative sys.path example:

# /lib/my_module.py
# /src/test.py


if __name__ == '__main__' and __package__ is None:
    sys.path.append(os.path.abspath(os.path.join(os.path.dirname(__file__), '../lib')))
import my_module

Based on this answer.


回答 15

好了,正如您提到的,通常您希望访问一个包含您的模块的文件夹,该模块相对于您运行主脚本的位置,因此您只需导入它们即可。

解:

我有脚本D:/Books/MyBooks.py和一些模块(如oldies.py)。我需要从子目录导入D:/Books/includes

import sys,site
site.addsitedir(sys.path[0] + '\\includes')
print (sys.path)  # Just verify it is there
import oldies

print('done')放在中oldies.py,以便您确认一切正常。这种方法始终有效,因为sys.path根据程序启动时初始化的Python定义,此列表的第一项path[0]是包含用于调用Python解释器的脚本的目录。

如果脚本目录不可用(例如,如果交互式调用解释器或从标准输入中读取脚本),path[0]则为空字符串,该字符串将引导Python首先搜索当前目录中的模块。请注意,作为的结果,在插入条目之前插入了脚本目录PYTHONPATH

Well, as you mention, usually you want to have access to a folder with your modules relative to where your main script is run, so you just import them.

Solution:

I have the script in D:/Books/MyBooks.py and some modules (like oldies.py). I need to import from subdirectory D:/Books/includes:

import sys,site
site.addsitedir(sys.path[0] + '\\includes')
print (sys.path)  # Just verify it is there
import oldies

Place a print('done') in oldies.py, so you verify everything is going OK. This way always works because by the Python definition sys.path as initialized upon program startup, the first item of this list, path[0], is the directory containing the script that was used to invoke the Python interpreter.

If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), path[0] is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted before the entries inserted as a result of PYTHONPATH.


回答 16

只需使用即可: from Desktop.filename import something

例:

鉴于该文件是test.pydirectory目录中的 name Users/user/Desktop,并且将导入所有内容。

编码:

from Desktop.test import *

但是请确保__init__.py在该目录中创建一个名为“ ” 的空文件

Simply you can use: from Desktop.filename import something

Example:

given that the file is name test.py in directory Users/user/Desktop , and will import everthing.

the code:

from Desktop.test import *

But make sure you make an empty file called “__init__.py” in that directory


回答 17

另一种解决方案是安装py-require软件包,然后在Foo.py

import require
Bar = require('./dirBar/Bar')

Another solution would be to install the py-require package and then use the following in Foo.py

import require
Bar = require('./dirBar/Bar')

回答 18

这是一种使用相对路径从上一级导入文件的方法。

基本上,只需将工作目录上移某个级别(或任何相对位置),然后将其添加到您的路径中,然后再将工作目录移回其开始位置即可。

#to import from one level above:
cwd = os.getcwd()
os.chdir("..")
below_path =  os.getcwd()
sys.path.append(below_path)
os.chdir(cwd)

Here’s a way to import a file from one level above, using the relative path.

Basically, just move the working directory up a level (or any relative location), add that to your path, then move the working directory back where it started.

#to import from one level above:
cwd = os.getcwd()
os.chdir("..")
below_path =  os.getcwd()
sys.path.append(below_path)
os.chdir(cwd)

回答 19

我对python没有经验,所以如果我的话有什么错误,请告诉我。如果您的文件层次结构是这样排列的:

project\
    module_1.py 
    module_2.py

module_1.py定义了一个称为函数func_1()module_2.py

from module_1 import func_1

def func_2():
    func_1()

if __name__ == '__main__':
    func_2()

并且您python module_2.py在cmd中运行,它将按func_1()定义运行。通常,这就是我们导入相同层次结构文件的方式。但是当您from .module_1 import func_1输入时module_2.py,python解释器会说No module named '__main__.module_1'; '__main__' is not a package。因此,要解决此问题,我们只需保留所做的更改,然后将两个模块都移到一个程序包中,然后将第三个模块作为调用方运行即可module_2.py

project\
    package_1\
        module_1.py
        module_2.py
    main.py

main.py

from package_1.module_2 import func_2

def func_3():
    func_2()

if __name__ == '__main__':
    func_3()

而增加了的原因.之前module_1module_2.py是,如果我们不这样做,并运行main.py,Python解释器会说No module named 'module_1',这是一个有点棘手,module_1.py是旁边module_2.py。现在让我func_1()module_1.py做一些事情:

def func_1():
    print(__name__)

__name__记录谁调用func_1。现在,我们保留.之前的内容module_1,运行main.py,它将打印出来package_1.module_1,而不是module_1。它表明呼叫的func_1()对象与处于相同的层次结构main.py,这.意味着module_1与其module_2.py本身处于相同的层次结构。因此,如果没有点,main.py它将module_1在与自身相同的层次结构中进行识别package_1,它可以识别,但不能识别它的“下方”。

现在,让它变得有点复杂。您有一个,config.ini并且一个模块定义了一个函数来读取与“ main.py”相同的层次结构的函数。

project\
    package_1\
        module_1.py
        module_2.py
    config.py
    config.ini
    main.py

出于某些不可避免的原因,您必须使用调用它module_2.py,因此必须从上层结构导入。module_2.py

 import ..config
 pass

两点表示从上级结构导入(三个点访问上层而不是上层,依此类推)。现在运行main.py,解释器将说:ValueError:attempted relative import beyond top-level package。这里的“顶级程序包”是main.py。仅仅因为config.py在旁边main.py,它们处于相同的层次结构,config.py不在“下面” main.py,或者不在“前面” main.py,所以它超出了main.py。要解决此问题,最简单的方法是:

project\
    package_1\
        module_1.py
        module_2.py
    config.py
    config.ini
main.py

我认为这与安排项目文件层次结构的原理是一致的,您应该将具有不同功能的模块安排在不同的文件夹中,而仅在外部放置一个顶级调用方,然后可以随心所欲地导入。

I’m not experienced about python, so if there is any wrong in my words, just tell me. If your file hierarchy arranged like this:

project\
    module_1.py 
    module_2.py

module_1.py defines a function called func_1(), module_2.py:

from module_1 import func_1

def func_2():
    func_1()

if __name__ == '__main__':
    func_2()

and you run python module_2.py in cmd, it will do run what func_1() defines. That’s usually how we import same hierarchy files. But when you write from .module_1 import func_1 in module_2.py, python interpreter will say No module named '__main__.module_1'; '__main__' is not a package. So to fix this, we just keep the change we just make, and move both of the module to a package, and make a third module as a caller to run module_2.py.

project\
    package_1\
        module_1.py
        module_2.py
    main.py

main.py:

from package_1.module_2 import func_2

def func_3():
    func_2()

if __name__ == '__main__':
    func_3()

But the reason we add a . before module_1 in module_2.py is that if we don’t do that and run main.py, python interpreter will say No module named 'module_1', that’s a little tricky, module_1.py is right beside module_2.py. Now I let func_1() in module_1.py do something:

def func_1():
    print(__name__)

that __name__ records who calls func_1. Now we keep the . before module_1 , run main.py, it will print package_1.module_1, not module_1. It indicates that the one who calls func_1() is at the same hierarchy as main.py, the . imply that module_1 is at the same hierarchy as module_2.py itself. So if there isn’t a dot, main.py will recognize module_1 at the same hierarchy as itself, it can recognize package_1, but not what “under” it.

Now let’s make it a bit complicated. You have a config.ini and a module defines a function to read it at the same hierarchy as ‘main.py’.

project\
    package_1\
        module_1.py
        module_2.py
    config.py
    config.ini
    main.py

And for some unavoidable reason, you have to call it with module_2.py, so it has to import from upper hierarchy.module_2.py:

 import ..config
 pass

Two dots means import from upper hierarchy (three dots access upper than upper,and so on). Now we run main.py, the interpreter will say:ValueError:attempted relative import beyond top-level package. The “top-level package” at here is main.py. Just because config.py is beside main.py, they are at same hierarchy, config.py isn’t “under” main.py, or it isn’t “leaded” by main.py, so it is beyond main.py. To fix this, the simplest way is:

project\
    package_1\
        module_1.py
        module_2.py
    config.py
    config.ini
main.py

I think that is coincide with the principle of arrange project file hierarchy, you should arrange modules with different function in different folders, and just leave a top caller in the outside, and you can import how ever you want.


回答 20

这也可行,并且比使用该sys模块的任何事情都要简单得多:

with open("C:/yourpath/foobar.py") as f:
    eval(f.read())

This also works, and is much simpler than anything with the sys module:

with open("C:/yourpath/foobar.py") as f:
    eval(f.read())

回答 21

称我过于谨慎,但我想让我的便携式计算机更加便携,因为假设文件始终位于每台计算机上的同一位置是不安全的。我个人的代码首先查找文件路径。我使用Linux,所以我的看起来像这样:

import os, sys
from subprocess import Popen, PIPE
try:
    path = Popen("find / -name 'file' -type f", shell=True, stdout=PIPE).stdout.read().splitlines()[0]
    if not sys.path.__contains__(path):
        sys.path.append(path)
except IndexError:
    raise RuntimeError("You must have FILE to run this program!")

当然,除非您计划将它们打包在一起。但是,在这种情况下,您实际上并不需要两个单独的文件。

Call me overly cautious, but I like to make mine more portable because it’s unsafe to assume that files will always be in the same place on every computer. Personally I have the code look up the file path first. I use Linux so mine would look like this:

import os, sys
from subprocess import Popen, PIPE
try:
    path = Popen("find / -name 'file' -type f", shell=True, stdout=PIPE).stdout.read().splitlines()[0]
    if not sys.path.__contains__(path):
        sys.path.append(path)
except IndexError:
    raise RuntimeError("You must have FILE to run this program!")

That is of course unless you plan to package these together. But if that’s the case you don’t really need two separate files anyway.


如何在Python中检查文件大小?

问题:如何在Python中检查文件大小?

我在Windows中编写Python脚本。我想根据文件大小做一些事情。例如,如果大小大于0,我将向某人发送电子邮件,否则继续其他操作。

如何检查文件大小?

I am writing a Python script in Windows. I want to do something based on the file size. For example, if the size is greater than 0, I will send an email to somebody, otherwise continue to other things.

How do I check the file size?


回答 0

您需要由返回的对象st_size属性。您可以使用(Python 3.4+)来获取它:os.statpathlib

>>> from pathlib import Path
>>> Path('somefile.txt').stat()
os.stat_result(st_mode=33188, st_ino=6419862, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=1564, st_atime=1584299303, st_mtime=1584299400, st_ctime=1584299400)
>>> Path('somefile.txt').stat().st_size
1564

或使用os.stat

>>> import os
>>> os.stat('somefile.txt')
os.stat_result(st_mode=33188, st_ino=6419862, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=1564, st_atime=1584299303, st_mtime=1584299400, st_ctime=1584299400)
>>> os.stat('somefile.txt').st_size
1564

输出以字节为单位。

You need the st_size property of the object returned by os.stat. You can get it by either using pathlib (Python 3.4+):

>>> from pathlib import Path
>>> Path('somefile.txt').stat()
os.stat_result(st_mode=33188, st_ino=6419862, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=1564, st_atime=1584299303, st_mtime=1584299400, st_ctime=1584299400)
>>> Path('somefile.txt').stat().st_size
1564

or using os.stat:

>>> import os
>>> os.stat('somefile.txt')
os.stat_result(st_mode=33188, st_ino=6419862, st_dev=16777220, st_nlink=1, st_uid=501, st_gid=20, st_size=1564, st_atime=1584299303, st_mtime=1584299400, st_ctime=1584299400)
>>> os.stat('somefile.txt').st_size
1564

Output is in bytes.


回答 1

使用os.path.getsize

>>> import os
>>> b = os.path.getsize("/path/isa_005.mp3")
>>> b
2071611

输出以字节为单位。

Using os.path.getsize:

>>> import os
>>> b = os.path.getsize("/path/isa_005.mp3")
>>> b
2071611

The output is in bytes.


回答 2

其他答案适用于实际文件,但是如果您需要适用于“类文件的对象”的文件,请尝试以下操作:

# f is a file-like object. 
f.seek(0, os.SEEK_END)
size = f.tell()

在我有限的测试中,它适用于真实文件和StringIO。(Python 2.7.3。)当然,“类文件对象” API并不是严格的接口,但是API文档建议类文件对象应支持seek()tell()

编辑

这与之间的另一个区别os.stat()是,stat()即使您没有读取权限,也可以文件。显然,除非您具有阅读许可,否则搜索/讲述方法将无法工作。

编辑2

在乔纳森的建议下,这是一个偏执的版本。(以上版本将文件指针留在文件的末尾,因此,如果您尝试从文件中读取文件,则将返回零字节!)

# f is a file-like object. 
old_file_position = f.tell()
f.seek(0, os.SEEK_END)
size = f.tell()
f.seek(old_file_position, os.SEEK_SET)

The other answers work for real files, but if you need something that works for “file-like objects”, try this:

# f is a file-like object. 
f.seek(0, os.SEEK_END)
size = f.tell()

It works for real files and StringIO’s, in my limited testing. (Python 2.7.3.) The “file-like object” API isn’t really a rigorous interface, of course, but the API documentation suggests that file-like objects should support seek() and tell().

Edit

Another difference between this and os.stat() is that you can stat() a file even if you don’t have permission to read it. Obviously the seek/tell approach won’t work unless you have read permission.

Edit 2

At Jonathon’s suggestion, here’s a paranoid version. (The version above leaves the file pointer at the end of the file, so if you were to try to read from the file, you’d get zero bytes back!)

# f is a file-like object. 
old_file_position = f.tell()
f.seek(0, os.SEEK_END)
size = f.tell()
f.seek(old_file_position, os.SEEK_SET)

回答 3

import os


def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0


def file_size(file_path):
    """
    this function will return the file size
    """
    if os.path.isfile(file_path):
        file_info = os.stat(file_path)
        return convert_bytes(file_info.st_size)


# Lets check the file size of MS Paint exe 
# or you can use any file path
file_path = r"C:\Windows\System32\mspaint.exe"
print file_size(file_path)

结果:

6.1 MB
import os


def convert_bytes(num):
    """
    this function will convert bytes to MB.... GB... etc
    """
    for x in ['bytes', 'KB', 'MB', 'GB', 'TB']:
        if num < 1024.0:
            return "%3.1f %s" % (num, x)
        num /= 1024.0


def file_size(file_path):
    """
    this function will return the file size
    """
    if os.path.isfile(file_path):
        file_info = os.stat(file_path)
        return convert_bytes(file_info.st_size)


# Lets check the file size of MS Paint exe 
# or you can use any file path
file_path = r"C:\Windows\System32\mspaint.exe"
print file_size(file_path)

Result:

6.1 MB

回答 4

使用pathlib在Python 3.4中添加或在PyPI上提供的反向端口):

from pathlib import Path
file = Path() / 'doc.txt'  # or Path('./doc.txt')
size = file.stat().st_size

实际上,这只是一个接口os.stat,但是使用pathlib提供了一种访问其他文件相关操作的简便方法。

Using pathlib (added in Python 3.4 or a backport available on PyPI):

from pathlib import Path
file = Path() / 'doc.txt'  # or Path('./doc.txt')
size = file.stat().st_size

This is really only an interface around os.stat, but using pathlib provides an easy way to access other file related operations.


回答 5

bitshift如果要从转换bytes为任何其他单位,有一个技巧。如果您进行右移,则10基本上是按一个顺序(多个)进行移位。

例: 5GB are 5368709120 bytes

print (5368709120 >> 10)  # 5242880 kilobytes (kB)
print (5368709120 >> 20 ) # 5120 megabytes (MB)
print (5368709120 >> 30 ) # 5 gigabytes (GB)

There is a bitshift trick I use if I want to to convert from bytes to any other unit. If you do a right shift by 10 you basically shift it by an order (multiple).

Example: 5GB are 5368709120 bytes

print (5368709120 >> 10)  # 5242880 kilobytes (kB)
print (5368709120 >> 20 ) # 5120 megabytes (MB)
print (5368709120 >> 30 ) # 5 gigabytes (GB)

回答 6

严格遵循这个问题,Python代码(+伪代码)将是:

import os
file_path = r"<path to your file>"
if os.stat(file_path).st_size > 0:
    <send an email to somebody>
else:
    <continue to other things>

Strictly sticking to the question, the Python code (+ pseudo-code) would be:

import os
file_path = r"<path to your file>"
if os.stat(file_path).st_size > 0:
    <send an email to somebody>
else:
    <continue to other things>

回答 7

#Get file size , print it , process it...
#Os.stat will provide the file size in (.st_size) property. 
#The file size will be shown in bytes.

import os

fsize=os.stat('filepath')
print('size:' + fsize.st_size.__str__())

#check if the file size is less than 10 MB

if fsize.st_size < 10000000:
    process it ....
#Get file size , print it , process it...
#Os.stat will provide the file size in (.st_size) property. 
#The file size will be shown in bytes.

import os

fsize=os.stat('filepath')
print('size:' + fsize.st_size.__str__())

#check if the file size is less than 10 MB

if fsize.st_size < 10000000:
    process it ....

回答 8

我们有两个选择都包括导入os模块

1)作为os.stat()函数导入os返回一个对象,该对象包含许多标头,包括文件创建时间和上次修改时间等。其中st_size()给出文件的确切大小。

os.stat(“文件名”).st_size()

2)import os在此,我们必须提供确切的文件路径(绝对路径),而不是相对路径。

os.path.getsize(“文件路径”)

we have two options Both include importing os module

1) import os as os.stat() function returns an object which contains so many headers including file created time and last modified time etc.. among them st_size() gives the exact size of the file.

os.stat(“filename”).st_size()

2) import os In this, we have to provide the exact file path(absolute path), not a relative path.

os.path.getsize(“path of file”)


如何获取模块的路径?

问题:如何获取模块的路径?

我想检测模块是否已更改。现在,使用inotify很简单,您只需要知道要从中获取通知的目录即可。

如何在python中检索模块的路径?

I want to detect whether module has changed. Now, using inotify is simple, you just need to know the directory you want to get notifications from.

How do I retrieve a module’s path in python?


回答 0

import a_module
print(a_module.__file__)

实际上,至少在Mac OS X上,将为您提供已加载的.pyc文件的路径。因此,我想您可以这样做:

import os
path = os.path.abspath(a_module.__file__)

您也可以尝试:

path = os.path.dirname(a_module.__file__)

获取模块的目录。

import a_module
print(a_module.__file__)

Will actually give you the path to the .pyc file that was loaded, at least on Mac OS X. So I guess you can do:

import os
path = os.path.abspath(a_module.__file__)

You can also try:

path = os.path.dirname(a_module.__file__)

To get the module’s directory.


回答 1

inspectpython中有模块。

官方文件

检查模块提供了几个有用的功能,以帮助获取有关活动对象的信息,例如模块,类,方法,函数,回溯,框架对象和代码对象。例如,它可以帮助您检查类的内容,检索方法的源代码,提取函数的参数列表并设置其格式或获取显示详细回溯所需的所有信息。

例:

>>> import os
>>> import inspect
>>> inspect.getfile(os)
'/usr/lib64/python2.7/os.pyc'
>>> inspect.getfile(inspect)
'/usr/lib64/python2.7/inspect.pyc'
>>> os.path.dirname(inspect.getfile(inspect))
'/usr/lib64/python2.7'

There is inspect module in python.

Official documentation

The inspect module provides several useful functions to help get information about live objects such as modules, classes, methods, functions, tracebacks, frame objects, and code objects. For example, it can help you examine the contents of a class, retrieve the source code of a method, extract and format the argument list for a function, or get all the information you need to display a detailed traceback.

Example:

>>> import os
>>> import inspect
>>> inspect.getfile(os)
'/usr/lib64/python2.7/os.pyc'
>>> inspect.getfile(inspect)
'/usr/lib64/python2.7/inspect.pyc'
>>> os.path.dirname(inspect.getfile(inspect))
'/usr/lib64/python2.7'

回答 2

正如其他答案所说的那样,最好的方法是使用__file__(在下面再次演示)。但是,有一个重要的警告,__file__如果您单独运行模块(例如,作为__main__),则该警告不存在。

例如,假设您有两个文件(两个文件都在PYTHONPATH上):

#/path1/foo.py
import bar
print(bar.__file__)

#/path2/bar.py
import os
print(os.getcwd())
print(__file__)

运行foo.py将给出输出:

/path1        # "import bar" causes the line "print(os.getcwd())" to run
/path2/bar.py # then "print(__file__)" runs
/path2/bar.py # then the import statement finishes and "print(bar.__file__)" runs

但是,如果尝试单独运行bar.py,则会得到:

/path2                              # "print(os.getcwd())" still works fine
Traceback (most recent call last):  # but __file__ doesn't exist if bar.py is running as main
  File "/path2/bar.py", line 3, in <module>
    print(__file__)
NameError: name '__file__' is not defined 

希望这可以帮助。在测试其他解决方案时,这一警告使我花费了大量时间和困惑。

As the other answers have said, the best way to do this is with __file__ (demonstrated again below). However, there is an important caveat, which is that __file__ does NOT exist if you are running the module on its own (i.e. as __main__).

For example, say you have two files (both of which are on your PYTHONPATH):

#/path1/foo.py
import bar
print(bar.__file__)

and

#/path2/bar.py
import os
print(os.getcwd())
print(__file__)

Running foo.py will give the output:

/path1        # "import bar" causes the line "print(os.getcwd())" to run
/path2/bar.py # then "print(__file__)" runs
/path2/bar.py # then the import statement finishes and "print(bar.__file__)" runs

HOWEVER if you try to run bar.py on its own, you will get:

/path2                              # "print(os.getcwd())" still works fine
Traceback (most recent call last):  # but __file__ doesn't exist if bar.py is running as main
  File "/path2/bar.py", line 3, in <module>
    print(__file__)
NameError: name '__file__' is not defined 

Hope this helps. This caveat cost me a lot of time and confusion while testing the other solutions presented.


回答 3

我还将尝试解决此问题的一些变体:

  1. 查找被调用脚本的路径
  2. 查找当前正在执行的脚本的路径
  3. 查找被调用脚本的目录

(其中一些问题已在SO上提出,但已作为重复内容关闭并在此处重定向。)

使用注意事项 __file__

对于已导入的模块:

import something
something.__file__ 

将返回模块的绝对路径。但是,鉴于以下脚本foo.py:

#foo.py
print '__file__', __file__

用“ python foo.py”调用它只会返回“ foo.py”。如果添加shebang:

#!/usr/bin/python 
#foo.py
print '__file__', __file__

并使用./foo.py调用它,它将返回’./foo.py’。从另一个目录中调用它(例如,将foo.py放在目录栏中),然后调用

python bar/foo.py

或添加一个Shebang并直接执行文件:

bar/foo.py

将返回“ bar / foo.py”(相对路径)。

查找目录

现在从那里获取目录,os.path.dirname(__file__)也可能很棘手。至少在我的系统上,如果从与文件相同的目录中调用它,它将返回一个空字符串。例如

# foo.py
import os
print '__file__ is:', __file__
print 'os.path.dirname(__file__) is:', os.path.dirname(__file__)

将输出:

__file__ is: foo.py
os.path.dirname(__file__) is: 

换句话说,它返回一个空字符串,因此如果要将其用于当前文件(与导入模块的文件相对),这似乎并不可靠。为了解决这个问题,您可以将其包装在对abspath的调用中:

# foo.py
import os
print 'os.path.abspath(__file__) is:', os.path.abspath(__file__)
print 'os.path.dirname(os.path.abspath(__file__)) is:', os.path.dirname(os.path.abspath(__file__))

输出类似:

os.path.abspath(__file__) is: /home/user/bar/foo.py
os.path.dirname(os.path.abspath(__file__)) is: /home/user/bar

请注意,abspath()不会解析符号链接。如果要执行此操作,请改用realpath()。例如,使符号链接file_import_testing_link指向file_import_testing.py,其内容如下:

import os
print 'abspath(__file__)',os.path.abspath(__file__)
print 'realpath(__file__)',os.path.realpath(__file__)

执行将打印绝对路径,例如:

abspath(__file__) /home/user/file_test_link
realpath(__file__) /home/user/file_test.py

file_import_testing_link-> file_import_testing.py

使用检查

@SummerBreeze提到使用检查模块。

对于导入的模块,这似乎很好用,也很简洁:

import os
import inspect
print 'inspect.getfile(os) is:', inspect.getfile(os)

听话地返回绝对路径。但是,为了找到当前正在执行的脚本的路径,我没有找到使用它的方法。

I will try tackling a few variations on this question as well:

  1. finding the path of the called script
  2. finding the path of the currently executing script
  3. finding the directory of the called script

(Some of these questions have been asked on SO, but have been closed as duplicates and redirected here.)

Caveats of Using __file__

For a module that you have imported:

import something
something.__file__ 

will return the absolute path of the module. However, given the folowing script foo.py:

#foo.py
print '__file__', __file__

Calling it with ‘python foo.py’ Will return simply ‘foo.py’. If you add a shebang:

#!/usr/bin/python 
#foo.py
print '__file__', __file__

and call it using ./foo.py, it will return ‘./foo.py’. Calling it from a different directory, (eg put foo.py in directory bar), then calling either

python bar/foo.py

or adding a shebang and executing the file directly:

bar/foo.py

will return ‘bar/foo.py’ (the relative path).

Finding the directory

Now going from there to get the directory, os.path.dirname(__file__) can also be tricky. At least on my system, it returns an empty string if you call it from the same directory as the file. ex.

# foo.py
import os
print '__file__ is:', __file__
print 'os.path.dirname(__file__) is:', os.path.dirname(__file__)

will output:

__file__ is: foo.py
os.path.dirname(__file__) is: 

In other words, it returns an empty string, so this does not seem reliable if you want to use it for the current file (as opposed to the file of an imported module). To get around this, you can wrap it in a call to abspath:

# foo.py
import os
print 'os.path.abspath(__file__) is:', os.path.abspath(__file__)
print 'os.path.dirname(os.path.abspath(__file__)) is:', os.path.dirname(os.path.abspath(__file__))

which outputs something like:

os.path.abspath(__file__) is: /home/user/bar/foo.py
os.path.dirname(os.path.abspath(__file__)) is: /home/user/bar

Note that abspath() does NOT resolve symlinks. If you want to do this, use realpath() instead. For example, making a symlink file_import_testing_link pointing to file_import_testing.py, with the following content:

import os
print 'abspath(__file__)',os.path.abspath(__file__)
print 'realpath(__file__)',os.path.realpath(__file__)

executing will print absolute paths something like:

abspath(__file__) /home/user/file_test_link
realpath(__file__) /home/user/file_test.py

file_import_testing_link -> file_import_testing.py

Using inspect

@SummerBreeze mentions using the inspect module.

This seems to work well, and is quite concise, for imported modules:

import os
import inspect
print 'inspect.getfile(os) is:', inspect.getfile(os)

obediently returns the absolute path. However for finding the path of the currently executing script, I did not see a way to use it.


回答 4

我不明白为什么没有人在谈论这个,但是对我来说,最简单的解决方案是使用imp.find_module(“ modulename”)在这里的文档):

import imp
imp.find_module("os")

它给出一个元组,路径在第二个位置:

(<open file '/usr/lib/python2.7/os.py', mode 'U' at 0x7f44528d7540>,
'/usr/lib/python2.7/os.py',
('.py', 'U', 1))

与“检查”方法相比,此方法的优势在于您无需导入模块即可使其工作,并且可以在输入中使用字符串。例如,在检查另一个脚本中调用的模块时很有用。

编辑

在python3中,importlib模块应该执行以下操作:

的文档importlib.util.find_spec

返回指定模块的规格。

首先,检查sys.modules以查看模块是否已经导入。如果是这样,则为sys.modules [name]。规格返回。如果恰好将其设置为“无”,则引发ValueError。如果该模块不在sys.modules中,则在sys.meta_path中搜索一个合适的规范,并为发现者提供’path’的值。如果找不到规范,则不返回任何内容。

如果名称是子模块的名称(包含点),则将自动导入父模块。

名称和包参数与importlib.import_module()相同。换句话说,相对的模块名称(带有前导点)起作用。

I don’t get why no one is talking about this, but to me the simplest solution is using imp.find_module(“modulename”) (documentation here):

import imp
imp.find_module("os")

It gives a tuple with the path in second position:

(<open file '/usr/lib/python2.7/os.py', mode 'U' at 0x7f44528d7540>,
'/usr/lib/python2.7/os.py',
('.py', 'U', 1))

The advantage of this method over the “inspect” one is that you don’t need to import the module to make it work, and you can use a string in input. Useful when checking modules called in another script for example.

EDIT:

In python3, importlib module should do:

Doc of importlib.util.find_spec:

Return the spec for the specified module.

First, sys.modules is checked to see if the module was already imported. If so, then sys.modules[name].spec is returned. If that happens to be set to None, then ValueError is raised. If the module is not in sys.modules, then sys.meta_path is searched for a suitable spec with the value of ‘path’ given to the finders. None is returned if no spec could be found.

If the name is for submodule (contains a dot), the parent module is automatically imported.

The name and package arguments work the same as importlib.import_module(). In other words, relative module names (with leading dots) work.


回答 5

这是微不足道的。

每个模块都有一个__file__变量,显示当前位置的相对路径。

因此,获取模块通知目录的方法很简单:

os.path.dirname(__file__)

This was trivial.

Each module has a __file__ variable that shows its relative path from where you are right now.

Therefore, getting a directory for the module to notify it is simple as:

os.path.dirname(__file__)

回答 6

import os
path = os.path.abspath(__file__)
dir_path = os.path.dirname(path)
import os
path = os.path.abspath(__file__)
dir_path = os.path.dirname(path)

回答 7

import module
print module.__path__

程序包支持另一个特殊属性__path__。它被初始化为一个列表,其中包含__init__.py执行该文件中的代码之前包含软件包目录的目录的名称。这个变量可以修改;这样做会影响以后对包中包含的模块和子包的搜索。

尽管通常不需要此功能,但可以使用它扩展软件包中的模块集。

资源

import module
print module.__path__

Packages support one more special attribute, __path__. This is initialized to be a list containing the name of the directory holding the package’s __init__.py before the code in that file is executed. This variable can be modified; doing so affects future searches for modules and subpackages contained in the package.

While this feature is not often needed, it can be used to extend the set of modules found in a package.

Source


回答 8

命令行实用程序

您可以将其调整为命令行实用程序,

python-which <package name>

在此处输入图片说明


创建 /usr/local/bin/python-which

#!/usr/bin/env python

import importlib
import os
import sys

args = sys.argv[1:]
if len(args) > 0:
    module = importlib.import_module(args[0])
    print os.path.dirname(module.__file__)

使它可执行

sudo chmod +x /usr/local/bin/python-which

Command Line Utility

You can tweak it to a command line utility,

python-which <package name>

enter image description here


Create /usr/local/bin/python-which

#!/usr/bin/env python

import importlib
import os
import sys

args = sys.argv[1:]
if len(args) > 0:
    module = importlib.import_module(args[0])
    print os.path.dirname(module.__file__)

Make it executable

sudo chmod +x /usr/local/bin/python-which

回答 9

因此,我花了大量时间尝试使用py2exe来执行此操作。问题是要获取脚本的基本文件夹,而不管它是作为python脚本还是作为py2exe可执行文件运行。不管是从当前文件夹,另一个文件夹还是从系统路径运行(这是最困难的),它都可以正常运行。

最终,我使用了这种方法,使用sys.frozen作为在py2exe中运行的指标:

import os,sys
if hasattr(sys,'frozen'): # only when running in py2exe this exists
    base = sys.prefix
else: # otherwise this is a regular python script
    base = os.path.dirname(os.path.realpath(__file__))

So I spent a fair amount of time trying to do this with py2exe The problem was to get the base folder of the script whether it was being run as a python script or as a py2exe executable. Also to have it work whether it was being run from the current folder, another folder or (this was the hardest) from the system’s path.

Eventually I used this approach, using sys.frozen as an indicator of running in py2exe:

import os,sys
if hasattr(sys,'frozen'): # only when running in py2exe this exists
    base = sys.prefix
else: # otherwise this is a regular python script
    base = os.path.dirname(os.path.realpath(__file__))

回答 10

您可以导入模块,然后点击其名称,然后获取其完整路径

>>> import os
>>> os
<module 'os' from 'C:\\Users\\Hassan Ashraf\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\os.py'>
>>>

you can just import your module then hit its name and you’ll get its full path

>>> import os
>>> os
<module 'os' from 'C:\\Users\\Hassan Ashraf\\AppData\\Local\\Programs\\Python\\Python36-32\\lib\\os.py'>
>>>

回答 11

如果要从包的任何模块中检索包的根路径,则可以进行以下工作(在Python 3.6上测试):

from . import __path__ as ROOT_PATH
print(ROOT_PATH)

__init__.py路径也可以通过使用__file__代替。

希望这可以帮助!

If you want to retrieve the package’s root path from any of its modules, the following works (tested on Python 3.6):

from . import __path__ as ROOT_PATH
print(ROOT_PATH)

The main __init__.py path can also be referenced by using __file__ instead.

Hope this helps!


回答 12

如果使用的唯一警告__file__是当前相对目录为空(即,当脚本从脚本所在的同一目录运行时),那么一个简单的解决方案是:

import os.path
mydir = os.path.dirname(__file__) or '.'
full  = os.path.abspath(mydir)
print __file__, mydir, full

结果:

$ python teste.py 
teste.py . /home/user/work/teste

诀窍是在or '.'dirname()调用。它将dir设置为.,表示当前目录并且是任何与路径相关的函数的有效目录。

因此,abspath()并不是真正需要使用。但是,如果仍然使用它,就不需要技巧了:abspath()接受空白路径并将其正确解释为当前目录。

If the only caveat of using __file__ is when current, relative directory is blank (ie, when running as a script from the same directory where the script is), then a trivial solution is:

import os.path
mydir = os.path.dirname(__file__) or '.'
full  = os.path.abspath(mydir)
print __file__, mydir, full

And the result:

$ python teste.py 
teste.py . /home/user/work/teste

The trick is in or '.' after the dirname() call. It sets the dir as ., which means current directory and is a valid directory for any path-related function.

Thus, using abspath() is not truly needed. But if you use it anyway, the trick is not needed: abspath() accepts blank paths and properly interprets it as the current directory.


回答 13

我想为一个常见的场景(在Python 3中)做出贡献,并探索一些实现它的方法。

内置函数open()接受相对路径或绝对路径作为其第一个参数。但是,相对路径被视为相对于当前工作目录的相对路径,因此建议将绝对路径传递给文件。

简而言之,如果使用以下代码运行脚本文件,则不能保证example.txt将在脚本文件所在的目录中创建该文件:

with open('example.txt', 'w'):
    pass

要修复此代码,我们需要获取脚本的路径并将其设为绝对路径。为了确保路径是绝对的,我们只需使用os.path.realpath()函数。要获取脚本的路径,有几个常用函数可以返回各种路径结果:

  • os.getcwd()
  • os.path.realpath('example.txt')
  • sys.argv[0]
  • __file__

os.getcwd()os.path.realpath()这两个函数都基于当前工作目录返回路径结果。通常不是我们想要的。sys.argv列表的第一个元素是根脚本(运行的脚本)的路径,无论您是在根脚本本身还是在其任何模块中调用列表。在某些情况下可能会派上用场。该__file__变量包含从它被称为模块的路径。


以下代码example.txt在脚本所在的目录中正确创建了一个文件:

filedir = os.path.dirname(os.path.realpath(__file__))
filepath = os.path.join(filedir, 'example.txt')

with open(filepath, 'w'):
    pass

I’d like to contribute with one common scenario (in Python 3) and explore a few approaches to it.

The built-in function open() accepts either relative or absolute path as its first argument. The relative path is treated as relative to the current working directory though so it is recommended to pass the absolute path to the file.

Simply said, if you run a script file with the following code, it is not guaranteed that the example.txt file will be created in the same directory where the script file is located:

with open('example.txt', 'w'):
    pass

To fix this code we need to get the path to the script and make it absolute. To ensure the path to be absolute we simply use the os.path.realpath() function. To get the path to the script there are several common functions that return various path results:

  • os.getcwd()
  • os.path.realpath('example.txt')
  • sys.argv[0]
  • __file__

Both functions os.getcwd() and os.path.realpath() return path results based on the current working directory. Generally not what we want. The first element of the sys.argv list is the path of the root script (the script you run) regardless of whether you call the list in the root script itself or in any of its modules. It might come handy in some situations. The __file__ variable contains path of the module from which it has been called.


The following code correctly creates a file example.txt in the same directory where the script is located:

filedir = os.path.dirname(os.path.realpath(__file__))
filepath = os.path.join(filedir, 'example.txt')

with open(filepath, 'w'):
    pass

回答 14

如果您想从脚本中知道绝对路径,可以使用Path对象:

from pathlib import Path

print(Path().absolute())
print(Path().resolve('.'))
print(Path().cwd())

cwd()方法

返回代表当前目录的新路径对象(由os.getcwd()返回)

resolve()方法

使路径绝对,解决任何符号链接。返回一个新的路径对象:

If you would like to know absolute path from your script you can use Path object:

from pathlib import Path

print(Path().absolute())
print(Path().resolve('.'))
print(Path().cwd())

cwd() method

Return a new path object representing the current directory (as returned by os.getcwd())

resolve() method

Make the path absolute, resolving any symlinks. A new path object is returned:


回答 15

从python包的模块内部,我必须引用与包位于同一目录中的文件。例如

some_dir/
  maincli.py
  top_package/
    __init__.py
    level_one_a/
      __init__.py
      my_lib_a.py
      level_two/
        __init__.py
        hello_world.py
    level_one_b/
      __init__.py
      my_lib_b.py

因此,在上面,我必须从my_lib_a.py模块调用maincli.py,因为知道top_package和maincli.py在同一目录中。这是我到达maincli.py的路径:

import sys
import os
import imp


class ConfigurationException(Exception):
    pass


# inside of my_lib_a.py
def get_maincli_path():
    maincli_path = os.path.abspath(imp.find_module('maincli')[1])
    # top_package = __package__.split('.')[0]
    # mod = sys.modules.get(top_package)
    # modfile = mod.__file__
    # pkg_in_dir = os.path.dirname(os.path.dirname(os.path.abspath(modfile)))
    # maincli_path = os.path.join(pkg_in_dir, 'maincli.py')

    if not os.path.exists(maincli_path):
        err_msg = 'This script expects that "maincli.py" be installed to the '\
        'same directory: "{0}"'.format(maincli_path)
        raise ConfigurationException(err_msg)

    return maincli_path

基于PlasmaBinturong的发布,我修改了代码。

From within modules of a python package I had to refer to a file that resided in the same directory as package. Ex.

some_dir/
  maincli.py
  top_package/
    __init__.py
    level_one_a/
      __init__.py
      my_lib_a.py
      level_two/
        __init__.py
        hello_world.py
    level_one_b/
      __init__.py
      my_lib_b.py

So in above I had to call maincli.py from my_lib_a.py module knowing that top_package and maincli.py are in the same directory. Here’s how I get the path to maincli.py:

import sys
import os
import imp


class ConfigurationException(Exception):
    pass


# inside of my_lib_a.py
def get_maincli_path():
    maincli_path = os.path.abspath(imp.find_module('maincli')[1])
    # top_package = __package__.split('.')[0]
    # mod = sys.modules.get(top_package)
    # modfile = mod.__file__
    # pkg_in_dir = os.path.dirname(os.path.dirname(os.path.abspath(modfile)))
    # maincli_path = os.path.join(pkg_in_dir, 'maincli.py')

    if not os.path.exists(maincli_path):
        err_msg = 'This script expects that "maincli.py" be installed to the '\
        'same directory: "{0}"'.format(maincli_path)
        raise ConfigurationException(err_msg)

    return maincli_path

Based on posting by PlasmaBinturong I modified the code.


回答 16

如果您希望在“程序”中动态执行此操作,请尝试以下代码:
我的意思是,您可能不知道要对其“硬编码”的模块的确切名称。它可能是从列表中选择的,或者可能当前未运行以使用__file__。

(我知道,它将在Python 3中不起作用)

global modpath
modname = 'os' #This can be any module name on the fly
#Create a file called "modname.py"
f=open("modname.py","w")
f.write("import "+modname+"\n")
f.write("modpath = "+modname+"\n")
f.close()
#Call the file with execfile()
execfile('modname.py')
print modpath
<module 'os' from 'C:\Python27\lib\os.pyc'>

我试图摆脱“全局”问题,但发现无法正常工作的情况,我认为“ execfile()”可以在Python 3中进行仿真,因为它在程序中,因此可以轻松地放入方法或模块中重用。

If you wish to do this dynamically in a “program” try this code:
My point is, you may not know the exact name of the module to “hardcode” it. It may be selected from a list or may not be currently running to use __file__.

(I know, it will not work in Python 3)

global modpath
modname = 'os' #This can be any module name on the fly
#Create a file called "modname.py"
f=open("modname.py","w")
f.write("import "+modname+"\n")
f.write("modpath = "+modname+"\n")
f.close()
#Call the file with execfile()
execfile('modname.py')
print modpath
<module 'os' from 'C:\Python27\lib\os.pyc'>

I tried to get rid of the “global” issue but found cases where it did not work I think “execfile()” can be emulated in Python 3 Since this is in a program, it can easily be put in a method or module for reuse.


回答 17

如果您使用pip进行安装,则“ pip show”效果很好(“位置”)

$ pip show detectron2

Name: detectron2
Version: 0.1
Summary: Detectron2 is FAIR next-generation research platform for object detection and segmentation.
Home-page: https://github.com/facebookresearch/detectron2
Author: FAIR
Author-email: None
License: UNKNOWN
Location: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages
Requires: yacs, tabulate, tqdm, pydot, tensorboard, Pillow, termcolor, future, cloudpickle, matplotlib, fvcore

If you installed it using pip, “pip show” works great (‘Location’)

$ pip show detectron2

Name: detectron2
Version: 0.1
Summary: Detectron2 is FAIR next-generation research platform for object detection and segmentation.
Home-page: https://github.com/facebookresearch/detectron2
Author: FAIR
Author-email: None
License: UNKNOWN
Location: /home/ubuntu/anaconda3/envs/pytorch_p36/lib/python3.6/site-packages
Requires: yacs, tabulate, tqdm, pydot, tensorboard, Pillow, termcolor, future, cloudpickle, matplotlib, fvcore

回答 18

这是一个快速的bash脚本,以防对任何人有用。我只希望能够设置一个环境变量,以便可以pushd对代码进行设置。

#!/bin/bash
module=${1:?"I need a module name"}

python << EOI
import $module
import os
print os.path.dirname($module.__file__)
EOI

壳示例:

[root@sri-4625-0004 ~]# export LXML=$(get_python_path.sh lxml)
[root@sri-4625-0004 ~]# echo $LXML
/usr/lib64/python2.7/site-packages/lxml
[root@sri-4625-0004 ~]#

Here is a quick bash script in case it’s useful to anyone. I just want to be able to set an environment variable so that I can pushd to the code.

#!/bin/bash
module=${1:?"I need a module name"}

python << EOI
import $module
import os
print os.path.dirname($module.__file__)
EOI

Shell example:

[root@sri-4625-0004 ~]# export LXML=$(get_python_path.sh lxml)
[root@sri-4625-0004 ~]# echo $LXML
/usr/lib64/python2.7/site-packages/lxml
[root@sri-4625-0004 ~]#

如何用逗号将数字打印为千位分隔符?

问题:如何用逗号将数字打印为千位分隔符?

我正在尝试在Python 2.6.1中打印一个整数,并以逗号作为千位分隔符。例如,我要将数字显示12345671,234,567。我将如何去做呢?我在Google上看到了很多示例,但我正在寻找最简单的实用方法。

在句点和逗号之间进行决定不需要特定于区域设置。我希望尽可能简单一些。

I am trying to print an integer in Python 2.6.1 with commas as thousands separators. For example, I want to show the number 1234567 as 1,234,567. How would I go about doing this? I have seen many examples on Google, but I am looking for the simplest practical way.

It does not need to be locale-specific to decide between periods and commas. I would prefer something as simple as reasonably possible.


回答 0

不知道语言环境

'{:,}'.format(value)  # For Python ≥2.7
f'{value:,}'  # For Python ≥3.6

区域感知

import locale
locale.setlocale(locale.LC_ALL, '')  # Use '' for auto, or force e.g. to 'en_US.UTF-8'

'{:n}'.format(value)  # For Python ≥2.7
f'{value:n}'  # For Python ≥3.6

参考

每种格式规格的迷你语言

','选项表示千位分隔符使用逗号。对于可识别语言环境的分隔符,请改用'n'整数表示类型。

Locale unaware

'{:,}'.format(value)  # For Python ≥2.7
f'{value:,}'  # For Python ≥3.6

Locale aware

import locale
locale.setlocale(locale.LC_ALL, '')  # Use '' for auto, or force e.g. to 'en_US.UTF-8'

'{:n}'.format(value)  # For Python ≥2.7
f'{value:n}'  # For Python ≥3.6

Reference

Per Format Specification Mini-Language,

The ',' option signals the use of a comma for a thousands separator. For a locale aware separator, use the 'n' integer presentation type instead.


回答 1

我得到这个工作:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.format("%d", 1255000, grouping=True)
'1,255,000'

当然,您不需要国际化支持,但它清晰,简洁并且使用内置库。

PS“%d”是通常的%样式格式化程序。您只能有一个格式化程序,但是就字段宽度和精度设置而言,它可以是您所需的任何格式。

PPS如果您无法locale上班,建议您修改Mark的答案:

def intWithCommas(x):
    if type(x) not in [type(0), type(0L)]:
        raise TypeError("Parameter must be an integer.")
    if x < 0:
        return '-' + intWithCommas(-x)
    result = ''
    while x >= 1000:
        x, r = divmod(x, 1000)
        result = ",%03d%s" % (r, result)
    return "%d%s" % (x, result)

递归对于否定情况很有用,但是每个逗号一次递归对我来说似乎有点多余。

I got this to work:

>>> import locale
>>> locale.setlocale(locale.LC_ALL, 'en_US')
'en_US'
>>> locale.format("%d", 1255000, grouping=True)
'1,255,000'

Sure, you don’t need internationalization support, but it’s clear, concise, and uses a built-in library.

P.S. That “%d” is the usual %-style formatter. You can have only one formatter, but it can be whatever you need in terms of field width and precision settings.

P.P.S. If you can’t get locale to work, I’d suggest a modified version of Mark’s answer:

def intWithCommas(x):
    if type(x) not in [type(0), type(0L)]:
        raise TypeError("Parameter must be an integer.")
    if x < 0:
        return '-' + intWithCommas(-x)
    result = ''
    while x >= 1000:
        x, r = divmod(x, 1000)
        result = ",%03d%s" % (r, result)
    return "%d%s" % (x, result)

Recursion is useful for the negative case, but one recursion per comma seems a bit excessive to me.


回答 2

由于效率低下和可读性差,很难克服:

>>> import itertools
>>> s = '-1234567'
>>> ','.join(["%s%s%s" % (x[0], x[1] or '', x[2] or '') for x in itertools.izip_longest(s[::-1][::3], s[::-1][1::3], s[::-1][2::3])])[::-1].replace('-,','-')

For inefficiency and unreadability it’s hard to beat:

>>> import itertools
>>> s = '-1234567'
>>> ','.join(["%s%s%s" % (x[0], x[1] or '', x[2] or '') for x in itertools.izip_longest(s[::-1][::3], s[::-1][1::3], s[::-1][2::3])])[::-1].replace('-,','-')

回答 3

在删除无关部分并对其进行一些清理之后,这是区域设置代码:

(以下仅适用于整数)

def group(number):
    s = '%d' % number
    groups = []
    while s and s[-1].isdigit():
        groups.append(s[-3:])
        s = s[:-3]
    return s + ','.join(reversed(groups))

>>> group(-23432432434.34)
'-23,432,432,434'

这里已经有一些不错的答案。我只想添加此内容以供将来参考。在python 2.7中,将有一个用于千位分隔符的格式说明符。根据python文档,它像这样工作

>>> '{:20,.2f}'.format(f)
'18,446,744,073,709,551,616.00'

在python3.1中,您可以执行以下操作:

>>> format(1234567, ',d')
'1,234,567'

Here is the locale grouping code after removing irrelevant parts and cleaning it up a little:

(The following only works for integers)

def group(number):
    s = '%d' % number
    groups = []
    while s and s[-1].isdigit():
        groups.append(s[-3:])
        s = s[:-3]
    return s + ','.join(reversed(groups))

>>> group(-23432432434.34)
'-23,432,432,434'

There are already some good answers in here. I just want to add this for future reference. In python 2.7 there is going to be a format specifier for thousands separator. According to python docs it works like this

>>> '{:20,.2f}'.format(f)
'18,446,744,073,709,551,616.00'

In python3.1 you can do the same thing like this:

>>> format(1234567, ',d')
'1,234,567'

回答 4

令我惊讶的是,没有人提到您可以在Python 3.6中使用f字符串做到这一点,就像这样简单:

>>> num = 10000000
>>> print(f"{num:,}")
10,000,000

…冒号后面的部分是格式说明符。逗号是所需的分隔符,因此请f"{num:_}"使用下划线而不是逗号。

这等效于format(num, ",")用于旧版本的python 3。

I’m surprised that no one has mentioned that you can do this with f-strings in Python 3.6 as easy as this:

>>> num = 10000000
>>> print(f"{num:,}")
10,000,000

… where the part after the colon is the format specifier. The comma is the separator character you want, so f"{num:_}" uses underscores instead of a comma.

This is equivalent of using format(num, ",") for older versions of python 3.


回答 5

这是单行正则表达式替换:

re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%d" % val)

仅适用于非正式输出:

import re
val = 1234567890
re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%d" % val)
# Returns: '1,234,567,890'

val = 1234567890.1234567890
# Returns: '1,234,567,890'

或对于少于4位数字的浮点数,将格式说明符更改为%.3f

re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%.3f" % val)
# Returns: '1,234,567,890.123'

注意:超过三位的小数位数无法正常工作,因为它将尝试对小数部分进行分组:

re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%.5f" % val)
# Returns: '1,234,567,890.12,346'

怎么运行的

让我们分解一下:

re.sub(pattern, repl, string)

pattern = \
    "(\d)           # Find one digit...
     (?=            # that is followed by...
         (\d{3})+   # one or more groups of three digits...
         (?!\d)     # which are not followed by any more digits.
     )",

repl = \
    r"\1,",         # Replace that one digit by itself, followed by a comma,
                    # and continue looking for more matches later in the string.
                    # (re.sub() replaces all matches it finds in the input)

string = \
    "%d" % val      # Format the string as a decimal to begin with

Here’s a one-line regex replacement:

re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%d" % val)

Works only for inegral outputs:

import re
val = 1234567890
re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%d" % val)
# Returns: '1,234,567,890'

val = 1234567890.1234567890
# Returns: '1,234,567,890'

Or for floats with less than 4 digits, change the format specifier to %.3f:

re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%.3f" % val)
# Returns: '1,234,567,890.123'

NB: Doesn’t work correctly with more than three decimal digits as it will attempt to group the decimal part:

re.sub("(\d)(?=(\d{3})+(?!\d))", r"\1,", "%.5f" % val)
# Returns: '1,234,567,890.12,346'

How it works

Let’s break it down:

re.sub(pattern, repl, string)

pattern = \
    "(\d)           # Find one digit...
     (?=            # that is followed by...
         (\d{3})+   # one or more groups of three digits...
         (?!\d)     # which are not followed by any more digits.
     )",

repl = \
    r"\1,",         # Replace that one digit by itself, followed by a comma,
                    # and continue looking for more matches later in the string.
                    # (re.sub() replaces all matches it finds in the input)

string = \
    "%d" % val      # Format the string as a decimal to begin with

回答 6

这就是我为花车所做的。虽然,老实说,我不确定它适用于哪个版本-我使用的是2.7:

my_number = 4385893.382939491

my_string = '{:0,.2f}'.format(my_number)

回报:4,385,893.38

更新:我最近遇到了这种格式的问题(无法告诉您确切的原因),但是可以通过删除0:来解决它:

my_string = '{:,.2f}'.format(my_number)

This is what I do for floats. Although, honestly, I’m not sure which versions it works for – I’m using 2.7:

my_number = 4385893.382939491

my_string = '{:0,.2f}'.format(my_number)

Returns: 4,385,893.38

Update: I recently had an issue with this format (couldn’t tell you the exact reason), but was able to fix it by dropping the 0:

my_string = '{:,.2f}'.format(my_number)

回答 7

您也可以将其'{:n}'.format( value )用于语言环境。我认为这是语言环境解决方案的最简单方法。

有关更多信息,请thousandsPython DOC中搜索。

对于货币,您可以使用locale.currency,设置标志grouping

import locale

locale.setlocale( locale.LC_ALL, '' )
locale.currency( 1234567.89, grouping = True )

输出量

'Portuguese_Brazil.1252'
'R$ 1.234.567,89'

You can also use '{:n}'.format( value ) for a locale representation. I think this is the simpliest way for a locale solution.

For more information, search for thousands in Python DOC.

For currency, you can use locale.currency, setting the flag grouping:

Code

import locale

locale.setlocale( locale.LC_ALL, '' )
locale.currency( 1234567.89, grouping = True )

Output

'Portuguese_Brazil.1252'
'R$ 1.234.567,89'

回答 8

稍微扩大Ian Schneider的答案:

如果要使用自定义的千位分隔符,最简单的解决方案是:

'{:,}'.format(value).replace(',', your_custom_thousands_separator)

例子

'{:,.2f}'.format(123456789.012345).replace(',', ' ')

如果要这样的德语表示形式,它将变得更加复杂:

('{:,.2f}'.format(123456789.012345)
          .replace(',', ' ')  # 'save' the thousands separators 
          .replace('.', ',')  # dot to comma
          .replace(' ', '.')) # thousand separators to dot

Slightly expanding the answer of Ian Schneider:

If you want to use a custom thousands separator, the simplest solution is:

'{:,}'.format(value).replace(',', your_custom_thousands_separator)

Examples

'{:,.2f}'.format(123456789.012345).replace(',', ' ')

If you want the German representation like this, it gets a bit more complicated:

('{:,.2f}'.format(123456789.012345)
          .replace(',', ' ')  # 'save' the thousands separators 
          .replace('.', ',')  # dot to comma
          .replace(' ', '.')) # thousand separators to dot

回答 9

我确定必须有一个标准的库函数,但是尝试自己使用递归编写它很有趣,所以这是我想出的:

def intToStringWithCommas(x):
    if type(x) is not int and type(x) is not long:
        raise TypeError("Not an integer!")
    if x < 0:
        return '-' + intToStringWithCommas(-x)
    elif x < 1000:
        return str(x)
    else:
        return intToStringWithCommas(x / 1000) + ',' + '%03d' % (x % 1000)

话虽如此,如果其他人确实找到了一种标准方法,则应该改用该方法。

I’m sure there must be a standard library function for this, but it was fun to try to write it myself using recursion so here’s what I came up with:

def intToStringWithCommas(x):
    if type(x) is not int and type(x) is not long:
        raise TypeError("Not an integer!")
    if x < 0:
        return '-' + intToStringWithCommas(-x)
    elif x < 1000:
        return str(x)
    else:
        return intToStringWithCommas(x / 1000) + ',' + '%03d' % (x % 1000)

Having said that, if someone else does find a standard way to do it, you should use that instead.


回答 10

评论到activestate食谱498181,我对此进行了重新设计:

import re
def thous(x, sep=',', dot='.'):
    num, _, frac = str(x).partition(dot)
    num = re.sub(r'(\d{3})(?=\d)', r'\1'+sep, num[::-1])[::-1]
    if frac:
        num += dot + frac
    return num

它使用正则表达式功能:先行搜索,(?=\d)确保只有三个数字组成的组在其“后”有一个逗号。我说“之后”是因为此时字符串是反向的。

[::-1] 只是反转一个字符串。

From the comments to activestate recipe 498181 I reworked this:

import re
def thous(x, sep=',', dot='.'):
    num, _, frac = str(x).partition(dot)
    num = re.sub(r'(\d{3})(?=\d)', r'\1'+sep, num[::-1])[::-1]
    if frac:
        num += dot + frac
    return num

It uses the regular expressions feature: lookahead i.e. (?=\d) to make sure only groups of three digits that have a digit ‘after’ them get a comma. I say ‘after’ because the string is reverse at this point.

[::-1] just reverses a string.


回答 11

可接受的答案很好,但我实际上更喜欢format(number,',')。对我来说更容易解释和记住。

https://docs.python.org/3/library/functions.html#format

The accepted answer is fine, but I actually prefer format(number,','). Easier for me to interpret and remember.

https://docs.python.org/3/library/functions.html#format


回答 12

Python 3

整数(不带小数):

"{:,d}".format(1234567)

浮点数(带小数):

"{:,.2f}".format(1234567)

前面的数字f指定小数位数。

奖金

印度十万/克劳斯编号系统的快速启动器功能(12,34,567):

https://stackoverflow.com/a/44832241/4928578

Python 3

Integers (without decimal):

"{:,d}".format(1234567)

Floats (with decimal):

"{:,.2f}".format(1234567)

where the number before f specifies the number of decimal places.

Bonus

Quick-and-dirty starter function for the Indian lakhs/crores numbering system (12,34,567):

https://stackoverflow.com/a/44832241/4928578


回答 13

从Python 2.6版开始,您可以执行以下操作:

def format_builtin(n):
    return format(n, ',')

对于2.6以下的Python版本,仅供参考,这里有2个手动解决方案,它们将浮点数转换为整数,但是负数可以正常工作:

def format_number_using_lists(number):
    string = '%d' % number
    result_list = list(string)
    indexes = range(len(string))
    for index in indexes[::-3][1:]:
        if result_list[index] != '-':
            result_list.insert(index+1, ',')
    return ''.join(result_list)

这里需要注意的几件事:

  • 这行代码:string =’%d’%number数字很好地转换为字符串,它支持负数,并从浮点数中除去小数,使它们成为整数;
  • 这个slice的索引[::-3]从末尾开始返回每个第三个项目,因此我使用了另一个切片[1:]删除了最后一个项目,因为在最后一个数字之后不需要逗号;
  • 如果l [index]!=’-‘用于支持负数,则此条件,不要在减号后插入逗号。

还有一个更核心的版本:

def format_number_using_generators_and_list_comprehensions(number):
    string = '%d' % number
    generator = reversed( 
        [
            value+',' if (index!=0 and value!='-' and index%3==0) else value
            for index,value in enumerate(reversed(string))
        ]
    )
    return ''.join(generator)

from Python version 2.6 you can do this:

def format_builtin(n):
    return format(n, ',')

For Python versions < 2.6 and just for your information, here are 2 manual solutions, they turn floats to ints but negative numbers work correctly:

def format_number_using_lists(number):
    string = '%d' % number
    result_list = list(string)
    indexes = range(len(string))
    for index in indexes[::-3][1:]:
        if result_list[index] != '-':
            result_list.insert(index+1, ',')
    return ''.join(result_list)

few things to notice here:

  • this line: string = ‘%d’ % number beautifully converts a number to a string, it supports negatives and it drops fractions from floats, making them ints;
  • this slice indexes[::-3] returns each third item starting from the end, so I used another slice [1:] to remove the very last item cuz I don’t need a comma after the last number;
  • this conditional if l[index] != ‘-‘ is being used to support negative numbers, do not insert a comma after the minus sign.

And a more hardcore version:

def format_number_using_generators_and_list_comprehensions(number):
    string = '%d' % number
    generator = reversed( 
        [
            value+',' if (index!=0 and value!='-' and index%3==0) else value
            for index,value in enumerate(reversed(string))
        ]
    )
    return ''.join(generator)

回答 14

我是Python初学者,但是经验丰富的程序员。我有Python 3.5,所以我只能使用逗号,但这仍然是一个有趣的编程练习。考虑无符号整数的情况。添加数千个分隔符的最易读的Python程序似乎是:

def add_commas(instr):
    out = [instr[0]]
    for i in range(1, len(instr)):
        if (len(instr) - i) % 3 == 0:
            out.append(',')
        out.append(instr[i])
    return ''.join(out)

也可以使用列表理解:

add_commas(instr):
    rng = reversed(range(1, len(instr) + (len(instr) - 1)//3 + 1))
    out = [',' if j%4 == 0 else instr[-(j - j//4)] for j in rng]
    return ''.join(out)

它比较短,可能只有一个衬里,但是您必须进行一些心理体操才能理解它的工作原理。在这两种情况下,我们得到:

for i in range(1, 11):
    instr = '1234567890'[:i]
    print(instr, add_commas(instr))
1 1
12 12
123 123
1234 1,234
12345 12,345
123456 123,456
1234567 1,234,567
12345678 12,345,678
123456789 123,456,789
1234567890 1,234,567,890

如果您想了解该程序,则第一个版本是更明智的选择。

I am a Python beginner, but an experienced programmer. I have Python 3.5, so I can just use the comma, but this is nonetheless an interesting programming exercise. Consider the case of an unsigned integer. The most readable Python program for adding thousands separators appears to be:

def add_commas(instr):
    out = [instr[0]]
    for i in range(1, len(instr)):
        if (len(instr) - i) % 3 == 0:
            out.append(',')
        out.append(instr[i])
    return ''.join(out)

It is also possible to use a list comprehension:

add_commas(instr):
    rng = reversed(range(1, len(instr) + (len(instr) - 1)//3 + 1))
    out = [',' if j%4 == 0 else instr[-(j - j//4)] for j in rng]
    return ''.join(out)

This is shorter, and could be a one liner, but you will have to do some mental gymnastics to understand why it works. In both cases we get:

for i in range(1, 11):
    instr = '1234567890'[:i]
    print(instr, add_commas(instr))
1 1
12 12
123 123
1234 1,234
12345 12,345
123456 123,456
1234567 1,234,567
12345678 12,345,678
123456789 123,456,789
1234567890 1,234,567,890

The first version is the more sensible choice, if you want the program to be understood.


回答 15

这也是一种适用于浮点数的方法:

def float2comma(f):
    s = str(abs(f)) # Convert to a string
    decimalposition = s.find(".") # Look for decimal point
    if decimalposition == -1:
        decimalposition = len(s) # If no decimal, then just work from the end
    out = "" 
    for i in range(decimalposition+1, len(s)): # do the decimal
        if not (i-decimalposition-1) % 3 and i-decimalposition-1: out = out+","
        out = out+s[i]      
    if len(out):
        out = "."+out # add the decimal point if necessary
    for i in range(decimalposition-1,-1,-1): # working backwards from decimal point
        if not (decimalposition-i-1) % 3 and decimalposition-i-1: out = ","+out
        out = s[i]+out      
    if f < 0:
        out = "-"+out
    return out

用法示例:

>>> float2comma(10000.1111)
'10,000.111,1'
>>> float2comma(656565.122)
'656,565.122'
>>> float2comma(-656565.122)
'-656,565.122'

Here’s one that works for floats too:

def float2comma(f):
    s = str(abs(f)) # Convert to a string
    decimalposition = s.find(".") # Look for decimal point
    if decimalposition == -1:
        decimalposition = len(s) # If no decimal, then just work from the end
    out = "" 
    for i in range(decimalposition+1, len(s)): # do the decimal
        if not (i-decimalposition-1) % 3 and i-decimalposition-1: out = out+","
        out = out+s[i]      
    if len(out):
        out = "."+out # add the decimal point if necessary
    for i in range(decimalposition-1,-1,-1): # working backwards from decimal point
        if not (decimalposition-i-1) % 3 and decimalposition-i-1: out = ","+out
        out = s[i]+out      
    if f < 0:
        out = "-"+out
    return out

Usage Example:

>>> float2comma(10000.1111)
'10,000.111,1'
>>> float2comma(656565.122)
'656,565.122'
>>> float2comma(-656565.122)
'-656,565.122'

回答 16

一种适用于Python 2.5+和Python 3的衬板(仅适用于正整数):

''.join(reversed([x + (',' if i and not i % 3 else '') for i, x in enumerate(reversed(str(1234567)))]))

One liner for Python 2.5+ and Python 3 (positive int only):

''.join(reversed([x + (',' if i and not i % 3 else '') for i, x in enumerate(reversed(str(1234567)))]))

回答 17

通用解决方案

我在上一个投票最高的答案中发现了点分隔符的一些问题。我设计了一个通用解决方案,您可以在不修改语言环境的情况下将任何内容用作千位分隔符。我知道这不是最优雅的解决方案,但可以完成工作。随时进行改进!

def format_integer(number, thousand_separator='.'):
    def reverse(string):
        string = "".join(reversed(string))
        return string

    s = reverse(str(number))
    count = 0
    result = ''
    for char in s:
        count = count + 1
        if count % 3 == 0:
            if len(s) == count:
                result = char + result
            else:
                result = thousand_separator + char + result
        else:
            result = char + result
    return result


print(format_integer(50))
# 50
print(format_integer(500))
# 500
print(format_integer(50000))
# 50.000
print(format_integer(50000000))
# 50.000.000

Universal solution

I have found some issues with the dot separator in the previous top voted answers. I have designed a universal solution where you can use whatever you want as a thousand separator without modifying the locale. I know it’s not the most elegant solution, but it gets the job done. Feel free to improve it !

def format_integer(number, thousand_separator='.'):
    def reverse(string):
        string = "".join(reversed(string))
        return string

    s = reverse(str(number))
    count = 0
    result = ''
    for char in s:
        count = count + 1
        if count % 3 == 0:
            if len(s) == count:
                result = char + result
            else:
                result = thousand_separator + char + result
        else:
            result = char + result
    return result


print(format_integer(50))
# 50
print(format_integer(500))
# 500
print(format_integer(50000))
# 50.000
print(format_integer(50000000))
# 50.000.000

回答 18

这与逗号一起赚钱

def format_money(money, presym='$', postsym=''):
    fmt = '%0.2f' % money
    dot = string.find(fmt, '.')
    ret = []
    if money < 0 :
        ret.append('(')
        p0 = 1
    else :
        p0 = 0
    ret.append(presym)
    p1 = (dot-p0) % 3 + p0
    while True :
        ret.append(fmt[p0:p1])
        if p1 == dot : break
        ret.append(',')
        p0 = p1
        p1 += 3
    ret.append(fmt[dot:])   # decimals
    ret.append(postsym)
    if money < 0 : ret.append(')')
    return ''.join(ret)

This does money along with the commas

def format_money(money, presym='$', postsym=''):
    fmt = '%0.2f' % money
    dot = string.find(fmt, '.')
    ret = []
    if money < 0 :
        ret.append('(')
        p0 = 1
    else :
        p0 = 0
    ret.append(presym)
    p1 = (dot-p0) % 3 + p0
    while True :
        ret.append(fmt[p0:p1])
        if p1 == dot : break
        ret.append(',')
        p0 = p1
        p1 += 3
    ret.append(fmt[dot:])   # decimals
    ret.append(postsym)
    if money < 0 : ret.append(')')
    return ''.join(ret)

回答 19

我有此代码的python 2和python 3版本。我知道这个问题是针对python 2提出的,但是现在(8年后,大声笑)人们可能会使用python3。Python

3代码:

import random
number = str(random.randint(1, 10000000))
comma_placement = 4
print('The original number is: {}. '.format(number))
while True:
    if len(number) % 3 == 0:
        for i in range(0, len(number) // 3 - 1):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
            comma_placement = comma_placement + 4
    else:
        for i in range(0, len(number) // 3):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
    break
print('The new and improved number is: {}'.format(number))        


Python 2代码:(编辑。python2代码无法正常工作。我认为语法是不同的)。

import random
number = str(random.randint(1, 10000000))
comma_placement = 4
print 'The original number is: %s.' % (number)
while True:
    if len(number) % 3 == 0:
        for i in range(0, len(number) // 3 - 1):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
            comma_placement = comma_placement + 4
    else:
        for i in range(0, len(number) // 3):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
    break
print 'The new and improved number is: %s.' % (number) 

I have a python 2 and python 3 version of this code. I know that the question was asked for python 2 but now (8 years later lol) people will probably be using python 3.

Python 3 Code:

import random
number = str(random.randint(1, 10000000))
comma_placement = 4
print('The original number is: {}. '.format(number))
while True:
    if len(number) % 3 == 0:
        for i in range(0, len(number) // 3 - 1):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
            comma_placement = comma_placement + 4
    else:
        for i in range(0, len(number) // 3):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
    break
print('The new and improved number is: {}'.format(number))        


Python 2 Code: (Edit. The python 2 code isn’t working. I am thinking that the syntax is different).

import random
number = str(random.randint(1, 10000000))
comma_placement = 4
print 'The original number is: %s.' % (number)
while True:
    if len(number) % 3 == 0:
        for i in range(0, len(number) // 3 - 1):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
            comma_placement = comma_placement + 4
    else:
        for i in range(0, len(number) // 3):
            number = number[0:len(number) - comma_placement + 1] + ',' + number[len(number) - comma_placement + 1:]
    break
print 'The new and improved number is: %s.' % (number) 

回答 20

我正在使用python 2.5,因此无法访问内置格式。

我查看了Django代码intcomma(下面的代码中的intcomma_recurs),发现它效率低下,因为它是递归的,并且每次运行时都编译正则表达式也不是一件好事。这不是必需的“问题”,因为django并不是真的专注于这种低级性能。另外,我期望性能差异达到10倍,但仅慢3倍。

出于好奇,我实现了一些intcomma版本,以查看使用正则表达式时的性能优势。我的测试数据总结出此任务有一点优势,但令人惊讶的是根本没有优势。

我也很高兴看到我所怀疑的:在无正则表达式的情况下,不需要使用反向xrange方法,但这确实使代码看起来更好一点,但性能却降低了10%。

另外,我假设您要传递的是一个字符串,看起来有点像一个数字。否则结果不确定。

from __future__ import with_statement
from contextlib import contextmanager
import re,time

re_first_num = re.compile(r"\d")
def intcomma_noregex(value):
    end_offset, start_digit, period = len(value),re_first_num.search(value).start(),value.rfind('.')
    if period == -1:
        period=end_offset
    segments,_from_index,leftover = [],0,(period-start_digit) % 3
    for _index in xrange(start_digit+3 if not leftover else start_digit+leftover,period,3):
        segments.append(value[_from_index:_index])
        _from_index=_index
    if not segments:
        return value
    segments.append(value[_from_index:])
    return ','.join(segments)

def intcomma_noregex_reversed(value):
    end_offset, start_digit, period = len(value),re_first_num.search(value).start(),value.rfind('.')
    if period == -1:
        period=end_offset
    _from_index,segments = end_offset,[]
    for _index in xrange(period-3,start_digit,-3):
        segments.append(value[_index:_from_index])
        _from_index=_index
    if not segments:
        return value
    segments.append(value[:_from_index])
    return ','.join(reversed(segments))

re_3digits = re.compile(r'(?<=\d)\d{3}(?!\d)')
def intcomma(value):
    segments,last_endoffset=[],len(value)
    while last_endoffset > 3:
        digit_group = re_3digits.search(value,0,last_endoffset)
        if not digit_group:
            break
        segments.append(value[digit_group.start():last_endoffset])
        last_endoffset=digit_group.start()
    if not segments:
        return value
    if last_endoffset:
        segments.append(value[:last_endoffset])
    return ','.join(reversed(segments))

def intcomma_recurs(value):
    """
    Converts an integer to a string containing commas every three digits.
    For example, 3000 becomes '3,000' and 45000 becomes '45,000'.
    """
    new = re.sub("^(-?\d+)(\d{3})", '\g<1>,\g<2>', str(value))
    if value == new:
        return new
    else:
        return intcomma(new)

@contextmanager
def timed(save_time_func):
    begin=time.time()
    try:
        yield
    finally:
        save_time_func(time.time()-begin)

def testset_xsimple(func):
    func('5')

def testset_simple(func):
    func('567')

def testset_onecomma(func):
    func('567890')

def testset_complex(func):
    func('-1234567.024')

def testset_average(func):
    func('-1234567.024')
    func('567')
    func('5674')

if __name__ == '__main__':
    print 'Test results:'
    for test_data in ('5','567','1234','1234.56','-253892.045'):
        for func in (intcomma,intcomma_noregex,intcomma_noregex_reversed,intcomma_recurs):
            print func.__name__,test_data,func(test_data)
    times=[]
    def overhead(x):
        pass
    for test_run in xrange(1,4):
        for func in (intcomma,intcomma_noregex,intcomma_noregex_reversed,intcomma_recurs,overhead):
            for testset in (testset_xsimple,testset_simple,testset_onecomma,testset_complex,testset_average):
                for x in xrange(1000): # prime the test
                    testset(func)
                with timed(lambda x:times.append(((test_run,func,testset),x))):
                    for x in xrange(50000):
                        testset(func)
    for (test_run,func,testset),_delta in times:
        print test_run,func.__name__,testset.__name__,_delta

这是测试结果:

intcomma 5 5
intcomma_noregex 5 5
intcomma_noregex_reversed 5 5
intcomma_recurs 5 5
intcomma 567 567
intcomma_noregex 567 567
intcomma_noregex_reversed 567 567
intcomma_recurs 567 567
intcomma 1234 1,234
intcomma_noregex 1234 1,234
intcomma_noregex_reversed 1234 1,234
intcomma_recurs 1234 1,234
intcomma 1234.56 1,234.56
intcomma_noregex 1234.56 1,234.56
intcomma_noregex_reversed 1234.56 1,234.56
intcomma_recurs 1234.56 1,234.56
intcomma -253892.045 -253,892.045
intcomma_noregex -253892.045 -253,892.045
intcomma_noregex_reversed -253892.045 -253,892.045
intcomma_recurs -253892.045 -253,892.045
1 intcomma testset_xsimple 0.0410001277924
1 intcomma testset_simple 0.0369999408722
1 intcomma testset_onecomma 0.213000059128
1 intcomma testset_complex 0.296000003815
1 intcomma testset_average 0.503000020981
1 intcomma_noregex testset_xsimple 0.134000062943
1 intcomma_noregex testset_simple 0.134999990463
1 intcomma_noregex testset_onecomma 0.190999984741
1 intcomma_noregex testset_complex 0.209000110626
1 intcomma_noregex testset_average 0.513000011444
1 intcomma_noregex_reversed testset_xsimple 0.124000072479
1 intcomma_noregex_reversed testset_simple 0.12700009346
1 intcomma_noregex_reversed testset_onecomma 0.230000019073
1 intcomma_noregex_reversed testset_complex 0.236999988556
1 intcomma_noregex_reversed testset_average 0.56299996376
1 intcomma_recurs testset_xsimple 0.348000049591
1 intcomma_recurs testset_simple 0.34600019455
1 intcomma_recurs testset_onecomma 0.625
1 intcomma_recurs testset_complex 0.773999929428
1 intcomma_recurs testset_average 1.6890001297
1 overhead testset_xsimple 0.0179998874664
1 overhead testset_simple 0.0190000534058
1 overhead testset_onecomma 0.0190000534058
1 overhead testset_complex 0.0190000534058
1 overhead testset_average 0.0309998989105
2 intcomma testset_xsimple 0.0360000133514
2 intcomma testset_simple 0.0369999408722
2 intcomma testset_onecomma 0.207999944687
2 intcomma testset_complex 0.302000045776
2 intcomma testset_average 0.523000001907
2 intcomma_noregex testset_xsimple 0.139999866486
2 intcomma_noregex testset_simple 0.141000032425
2 intcomma_noregex testset_onecomma 0.203999996185
2 intcomma_noregex testset_complex 0.200999975204
2 intcomma_noregex testset_average 0.523000001907
2 intcomma_noregex_reversed testset_xsimple 0.130000114441
2 intcomma_noregex_reversed testset_simple 0.129999876022
2 intcomma_noregex_reversed testset_onecomma 0.236000061035
2 intcomma_noregex_reversed testset_complex 0.241999864578
2 intcomma_noregex_reversed testset_average 0.582999944687
2 intcomma_recurs testset_xsimple 0.351000070572
2 intcomma_recurs testset_simple 0.352999925613
2 intcomma_recurs testset_onecomma 0.648999929428
2 intcomma_recurs testset_complex 0.808000087738
2 intcomma_recurs testset_average 1.81900000572
2 overhead testset_xsimple 0.0189998149872
2 overhead testset_simple 0.0189998149872
2 overhead testset_onecomma 0.0190000534058
2 overhead testset_complex 0.0179998874664
2 overhead testset_average 0.0299999713898
3 intcomma testset_xsimple 0.0360000133514
3 intcomma testset_simple 0.0360000133514
3 intcomma testset_onecomma 0.210000038147
3 intcomma testset_complex 0.305999994278
3 intcomma testset_average 0.493000030518
3 intcomma_noregex testset_xsimple 0.131999969482
3 intcomma_noregex testset_simple 0.136000156403
3 intcomma_noregex testset_onecomma 0.192999839783
3 intcomma_noregex testset_complex 0.202000141144
3 intcomma_noregex testset_average 0.509999990463
3 intcomma_noregex_reversed testset_xsimple 0.125999927521
3 intcomma_noregex_reversed testset_simple 0.126999855042
3 intcomma_noregex_reversed testset_onecomma 0.235999822617
3 intcomma_noregex_reversed testset_complex 0.243000030518
3 intcomma_noregex_reversed testset_average 0.56200003624
3 intcomma_recurs testset_xsimple 0.337000131607
3 intcomma_recurs testset_simple 0.342000007629
3 intcomma_recurs testset_onecomma 0.609999895096
3 intcomma_recurs testset_complex 0.75
3 intcomma_recurs testset_average 1.68300008774
3 overhead testset_xsimple 0.0189998149872
3 overhead testset_simple 0.018000125885
3 overhead testset_onecomma 0.018000125885
3 overhead testset_complex 0.0179998874664
3 overhead testset_average 0.0299999713898

I’m using python 2.5 so I don’t have access to the built-in formatting.

I looked at the Django code intcomma (intcomma_recurs in code below) and realized it’s inefficient, because it’s recursive and also compiling the regex on every run is not a good thing either. This is not necessary an ‘issue’ as django isn’t really THAT focused on this kind of low-level performance. Also, I was expecting a factor of 10 difference in performance, but it’s only 3 times slower.

Out of curiosity I implemented a few versions of intcomma to see what the performance advantages are when using regex. My test data concludes a slight advantage for this task, but surprisingly not much at all.

I also was pleased to see what I suspected: using the reverse xrange approach is unnecessary in the no-regex case, but it does make the code look slightly better at the cost of ~10% performance.

Also, I assume what you’re passing in is a string and looks somewhat like a number. Results undetermined otherwise.

from __future__ import with_statement
from contextlib import contextmanager
import re,time

re_first_num = re.compile(r"\d")
def intcomma_noregex(value):
    end_offset, start_digit, period = len(value),re_first_num.search(value).start(),value.rfind('.')
    if period == -1:
        period=end_offset
    segments,_from_index,leftover = [],0,(period-start_digit) % 3
    for _index in xrange(start_digit+3 if not leftover else start_digit+leftover,period,3):
        segments.append(value[_from_index:_index])
        _from_index=_index
    if not segments:
        return value
    segments.append(value[_from_index:])
    return ','.join(segments)

def intcomma_noregex_reversed(value):
    end_offset, start_digit, period = len(value),re_first_num.search(value).start(),value.rfind('.')
    if period == -1:
        period=end_offset
    _from_index,segments = end_offset,[]
    for _index in xrange(period-3,start_digit,-3):
        segments.append(value[_index:_from_index])
        _from_index=_index
    if not segments:
        return value
    segments.append(value[:_from_index])
    return ','.join(reversed(segments))

re_3digits = re.compile(r'(?<=\d)\d{3}(?!\d)')
def intcomma(value):
    segments,last_endoffset=[],len(value)
    while last_endoffset > 3:
        digit_group = re_3digits.search(value,0,last_endoffset)
        if not digit_group:
            break
        segments.append(value[digit_group.start():last_endoffset])
        last_endoffset=digit_group.start()
    if not segments:
        return value
    if last_endoffset:
        segments.append(value[:last_endoffset])
    return ','.join(reversed(segments))

def intcomma_recurs(value):
    """
    Converts an integer to a string containing commas every three digits.
    For example, 3000 becomes '3,000' and 45000 becomes '45,000'.
    """
    new = re.sub("^(-?\d+)(\d{3})", '\g<1>,\g<2>', str(value))
    if value == new:
        return new
    else:
        return intcomma(new)

@contextmanager
def timed(save_time_func):
    begin=time.time()
    try:
        yield
    finally:
        save_time_func(time.time()-begin)

def testset_xsimple(func):
    func('5')

def testset_simple(func):
    func('567')

def testset_onecomma(func):
    func('567890')

def testset_complex(func):
    func('-1234567.024')

def testset_average(func):
    func('-1234567.024')
    func('567')
    func('5674')

if __name__ == '__main__':
    print 'Test results:'
    for test_data in ('5','567','1234','1234.56','-253892.045'):
        for func in (intcomma,intcomma_noregex,intcomma_noregex_reversed,intcomma_recurs):
            print func.__name__,test_data,func(test_data)
    times=[]
    def overhead(x):
        pass
    for test_run in xrange(1,4):
        for func in (intcomma,intcomma_noregex,intcomma_noregex_reversed,intcomma_recurs,overhead):
            for testset in (testset_xsimple,testset_simple,testset_onecomma,testset_complex,testset_average):
                for x in xrange(1000): # prime the test
                    testset(func)
                with timed(lambda x:times.append(((test_run,func,testset),x))):
                    for x in xrange(50000):
                        testset(func)
    for (test_run,func,testset),_delta in times:
        print test_run,func.__name__,testset.__name__,_delta

And here are the test results:

intcomma 5 5
intcomma_noregex 5 5
intcomma_noregex_reversed 5 5
intcomma_recurs 5 5
intcomma 567 567
intcomma_noregex 567 567
intcomma_noregex_reversed 567 567
intcomma_recurs 567 567
intcomma 1234 1,234
intcomma_noregex 1234 1,234
intcomma_noregex_reversed 1234 1,234
intcomma_recurs 1234 1,234
intcomma 1234.56 1,234.56
intcomma_noregex 1234.56 1,234.56
intcomma_noregex_reversed 1234.56 1,234.56
intcomma_recurs 1234.56 1,234.56
intcomma -253892.045 -253,892.045
intcomma_noregex -253892.045 -253,892.045
intcomma_noregex_reversed -253892.045 -253,892.045
intcomma_recurs -253892.045 -253,892.045
1 intcomma testset_xsimple 0.0410001277924
1 intcomma testset_simple 0.0369999408722
1 intcomma testset_onecomma 0.213000059128
1 intcomma testset_complex 0.296000003815
1 intcomma testset_average 0.503000020981
1 intcomma_noregex testset_xsimple 0.134000062943
1 intcomma_noregex testset_simple 0.134999990463
1 intcomma_noregex testset_onecomma 0.190999984741
1 intcomma_noregex testset_complex 0.209000110626
1 intcomma_noregex testset_average 0.513000011444
1 intcomma_noregex_reversed testset_xsimple 0.124000072479
1 intcomma_noregex_reversed testset_simple 0.12700009346
1 intcomma_noregex_reversed testset_onecomma 0.230000019073
1 intcomma_noregex_reversed testset_complex 0.236999988556
1 intcomma_noregex_reversed testset_average 0.56299996376
1 intcomma_recurs testset_xsimple 0.348000049591
1 intcomma_recurs testset_simple 0.34600019455
1 intcomma_recurs testset_onecomma 0.625
1 intcomma_recurs testset_complex 0.773999929428
1 intcomma_recurs testset_average 1.6890001297
1 overhead testset_xsimple 0.0179998874664
1 overhead testset_simple 0.0190000534058
1 overhead testset_onecomma 0.0190000534058
1 overhead testset_complex 0.0190000534058
1 overhead testset_average 0.0309998989105
2 intcomma testset_xsimple 0.0360000133514
2 intcomma testset_simple 0.0369999408722
2 intcomma testset_onecomma 0.207999944687
2 intcomma testset_complex 0.302000045776
2 intcomma testset_average 0.523000001907
2 intcomma_noregex testset_xsimple 0.139999866486
2 intcomma_noregex testset_simple 0.141000032425
2 intcomma_noregex testset_onecomma 0.203999996185
2 intcomma_noregex testset_complex 0.200999975204
2 intcomma_noregex testset_average 0.523000001907
2 intcomma_noregex_reversed testset_xsimple 0.130000114441
2 intcomma_noregex_reversed testset_simple 0.129999876022
2 intcomma_noregex_reversed testset_onecomma 0.236000061035
2 intcomma_noregex_reversed testset_complex 0.241999864578
2 intcomma_noregex_reversed testset_average 0.582999944687
2 intcomma_recurs testset_xsimple 0.351000070572
2 intcomma_recurs testset_simple 0.352999925613
2 intcomma_recurs testset_onecomma 0.648999929428
2 intcomma_recurs testset_complex 0.808000087738
2 intcomma_recurs testset_average 1.81900000572
2 overhead testset_xsimple 0.0189998149872
2 overhead testset_simple 0.0189998149872
2 overhead testset_onecomma 0.0190000534058
2 overhead testset_complex 0.0179998874664
2 overhead testset_average 0.0299999713898
3 intcomma testset_xsimple 0.0360000133514
3 intcomma testset_simple 0.0360000133514
3 intcomma testset_onecomma 0.210000038147
3 intcomma testset_complex 0.305999994278
3 intcomma testset_average 0.493000030518
3 intcomma_noregex testset_xsimple 0.131999969482
3 intcomma_noregex testset_simple 0.136000156403
3 intcomma_noregex testset_onecomma 0.192999839783
3 intcomma_noregex testset_complex 0.202000141144
3 intcomma_noregex testset_average 0.509999990463
3 intcomma_noregex_reversed testset_xsimple 0.125999927521
3 intcomma_noregex_reversed testset_simple 0.126999855042
3 intcomma_noregex_reversed testset_onecomma 0.235999822617
3 intcomma_noregex_reversed testset_complex 0.243000030518
3 intcomma_noregex_reversed testset_average 0.56200003624
3 intcomma_recurs testset_xsimple 0.337000131607
3 intcomma_recurs testset_simple 0.342000007629
3 intcomma_recurs testset_onecomma 0.609999895096
3 intcomma_recurs testset_complex 0.75
3 intcomma_recurs testset_average 1.68300008774
3 overhead testset_xsimple 0.0189998149872
3 overhead testset_simple 0.018000125885
3 overhead testset_onecomma 0.018000125885
3 overhead testset_complex 0.0179998874664
3 overhead testset_average 0.0299999713898

回答 21

每个PEP将其烘焙到python中-> https://www.python.org/dev/peps/pep-0378/

只需使用format(1000,’,d’)来显示带有千位分隔符的整数

PEP中描述了更多的格式

this is baked into python per PEP -> https://www.python.org/dev/peps/pep-0378/

just use format(1000, ‘,d’) to show an integer with thousands separator

there are more formats described in the PEP, have at it


回答 22

这是使用生成器函数的另一种变体,适用于整数:

def ncomma(num):
    def _helper(num):
        # assert isinstance(numstr, basestring)
        numstr = '%d' % num
        for ii, digit in enumerate(reversed(numstr)):
            if ii and ii % 3 == 0 and digit.isdigit():
                yield ','
            yield digit

    return ''.join(reversed([n for n in _helper(num)]))

这是一个测试:

>>> for i in (0, 99, 999, 9999, 999999, 1000000, -1, -111, -1111, -111111, -1000000):
...     print i, ncomma(i)
... 
0 0
99 99
999 999
9999 9,999
999999 999,999
1000000 1,000,000
-1 -1
-111 -111
-1111 -1,111
-111111 -111,111
-1000000 -1,000,000

Here is another variant using a generator function that works for integers:

def ncomma(num):
    def _helper(num):
        # assert isinstance(numstr, basestring)
        numstr = '%d' % num
        for ii, digit in enumerate(reversed(numstr)):
            if ii and ii % 3 == 0 and digit.isdigit():
                yield ','
            yield digit

    return ''.join(reversed([n for n in _helper(num)]))

And here’s a test:

>>> for i in (0, 99, 999, 9999, 999999, 1000000, -1, -111, -1111, -111111, -1000000):
...     print i, ncomma(i)
... 
0 0
99 99
999 999
9999 9,999
999999 999,999
1000000 1,000,000
-1 -1
-111 -111
-1111 -1,111
-111111 -111,111
-1000000 -1,000,000

回答 23

只是子类long(或float,或其他)。这非常实用,因为这样您仍然可以在数学运算中使用数字(因此也可以在现有代码中使用数字),但是它们都可以在终端中很好地打印出来。

>>> class number(long):

        def __init__(self, value):
            self = value

        def __repr__(self):
            s = str(self)
            l = [x for x in s if x in '1234567890']
            for x in reversed(range(len(s)-1)[::3]):
                l.insert(-x, ',')
            l = ''.join(l[1:])
            return ('-'+l if self < 0 else l) 

>>> number(-100000)
-100,000
>>> number(-100)
-100
>>> number(-12345)
-12,345
>>> number(928374)
928,374
>>> 345

Just subclass long (or float, or whatever). This is highly practical, because this way you can still use your numbers in math ops (and therefore existing code), but they will all print nicely in your terminal.

>>> class number(long):

        def __init__(self, value):
            self = value

        def __repr__(self):
            s = str(self)
            l = [x for x in s if x in '1234567890']
            for x in reversed(range(len(s)-1)[::3]):
                l.insert(-x, ',')
            l = ''.join(l[1:])
            return ('-'+l if self < 0 else l) 

>>> number(-100000)
-100,000
>>> number(-100)
-100
>>> number(-12345)
-12,345
>>> number(928374)
928,374
>>> 345

回答 24

意大利:

>>> import locale
>>> locale.setlocale(locale.LC_ALL,"")
'Italian_Italy.1252'
>>> f"{1000:n}"
'1.000'

Italy:

>>> import locale
>>> locale.setlocale(locale.LC_ALL,"")
'Italian_Italy.1252'
>>> f"{1000:n}"
'1.000'

回答 25

对于花车:

float(filter(lambda x: x!=',', '1,234.52'))
# returns 1234.52

对于整数:

int(filter(lambda x: x!=',', '1,234'))
# returns 1234

For floats:

float(filter(lambda x: x!=',', '1,234.52'))
# returns 1234.52

For ints:

int(filter(lambda x: x!=',', '1,234'))
# returns 1234

urllib,urllib2,urllib3和请求模块之间有什么区别?

问题:urllib,urllib2,urllib3和请求模块之间有什么区别?

在Python,有什么之间的差异urlliburllib2urllib3requests模块?为什么有三个?他们似乎在做同样的事情…

In Python, what are the differences between the urllib, urllib2, urllib3 and requests modules? Why are there three? They seem to do the same thing…


回答 0

我知道已经有人说过了,但我强烈建议您使用requestsPython软件包。

如果您使用的是python以外的语言,则可能是在考虑urllib并且urllib2易于使用,代码不多且功能强大,这就是我以前的想法。但是该requests程序包是如此有用且太短,以至于每个人都应该使用它。

首先,它支持完全宁静的API,并且非常简单:

import requests

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

无论是GET / POST,您都无需再次对参数进行编码,只需将字典作为参数即可。

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

加上它甚至还具有内置的JSON解码器(再次,我知道json.loads()编写的内容并不多,但这肯定很方便):

resp.json()

或者,如果您的响应数据只是文本,请使用:

resp.text

这只是冰山一角。这是请求站点中的功能列表:

  • 国际域名和URL
  • 保持活动和连接池
  • Cookie持久性会话
  • 浏览器式SSL验证
  • 基本/摘要身份验证
  • 优雅的键/值Cookie
  • 自动减压
  • Unicode响应机构
  • 分段文件上传
  • 连接超时
  • .netrc支持
  • 项目清单
  • python 2.6—3.4
  • 线程安全的。

I know it’s been said already, but I’d highly recommend the requests Python package.

If you’ve used languages other than python, you’re probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that’s how I used to think. But the requests package is so unbelievably useful and short that everyone should be using it.

First, it supports a fully restful API, and is as easy as:

import requests

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

Plus it even has a built in JSON decoder (again, I know json.loads() isn’t a lot more to write, but this sure is convenient):

resp.json()

Or if your response data is just text, use:

resp.text

This is just the tip of the iceberg. This is the list of features from the requests site:

  • International Domains and URLs
  • Keep-Alive & Connection Pooling
  • Sessions with Cookie Persistence
  • Browser-style SSL Verification
  • Basic/Digest Authentication
  • Elegant Key/Value Cookies
  • Automatic Decompression
  • Unicode Response Bodies
  • Multipart File Uploads
  • Connection Timeouts
  • .netrc support
  • List item
  • Python 2.6—3.4
  • Thread-safe.

回答 1

urllib2提供了一些额外的功能,即该urlopen()函数可以允许您指定标头(通常您以前必须使用httplib,这要冗长得多。)不过,更重要的是,urllib2提供了Request该类,该类可以提供更多功能。声明式处理请求:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

请注意,urlencode()仅在urllib中,而不在urllib2中。

还有一些处理程序,用于在urllib2中实现更高级的URL支持。简短的答案是,除非使用旧代码,否则可能要使用urllib2中的URL打开程序,但是对于某些实用程序功能,仍然需要导入urllib。

奖励答案 使用Google App Engine,您可以使用httplib,urllib或urllib2中的任何一个,但它们都只是Google URL Fetch API的包装。也就是说,您仍然受到端口,协议和允许的响应时间之类的相同限制。不过,您可以像期望的那样使用库的核心来获取HTTP URL。

urllib2 provides some extra functionality, namely the urlopen() function can allow you to specify headers (normally you’d have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request class, which allows for a more declarative approach to doing a request:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

Note that urlencode() is only in urllib, not urllib2.

There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you’re working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.

Bonus answer With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google’s URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.


回答 2

urlliburllib2都是Python模块,它们执行URL请求相关的内容,但提供不同的功能。

1)urllib2可以接受Request对象来设置URL请求的标头,而urllib仅接受URL。

2)urllib提供了urlencode方法,该方法用于生成GET查询字符串,而urllib2没有此功能。这是urllib与urllib2经常一起使用的原因之一。

Requests -Requests是一个使用Python编写的简单易用的HTTP库。

1)Python请求自动对参数进行编码,因此您只需将它们作为简单的参数传递,就与urllib不同,在urllib中,需要在传递参数之前使用urllib.encode()方法对参数进行编码。

2)它自动将响应解码为Unicode。

3)Requests还具有更方便的错误处理方式。如果您的身份验证失败,则urllib2将引发urllib2.URLError,而Requests将返回正常的响应对象。您需要通过boolean response.ok查看所有请求是否成功

urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.

2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn’t have such a function. This is one of the reasons why urllib is often used along with urllib2.

Requests – Requests’ is a simple, easy-to-use HTTP library written in Python.

1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.

2) It automatically decoded the response into Unicode.

3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok


回答 3

将Python2移植到Python3是一个相当大的区别。urllib2对于python3不存在,其方法已移植到urllib。因此,您正在大量使用它,并希望将来迁移到Python3,请考虑使用urllib。但是2to3工具将自动为您完成大部分工作。

One considerable difference is about porting Python2 to Python3. urllib2 does not exist for python3 and its methods ported to urllib. So you are using that heavily and want to migrate to Python3 in future, consider using urllib. However 2to3 tool will automatically do most of the work for you.


回答 4

仅添加到现有答案中,我看不到有人提到python请求不是本机库。如果可以添加依赖项,那么请求就可以了。但是,如果您试图避免添加依赖项,则urllib是一个本机python库,已经可供您使用。

Just to add to the existing answers, I don’t see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.


回答 5

我喜欢此urllib.urlencode功能,并且似乎不存在urllib2

>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'

I like the urllib.urlencode function, and it doesn’t appear to exist in urllib2.

>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'

回答 6

要获取网址的内容:

try: # Try importing requests first.
    import requests
except ImportError: 
    try: # Try importing Python3 urllib
        import urllib.request
    except AttributeError: # Now importing Python2 urllib
        import urllib


def get_content(url):
    try:  # Using requests.
        return requests.get(url).content # Returns requests.models.Response.
    except NameError:  
        try: # Using Python3 urllib.
            with urllib.request.urlopen(index_url) as response:
                return response.read() # Returns http.client.HTTPResponse.
        except AttributeError: # Using Python3 urllib.
            return urllib.urlopen(url).read() # Returns an instance.

很难request为响应编写Python2和Python3以及依赖项代码,因为它们的urlopen()功能和requests.get()函数返回不同的类型:

  • Python2 urllib.request.urlopen()返回一个http.client.HTTPResponse
  • Python3 urllib.urlopen(url)返回一个instance
  • 请求request.get(url)返回一个requests.models.Response

To get the content of a url:

try: # Try importing requests first.
    import requests
except ImportError: 
    try: # Try importing Python3 urllib
        import urllib.request
    except AttributeError: # Now importing Python2 urllib
        import urllib


def get_content(url):
    try:  # Using requests.
        return requests.get(url).content # Returns requests.models.Response.
    except NameError:  
        try: # Using Python3 urllib.
            with urllib.request.urlopen(index_url) as response:
                return response.read() # Returns http.client.HTTPResponse.
        except AttributeError: # Using Python3 urllib.
            return urllib.urlopen(url).read() # Returns an instance.

It’s hard to write Python2 and Python3 and request dependencies code for the responses because they urlopen() functions and requests.get() function return different types:

  • Python2 urllib.request.urlopen() returns a http.client.HTTPResponse
  • Python3 urllib.urlopen(url) returns an instance
  • Request request.get(url) returns a requests.models.Response

回答 7

通常应该使用urllib2,因为通过接受Request对象有时会使事情变得容易一些,并且还会在协议错误时引发URLException。但是,借助Google App Engine,您将无法使用任何一种。您必须使用Google在其沙盒Python环境中提供的URL Fetch API

You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can’t use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.


回答 8

我发现上述答案中缺少的一个关键点是urllib返回类型为object的对象,<class http.client.HTTPResponse>requests返回return <class 'requests.models.Response'>

因此,read()方法可以与一起使用,urllib但不能与一起使用requests

PS:requests已经有很多方法,几乎​​不需要read();>

A key point that I find missing in the above answers is that urllib returns an object of type <class http.client.HTTPResponse> whereas requests returns <class 'requests.models.Response'>.

Due to this, read() method can be used with urllib but not with requests.

P.S. : requests is already rich with so many methods that it hardly needs one more as read() ;>


如何删除在特定列中的值为NaN的Pandas DataFrame行

问题:如何删除在特定列中的值为NaN的Pandas DataFrame行

我有这个DataFrame,只想要EPS列不是的记录NaN

>>> df
                 STK_ID  EPS  cash
STK_ID RPT_Date                   
601166 20111231  601166  NaN   NaN
600036 20111231  600036  NaN    12
600016 20111231  600016  4.3   NaN
601009 20111231  601009  NaN   NaN
601939 20111231  601939  2.5   NaN
000001 20111231  000001  NaN   NaN

…例如df.drop(....)要得到这个结果的数据框:

                  STK_ID  EPS  cash
STK_ID RPT_Date                   
600016 20111231  600016  4.3   NaN
601939 20111231  601939  2.5   NaN

我怎么做?

I have this DataFrame and want only the records whose EPS column is not NaN:

>>> df
                 STK_ID  EPS  cash
STK_ID RPT_Date                   
601166 20111231  601166  NaN   NaN
600036 20111231  600036  NaN    12
600016 20111231  600016  4.3   NaN
601009 20111231  601009  NaN   NaN
601939 20111231  601939  2.5   NaN
000001 20111231  000001  NaN   NaN

…i.e. something like df.drop(....) to get this resulting dataframe:

                  STK_ID  EPS  cash
STK_ID RPT_Date                   
600016 20111231  600016  4.3   NaN
601939 20111231  601939  2.5   NaN

How do I do that?


回答 0

不要丢掉,只取EPS不是NA的行:

df = df[df['EPS'].notna()]

Don’t drop, just take the rows where EPS is not NA:

df = df[df['EPS'].notna()]

回答 1

这个问题已经解决,但是…

…还要考虑伍特(Wouter)在其原始评论中提出的解决方案。dropna()大熊猫内置了处理丢失数据(包括)的功能。除了通过手动执行可能会提高的性能外,这些功能还带有多种可能有用的选项。

In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

In [26]: df
Out[26]:
          0         1         2
0       NaN       NaN       NaN
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values
Out[27]:
          0         1         2
1  2.677677 -1.466923 -0.750366
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
Out[28]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295
9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
Out[30]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
9 -0.310130  0.078891       NaN

还有其他选项(请参见http://pandas.pydata.org/pandas-docs/stable/generation/pandas.DataFrame.dropna.html上的文档),包括删除列而不是行。

很方便!

This question is already resolved, but…

…also consider the solution suggested by Wouter in his original comment. The ability to handle missing data, including dropna(), is built into pandas explicitly. Aside from potentially improved performance over doing it manually, these functions also come with a variety of options which may be useful.

In [24]: df = pd.DataFrame(np.random.randn(10,3))

In [25]: df.iloc[::2,0] = np.nan; df.iloc[::4,1] = np.nan; df.iloc[::3,2] = np.nan;

In [26]: df
Out[26]:
          0         1         2
0       NaN       NaN       NaN
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [27]: df.dropna()     #drop all rows that have any NaN values
Out[27]:
          0         1         2
1  2.677677 -1.466923 -0.750366
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295

In [28]: df.dropna(how='all')     #drop only if ALL columns are NaN
Out[28]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
4       NaN       NaN  0.050742
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
8       NaN       NaN  0.637482
9 -0.310130  0.078891       NaN

In [29]: df.dropna(thresh=2)   #Drop row if it does not have at least two values that are **not** NaN
Out[29]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
5 -1.250970  0.030561 -2.678622
7  0.049896 -0.308003  0.823295
9 -0.310130  0.078891       NaN

In [30]: df.dropna(subset=[1])   #Drop only if NaN in specific column (as asked in the question)
Out[30]:
          0         1         2
1  2.677677 -1.466923 -0.750366
2       NaN  0.798002 -0.906038
3  0.672201  0.964789       NaN
5 -1.250970  0.030561 -2.678622
6       NaN  1.036043       NaN
7  0.049896 -0.308003  0.823295
9 -0.310130  0.078891       NaN

There are also other options (See docs at http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html), including dropping columns instead of rows.

Pretty handy!


回答 2

我知道已经回答了这个问题,但是只是为了对这个特定问题提供一个纯粹的熊猫解决方案,而不是Aman的一般性描述(这很妙),以防万一其他人发生于此:

import pandas as pd
df = df[pd.notnull(df['EPS'])]

I know this has already been answered, but just for the sake of a purely pandas solution to this specific question as opposed to the general description from Aman (which was wonderful) and in case anyone else happens upon this:

import pandas as pd
df = df[pd.notnull(df['EPS'])]

回答 3

您可以使用此:

df.dropna(subset=['EPS'], how='all', inplace=True)

You can use this:

df.dropna(subset=['EPS'], how='all', inplace=True)

回答 4

所有解决方案中最简单的:

filtered_df = df[df['EPS'].notnull()]

上面的解决方案比使用np.isfinite()更好

Simplest of all solutions:

filtered_df = df[df['EPS'].notnull()]

The above solution is way better than using np.isfinite()


回答 5

你可以使用数据帧的方法NOTNULL或逆ISNULL,或numpy.isnan

In [332]: df[df.EPS.notnull()]
Out[332]:
   STK_ID  RPT_Date  STK_ID.1  EPS  cash
2  600016  20111231    600016  4.3   NaN
4  601939  20111231    601939  2.5   NaN


In [334]: df[~df.EPS.isnull()]
Out[334]:
   STK_ID  RPT_Date  STK_ID.1  EPS  cash
2  600016  20111231    600016  4.3   NaN
4  601939  20111231    601939  2.5   NaN


In [347]: df[~np.isnan(df.EPS)]
Out[347]:
   STK_ID  RPT_Date  STK_ID.1  EPS  cash
2  600016  20111231    600016  4.3   NaN
4  601939  20111231    601939  2.5   NaN

You could use dataframe method notnull or inverse of isnull, or numpy.isnan:

In [332]: df[df.EPS.notnull()]
Out[332]:
   STK_ID  RPT_Date  STK_ID.1  EPS  cash
2  600016  20111231    600016  4.3   NaN
4  601939  20111231    601939  2.5   NaN


In [334]: df[~df.EPS.isnull()]
Out[334]:
   STK_ID  RPT_Date  STK_ID.1  EPS  cash
2  600016  20111231    600016  4.3   NaN
4  601939  20111231    601939  2.5   NaN


In [347]: df[~np.isnan(df.EPS)]
Out[347]:
   STK_ID  RPT_Date  STK_ID.1  EPS  cash
2  600016  20111231    600016  4.3   NaN
4  601939  20111231    601939  2.5   NaN

回答 6

简单方法

df.dropna(subset=['EPS'],inplace=True)

来源:https : //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.dropna.html


回答 7

还有一个使用以下事实的解决方案np.nan != np.nan

In [149]: df.query("EPS == EPS")
Out[149]:
                 STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231  600016  4.3   NaN
601939 20111231  601939  2.5   NaN

yet another solution which uses the fact that np.nan != np.nan:

In [149]: df.query("EPS == EPS")
Out[149]:
                 STK_ID  EPS  cash
STK_ID RPT_Date
600016 20111231  600016  4.3   NaN
601939 20111231  601939  2.5   NaN

回答 8

另一个版本:

df[~df['EPS'].isna()]

Another version:

df[~df['EPS'].isna()]

回答 9

在具有大量列的数据集中,最好查看有多少列包含空值而有多少列不包含空值。

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

例如,在我的数据框中,它包含82列,其中19列至少包含一个空值。

此外,您还可以自动删除cols和row,具体取决于哪个具有更多的null值。
以下是巧妙地执行此操作的代码:

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

注意:上面的代码删除了所有空值。如果需要空值,请先处理它们。

In datasets having large number of columns its even better to see how many columns contain null values and how many don’t.

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

Further you can also automatically remove cols and rows depending on which has more null values
Here is the code which does this intelligently:

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

Note: Above code removes all of your null values. If you want null values, process them before.


回答 10

可以将其添加为’&’可用于添加其他条件,例如

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

请注意,在评估语句时,熊猫需要加上括号。

It may be added at that ‘&’ can be used to add additional conditions e.g.

df = df[(df.EPS > 2.0) & (df.EPS <4.0)]

Notice that when evaluating the statements, pandas needs parenthesis.


回答 11

由于某种原因,以前提交的答案都对我不起作用。这个基本解决方案做到了:

df = df[df.EPS >= 0]

当然,这也会删除带有负数的行。因此,如果您想要这些,在以后添加它可能也很聪明。

df = df[df.EPS <= 0]

For some reason none of the previously submitted answers worked for me. This basic solution did:

df = df[df.EPS >= 0]

Though of course that will drop rows with negative numbers, too. So if you want those it’s probably smart to add this after, too.

df = df[df.EPS <= 0]

回答 12

解决方案之一可以是

df = df[df.isnull().sum(axis=1) <= Cutoff Value]

另一种方法可以是

df= df.dropna(thresh=(df.shape[1] - Cutoff_value))

我希望这些是有用的。

One of the solution can be

df = df[df.isnull().sum(axis=1) <= Cutoff Value]

Another way can be

df= df.dropna(thresh=(df.shape[1] - Cutoff_value))

I hope these are useful.


Python的`如果x不是None`或`如果x不是None`?

问题:Python的`如果x不是None`或`如果x不是None`?

我一直认为该if not x is None版本会更清晰,但是Google的样式指南PEP-8都使用if x is not None。是否存在任何微小的性能差异(我假设不是),并且在任何情况下确实不适合(使另一方成为我的会议的明显获胜者)吗?*

*我指的是任何单身人士,而不仅仅是None

…比较单例,如“无”。使用是或不是。

I’ve always thought of the if not x is None version to be more clear, but Google’s style guide and PEP-8 both use if x is not None. Is there any minor performance difference (I’m assuming not), and is there any case where one really doesn’t fit (making the other a clear winner for my convention)?*

*I’m referring to any singleton, rather than just None.

…to compare singletons like None. Use is or is not.


回答 0

没有性能差异,因为它们可以编译为相同的字节码:

Python 2.6.2 (r262:71600, Apr 15 2009, 07:20:39)
>>> import dis
>>> def f(x):
...    return x is not None
...
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> def g(x):
...   return not x is None
...
>>> dis.dis(g)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

从风格上讲,我尽量避免not x is y。尽管编译器总是将其视为not (x is y)。读者可能会误解为(not x) is y。如果我写的x is not y话就没有歧义。

There’s no performance difference, as they compile to the same bytecode:

Python 2.6.2 (r262:71600, Apr 15 2009, 07:20:39)
>>> import dis
>>> def f(x):
...    return x is not None
...
>>> dis.dis(f)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> def g(x):
...   return not x is None
...
>>> dis.dis(g)
  2           0 LOAD_FAST                0 (x)
              3 LOAD_CONST               0 (None)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

Stylistically, I try to avoid not x is y. Although the compiler will always treat it as not (x is y), a human reader might misunderstand the construct as (not x) is y. If I write x is not y then there is no ambiguity.


回答 1

Google和Python的样式指南都是最佳做法:

if x is not None:
    # Do something about x

使用not x会导致不良结果。

见下文:

>>> x = 1
>>> not x
False
>>> x = [1]
>>> not x
False
>>> x = 0
>>> not x
True
>>> x = [0]         # You don't want to fall in this one.
>>> not x
False

您可能有兴趣了解对Python TrueFalse在Python 中评估了哪些文字:


编辑以下评论:

我只是做了一些测试。先not x is None不取反x,然后与相比较None。实际上,is使用这种方式时,似乎运算符具有更高的优先级:

>>> x
[0]
>>> not x is None
True
>>> not (x is None)
True
>>> (not x) is None
False

因此,not x is None以我的诚实观点,最好避免。


更多编辑:

我只是做了更多测试,可以确认bukzor的评论正确。(至少,我无法证明这一点。)

这意味着if x is not None结果与相同if not x is None。我站得住了。谢谢布克佐。

但是,我的答案仍然是:使用常规if x is not None:]

Both Google and Python‘s style guide is the best practice:

if x is not None:
    # Do something about x

Using not x can cause unwanted results.

See below:

>>> x = 1
>>> not x
False
>>> x = [1]
>>> not x
False
>>> x = 0
>>> not x
True
>>> x = [0]         # You don't want to fall in this one.
>>> not x
False

You may be interested to see what literals are evaluated to True or False in Python:


Edit for comment below:

I just did some more testing. not x is None doesn’t negate x first and then compared to None. In fact, it seems the is operator has a higher precedence when used that way:

>>> x
[0]
>>> not x is None
True
>>> not (x is None)
True
>>> (not x) is None
False

Therefore, not x is None is just, in my honest opinion, best avoided.


More edit:

I just did more testing and can confirm that bukzor’s comment is correct. (At least, I wasn’t able to prove it otherwise.)

This means if x is not None has the exact result as if not x is None. I stand corrected. Thanks bukzor.

However, my answer still stands: Use the conventional if x is not None. :]


回答 2

应该首先编写代码,以便程序员首先可以理解,然后再编译器或解释器理解。“不是”构造比“不是”更像英语。

Code should be written to be understandable to the programmer first, and the compiler or interpreter second. The “is not” construct resembles English more closely than “not is”.


回答 3

Python if x is not None还是if not x is None

TLDR:字节码编译器将它们都解析为x is not None-为了便于阅读,请使用if x is not None

可读性

我们之所以使用Python,是因为我们重视诸如人类可读性,可用性和各种编程范式的正确性之类的东西,而不是性能。

Python针对可读性进行了优化,尤其是在这种情况下。

解析和编译字节码

not 结合更弱is,所以这里没有逻辑的差异。请参阅文档

运算符isis not测试对象标识:x is y当且仅当x和y是同一对象时才为true。x is not y产生反真值。

is not有具体规定,在Python 语法作为语言可读性改善:

comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'

因此,它也是语法的一个统一要素。

当然,它的解析方式不同:

>>> import ast
>>> ast.dump(ast.parse('x is not None').body[0].value)
"Compare(left=Name(id='x', ctx=Load()), ops=[IsNot()], comparators=[Name(id='None', ctx=Load())])"
>>> ast.dump(ast.parse('not x is None').body[0].value)
"UnaryOp(op=Not(), operand=Compare(left=Name(id='x', ctx=Load()), ops=[Is()], comparators=[Name(id='None', ctx=Load())]))"

但是字节编译器实际上会将转换not ... isis not

>>> import dis
>>> dis.dis(lambda x, y: x is not y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> dis.dis(lambda x, y: not x is y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

因此,为了便于阅读并按预期使用语言,请使用is not

不使用它不明智的。

Python if x is not None or if not x is None?

TLDR: The bytecode compiler parses them both to x is not None – so for readability’s sake, use if x is not None.

Readability

We use Python because we value things like human readability, useability, and correctness of various paradigms of programming over performance.

Python optimizes for readability, especially in this context.

Parsing and Compiling the Bytecode

The not binds more weakly than is, so there is no logical difference here. See the documentation:

The operators is and is not test for object identity: x is y is true if and only if x and y are the same object. x is not y yields the inverse truth value.

The is not is specifically provided for in the Python grammar as a readability improvement for the language:

comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'

And so it is a unitary element of the grammar as well.

Of course, it is not parsed the same:

>>> import ast
>>> ast.dump(ast.parse('x is not None').body[0].value)
"Compare(left=Name(id='x', ctx=Load()), ops=[IsNot()], comparators=[Name(id='None', ctx=Load())])"
>>> ast.dump(ast.parse('not x is None').body[0].value)
"UnaryOp(op=Not(), operand=Compare(left=Name(id='x', ctx=Load()), ops=[Is()], comparators=[Name(id='None', ctx=Load())]))"

But then the byte compiler will actually translate the not ... is to is not:

>>> import dis
>>> dis.dis(lambda x, y: x is not y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE
>>> dis.dis(lambda x, y: not x is y)
  1           0 LOAD_FAST                0 (x)
              3 LOAD_FAST                1 (y)
              6 COMPARE_OP               9 (is not)
              9 RETURN_VALUE

So for the sake of readability and using the language as it was intended, please use is not.

To not use it is not wise.


回答 4

答案比人们做的要简单。

两种方法都没有技术优势,其他人都使用 “ x不是y” ,这显然是赢家。是否“看起来更像英语”并不重要;每个人都使用它,这意味着Python的每个用户-甚至是中国用户,其语言与Python看起来都不像-都将一目了然地理解它,稍稍不常见的语法将需要花费更多的脑力来解析。

至少在这个领域,不要仅仅为了与众不同而与众不同。

The answer is simpler than people are making it.

There’s no technical advantage either way, and “x is not y” is what everybody else uses, which makes it the clear winner. It doesn’t matter that it “looks more like English” or not; everyone uses it, which means every user of Python–even Chinese users, whose language Python looks nothing like–will understand it at a glance, where the slightly less common syntax will take a couple extra brain cycles to parse.

Don’t be different just for the sake of being different, at least in this field.


回答 5

is not由于is风格上的原因,操作员优先于否定结果。“ if x is not None:”的读法类似于英语,但“ if not x is None:”需要理解操作符的优先级,并且读起来并不像英文。

如果有性能上的差异,我会花钱is not,但这几乎肯定不是决定选择该技术的动机。显然,这将取决于实现。由于这is是不可替代的,因此无论如何都应该很容易优化任何区别。

The is not operator is preferred over negating the result of is for stylistic reasons. “if x is not None:” reads just like English, but “if not x is None:” requires understanding of the operator precedence and does not read like english.

If there is a performance difference my money is on is not, but this almost certainly isn’t the motivation for the decision to prefer that technique. It would obviously be implementation-dependent. Since is isn’t overridable, it should be easy to optimise out any distinction anyhow.


回答 6

我个人使用

if not (x is None):

每个程序员,即使不是Python语法专家的程序员,也都可以毫不歧义地立即理解它。

Personally, I use

if not (x is None):

which is understood immediately without ambiguity by every programmer, even those not expert in the Python syntax.


回答 7

if not x is None与其他编程语言更相似,但if x is not None对我来说绝对听起来更清晰(英语语法更正确)。

话虽如此,这似乎对我来说更偏爱。

if not x is None is more similar to other programming languages, but if x is not None definitely sounds more clear (and is more grammatically correct in English) to me.

That said it seems like it’s more of a preference thing to me.


回答 8

我更喜欢可读性强的形式,而x is not y 不是想如何最终写出运算符的代码处理优先级以产生可读性更高的代码。

I would prefer the more readable form x is not y than I would think how to eventually write the code handling precedence of the operators in order to produce much more readable code.