标签归档:null

如何从pandas DataFrame中选择一个或多个null的行而不显式列出列?

问题:如何从pandas DataFrame中选择一个或多个null的行而不显式列出列?

我有一个约30万行和约40列的数据框。我想找出是否有任何行包含空值-并将这些“空”行放入单独的数据框中,以便我可以轻松地探索它们。

我可以显式创建一个遮罩:

mask = False
for col in df.columns: 
    mask = mask | df[col].isnull()
dfnulls = df[mask]

或者我可以做类似的事情:

df.ix[df.index[(df.T == np.nan).sum() > 1]]

有没有更优雅的方法(找到行中包含null的行)?

I have a dataframe with ~300K rows and ~40 columns. I want to find out if any rows contain null values – and put these ‘null’-rows into a separate dataframe so that I could explore them easily.

I can create a mask explicitly:

mask = False
for col in df.columns: 
    mask = mask | df[col].isnull()
dfnulls = df[mask]

Or I can do something like:

df.ix[df.index[(df.T == np.nan).sum() > 1]]

Is there a more elegant way of doing it (locating rows with nulls in them)?


回答 0

[已更新以适应现代pandas,它已isnull成为一种方法DataFrame。]

您可以使用isnullany构建布尔系列,并使用它来索引您的框架:

>>> df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])
>>> df.isnull()
       0      1      2
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False
4  False  False  False
>>> df.isnull().any(axis=1)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> df[df.isnull().any(axis=1)]
   0   1   2
1  0 NaN   0
2  0   0 NaN

[较老pandas:]

您可以使用函数isnull代替方法:

In [56]: df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])

In [57]: df
Out[57]: 
   0   1   2
0  0   1   2
1  0 NaN   0
2  0   0 NaN
3  0   1   2
4  0   1   2

In [58]: pd.isnull(df)
Out[58]: 
       0      1      2
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False
4  False  False  False

In [59]: pd.isnull(df).any(axis=1)
Out[59]: 
0    False
1     True
2     True
3    False
4    False

导致相当紧凑:

In [60]: df[pd.isnull(df).any(axis=1)]
Out[60]: 
   0   1   2
1  0 NaN   0
2  0   0 NaN

[Updated to adapt to modern pandas, which has isnull as a method of DataFrames..]

You can use isnull and any to build a boolean Series and use that to index into your frame:

>>> df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])
>>> df.isnull()
       0      1      2
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False
4  False  False  False
>>> df.isnull().any(axis=1)
0    False
1     True
2     True
3    False
4    False
dtype: bool
>>> df[df.isnull().any(axis=1)]
   0   1   2
1  0 NaN   0
2  0   0 NaN

[For older pandas:]

You could use the function isnull instead of the method:

In [56]: df = pd.DataFrame([range(3), [0, np.NaN, 0], [0, 0, np.NaN], range(3), range(3)])

In [57]: df
Out[57]: 
   0   1   2
0  0   1   2
1  0 NaN   0
2  0   0 NaN
3  0   1   2
4  0   1   2

In [58]: pd.isnull(df)
Out[58]: 
       0      1      2
0  False  False  False
1  False   True  False
2  False  False   True
3  False  False  False
4  False  False  False

In [59]: pd.isnull(df).any(axis=1)
Out[59]: 
0    False
1     True
2     True
3    False
4    False

leading to the rather compact:

In [60]: df[pd.isnull(df).any(axis=1)]
Out[60]: 
   0   1   2
1  0 NaN   0
2  0   0 NaN

回答 1

def nans(df): return df[df.isnull().any(axis=1)]

然后,当您需要时可以键入:

nans(your_dataframe)
def nans(df): return df[df.isnull().any(axis=1)]

then when ever you need it you can type:

nans(your_dataframe)

回答 2

.any()并且.all()非常适合极端情况,但不适用于要查找特定数量的空值的情况。这是完成我认为您要问的事情的一种非常简单的方法。它很冗长,但很实用。

import pandas as pd
import numpy as np

# Some test data frame
df = pd.DataFrame({'num_legs':          [2, 4,      np.nan, 0, np.nan],
                   'num_wings':         [2, 0,      np.nan, 0, 9],
                   'num_specimen_seen': [10, np.nan, 1,     8, np.nan]})

# Helper : Gets NaNs for some row
def row_nan_sums(df):
    sums = []
    for row in df.values:
        sum = 0
        for el in row:
            if el != el: # np.nan is never equal to itself. This is "hacky", but complete.
                sum+=1
        sums.append(sum)
    return sums

# Returns a list of indices for rows with k+ NaNs
def query_k_plus_sums(df, k):
    sums = row_nan_sums(df)
    indices = []
    i = 0
    for sum in sums:
        if (sum >= k):
            indices.append(i)
        i += 1
    return indices

# test
print(df)
print(query_k_plus_sums(df, 2))

输出量

   num_legs  num_wings  num_specimen_seen
0       2.0        2.0               10.0
1       4.0        0.0                NaN
2       NaN        NaN                1.0
3       0.0        0.0                8.0
4       NaN        9.0                NaN
[2, 4]

然后,如果您像我一样,并且想要清除这些行,则只需编写以下代码:

# drop the rows from the data frame
df.drop(query_k_plus_sums(df, 2),inplace=True)
# Reshuffle up data (if you don't do this, the indices won't reset)
df = df.sample(frac=1).reset_index(drop=True)
# print data frame
print(df)

输出:

   num_legs  num_wings  num_specimen_seen
0       4.0        0.0                NaN
1       0.0        0.0                8.0
2       2.0        2.0               10.0

.any() and .all() are great for the extreme cases, but not when you’re looking for a specific number of null values. Here’s an extremely simple way to do what I believe you’re asking. It’s pretty verbose, but functional.

import pandas as pd
import numpy as np

# Some test data frame
df = pd.DataFrame({'num_legs':          [2, 4,      np.nan, 0, np.nan],
                   'num_wings':         [2, 0,      np.nan, 0, 9],
                   'num_specimen_seen': [10, np.nan, 1,     8, np.nan]})

# Helper : Gets NaNs for some row
def row_nan_sums(df):
    sums = []
    for row in df.values:
        sum = 0
        for el in row:
            if el != el: # np.nan is never equal to itself. This is "hacky", but complete.
                sum+=1
        sums.append(sum)
    return sums

# Returns a list of indices for rows with k+ NaNs
def query_k_plus_sums(df, k):
    sums = row_nan_sums(df)
    indices = []
    i = 0
    for sum in sums:
        if (sum >= k):
            indices.append(i)
        i += 1
    return indices

# test
print(df)
print(query_k_plus_sums(df, 2))

Output

   num_legs  num_wings  num_specimen_seen
0       2.0        2.0               10.0
1       4.0        0.0                NaN
2       NaN        NaN                1.0
3       0.0        0.0                8.0
4       NaN        9.0                NaN
[2, 4]

Then, if you’re like me and want to clear those rows out, you just write this:

# drop the rows from the data frame
df.drop(query_k_plus_sums(df, 2),inplace=True)
# Reshuffle up data (if you don't do this, the indices won't reset)
df = df.sample(frac=1).reset_index(drop=True)
# print data frame
print(df)

Output:

   num_legs  num_wings  num_specimen_seen
0       4.0        0.0                NaN
1       0.0        0.0                8.0
2       2.0        2.0               10.0

返回,返回无,根本没有返回?

问题:返回,返回无,根本没有返回?

考虑三个功能:

def my_func1():
  print "Hello World"
  return None

def my_func2():
  print "Hello World"
  return

def my_func3():
  print "Hello World"

它们似乎都返回None。这些函数的返回值的行为方式之间有什么区别吗?是否有任何理由偏爱一个?

Consider three functions:

def my_func1():
  print "Hello World"
  return None

def my_func2():
  print "Hello World"
  return

def my_func3():
  print "Hello World"

They all appear to return None. Are there any differences between how the returned value of these functions behave? Are there any reasons to prefer one versus the other?


回答 0

在实际行为上,没有区别。他们都回来了None,就是这样。但是,所有这些都有时间和地点。以下说明基本上是应如何使用不同方法的方法(或至少应告诉我应如何使用它们的方法),但是它们不是绝对规则,因此您可以根据需要将它们混合使用。

使用 return None

这说明该函数确实是要返回一个值供以后使用,在这种情况下,它返回NoneNone然后可以在其他地方使用此值。return None如果该函数没有其他可能的返回值,则永远不要使用。

在下面的例子中,我们返回personmother,如果person给出的一个人。如果不是人类,我们将返回,None因为person它没有mother(假设它不是动物或其他东西)。

def get_mother(person):
    if is_human(person):
        return person.mother
    else:
        return None

使用 return

出于与break循环中相同的原因使用它。返回值无关紧要,您只想退出整个函数。即使您不经常使用它,它在某些地方也非常有用。

我们已经有15个人,prisoners而且我们知道其中一个拥有一把刀。我们prisoner逐个循环检查他们是否有刀。如果我们用小刀打人,则可以退出该功能,因为我们知道只有一把小刀,没有理由检查其余部分prisoners。如果找不到prisoner刀子,则会发出警报。这可以通过许多不同的方式完成,使用return可能甚至不是最好的方式,但这只是说明如何使用return退出函数的一个示例。

def find_prisoner_with_knife(prisoners):
    for prisoner in prisoners:
        if "knife" in prisoner.items:
            prisoner.move_to_inquisition()
            return # no need to check rest of the prisoners nor raise an alert
    raise_alert()

注意:绝对不要这样做var = find_prisoner_with_knife(),因为返回值不是要捕获的。

使用无return可言

这也将返回None,但是该值并不意味着要使用或捕获。这仅表示该功能已成功结束。它基本上与C ++或Java等语言return中的void函数相同。

在下面的示例中,我们设置了人的母亲的名字,然后该函数在成功完成后退出。

def set_mother(person, mother):
    if is_human(person):
        person.mother = mother

注意:绝对不要这样做var = set_mother(my_person, my_mother),因为返回值不是要捕获的。

On the actual behavior, there is no difference. They all return None and that’s it. However, there is a time and place for all of these. The following instructions are basically how the different methods should be used (or at least how I was taught they should be used), but they are not absolute rules so you can mix them up if you feel necessary to.

Using return None

This tells that the function is indeed meant to return a value for later use, and in this case it returns None. This value None can then be used elsewhere. return None is never used if there are no other possible return values from the function.

In the following example, we return person‘s mother if the person given is a human. If it’s not a human, we return None since the person doesn’t have a mother (let’s suppose it’s not an animal or something).

def get_mother(person):
    if is_human(person):
        return person.mother
    else:
        return None

Using return

This is used for the same reason as break in loops. The return value doesn’t matter and you only want to exit the whole function. It’s extremely useful in some places, even though you don’t need it that often.

We’ve got 15 prisoners and we know one of them has a knife. We loop through each prisoner one by one to check if they have a knife. If we hit the person with a knife, we can just exit the function because we know there’s only one knife and no reason the check rest of the prisoners. If we don’t find the prisoner with a knife, we raise an alert. This could be done in many different ways and using return is probably not even the best way, but it’s just an example to show how to use return for exiting a function.

def find_prisoner_with_knife(prisoners):
    for prisoner in prisoners:
        if "knife" in prisoner.items:
            prisoner.move_to_inquisition()
            return # no need to check rest of the prisoners nor raise an alert
    raise_alert()

Note: You should never do var = find_prisoner_with_knife(), since the return value is not meant to be caught.

Using no return at all

This will also return None, but that value is not meant to be used or caught. It simply means that the function ended successfully. It’s basically the same as return in void functions in languages such as C++ or Java.

In the following example, we set person’s mother’s name and then the function exits after completing successfully.

def set_mother(person, mother):
    if is_human(person):
        person.mother = mother

Note: You should never do var = set_mother(my_person, my_mother), since the return value is not meant to be caught.


回答 1

是的,它们都是一样的。

我们可以查看解释后的机器代码,以确认它们都在做完全相同的事情。

import dis

def f1():
  print "Hello World"
  return None

def f2():
  print "Hello World"
  return

def f3():
  print "Hello World"

dis.dis(f1)
    4   0 LOAD_CONST    1 ('Hello World')
        3 PRINT_ITEM
        4 PRINT_NEWLINE

    5   5 LOAD_CONST    0 (None)
        8 RETURN_VALUE

dis.dis(f2)
    9   0 LOAD_CONST    1 ('Hello World')
        3 PRINT_ITEM
        4 PRINT_NEWLINE

    10  5 LOAD_CONST    0 (None)
        8 RETURN_VALUE

dis.dis(f3)
    14  0 LOAD_CONST    1 ('Hello World')
        3 PRINT_ITEM
        4 PRINT_NEWLINE            
        5 LOAD_CONST    0 (None)
        8 RETURN_VALUE      

Yes, they are all the same.

We can review the interpreted machine code to confirm that that they’re all doing the exact same thing.

import dis

def f1():
  print "Hello World"
  return None

def f2():
  print "Hello World"
  return

def f3():
  print "Hello World"

dis.dis(f1)
    4   0 LOAD_CONST    1 ('Hello World')
        3 PRINT_ITEM
        4 PRINT_NEWLINE

    5   5 LOAD_CONST    0 (None)
        8 RETURN_VALUE

dis.dis(f2)
    9   0 LOAD_CONST    1 ('Hello World')
        3 PRINT_ITEM
        4 PRINT_NEWLINE

    10  5 LOAD_CONST    0 (None)
        8 RETURN_VALUE

dis.dis(f3)
    14  0 LOAD_CONST    1 ('Hello World')
        3 PRINT_ITEM
        4 PRINT_NEWLINE            
        5 LOAD_CONST    0 (None)
        8 RETURN_VALUE      

回答 2

它们每个都返回相同的单例None-功能上没有差异。

我认为,return除非您需要先退出该函数(在这种情况下,裸露return更为常见)或返回除以外的其他值,否则放弃该语句是很习惯的做法None。它也很有意义,并且return None在函数中具有返回除之外的其他值的函数时似乎是惯用的Nonereturn None明确地写出来是读者的视觉提示,还有另一个分支返回更有趣的内容(并且调用代码可能需要处理两种类型的返回值)。

返回的函数通常在Python None中像voidC中的函数一样使用-它们的目的通常是在适当的位置对输入参数进行操作(除非您正在使用全局数据(shudders))。None通常,返回可以使参数更加明确。这使我们更加清楚为什么return从“语言约定”的角度出发不理会该声明。

就是说,如果您正在使用已经针对这些事情设置了预设约定的代码库,那么我肯定会效仿以帮助使代码库保持一致…

They each return the same singleton None — There is no functional difference.

I think that it is reasonably idiomatic to leave off the return statement unless you need it to break out of the function early (in which case a bare return is more common), or return something other than None. It also makes sense and seems to be idiomatic to write return None when it is in a function that has another path that returns something other than None. Writing return None out explicitly is a visual cue to the reader that there’s another branch which returns something more interesting (and that calling code will probably need to handle both types of return values).

Often in Python, functions which return None are used like void functions in C — Their purpose is generally to operate on the input arguments in place (unless you’re using global data (shudders)). Returning None usually makes it more explicit that the arguments were mutated. This makes it a little more clear why it makes sense to leave off the return statement from a “language conventions” standpoint.

That said, if you’re working in a code base that already has pre-set conventions around these things, I’d definitely follow suit to help the code base stay uniform…


回答 3

正如其他人回答的那样,None在所有情况下都将返回完全相同的结果。

区别是风格上的,但请注意,PEP8要求使用时要保持一致:

在返回语句中保持一致。函数中的所有return语句应返回一个表达式,或者都不返回。如果任何return语句返回一个表达式,则不返回任何值的任何return语句应将其显式声明为return None,并且在函数的末尾(如果可访问)应存在显式return语句。

是:

def foo(x):
    if x >= 0:
        return math.sqrt(x)
    else:
        return None

def bar(x):
    if x < 0:
        return None
    return math.sqrt(x)

没有:

def foo(x):
    if x >= 0:
        return math.sqrt(x)

def bar(x):
    if x < 0:
        return
    return math.sqrt(x)

https://www.python.org/dev/peps/pep-0008/#programming-recommendations


基本上,如果您曾经非None在函数中值,则意味着返回值具有含义,并且被调用方捕获。因此,当您返回时None,它也必须是显式的,以None在这种情况下传达含义,它是可能的返回值之一。

如果您根本不需要返回,则函数基本上是作为过程而不是函数工作的,因此不要包括 return语句。

如果您正在编写类似过程的函数,并且有机会早点返回(即您已经完成了此操作,不需要执行其余的函数),则可以使用empty returns向读者发出信号这只是执行的早期完成,None隐式返回的值没有任何意义,也不意味着被捕获(类似过程的函数始终返回None)。

As other have answered, the result is exactly the same, None is returned in all cases.

The difference is stylistic, but please note that PEP8 requires the use to be consistent:

Be consistent in return statements. Either all return statements in a function should return an expression, or none of them should. If any return statement returns an expression, any return statements where no value is returned should explicitly state this as return None, and an explicit return statement should be present at the end of the function (if reachable).

Yes:

def foo(x):
    if x >= 0:
        return math.sqrt(x)
    else:
        return None

def bar(x):
    if x < 0:
        return None
    return math.sqrt(x)

No:

def foo(x):
    if x >= 0:
        return math.sqrt(x)

def bar(x):
    if x < 0:
        return
    return math.sqrt(x)

https://www.python.org/dev/peps/pep-0008/#programming-recommendations


Basically, if you ever return non-None value in a function, it means the return value has meaning and is meant to be caught by callers. So when you return None, it must also be explicit, to convey None in this case has meaning, it is one of the possible return values.

If you don’t need return at all, you function basically works as a procedure instead of a function, so just don’t include the return statement.

If you are writing a procedure-like function and there is an opportunity to return earlier (i.e. you are already done at that point and don’t need to execute the remaining of the function) you may use empty an returns to signal for the reader it is just an early finish of execution and the None value returned implicitly doesn’t have any meaning and is not meant to be caught (the procedure-like function always returns None anyway).


回答 4

就功能而言,它们都是相同的,它们之间的区别在于代码的可读性和样式(要考虑的重要因素)

In terms of functionality these are all the same, the difference between them is in code readability and style (which is important to consider)


Python中的空对象?

问题:Python中的空对象?

如何在Python中引用null对象?

How do I refer to the null object in Python?


回答 0

在Python中,“空”对象是singleton None

检查事物是否为“无”的最好方法是使用身份运算符is

if foo is None:
    ...

In Python, the ‘null’ object is the singleton None.

The best way to check things for “Noneness” is to use the identity operator, is:

if foo is None:
    ...

回答 1

None,Python是否为空?

nullPython中没有,而是None。如前所述,测试是否已将某些None值作为值给出的最准确方法是使用is标识运算符,该运算符用于测试两个变量是否引用同一对象。

>>> foo is None
True
>>> foo = 'bar' 
>>> foo is None
False

基础

有并且只能是一个 None

None是该类的唯一实例,NoneType并且任何进一步实例化该类的尝试都将返回相同的对象,从而形成None单例。Python的新手经常会看到提到的错误消息,NoneType并想知道它是什么。我个人认为,这些消息仅可以None按名称提及,因为我们将很快看到,这None几乎没有歧义的余地。因此,如果您看到某些TypeError消息提到NoneType无法执行或无法执行该操作,则只知道它只是None被以一种无法使用的方式使用了。

另外,它None是一个内置常数,一旦您启动Python,它便可以在任何地方使用,无论是在模块,类还是函数中。NoneType相反,不是,您首先需要通过查询None其类来获取对其的引用。

>>> NoneType
NameError: name 'NoneType' is not defined
>>> type(None)
NoneType

您可以None使用Python的identity函数检查其唯一性id()。它返回分配给一个对象的唯一编号,每个对象都有一个。如果两个变量的id相同,则它们实际上指向同一对象。

>>> NoneType = type(None)
>>> id(None)
10748000
>>> my_none = NoneType()
>>> id(my_none)
10748000
>>> another_none = NoneType()
>>> id(another_none)
10748000
>>> def function_that_does_nothing(): pass
>>> return_value = function_that_does_nothing()
>>> id(return_value)
10748000

None 不能覆盖

在Python的较旧版本(2.4之前)中,可以重新分配None,但现在不再可用。甚至不作为类属性或在函数范围内。

# In Python 2.7
>>> class SomeClass(object):
...     def my_fnc(self):
...             self.None = 'foo'
SyntaxError: cannot assign to None
>>> def my_fnc():
        None = 'foo'
SyntaxError: cannot assign to None

# In Python 3.5
>>> class SomeClass:
...     def my_fnc(self):
...             self.None = 'foo'
SyntaxError: invalid syntax
>>> def my_fnc():
        None = 'foo'
SyntaxError: cannot assign to keyword

因此可以安全地假设所有None引用都是相同的。没有“风俗” None

测试操作员的None使用is

在编写代码时,您可能会想像这样来测试None

if value==None:
    pass

或者像这样测试虚假

if not value:
    pass

您需要了解其含义,以及为什么要明确地说明它通常是一个好主意。

情况1:测试值是否为 None

为什么这样做

value is None

而不是

value==None

第一个等效于:

id(value)==id(None)

而表达式value==None实际上是这样应用的

value.__eq__(None)

如果价值确实是,None那么您将得到期望的结果。

>>> nothing = function_that_does_nothing()
>>> nothing.__eq__(None)
True

在大多数情况下,结果是相同的,但是该__eq__()方法打开了一扇门,使准确性的保证无效,因为可以在类中覆盖它以提供特殊的行为。

考虑这个类。

>>> class Empty(object):
...     def __eq__(self, other):
...         return not other

所以你试一下就None可以了

>>> empty = Empty()
>>> empty==None
True

但随后它也适用于空字符串

>>> empty==''
True

但是

>>> ''==None
False
>>> empty is None
False

情况2:None用作布尔值

以下两项测试

if value:
    # do something

if not value:
    # do something

实际上被评估为

if bool(value):
    # do something

if not bool(value):
    # do something

None是“ falsey”,表示如果将其强制转换为布尔值,它将返回False,如果应用了not运算符,它将返回True。但是请注意,它不是唯一的属性None。除了False本身之外,该属性还由空列表,元组,集合,字典,字符串以及0以及实现__bool__()magic方法的类中的所有对象共享来共享False

>>> bool(None)
False
>>> not None
True

>>> bool([])
False
>>> not []
True

>>> class MyFalsey(object):
...     def __bool__(self):
...         return False
>>> f = MyFalsey()
>>> bool(f)
False
>>> not f
True

因此,当以以下方式测试变量时,请特别注意要包含或排除的内容:

def some_function(value=None):
    if not value:
        value = init_value()

在上面,您是要init_value()在将值专门设置为时调用None,还是要将值设置为0,空字符串或空列表也触发初始化。就像我说的,要注意。通常在Python中,显式要比隐式好

None 在实践中

None 用作信号值

None在Python中具有特殊的地位。这是最喜欢的基准值,因为许多算法将其视为特殊值。在这种情况下,它可以用作标志,以表明某种情况需要某种特殊处理(例如,设置默认值)。

您可以分配None给函数的关键字参数,然后对其进行显式测试。

def my_function(value, param=None):
    if param is None:
        # do something outrageous!

尝试获取对象的属性时,可以将其作为默认值返回,然后在执行特殊操作之前对其进行显式测试。

value = getattr(some_obj, 'some_attribute', None)
if value is None:
    # do something spectacular!

默认情况下,尝试访问不存在的键时,字典的get()方法返回None

>>> some_dict = {}
>>> value = some_dict.get('foo')
>>> value is None
True

如果您尝试使用下标符号访问它,KeyError则会引发a

>>> value = some_dict['foo']
KeyError: 'foo'

同样,如果您尝试弹出一个不存在的项目

>>> value = some_dict.pop('foo')
KeyError: 'foo'

您可以使用通常设置为的默认值来抑制 None

value = some_dict.pop('foo', None)
if value is None:
    # booom!

None 用作标志和有效值

None当它不被视为有效值,而更像是执行某些特殊操作的信号时,上面描述的apply 用法。但是,在某些情况下,有时重要的是知道None来自哪里,因为即使将其用作信号,它也可能是数据的一部分。

当您查询对象以getattr(some_obj, 'attribute_name', None)获取其属性时,None它不会告诉您您尝试访问的属性是否设置为None对象,或者对象是否完全不存在。从字典访问密钥的情况相同,例如some_dict.get('some_key'),您不知道它some_dict['some_key']是否丢失了,或者只是将其设置为None。如果您需要这些信息,通常的处理方法是直接尝试从try/except构造中访问属性或键:

try:
    # equivalent to getattr() without specifying a default
    # value = getattr(some_obj, 'some_attribute')
    value = some_obj.some_attribute
    # now you handle `None` the data here
    if value is None:
        # do something here because the attribute was set to None
except AttributeError:
    # we're now hanling the exceptional situation from here.
    # We could assign None as a default value if required.
    value = None 
    # In addition, since we now know that some_obj doesn't have the
    # attribute 'some_attribute' we could do something about that.
    log_something(some_obj)

与dict类似:

try:
    value = some_dict['some_key']
    if value is None:
        # do something here because 'some_key' is set to None
except KeyError:
    # set a default 
    value = None
    # and do something because 'some_key' was missing
    # from the dict.
    log_something(some_dict)

上面的两个示例显示了如何处理对象和字典的情况,函数呢?同样,但为此我们使用了double asterisks关键字参数:

def my_function(**kwargs):
    try:
        value = kwargs['some_key'] 
        if value is None:
            # do something because 'some_key' is explicitly 
            # set to None
    except KeyError:
        # we assign the default
        value = None
        # and since it's not coming from the caller.
        log_something('did not receive "some_key"')

None 仅用作有效值

如果您发现您的代码中散布着上述try/except模式,只是为了区分None标志和None数据,则只需使用另一个测试值即可。有一种模式是,将超出有效值集的值作为数据的一部分插入数据结构中,并用于控制和测试特殊条件(例如边界,状态等)。这样的值称为哨兵,可以将其用作None信号。在Python中创建一个哨兵很简单。

undefined = object()

undefined上面的对象是唯一的,并且不执行任何程序可能感兴趣的任何事情,因此,它是None作为flag 的绝佳替代品。一些注意事项适用,有关代码之后的更多说明。

具有功能

def my_function(value, param1=undefined, param2=undefined):
    if param1 is undefined:
        # we know nothing was passed to it, not even None
        log_something('param1 was missing')
        param1 = None


    if param2 is undefined:
        # we got nothing here either
        log_something('param2 was missing')
        param2 = None

与字典

value = some_dict.get('some_key', undefined)
if value is None:
    log_something("'some_key' was set to None")

if value is undefined:
    # we know that the dict didn't have 'some_key'
    log_something("'some_key' was not set at all")
    value = None

带物体

value = getattr(obj, 'some_attribute', undefined) 
if value is None:
    log_something("'obj.some_attribute' was set to None")
if value is undefined:
    # we know that there's no obj.some_attribute
    log_something("no 'some_attribute' set on obj")
    value = None

正如我之前提到的,自定义哨兵带有一些警告。首先,它们不是类似的关键字None,因此python不能保护它们。您可以undefined随时在定义的模块中的任何位置覆盖上面的内容,因此请谨慎使用它们。接下来,by返回的实例object()不是单例,如果您进行10次该调用,则会得到10个不同的对象。最后,哨兵的使用是高度特质的。前哨特定于它所使用的库,因此,其范围通常应限于库的内部。它不应该“泄漏”出去。仅当外部代码的目的是扩展或补充库的API时,外部代码才应意识到这一点。

None, Python’s null?

There’s no null in Python, instead there’s None. As stated already the most accurate way to test that something has been given None as a value is to use the is identity operator, which tests that two variables refer to the same object.

>>> foo is None
True
>>> foo = 'bar' 
>>> foo is None
False

The basics

There is and can only be one None

None is the sole instance of the class NoneType and any further attempts at instantiating that class will return the same object, which makes None a singleton. Newcomers to Python often see error messages that mention NoneType and wonder what it is. It’s my personal opinion that these messages could simply just mention None by name because, as we’ll see shortly, None leaves little room to ambiguity. So if you see some TypeError message that mentions that NoneType can’t do this or can’t do that, just know that it’s simply the one None that was being used in a way that it can’t.

Also, None is a built-in constant, as soon as you start Python it’s available to use from everywhere, whether in module, class, or function. NoneType by contrast is not, you’d need to get a reference to it first by querying None for its class.

>>> NoneType
NameError: name 'NoneType' is not defined
>>> type(None)
NoneType

You can check None‘s uniqueness with Python’s identity function id(). It returns the unique number assigned to an object, each object has one. If the id of two variables is the same, then they point in fact to the same object.

>>> NoneType = type(None)
>>> id(None)
10748000
>>> my_none = NoneType()
>>> id(my_none)
10748000
>>> another_none = NoneType()
>>> id(another_none)
10748000
>>> def function_that_does_nothing(): pass
>>> return_value = function_that_does_nothing()
>>> id(return_value)
10748000

None cannot be overwritten

In much older version of Python (before 2.4) it was possible to reassign None, but not anymore. Not even as a class attribute or in the confines of a function.

# In Python 2.7
>>> class SomeClass(object):
...     def my_fnc(self):
...             self.None = 'foo'
SyntaxError: cannot assign to None
>>> def my_fnc():
        None = 'foo'
SyntaxError: cannot assign to None

# In Python 3.5
>>> class SomeClass:
...     def my_fnc(self):
...             self.None = 'foo'
SyntaxError: invalid syntax
>>> def my_fnc():
        None = 'foo'
SyntaxError: cannot assign to keyword

It’s therefore safe to assume that all None references are the same. There’s no “custom” None.

To test for None use the is operator

When writing code you might be tempted to test for Noneness like this:

if value==None:
    pass

Or to test for falsehood like this

if not value:
    pass

You need to understand the implications and why it’s often a good idea to be explicit.

Case 1: testing if a value is None

why do this

value is None

rather than

value==None

The first is equivalent to:

id(value)==id(None)

Whereas the expression value==None is in fact applied like this

value.__eq__(None)

if the value really is None then you’ll get what you expected.

>>> nothing = function_that_does_nothing()
>>> nothing.__eq__(None)
True

In most common cases the outcome will be the same, but the __eq__() method opens a door that voids any guarantee of accuracy, since it can be overridden in a class to provide special behavior.

Consider this class.

>>> class Empty(object):
...     def __eq__(self, other):
...         return not other

So you try it on None and it works

>>> empty = Empty()
>>> empty==None
True

But then it also works on the empty string

>>> empty==''
True

And yet

>>> ''==None
False
>>> empty is None
False

Case 2: Using None as a boolean

The following two tests

if value:
    # do something

if not value:
    # do something

are in fact evaluated as

if bool(value):
    # do something

if not bool(value):
    # do something

None is a “falsey”, meaning that if cast to a boolean it will return False and if applied the not operator it will return True. Note however that it’s not a property unique to None. In addition to False itself, the property is shared by empty lists, tuples, sets, dicts, strings, as well as 0, and all objects from classes that implement the __bool__() magic method to return False.

>>> bool(None)
False
>>> not None
True

>>> bool([])
False
>>> not []
True

>>> class MyFalsey(object):
...     def __bool__(self):
...         return False
>>> f = MyFalsey()
>>> bool(f)
False
>>> not f
True

So when testing for variables in the following way, be extra aware of what you’re including or excluding from the test:

def some_function(value=None):
    if not value:
        value = init_value()

In the above, did you mean to call init_value() when the value is set specifically to None, or did you mean that a value set to 0, or the empty string, or an empty list should also trigger the initialization. Like I said, be mindful. As it’s often the case in Python explicit is better than implicit.

None in practice

None used as a signal value

None has a special status in Python. It’s a favorite baseline value because many algorithms treat it as an exceptional value. In such scenarios it can be used as a flag to signal that a condition requires some special handling (such as the setting of a default value).

You can assign None to the keyword arguments of a function and then explicitly test for it.

def my_function(value, param=None):
    if param is None:
        # do something outrageous!

You can return it as the default when trying to get to an object’s attribute and then explicitly test for it before doing something special.

value = getattr(some_obj, 'some_attribute', None)
if value is None:
    # do something spectacular!

By default a dictionary’s get() method returns None when trying to access a non-existing key:

>>> some_dict = {}
>>> value = some_dict.get('foo')
>>> value is None
True

If you were to try to access it by using the subscript notation a KeyError would be raised

>>> value = some_dict['foo']
KeyError: 'foo'

Likewise if you attempt to pop a non-existing item

>>> value = some_dict.pop('foo')
KeyError: 'foo'

which you can suppress with a default value that is usually set to None

value = some_dict.pop('foo', None)
if value is None:
    # booom!

None used as both a flag and valid value

The above described uses of None apply when it is not considered a valid value, but more like a signal to do something special. There are situations however where it sometimes matters to know where None came from because even though it’s used as a signal it could also be part of the data.

When you query an object for its attribute with getattr(some_obj, 'attribute_name', None) getting back None doesn’t tell you if the attribute you were trying to access was set to None or if it was altogether absent from the object. Same situation when accessing a key from a dictionary like some_dict.get('some_key'), you don’t know if some_dict['some_key'] is missing or if it’s just set to None. If you need that information, the usual way to handle this is to directly attempt accessing the attribute or key from within a try/except construct:

try:
    # equivalent to getattr() without specifying a default
    # value = getattr(some_obj, 'some_attribute')
    value = some_obj.some_attribute
    # now you handle `None` the data here
    if value is None:
        # do something here because the attribute was set to None
except AttributeError:
    # we're now hanling the exceptional situation from here.
    # We could assign None as a default value if required.
    value = None 
    # In addition, since we now know that some_obj doesn't have the
    # attribute 'some_attribute' we could do something about that.
    log_something(some_obj)

Similarly with dict:

try:
    value = some_dict['some_key']
    if value is None:
        # do something here because 'some_key' is set to None
except KeyError:
    # set a default 
    value = None
    # and do something because 'some_key' was missing
    # from the dict.
    log_something(some_dict)

The above two examples show how to handle object and dictionary cases, what about functions? Same thing, but we use the double asterisks keyword argument to that end:

def my_function(**kwargs):
    try:
        value = kwargs['some_key'] 
        if value is None:
            # do something because 'some_key' is explicitly 
            # set to None
    except KeyError:
        # we assign the default
        value = None
        # and since it's not coming from the caller.
        log_something('did not receive "some_key"')

None used only as a valid value

If you find that your code is littered with the above try/except pattern simply to differentiate between None flags and None data, then just use another test value. There’s a pattern where a value that falls outside the set of valid values is inserted as part of the data in a data structure and is used to control and test special conditions (e.g. boundaries, state, etc). Such a value is called a sentinel and it can be used the way None is used as a signal. It’s trivial to create a sentinel in Python.

undefined = object()

The undefined object above is unique and doesn’t do much of anything that might be of interest to a program, it’s thus an excellent replacement for None as a flag. Some caveats apply, more about that after the code.

With function

def my_function(value, param1=undefined, param2=undefined):
    if param1 is undefined:
        # we know nothing was passed to it, not even None
        log_something('param1 was missing')
        param1 = None


    if param2 is undefined:
        # we got nothing here either
        log_something('param2 was missing')
        param2 = None

With dict

value = some_dict.get('some_key', undefined)
if value is None:
    log_something("'some_key' was set to None")

if value is undefined:
    # we know that the dict didn't have 'some_key'
    log_something("'some_key' was not set at all")
    value = None

With an object

value = getattr(obj, 'some_attribute', undefined) 
if value is None:
    log_something("'obj.some_attribute' was set to None")
if value is undefined:
    # we know that there's no obj.some_attribute
    log_something("no 'some_attribute' set on obj")
    value = None

As I mentioned earlier custom sentinels come with some caveats. First, they’re not keywords like None, so python doesn’t protect them. You can overwrite your undefined above at any time, anywhere in the module it’s defined, so be careful how you expose and use them. Next, the instance returned by object() is not a singleton, if you make that call 10 times you get 10 different objects. Finally, usage of a sentinel is highly idiosyncratic. A sentinel is specific to the library it’s used in and as such its scope should generally be limited to the library’s internals. It shouldn’t “leak” out. External code should only become aware of it, if their purpose is to extend or supplement the library’s API.


回答 2

它不像其他语言那样称为null,而是None。此对象始终只有一个实例,因此您可以根据需要使用x is None(同一性比较)而不是来检查是否相等x == None

It’s not called null as in other languages, but None. There is always only one instance of this object, so you can check for equivalence with x is None (identity comparison) instead of x == None, if you want.


回答 3

在Python中,要表示缺少值,可以对对象使用None值(types.NoneType.None),对字符串使用“”(或len()== 0)。因此:

if yourObject is None:  # if yourObject == None:
    ...

if yourString == "":  # if yourString.len() == 0:
    ...

关于“ ==”和“ is”之间的区别,使用“ ==”测试对象身份就足够了。但是,由于将“ is”操作定义为对象标识操作,因此使用它而不是“ ==”可能更正确。不知道是否存在速度差异。

无论如何,您可以看一下:

In Python, to represent the absence of a value, you can use the None value (types.NoneType.None) for objects and “” (or len() == 0) for strings. Therefore:

if yourObject is None:  # if yourObject == None:
    ...

if yourString == "":  # if yourString.len() == 0:
    ...

Regarding the difference between “==” and “is”, testing for object identity using “==” should be sufficient. However, since the operation “is” is defined as the object identity operation, it is probably more correct to use it, rather than “==”. Not sure if there is even a speed difference.

Anyway, you can have a look at:


回答 4

Null是一种特殊的对象类型,例如:

>>>type(None)
<class 'NoneType'>

您可以检查对象是否在类“ NoneType”中:

>>>variable = None
>>>variable is None
True

有关更多信息,请参见Python Docs。

Null is a special object type like:

>>>type(None)
<class 'NoneType'>

You can check if an object is in class ‘NoneType’:

>>>variable = None
>>>variable is None
True

More information is available at Python Docs


回答 5

Per Truth值测试,“ None”直接测试为FALSE,因此最简单的表达式就足够了:

if not foo:

Per Truth value testing, ‘None’ directly tests as FALSE, so the simplest expression will suffice:

if not foo: