Python 实用宝典

Question 1

I’m looking for the fastest way to check for the occurrence of NaN (np.nan) in a NumPy array X. np.isnan(X) is out of the question, since it builds a boolean array of shape X.shape, which is potentially gigantic.

I tried np.nan in X, but that seems not to work because np.nan != np.nan. Is there a fast and memory-efficient way to do this at all?

(To those who would ask “how gigantic”: I can’t tell. This is input validation for library code.)

Question 2

Ray’s solution is good. However, on my machine it is about 2.5x faster to use numpy.sum in place of numpy.min:

In [13]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 244 us per loop

In [14]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 97.3 us per loop

Unlike min, sum doesn’t require branching, which on modern hardware tends to be pretty expensive. This is probably the reason why sum is faster.

edit The above test was performed with a single NaN right in the middle of the array.

It is interesting to note that min is slower in the presence of NaNs than in their absence. It also seems to get slower as NaNs get closer to the start of the array. On the other hand, sum‘s throughput seems constant regardless of whether there are NaNs and where they’re located:

In [40]: x = np.random.rand(100000)

In [41]: %timeit np.isnan(np.min(x))
10000 loops, best of 3: 153 us per loop

In [42]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop

In [43]: x[50000] = np.nan

In [44]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 239 us per loop

In [45]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.8 us per loop

In [46]: x[0] = np.nan

In [47]: %timeit np.isnan(np.min(x))
1000 loops, best of 3: 326 us per loop

In [48]: %timeit np.isnan(np.sum(x))
10000 loops, best of 3: 95.9 us per loop

Question 3

I think np.isnan(np.min(X)) should do what you want.

Question 4

Even there exist an accepted answer, I’ll like to demonstrate the following (with Python 2.7.2 and Numpy 1.6.0 on Vista):

In []: x= rand(1e5)
In []: %timeit isnan(x.min())
10000 loops, best of 3: 200 us per loop
In []: %timeit isnan(x.sum())
10000 loops, best of 3: 169 us per loop
In []: %timeit isnan(dot(x, x))
10000 loops, best of 3: 134 us per loop

In []: x[5e4]= NaN
In []: %timeit isnan(x.min())
100 loops, best of 3: 4.47 ms per loop
In []: %timeit isnan(x.sum())
100 loops, best of 3: 6.44 ms per loop
In []: %timeit isnan(dot(x, x))
10000 loops, best of 3: 138 us per loop

Thus, the really efficient way might be heavily dependent on the operating system. Anyway dot(.) based seems to be the most stable one.

Question 5

There are two general approaches here:

Check each array item for nan and take any.
Apply some cumulative operation that preserves nans (like sum) and check its result.

While the first approach is certainly the cleanest, the heavy optimization of some of the cumulative operations (particularly the ones that are executed in BLAS, like dot) can make those quite fast. Note that dot, like some other BLAS operations, are multithreaded under certain conditions. This explains the difference in speed between different machines.

import numpy
import perfplot


def min(a):
    return numpy.isnan(numpy.min(a))


def sum(a):
    return numpy.isnan(numpy.sum(a))


def dot(a):
    return numpy.isnan(numpy.dot(a, a))


def any(a):
    return numpy.any(numpy.isnan(a))


def einsum(a):
    return numpy.isnan(numpy.einsum("i->", a))


perfplot.show(
    setup=lambda n: numpy.random.rand(n),
    kernels=[min, sum, dot, any, einsum],
    n_range=[2 ** k for k in range(20)],
    logx=True,
    logy=True,
    xlabel="len(a)",
)

Question 6

use .any()

if numpy.isnan(myarray).any()
numpy.isfinite maybe better than isnan for checking

if not np.isfinite(prop).all()

Question 7

If you’re comfortable with numba it allows to create a fast short-circuit (stops as soon as a NaN is found) function:

import numba as nb
import math

@nb.njit
def anynan(array):
    array = array.ravel()
    for i in range(array.size):
        if math.isnan(array[i]):
            return True
    return False

If there is no NaN the function might actually be slower than np.min, I think that’s because np.min uses multiprocessing for large arrays:

import numpy as np
array = np.random.random(2000000)

%timeit anynan(array)          # 100 loops, best of 3: 2.21 ms per loop
%timeit np.isnan(array.sum())  # 100 loops, best of 3: 4.45 ms per loop
%timeit np.isnan(array.min())  # 1000 loops, best of 3: 1.64 ms per loop

But in case there is a NaN in the array, especially if it’s position is at low indices, then it’s much faster:

array = np.random.random(2000000)
array[100] = np.nan

%timeit anynan(array)          # 1000000 loops, best of 3: 1.93 µs per loop
%timeit np.isnan(array.sum())  # 100 loops, best of 3: 4.57 ms per loop
%timeit np.isnan(array.min())  # 1000 loops, best of 3: 1.65 ms per loop

Similar results may be achieved with Cython or a C extension, these are a bit more complicated (or easily avaiable as bottleneck.anynan) but ultimatly do the same as my anynan function.

Question 8

Related to this is the question of how to find the first occurrence of NaN. This is the fastest way to handle that that I know of:

index = next((i for (i,n) in enumerate(iterable) if n!=n), None)

Question 9

>>> (float('inf')+0j)*1
(inf+nanj)

Why? This caused a nasty bug in my code.

Why isn’t 1 the multiplicative identity, giving (inf + 0j)?

Question 10

The 1 is converted to a complex number first, 1 + 0j, which then leads to an inf * 0 multiplication, resulting in a nan.

(inf + 0j) * 1
(inf + 0j) * (1 + 0j)
inf * 1  + inf * 0j  + 0j * 1 + 0j * 0j
#          ^ this is where it comes from
inf  + nan j  + 0j - 0
inf  + nan j

Question 11

Mechanistically, the accepted answer is, of course, correct, but I would argue that a deeper ansswer can be given.

First, it is useful to clarify the question as @PeterCordes does in a comment: “Is there a multiplicative identity for complex numbers that does work on inf + 0j?” or in other words is what OP sees a weakness in the computer implementation of complex multiplication or is there something conceptually unsound with inf+0j

Short answer:

Using polar coordinates we can view complex multiplication as a scaling and a rotation. Rotating an infinite “arm” even by 0 degrees as in the case of multiplying by one we cannot expect to place its tip with finite precision. So indeed, there is something fundamentally not right with inf+0j, namely, that as soon as we are at infinity a finite offset becomes meaningless.

Long answer:

Background: The “big thing” around which this question revolves is the matter of extending a system of numbers (think reals or complex numbers). One reason one might want to do that is to add some concept of infinity, or to “compactify” if one happens to be a mathematician. There are other reasons, too (https://en.wikipedia.org/wiki/Galois_theory, https://en.wikipedia.org/wiki/Non-standard_analysis), but we are not interested in those here.

One point compactification

The tricky bit about such an extension is, of course, that we want these new numbers to fit into the existing arithmetic. The simplest way is to add a single element at infinity (https://en.wikipedia.org/wiki/Alexandroff_extension) and make it equal anything but zero divided by zero. This works for the reals (https://en.wikipedia.org/wiki/Projectively_extended_real_line) and the complex numbers (https://en.wikipedia.org/wiki/Riemann_sphere).

Other extensions …

While the one point compactification is simple and mathematically sound, “richer” extensions comprising multiple infinties have been sought. The IEEE 754 standard for real floating point numbers has +inf and -inf (https://en.wikipedia.org/wiki/Extended_real_number_line). Looks natural and straightforward but already forces us to jump through hoops and invent stuff like -0 https://en.wikipedia.org/wiki/Signed_zero

… of the complex plane

What about more-than-one-inf extensions of the complex plane?

In computers, complex numbers are typically implemented by sticking two fp reals together one for the real and one for the imaginary part. That is perfectly fine as long as everything is finite. As soon, however, as infinities are considered things become tricky.

The complex plane has a natural rotational symmetry, which ties in nicely with complex arithmetic as multiplying the entire plane by e^phij is the same as a phi radian rotation around 0.

That annex G thing

Now, to keep things simple, complex fp simply uses the extensions (+/-inf, nan etc.) of the underlying real number implementation. This choice may seem so natural it isn’t even perceived as a choice, but let’s take a closer look at what it implies. A simple visualization of this extension of the complex plane looks like (I = infinite, f = finite, 0 = 0)

I IIIIIIIII I
             
I fffffffff I
I fffffffff I
I fffffffff I
I fffffffff I
I ffff0ffff I
I fffffffff I
I fffffffff I
I fffffffff I
I fffffffff I
             
I IIIIIIIII I

But since a true complex plane is one that respects complex multiplication, a more informative projection would be

     III    
 I         I  
    fffff    
   fffffff   
  fffffffff  
I fffffffff I
I ffff0ffff I
I fffffffff I
  fffffffff  
   fffffff   
    fffff    
 I         I 
     III

In this projection we see the “uneven distribution” of infinities that is not only ugly but also the root of problems of the kind OP has suffered: Most infinities (those of the forms (+/-inf, finite) and (finite, +/-inf) are lumped together at the four principal directions all other directions are represented by just four infinities (+/-inf, +-inf). It shouldn’t come as a surprise that extending complex multiplication to this geometry is a nightmare.

Annex G of the C99 spec tries its best to make it work, including bending the rules on how inf and nan interact (essentially inf trumps nan). OP’s problem is sidestepped by not promoting reals and a proposed purely imaginary type to complex, but having the real 1 behave differently from the complex 1 doesn’t strike me as a solution. Tellingly, Annex G stops short of fully specifying what the product of two infinities should be.

Can we do better?

It is tempting to try and fix these problems by choosing a better geometry of infinities. In analogy to the extended real line we could add one infinity for each direction. This construction is similar to the projective plane but doesn’t lump together opposite directions. Infinities would be represented in polar coordinates inf x e^{2 omega pi i}, defining products would be straightforward. In particular, OP’s problem would be solved quite naturally.

But this is where the good news ends. In a way we can be hurled back to square one by—not unreasonably—requiring that our newstyle infinities support functions that extract their real or imaginary parts. Addition is another problem; adding two nonantipodal infinities we’d have to set the angle to undefined i.e. nan (one could argue the angle must lie between the two input angles but there is no simple way of representing that “partial nan-ness”)

Riemann to the rescue

In view of all this maybe the good old one point compactification is the safest thing to do. Maybe the authors of Annex G felt the same when mandating a function cproj that lumps all the infinities together.

Here is a related question answered by people more competent on the subject matter than I am.

Question 12

This is an implementation detail of how complex multiplication is implemented in CPython. Unlike other languages (e.g. C or C++), CPython takes a somewhat simplistic approach:

ints/floats are promoted to complex numbers in multiplication
the simple school-formula is used, which doesn’t provide desired/expected results as soon as infinite numbers are involved:

Py_complex
_Py_c_prod(Py_complex a, Py_complex b)
{
    Py_complex r;
    r.real = a.real*b.real - a.imag*b.imag;
    r.imag = a.real*b.imag + a.imag*b.real;
    return r;
}

One problematic case with the above code would be:

(0.0+1.0*j)*(inf+inf*j) = (0.0*inf-1*inf)+(0.0*inf+1.0*inf)j
                        =  nan + nan*j

However, one would like to have -inf + inf*j as result.

In this respect other languages are not far ahead: complex number multiplication was for long a time not part of the C standard, included only in C99 as appendix G, which describes how a complex multiplication should be performed – and it is not as simple as the school formula above! The C++ standard doesn’t specify how complex multiplication should work, thus most compiler implementations are falling back to C-implementation, which might be C99 conforming (gcc, clang) or not (MSVC).

For the above “problematic” example, C99-compliant implementations (which are more complicated than the school formula) would give (see live) the expected result:

(0.0+1.0*j)*(inf+inf*j) = -inf + inf*j

Even with C99 standard, an unambiguous result is not defined for all inputs and it might be different even for C99-compliant versions.

Another side effect of float not being promoted to complex in C99 is that multiplyinginf+0.0j with 1.0 or 1.0+0.0j can lead to different results (see here live):

(inf+0.0j)*1.0 = inf+0.0j
(inf+0.0j)*(1.0+0.0j) = inf-nanj, imaginary part being -nan and not nan (as for CPython) doesn’t play a role here, because all quiet nans are equivalent (see this), even some of them have sign-bit set (and thus printed as “-“, see this) and some not.

Which is at least counter-intuitive.

My key take-away from it is: there is nothing simple about “simple” complex number multiplication (or division) and when switching between languages or even compilers one must brace oneself for subtle bugs/differences.

Question 13

Funny definition from Python. If we are solving this with a pen and paper I would say that expected result would be expected: (inf + 0j) as you pointed out because we know that we mean the norm of 1 so (float('inf')+0j)*1 =should= ('inf'+0j):

But that is not the case as you can see… when we run it we get:

>>> Complex( float('inf') , 0j ) * 1
result: (inf + nanj)

Python understands this *1 as a complex number and not the norm of 1 so it interprets as *(1+0j) and the error appears when we try to do inf * 0j = nanj as inf*0 can’t be resolved.

What you actually want to do (assuming 1 is the norm of 1):

Recall that if z = x + iy is a complex number with real part x and imaginary part y, the complex conjugate of z is defined as z* = x − iy, and the absolute value, also called the norm of z is defined as:

Assuming 1 is the norm of 1 we should do something like:

>>> c_num = complex(float('inf'),0)
>>> value = 1
>>> realPart=(c_num.real)*value
>>> imagPart=(c_num.imag)*value
>>> complex(realPart,imagPart)
result: (inf+0j)

not very intuitive I know… but sometimes coding languages get defined in a different way from what we are used in our day to day.

Question 14

I’d like to replace bad values in a column of a dataframe by NaN’s.

mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
df = pd.DataFrame(mydata)

df[df.y == 'N/A']['y'] = np.nan

Though, the last line fails and throws a warning because it’s working on a copy of df. So, what’s the correct way to handle this? I’ve seen many solutions with iloc or ix but here, I need to use a boolean condition.

Question 15

just use replace:

In [106]:
df.replace('N/A',np.NaN)

Out[106]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

What you’re trying is called chain indexing: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

You can use loc to ensure you operate on the original dF:

In [108]:
df.loc[df['y'] == 'N/A','y'] = np.nan
df

Out[108]:
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

Question 16

While using replace seems to solve the problem, I would like to propose an alternative. Problem with mix of numeric and some string values in the column not to have strings replaced with np.nan, but to make whole column proper. I would bet that original column most likely is of an object type

Name: y, dtype: object

What you really need is to make it a numeric column (it will have proper type and would be quite faster), with all non-numeric values replaced by NaN.

Thus, good conversion code would be

pd.to_numeric(df['y'], errors='coerce')

Specify errors='coerce' to force strings that can’t be parsed to a numeric value to become NaN. Column type would be

Name: y, dtype: float64

Question 17

You can use replace:

df['y'] = df['y'].replace({'N/A': np.nan})

Also be aware of the inplace parameter for replace. You can do something like:

df.replace({'N/A': np.nan}, inplace=True)

This will replace all instances in the df without creating a copy.

Similarly, if you run into other types of unknown values such as empty string or None value:

df['y'] = df['y'].replace({'': np.nan})

df['y'] = df['y'].replace({None: np.nan})

Reference: Pandas Latest – Replace

Question 18

df.loc[df.y == 'N/A',['y']] = np.nan

This solve your problem. With the double [], you are working on a copy of the DataFrame. You have to specify exact location in one call to be able to modify it.

Question 19

You can try these snippets.

In [16]:mydata = {'x' : [10, 50, 18, 32, 47, 20], 'y' : ['12', '11', 'N/A', '13', '15', 'N/A']}
In [17]:df=pd.DataFrame(mydata)

In [18]:df.y[df.y=="N/A"]=np.nan

Out[19]:df 
    x    y
0  10   12
1  50   11
2  18  NaN
3  32   13
4  47   15
5  20  NaN

Question 20

As of pandas 1.0.0, you no longer need to use numpy to create null values in your dataframe. Instead you can just use pandas.NA (which is of type pandas._libs.missing.NAType), so it will be treated as null within the dataframe but will not be null outside dataframe context.

Question 21

我正在使用Python中的Pandas DataFrame。

File    heat    Farheit Temp_Rating
   1    YesQ         75         N/A
   1    NoR         115         N/A
   1    YesA         63         N/A
   1    NoT          83          41
   1    NoY         100          80
   1    YesZ         56          12
   2    YesQ        111         N/A
   2    NoR          60         N/A
   2    YesA         19         N/A
   2    NoT         106          77
   2    NoY          45          21
   2    YesZ         40          54
   3    YesQ         84         N/A
   3    NoR          67         N/A
   3    YesA         94         N/A
   3    NoT          68          39
   3    NoY          63          46
   3    YesZ         34          81

我需要用Temp_Rating列中的值替换列中的所有NaN Farheit。

这就是我需要的：

File        heat    Temp_Rating
   1        YesQ             75
   1         NoR            115
   1        YesA             63
   1        YesQ             41
   1         NoR             80
   1        YesA             12
   2        YesQ            111
   2         NoR             60
   2        YesA             19
   2         NoT             77
   2         NoY             21
   2        YesZ             54
   3        YesQ             84
   3         NoR             67
   3        YesA             94
   3         NoT             39
   3         NoY             46
   3        YesZ             81

如果我进行布尔选择，则一次只能选择其中一列。问题是，如果我随后尝试加入他们，那么在保留正确顺序的同时我将无法执行此操作。

如何只查找Temp_Rating带有NaNs的行并将其替换为该Farheit列同一行中的值？

Question 22

I am working with this Pandas DataFrame in Python.

File    heat    Farheit Temp_Rating
   1    YesQ         75         N/A
   1    NoR         115         N/A
   1    YesA         63         N/A
   1    NoT          83          41
   1    NoY         100          80
   1    YesZ         56          12
   2    YesQ        111         N/A
   2    NoR          60         N/A
   2    YesA         19         N/A
   2    NoT         106          77
   2    NoY          45          21
   2    YesZ         40          54
   3    YesQ         84         N/A
   3    NoR          67         N/A
   3    YesA         94         N/A
   3    NoT          68          39
   3    NoY          63          46
   3    YesZ         34          81

I need to replace all NaNs in the Temp_Rating column with the value from the Farheit column.

This is what I need:

File        heat    Temp_Rating
   1        YesQ             75
   1         NoR            115
   1        YesA             63
   1        YesQ             41
   1         NoR             80
   1        YesA             12
   2        YesQ            111
   2         NoR             60
   2        YesA             19
   2         NoT             77
   2         NoY             21
   2        YesZ             54
   3        YesQ             84
   3         NoR             67
   3        YesA             94
   3         NoT             39
   3         NoY             46
   3        YesZ             81

If I do a Boolean selection, I can pick out only one of these columns at a time. The problem is if I then try to join them, I am not able to do this while preserving the correct order.

How can I only find Temp_Rating rows with the NaNs and replace them with the value in the same row of the Farheit column?

Question 23

假设您的DataFrame位于df：

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

首先NaN用的对应值替换任何值df.Farheit。删除'Farheit'列。然后重命名列。结果DataFrame如下：

Question 24

Assuming your DataFrame is in df:

df.Temp_Rating.fillna(df.Farheit, inplace=True)
del df['Farheit']
df.columns = 'File heat Observations'.split()

First replace any NaN values with the corresponding value of df.Farheit. Delete the 'Farheit' column. Then rename the columns. Here’s the resulting DataFrame:

Question 25

上述解决方案对我不起作用。我使用的方法是：

df.loc[df['foo'].isnull(),'foo'] = df['bar']

Question 26

The above mentioned solutions did not work for me. The method I used was:

df.loc[df['foo'].isnull(),'foo'] = df['bar']

Question 27

解决这个问题的另一种方法，

import pandas as pd
import numpy as np

ts_df = pd.DataFrame([[1,"YesQ",75,],[1,"NoR",115,],[1,"NoT",63,13],[2,"YesT",43,71]],columns=['File','heat','Farheit','Temp'])


def fx(x):
    if np.isnan(x['Temp']):
        return x['Farheit']
    else:
        return x['Temp']
print(1,ts_df)
ts_df['Temp']=ts_df.apply(lambda x : fx(x),axis=1)

print(2,ts_df)

返回：

(1,    File  heat  Farheit  Temp                                                                                    
0     1  YesQ       75   NaN                                                                                        
1     1   NoR      115   NaN                                                                                        
2     1   NoT       63  13.0                                                                                        
3     2  YesT       43  71.0)                                                                                       
(2,    File  heat  Farheit   Temp                                                                                   
0     1  YesQ       75   75.0                                                                                       
1     1   NoR      115  115.0
2     1   NoT       63   13.0
3     2  YesT       43   71.0)

Question 28

An other way to solve this problem,

import pandas as pd
import numpy as np

ts_df = pd.DataFrame([[1,"YesQ",75,],[1,"NoR",115,],[1,"NoT",63,13],[2,"YesT",43,71]],columns=['File','heat','Farheit','Temp'])


def fx(x):
    if np.isnan(x['Temp']):
        return x['Farheit']
    else:
        return x['Temp']
print(1,ts_df)
ts_df['Temp']=ts_df.apply(lambda x : fx(x),axis=1)

print(2,ts_df)

returns:

(1,    File  heat  Farheit  Temp                                                                                    
0     1  YesQ       75   NaN                                                                                        
1     1   NoR      115   NaN                                                                                        
2     1   NoT       63  13.0                                                                                        
3     2  YesT       43  71.0)                                                                                       
(2,    File  heat  Farheit   Temp                                                                                   
0     1  YesQ       75   75.0                                                                                       
1     1   NoR      115  115.0
2     1   NoT       63   13.0
3     2  YesT       43   71.0)

Question 29

我有一个二维的numpy数组。此数组中的一些值为NaN。我想使用此数组执行某些操作。例如考虑数组：

[[   0.   43.   67.    0.   38.]
 [ 100.   86.   96.  100.   94.]
 [  76.   79.   83.   89.   56.]
 [  88.   NaN   67.   89.   81.]
 [  94.   79.   67.   89.   69.]
 [  88.   79.   58.   72.   63.]
 [  76.   79.   71.   67.   56.]
 [  71.   71.   NaN   56.  100.]]

我试图每次取一行，以相反的顺序对其进行排序，以从行中获取最多3个值并取其平均值。我试过的代码是：

# nparr is a 2D numpy array
for entry in nparr:
    sortedentry = sorted(entry, reverse=True)
    highest_3_values = sortedentry[:3]
    avg_highest_3 = float(sum(highest_3_values)) / 3

这不适用于包含的行NaN。我的问题是，有没有一种快速的方法可以将NaN2D numpy数组中的所有值都转换为零，这样我就不会遇到排序和其他尝试执行的操作。

Question 30

I have a 2D numpy array. Some of the values in this array are NaN. I want to perform certain operations using this array. For example consider the array:

[[   0.   43.   67.    0.   38.]
 [ 100.   86.   96.  100.   94.]
 [  76.   79.   83.   89.   56.]
 [  88.   NaN   67.   89.   81.]
 [  94.   79.   67.   89.   69.]
 [  88.   79.   58.   72.   63.]
 [  76.   79.   71.   67.   56.]
 [  71.   71.   NaN   56.  100.]]

I am trying to take each row, one at a time, sort it in reversed order to get max 3 values from the row and take their average. The code I tried is:

# nparr is a 2D numpy array
for entry in nparr:
    sortedentry = sorted(entry, reverse=True)
    highest_3_values = sortedentry[:3]
    avg_highest_3 = float(sum(highest_3_values)) / 3

This does not work for rows containing NaN. My question is, is there a quick way to convert all NaN values to zero in the 2D numpy array so that I have no problems with sorting and other things I am trying to do.

Question 31

这应该工作：

from numpy import *

a = array([[1, 2, 3], [0, 3, NaN]])
where_are_NaNs = isnan(a)
a[where_are_NaNs] = 0

在上述情况下，where_are_NaNs为：

In [12]: where_are_NaNs
Out[12]: 
array([[False, False, False],
       [False, False,  True]], dtype=bool)

Question 32

This should work:

from numpy import *

a = array([[1, 2, 3], [0, 3, NaN]])
where_are_NaNs = isnan(a)
a[where_are_NaNs] = 0

In the above case where_are_NaNs is:

In [12]: where_are_NaNs
Out[12]: 
array([[False, False, False],
       [False, False,  True]], dtype=bool)

Question 33

A您的2D阵列在哪里：

import numpy as np
A[np.isnan(A)] = 0

该函数isnan产生一个布尔数组，指示NaN值在哪里。布尔数组可用于索引相同形状的数组。认为它就像一个面具。

Question 34

Where A is your 2D array:

import numpy as np
A[np.isnan(A)] = 0

The function isnan produces a bool array indicating where the NaN values are. A boolean array can by used to index an array of the same shape. Think of it like a mask.

Question 35

如何nan_to_num（）？

Question 36

How about nan_to_num()?

Question 37

您可以np.where用来查找您的位置NaN：

import numpy as np

a = np.array([[   0,   43,   67,    0,   38],
              [ 100,   86,   96,  100,   94],
              [  76,   79,   83,   89,   56],
              [  88,   np.nan,   67,   89,   81],
              [  94,   79,   67,   89,   69],
              [  88,   79,   58,   72,   63],
              [  76,   79,   71,   67,   56],
              [  71,   71,   np.nan,   56,  100]])

b = np.where(np.isnan(a), 0, a)

In [20]: b
Out[20]: 
array([[   0.,   43.,   67.,    0.,   38.],
       [ 100.,   86.,   96.,  100.,   94.],
       [  76.,   79.,   83.,   89.,   56.],
       [  88.,    0.,   67.,   89.,   81.],
       [  94.,   79.,   67.,   89.,   69.],
       [  88.,   79.,   58.,   72.,   63.],
       [  76.,   79.,   71.,   67.,   56.],
       [  71.,   71.,    0.,   56.,  100.]])

Question 38

You could use np.where to find where you have NaN:

import numpy as np

a = np.array([[   0,   43,   67,    0,   38],
              [ 100,   86,   96,  100,   94],
              [  76,   79,   83,   89,   56],
              [  88,   np.nan,   67,   89,   81],
              [  94,   79,   67,   89,   69],
              [  88,   79,   58,   72,   63],
              [  76,   79,   71,   67,   56],
              [  71,   71,   np.nan,   56,  100]])

b = np.where(np.isnan(a), 0, a)

In [20]: b
Out[20]: 
array([[   0.,   43.,   67.,    0.,   38.],
       [ 100.,   86.,   96.,  100.,   94.],
       [  76.,   79.,   83.,   89.,   56.],
       [  88.,    0.,   67.,   89.,   81.],
       [  94.,   79.,   67.,   89.,   69.],
       [  88.,   79.,   58.,   72.,   63.],
       [  76.,   79.,   71.,   67.,   56.],
       [  71.,   71.,    0.,   56.,  100.]])

Question 39

德雷克使用答案的代码示例nan_to_num：

>>> import numpy as np
>>> A = np.array([[1, 2, 3], [0, 3, np.NaN]])
>>> A = np.nan_to_num(A)
>>> A
array([[ 1.,  2.,  3.],
       [ 0.,  3.,  0.]])

Question 40

A code example for drake’s answer to use nan_to_num:

>>> import numpy as np
>>> A = np.array([[1, 2, 3], [0, 3, np.NaN]])
>>> A = np.nan_to_num(A)
>>> A
array([[ 1.,  2.,  3.],
       [ 0.,  3.,  0.]])

Question 41

您可以使用numpy.nan_to_num ：

numpy.nan_to_num（X）：替换南与零和INF用有限数。

示例（请参阅doc）：

>>> np.set_printoptions(precision=8)
>>> x = np.array([np.inf, -np.inf, np.nan, -128, 128])
>>> np.nan_to_num(x)
array([  1.79769313e+308,  -1.79769313e+308,   0.00000000e+000,
        -1.28000000e+002,   1.28000000e+002])

Question 42

You can use numpy.nan_to_num :

numpy.nan_to_num(x) : Replace nan with zero and inf with finite numbers.

Example (see doc) :

>>> np.set_printoptions(precision=8)
>>> x = np.array([np.inf, -np.inf, np.nan, -128, 128])
>>> np.nan_to_num(x)
array([  1.79769313e+308,  -1.79769313e+308,   0.00000000e+000,
        -1.28000000e+002,   1.28000000e+002])

Question 43

nan永远不等于nan

if z!=z:z=0

所以对于二维数组

for entry in nparr:
    if entry!=entry:entry=0

Question 44

nan is never equal to nan

if z!=z:z=0

so for a 2D array

for entry in nparr:
    if entry!=entry:entry=0

Question 45

您可以使用lambda函数，这是一维数组的示例：

import numpy as np
a = [np.nan, 2, 3]
map(lambda v:0 if np.isnan(v) == True else v, a)

这将为您提供结果：

[0, 2, 3]

Question 46

You can use lambda function, an example for 1D array:

import numpy as np
a = [np.nan, 2, 3]
map(lambda v:0 if np.isnan(v) == True else v, a)

This will give you the result:

[0, 2, 3]

Question 47

出于您的目的，如果所有项目都存储为str并且您只是按使用的方式使用sorted，然后检查第一个元素并将其替换为“ 0”

>>> l1 = ['88','NaN','67','89','81']
>>> n = sorted(l1,reverse=True)
['NaN', '89', '88', '81', '67']
>>> import math
>>> if math.isnan(float(n[0])):
...     n[0] = '0'
... 
>>> n
['0', '89', '88', '81', '67']

Question 48

For your purposes, if all the items are stored as str and you just use sorted as you are using and then check for the first element and replace it with ‘0’

>>> l1 = ['88','NaN','67','89','81']
>>> n = sorted(l1,reverse=True)
['NaN', '89', '88', '81', '67']
>>> import math
>>> if math.isnan(float(n[0])):
...     n[0] = '0'
... 
>>> n
['0', '89', '88', '81', '67']

Question 49

Most languages have a NaN constant you can use to assign a variable the value NaN. Can python do this without using numpy?

Question 50

Yes — use math.nan.

>>> from math import nan
>>> print(nan)
nan
>>> print(nan + 2)
nan
>>> nan == nan
False
>>> import math
>>> math.isnan(nan)
True

Before Python 3.5, one could use float("nan") (case insensitive).

Note that checking to see if two things that are NaN are equal to one another will always return false. This is in part because two things that are “not a number” cannot (strictly speaking) be said to be equal to one another — see What is the rationale for all comparisons returning false for IEEE754 NaN values? for more details and information.

Instead, use math.isnan(...) if you need to determine if a value is NaN or not.

Furthermore, the exact semantics of the == operation on NaN value may cause subtle issues when trying to store NaN inside container types such as list or dict (or when using custom container types). See Checking for NaN presence in a container for more details.

You can also construct NaN numbers using Python’s decimal module:

>>> from decimal import Decimal
>>> b = Decimal('nan')
>>> print(b)
NaN
>>> print(repr(b))
Decimal('NaN')
>>>
>>> Decimal(float('nan'))
Decimal('NaN')
>>> 
>>> import math
>>> math.isnan(b)
True

math.isnan(...) will also work with Decimal objects.

However, you cannot construct NaN numbers in Python’s fractions module:

>>> from fractions import Fraction
>>> Fraction('nan')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\fractions.py", line 146, in __new__
    numerator)
ValueError: Invalid literal for Fraction: 'nan'
>>>
>>> Fraction(float('nan'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Python35\lib\fractions.py", line 130, in __new__
    value = Fraction.from_float(numerator)
  File "C:\Python35\lib\fractions.py", line 214, in from_float
    raise ValueError("Cannot convert %r to %s." % (f, cls.__name__))
ValueError: Cannot convert nan to Fraction.

Incidentally, you can also do float('Inf'), Decimal('Inf'), or math.inf (3.5+) to assign infinite numbers. (And also see math.isinf(...))

However doing Fraction('Inf') or Fraction(float('inf')) isn’t permitted and will throw an exception, just like NaN.

If you want a quick and easy way to check if a number is neither NaN nor infinite, you can use math.isfinite(...) as of Python 3.2+.

If you want to do similar checks with complex numbers, the cmath module contains a similar set of functions and constants as the math module:

cmath.isnan(...)
cmath.isinf(...)
cmath.isfinite(...) (Python 3.2+)
cmath.nan (Python 3.6+; equivalent to complex(float('nan'), 0.0))
cmath.nanj (Python 3.6+; equivalent to complex(0.0, float('nan')))
cmath.inf (Python 3.6+; equivalent to complex(float('inf'), 0.0))
cmath.infj (Python 3.6+; equivalent to complex(0.0, float('inf')))

Question 51

nan = float('nan')

And now you have the constant, nan.

You can similarly create NaN values for decimal.Decimal.:

dnan = Decimal('nan')

Question 52

Use float("nan"):

>>> float("nan")
nan

Question 53

You can do float('nan') to get NaN.

Question 54

You can get NaN from “inf – inf”, and you can get “inf” from a number greater than 2e308, so, I generally used:

>>> inf = 9e999
>>> inf
inf
>>> inf - inf
nan

Question 55

A more consistent (and less opaque) way to generate inf and -inf is to again use float():

>> positive_inf = float('inf')
>> positive_inf
inf
>> negative_inf = float('-inf')
>> negative_inf
-inf

Note that the size of a float varies depending on the architecture, so it probably best to avoid using magic numbers like 9e999, even if that is likely to work.

import sys
sys.float_info
sys.float_info(max=1.7976931348623157e+308,
               max_exp=1024, max_10_exp=308,
               min=2.2250738585072014e-308, min_exp=-1021,
               min_10_exp=-307, dig=15, mant_dig=53,
               epsilon=2.220446049250313e-16, radix=2, rounds=1)

Question 56

Given a pandas dataframe containing possible NaN values scattered here and there:

Question: How do I determine which columns contain NaN values? In particular, can I get a list of the column names containing NaNs?

Question 57

UPDATE: using Pandas 0.22.0

Newer Pandas versions have new methods ‘DataFrame.isna()’ and ‘DataFrame.notna()’

In [71]: df
Out[71]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [72]: df.isna().any()
Out[72]:
a     True
b     True
c    False
dtype: bool

as list of columns:

In [74]: df.columns[df.isna().any()].tolist()
Out[74]: ['a', 'b']

to select those columns (containing at least one NaN value):

In [73]: df.loc[:, df.isna().any()]
Out[73]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

OLD answer:

Try to use isnull():

In [97]: df
Out[97]:
     a    b  c
0  NaN  7.0  0
1  0.0  NaN  4
2  2.0  NaN  4
3  1.0  7.0  0
4  1.0  3.0  9
5  7.0  4.0  9
6  2.0  6.0  9
7  9.0  6.0  4
8  3.0  0.0  9
9  9.0  0.0  1

In [98]: pd.isnull(df).sum() > 0
Out[98]:
a     True
b     True
c    False
dtype: bool

or as @root proposed clearer version:

In [5]: df.isnull().any()
Out[5]:
a     True
b     True
c    False
dtype: bool

In [7]: df.columns[df.isnull().any()].tolist()
Out[7]: ['a', 'b']

to select a subset – all columns containing at least one NaN value:

In [31]: df.loc[:, df.isnull().any()]
Out[31]:
     a    b
0  NaN  7.0
1  0.0  NaN
2  2.0  NaN
3  1.0  7.0
4  1.0  3.0
5  7.0  4.0
6  2.0  6.0
7  9.0  6.0
8  3.0  0.0
9  9.0  0.0

Question 58

You can use df.isnull().sum(). It shows all columns and the total NaNs of each feature.

Question 59

I had a problem where I had to many columns to visually inspect on the screen so a short list comp that filters and returns the offending columns is

nan_cols = [i for i in df.columns if df[i].isnull().any()]

if that’s helpful to anyone

Question 60

In datasets having large number of columns its even better to see how many columns contain null values and how many don’t.

print("No. of columns containing null values")
print(len(df.columns[df.isna().any()]))

print("No. of columns not containing null values")
print(len(df.columns[df.notna().all()]))

print("Total no. of columns in the dataframe")
print(len(df.columns))

For example in my dataframe it contained 82 columns, of which 19 contained at least one null value.

Further you can also automatically remove cols and rows depending on which has more null values
Here is the code which does this intelligently:

df = df.drop(df.columns[df.isna().sum()>len(df.columns)],axis = 1)
df = df.dropna(axis = 0).reset_index(drop=True)

Note: Above code removes all of your null values. If you want null values, process them before.

Question 61

i use these three lines of code to print out the column names which contain at least one null value:

for column in dataframe:
    if dataframe[column].isnull().any():
       print('{0} has {1} null values'.format(column, dataframe[column].isnull().sum()))

Question 62

Both of these should work:

df.isnull().sum()
df.isna().sum()

DataFrame methods isna() or isnull() are completely identical.

Note: Empty strings '' is considered as False (not considered NA)

Question 63

This worked for me,

1. For getting Columns having at least 1 null value. (column names)

data.columns[data.isnull().any()]

2. For getting Columns with count, with having at least 1 null value.

data[data.columns[data.isnull().any()]].isnull().sum()

[Optional] 3. For getting percentage of the null count.

data[data.columns[data.isnull().any()]].isnull().sum() * 100 / data.shape[0]

Question 64

Suppose I have a DataFrame with some NaNs:

>>> import pandas as pd
>>> df = pd.DataFrame([[1, 2, 3], [4, None, None], [None, None, 9]])
>>> df
    0   1   2
0   1   2   3
1   4 NaN NaN
2 NaN NaN   9

What I need to do is replace every NaN with the first non-NaN value in the same column above it. It is assumed that the first row will never contain a NaN. So for the previous example the result would be

I can just loop through the whole DataFrame column-by-column, element-by-element and set the values directly, but is there an easy (optimally a loop-free) way of achieving this?

问题：快速检查NumPy中的NaN

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：为什么（inf + 0j）* 1计算为inf + nanj？

回答 0

回答 1

简短答案：

长答案：

一点压实

其他扩展…

…复杂平面的

那附件G的东西

我们可以做得更好吗？

黎曼来营救

Short answer:

Long answer:

One point compactification

Other extensions …

… of the complex plane

That annex G thing

Can we do better?

Riemann to the rescue

回答 2

回答 3

问题：如何在熊猫数据框中将单元格设置为NaN

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：Python Pandas用第二列对应行中的值替换第一列中的NaN

回答 0

回答 1

回答 2

问题：将nan值转换为零

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：在没有numpy的python中分配变量NaN

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何在Pandas数据框中查找哪些列包含任何NaN值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：如何用熊猫DataFrame中的先前值替换NaN？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：从数组中删除Nan值

回答 0

回答 1

回答 2

回答 3