Python 实用宝典

Question 1

I want to slice a NumPy nxn array. I want to extract an arbitrary selection of m rows and columns of that array (i.e. without any pattern in the numbers of rows/columns), making it a new, mxm array. For this example let us say the array is 4×4 and I want to extract a 2×2 array from it.

Here is our array:

from numpy import *
x = range(16)
x = reshape(x,(4,4))

print x
[[ 0  1  2  3]
 [ 4  5  6  7]
 [ 8  9 10 11]
 [12 13 14 15]]

The line and columns to remove are the same. The easiest case is when I want to extract a 2×2 submatrix that is at the beginning or at the end, i.e. :

In [33]: x[0:2,0:2]
Out[33]: 
array([[0, 1],
       [4, 5]])

In [34]: x[2:,2:]
Out[34]: 
array([[10, 11],
       [14, 15]])

But what if I need to remove another mixture of rows/columns? What if I need to remove the first and third lines/rows, thus extracting the submatrix [[5,7],[13,15]]? There can be any composition of rows/lines. I read somewhere that I just need to index my array using arrays/lists of indices for both rows and columns, but that doesn’t seem to work:

In [35]: x[[1,3],[1,3]]
Out[35]: array([ 5, 15])

I found one way, which is:

    In [61]: x[[1,3]][:,[1,3]]
Out[61]: 
array([[ 5,  7],
       [13, 15]])

First issue with this is that it is hardly readable, although I can live with that. If someone has a better solution, I’d certainly like to hear it.

Other thing is I read on a forum that indexing arrays with arrays forces NumPy to make a copy of the desired array, thus when treating with large arrays this could become a problem. Why is that so / how does this mechanism work?

Question 2

As Sven mentioned, x[[[0],[2]],[1,3]] will give back the 0 and 2 rows that match with the 1 and 3 columns while x[[0,2],[1,3]] will return the values x[0,1] and x[2,3] in an array.

There is a helpful function for doing the first example I gave, numpy.ix_. You can do the same thing as my first example with x[numpy.ix_([0,2],[1,3])]. This can save you from having to enter in all of those extra brackets.

Question 3

To answer this question, we have to look at how indexing a multidimensional array works in Numpy. Let’s first say you have the array x from your question. The buffer assigned to x will contain 16 ascending integers from 0 to 15. If you access one element, say x[i,j], NumPy has to figure out the memory location of this element relative to the beginning of the buffer. This is done by calculating in effect i*x.shape[1]+j (and multiplying with the size of an int to get an actual memory offset).

If you extract a subarray by basic slicing like y = x[0:2,0:2], the resulting object will share the underlying buffer with x. But what happens if you acces y[i,j]? NumPy can’t use i*y.shape[1]+j to calculate the offset into the array, because the data belonging to y is not consecutive in memory.

NumPy solves this problem by introducing strides. When calculating the memory offset for accessing x[i,j], what is actually calculated is i*x.strides[0]+j*x.strides[1] (and this already includes the factor for the size of an int):

x.strides
(16, 4)

When y is extracted like above, NumPy does not create a new buffer, but it does create a new array object referencing the same buffer (otherwise y would just be equal to x.) The new array object will have a different shape then x and maybe a different starting offset into the buffer, but will share the strides with x (in this case at least):

y.shape
(2,2)
y.strides
(16, 4)

This way, computing the memory offset for y[i,j] will yield the correct result.

But what should NumPy do for something like z=x[[1,3]]? The strides mechanism won’t allow correct indexing if the original buffer is used for z. NumPy theoretically could add some more sophisticated mechanism than the strides, but this would make element access relatively expensive, somehow defying the whole idea of an array. In addition, a view wouldn’t be a really lightweight object anymore.

This is covered in depth in the NumPy documentation on indexing.

Oh, and nearly forgot about your actual question: Here is how to make the indexing with multiple lists work as expected:

x[[[1],[3]],[1,3]]

This is because the index arrays are broadcasted to a common shape. Of course, for this particular example, you can also make do with basic slicing:

x[1::2, 1::2]

Question 4

I don’t think that x[[1,3]][:,[1,3]] is hardly readable. If you want to be more clear on your intent, you can do:

a[[1,3],:][:,[1,3]]

I am not an expert in slicing but typically, if you try to slice into an array and the values are continuous, you get back a view where the stride value is changed.

e.g. In your inputs 33 and 34, although you get a 2×2 array, the stride is 4. Thus, when you index the next row, the pointer moves to the correct position in memory.

Clearly, this mechanism doesn’t carry well into the case of an array of indices. Hence, numpy will have to make the copy. After all, many other matrix math function relies on size, stride and continuous memory allocation.

Question 5

If you want to skip every other row and every other column, then you can do it with basic slicing:

In [49]: x=np.arange(16).reshape((4,4))
In [50]: x[1:4:2,1:4:2]
Out[50]: 
array([[ 5,  7],
       [13, 15]])

This returns a view, not a copy of your array.

In [51]: y=x[1:4:2,1:4:2]

In [52]: y[0,0]=100

In [53]: x   # <---- Notice x[1,1] has changed
Out[53]: 
array([[  0,   1,   2,   3],
       [  4, 100,   6,   7],
       [  8,   9,  10,  11],
       [ 12,  13,  14,  15]])

while z=x[(1,3),:][:,(1,3)] uses advanced indexing and thus returns a copy:

In [58]: x=np.arange(16).reshape((4,4))
In [59]: z=x[(1,3),:][:,(1,3)]

In [60]: z
Out[60]: 
array([[ 5,  7],
       [13, 15]])

In [61]: z[0,0]=0

Note that x is unchanged:

In [62]: x
Out[62]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

If you wish to select arbitrary rows and columns, then you can’t use basic slicing. You’ll have to use advanced indexing, using something like x[rows,:][:,columns], where rows and columns are sequences. This of course is going to give you a copy, not a view, of your original array. This is as one should expect, since a numpy array uses contiguous memory (with constant strides), and there would be no way to generate a view with arbitrary rows and columns (since that would require non-constant strides).

Question 6

With numpy, you can pass a slice for each component of the index – so, your x[0:2,0:2] example above works.

If you just want to evenly skip columns or rows, you can pass slices with three components (i.e. start, stop, step).

Again, for your example above:

>>> x[1:4:2, 1:4:2]
array([[ 5,  7],
       [13, 15]])

Which is basically: slice in the first dimension, with start at index 1, stop when index is equal or greater than 4, and add 2 to the index in each pass. The same for the second dimension. Again: this only works for constant steps.

The syntax you got to do something quite different internally – what x[[1,3]][:,[1,3]] actually does is create a new array including only rows 1 and 3 from the original array (done with the x[[1,3]] part), and then re-slice that – creating a third array – including only columns 1 and 3 of the previous array.

Question 7

I have a similar question here: Writting in sub-ndarray of a ndarray in the most pythonian way. Python 2 .

Following the solution of previous post for your case the solution looks like:

columns_to_keep = [1,3] 
rows_to_keep = [1,3]

An using ix_:

x[np.ix_(rows_to_keep, columns_to_keep)]

Which is:

array([[ 5,  7],
       [13, 15]])

Question 8

I’m not sure how efficient this is but you can use range() to slice in both axis

 x=np.arange(16).reshape((4,4))
 x[range(1,3), :][:,range(1,3)]

Question 9

I have to make a Lagrange polynomial in Python for a project I’m doing. I’m doing a barycentric style one to avoid using an explicit for-loop as opposed to a Newton’s divided difference style one. The problem I have is that I need to catch a division by zero, but Python (or maybe numpy) just makes it a warning instead of a normal exception.

So, what I need to know how to do is to catch this warning as if it were an exception. The related questions to this I found on this site were answered not in the way I needed. Here’s my code:

import numpy as np
import matplotlib.pyplot as plt
import warnings

class Lagrange:
    def __init__(self, xPts, yPts):
        self.xPts = np.array(xPts)
        self.yPts = np.array(yPts)
        self.degree = len(xPts)-1 
        self.weights = np.array([np.product([x_j - x_i for x_j in xPts if x_j != x_i]) for x_i in xPts])

    def __call__(self, x):
        warnings.filterwarnings("error")
        try:
            bigNumerator = np.product(x - self.xPts)
            numerators = np.array([bigNumerator/(x - x_j) for x_j in self.xPts])
            return sum(numerators/self.weights*self.yPts) 
        except Exception, e: # Catch division by 0. Only possible in 'numerators' array
            return yPts[np.where(xPts == x)[0][0]]

L = Lagrange([-1,0,1],[1,0,1]) # Creates quadratic poly L(x) = x^2

L(1) # This should catch an error, then return 1.

When this code is executed, the output I get is:

Warning: divide by zero encountered in int_scalars

That’s the warning I want to catch. It should occur inside the list comprehension.

Question 10

It seems that your configuration is using the print option for numpy.seterr:

>>> import numpy as np
>>> np.array([1])/0   #'warn' mode
__main__:1: RuntimeWarning: divide by zero encountered in divide
array([0])
>>> np.seterr(all='print')
{'over': 'warn', 'divide': 'warn', 'invalid': 'warn', 'under': 'ignore'}
>>> np.array([1])/0   #'print' mode
Warning: divide by zero encountered in divide
array([0])

This means that the warning you see is not a real warning, but it’s just some characters printed to stdout(see the documentation for seterr). If you want to catch it you can:

Use numpy.seterr(all='raise') which will directly raise the exception. This however changes the behaviour of all the operations, so it’s a pretty big change in behaviour.
Use numpy.seterr(all='warn'), which will transform the printed warning in a real warning and you’ll be able to use the above solution to localize this change in behaviour.

Once you actually have a warning, you can use the warnings module to control how the warnings should be treated:

>>> import warnings
>>> 
>>> warnings.filterwarnings('error')
>>> 
>>> try:
...     warnings.warn(Warning())
... except Warning:
...     print 'Warning was raised as an exception!'
... 
Warning was raised as an exception!

Read carefully the documentation for filterwarnings since it allows you to filter only the warning you want and has other options. I’d also consider looking at catch_warnings which is a context manager which automatically resets the original filterwarnings function:

>>> import warnings
>>> with warnings.catch_warnings():
...     warnings.filterwarnings('error')
...     try:
...         warnings.warn(Warning())
...     except Warning: print 'Raised!'
... 
Raised!
>>> try:
...     warnings.warn(Warning())
... except Warning: print 'Not raised!'
... 
__main__:2: Warning:

Question 11

To add a little to @Bakuriu’s answer:

If you already know where the warning is likely to occur then it’s often cleaner to use the numpy.errstate context manager, rather than numpy.seterr which treats all subsequent warnings of the same type the same regardless of where they occur within your code:

import numpy as np

a = np.r_[1.]
with np.errstate(divide='raise'):
    try:
        a / 0   # this gets caught and handled as an exception
    except FloatingPointError:
        print('oh no!')
a / 0           # this prints a RuntimeWarning as usual

Edit:

In my original example I had a = np.r_[0], but apparently there was a change in numpy’s behaviour such that division-by-zero is handled differently in cases where the numerator is all-zeros. For example, in numpy 1.16.4:

all_zeros = np.array([0., 0.])
not_all_zeros = np.array([1., 0.])

with np.errstate(divide='raise'):
    not_all_zeros / 0.  # Raises FloatingPointError

with np.errstate(divide='raise'):
    all_zeros / 0.  # No exception raised

with np.errstate(invalid='raise'):
    all_zeros / 0.  # Raises FloatingPointError

The corresponding warning messages are also different: 1. / 0. is logged as RuntimeWarning: divide by zero encountered in true_divide, whereas 0. / 0. is logged as RuntimeWarning: invalid value encountered in true_divide. I’m not sure why exactly this change was made, but I suspect it has to do with the fact that the result of 0. / 0. is not representable as a number (numpy returns a NaN in this case) whereas 1. / 0. and -1. / 0. return +Inf and -Inf respectively, per the IEE 754 standard.

If you want to catch both types of error you can always pass np.errstate(divide='raise', invalid='raise'), or all='raise' if you want to raise an exception on any kind of floating point error.

Question 12

To elaborate on @Bakuriu’s answer above, I’ve found that this enables me to catch a runtime warning in a similar fashion to how I would catch an error warning, printing out the warning nicely:

import warnings

with warnings.catch_warnings():
    warnings.filterwarnings('error')
    try:
        answer = 1 / 0
    except Warning as e:
        print('error found:', e)

You will probably be able to play around with placing of the warnings.catch_warnings() placement depending on how big of an umbrella you want to cast with catching errors this way.

Question 13

Remove warnings.filterwarnings and add:

numpy.seterr(all='raise')

Question 14

Using Matplotlib, I want to plot a 2D heat map. My data is an n-by-n Numpy array, each with a value between 0 and 1. So for the (i, j) element of this array, I want to plot a square at the (i, j) coordinate in my heat map, whose color is proportional to the element’s value in the array.

How can I do this?

Question 15

The imshow() function with parameters interpolation='nearest' and cmap='hot' should do what you want.

import matplotlib.pyplot as plt
import numpy as np

a = np.random.random((16, 16))
plt.imshow(a, cmap='hot', interpolation='nearest')
plt.show()

Question 16

Seaborn takes care of a lot of the manual work and automatically plots a gradient at the side of the chart etc.

import numpy as np
import seaborn as sns
import matplotlib.pylab as plt

uniform_data = np.random.rand(10, 12)
ax = sns.heatmap(uniform_data, linewidth=0.5)
plt.show()

Or, you can even plot upper / lower left / right triangles of square matrices, for example a correlation matrix which is square and is symmetric, so plotting all values would be redundant anyway.

corr = np.corrcoef(np.random.randn(10, 200))
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
with sns.axes_style("white"):
    ax = sns.heatmap(corr, mask=mask, vmax=.3, square=True,  cmap="YlGnBu")
    plt.show()

Question 17

For a 2d numpy array, simply use imshow() may help you:

import matplotlib.pyplot as plt
import numpy as np


def heatmap2d(arr: np.ndarray):
    plt.imshow(arr, cmap='viridis')
    plt.colorbar()
    plt.show()


test_array = np.arange(100 * 100).reshape(100, 100)
heatmap2d(test_array)

This code produces a continuous heatmap.

You can choose another built-in colormap from here.

Question 18

I would use matplotlib’s pcolor/pcolormesh function since it allows nonuniform spacing of the data.

Example taken from matplotlib:

import matplotlib.pyplot as plt
import numpy as np

# generate 2 2d grids for the x & y bounds
y, x = np.meshgrid(np.linspace(-3, 3, 100), np.linspace(-3, 3, 100))

z = (1 - x / 2. + x ** 5 + y ** 3) * np.exp(-x ** 2 - y ** 2)
# x and y are bounds, so z should be the value *inside* those bounds.
# Therefore, remove the last value from the z array.
z = z[:-1, :-1]
z_min, z_max = -np.abs(z).max(), np.abs(z).max()

fig, ax = plt.subplots()

c = ax.pcolormesh(x, y, z, cmap='RdBu', vmin=z_min, vmax=z_max)
ax.set_title('pcolormesh')
# set the limits of the plot to the limits of the data
ax.axis([x.min(), x.max(), y.min(), y.max()])
fig.colorbar(c, ax=ax)

plt.show()

Question 19

Here’s how to do it from a csv:

import numpy as np
import matplotlib.pyplot as plt
from scipy.interpolate import griddata

# Load data from CSV
dat = np.genfromtxt('dat.xyz', delimiter=' ',skip_header=0)
X_dat = dat[:,0]
Y_dat = dat[:,1]
Z_dat = dat[:,2]

# Convert from pandas dataframes to numpy arrays
X, Y, Z, = np.array([]), np.array([]), np.array([])
for i in range(len(X_dat)):
        X = np.append(X, X_dat[i])
        Y = np.append(Y, Y_dat[i])
        Z = np.append(Z, Z_dat[i])

# create x-y points to be used in heatmap
xi = np.linspace(X.min(), X.max(), 1000)
yi = np.linspace(Y.min(), Y.max(), 1000)

# Z is a matrix of x-y values
zi = griddata((X, Y), Z, (xi[None,:], yi[:,None]), method='cubic')

# I control the range of my colorbar by removing data 
# outside of my range of interest
zmin = 3
zmax = 12
zi[(zi<zmin) | (zi>zmax)] = None

# Create the contour plot
CS = plt.contourf(xi, yi, zi, 15, cmap=plt.cm.rainbow,
                  vmax=zmax, vmin=zmin)
plt.colorbar()  
plt.show()

where dat.xyz is in the form

x1 y1 z1
x2 y2 z2
...

Question 20

In a project using SciPy and NumPy, should I use scipy.pi, numpy.pi, or math.pi?

Question 21

>>> import math
>>> import numpy as np
>>> import scipy
>>> math.pi == np.pi == scipy.pi
True

So it doesn’t matter, they are all the same value.

The only reason all three modules provide a pi value is so if you are using just one of the three modules, you can conveniently have access to pi without having to import another module. They’re not providing different values for pi.

Question 22

One thing to note is that not all libraries will use the same meaning for pi, of course, so it never hurts to know what you’re using. For example, the symbolic math library Sympy’s representation of pi is not the same as math and numpy:

import math
import numpy
import scipy
import sympy

print(math.pi == numpy.pi)
> True
print(math.pi == scipy.pi)
> True
print(math.pi == sympy.pi)
> False

Question 23

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn’t seem to be related.

Question 24

audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())

Using /= and *= allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

is marginally faster than

image /= image.max()/255.0    # Uses 1+image.size divisions

Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.

In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio and image arrays need to have floating-point point dtype before the in-place operations are performed. If they are not already of floating-point dtype, you’ll need to convert them using astype. For example,

image = image.astype('float64')

Question 25

If the array contains both positive and negative data, I’d go with:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)        

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

If the array contains nan, one solution could be to just remove them as:

def nan_ptp(a):
    return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)

However, depending on the context you might want to treat nan differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.

Finally, worth mentioning even if it’s not OP’s question, standardization:

e = (a - np.mean(a)) / np.std(a)

Question 26

You can also rescale using sklearn. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.

from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )

The keyword arguments axis, with_mean, with_std are self explanatory, and are shown in their default state. The argument copy performs the operation in-place if it is set to False. Documentation here.

Question 27

You can use the “i” (as in idiv, imul..) version, and it doesn’t look half bad:

image /= (image.max()/255.0)

For the other case you can write a function to normalize an n-dimensional array by colums:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()

Question 28

You are trying to min-max scale the values of audio between -1 and +1 and image between 0 and 255.

Using sklearn.preprocessing.minmax_scale, should easily solve your problem.

e.g.:

audio_scaled = minmax_scale(audio, feature_range=(-1,1))

and

shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)

note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.

Question 29

A simple solution is using the scalers offered by the sklearn.preprocessing library.

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()

Question 30

I tried following this, and got the error

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

The numpy array I was trying to normalize was an integer array. It seems they deprecated type casting in versions > 1.10, and you have to use numpy.true_divide() to resolve that.

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img was an PIL.Image object.

Question 31

Is there a less verbose alternative to this:

for x in xrange(array.shape[0]):
    for y in xrange(array.shape[1]):
        do_stuff(x, y)

I came up with this:

for x, y in itertools.product(map(xrange, array.shape)):
    do_stuff(x, y)

Which saves one indentation, but is still pretty ugly.

I’m hoping for something that looks like this pseudocode:

for x, y in array.indices:
    do_stuff(x, y)

Does anything like that exist?

Question 32

I think you’re looking for the ndenumerate.

>>> a =numpy.array([[1,2],[3,4],[5,6]])
>>> for (x,y), value in numpy.ndenumerate(a):
...  print x,y
... 
0 0
0 1
1 0
1 1
2 0
2 1

Regarding the performance. It is a bit slower than a list comprehension.

X = np.zeros((100, 100, 100))

%timeit list([((i,j,k), X[i,j,k]) for i in range(X.shape[0]) for j in range(X.shape[1]) for k in range(X.shape[2])])
1 loop, best of 3: 376 ms per loop

%timeit list(np.ndenumerate(X))
1 loop, best of 3: 570 ms per loop

If you are worried about the performance you could optimise a bit further by looking at the implementation of ndenumerate, which does 2 things, converting to an array and looping. If you know you have an array, you can call the .coords attribute of the flat iterator.

a = X.flat
%timeit list([(a.coords, x) for x in a.flat])
1 loop, best of 3: 305 ms per loop

Question 33

If you only need the indices, you could try numpy.ndindex:

>>> a = numpy.arange(9).reshape(3, 3)
>>> [(x, y) for x, y in numpy.ndindex(a.shape)]
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2)]

Question 34

see nditer

import numpy as np
Y = np.array([3,4,5,6])
for y in np.nditer(Y, op_flags=['readwrite']):
    y += 3

Y == np.array([6, 7, 8, 9])

y = 3 would not work, use y *= 0 and y += 3 instead.

Question 35

I’m trying to speed up the answer here using Cython. I try to compile the code (after doing the cygwinccompiler.py hack explained here), but get a fatal error: numpy/arrayobject.h: No such file or directory...compilation terminated error. Can anyone tell me if it’s a problem with my code, or some esoteric subtlety with Cython?

Below is my code.

import numpy as np
import scipy as sp
cimport numpy as np
cimport cython

cdef inline np.ndarray[np.int, ndim=1] fbincount(np.ndarray[np.int_t, ndim=1] x):
    cdef int m = np.amax(x)+1
    cdef int n = x.size
    cdef unsigned int i
    cdef np.ndarray[np.int_t, ndim=1] c = np.zeros(m, dtype=np.int)

    for i in xrange(n):
        c[<unsigned int>x[i]] += 1

    return c

cdef packed struct Point:
    np.float64_t f0, f1

@cython.boundscheck(False)
def sparsemaker(np.ndarray[np.float_t, ndim=2] X not None,
                np.ndarray[np.float_t, ndim=2] Y not None,
                np.ndarray[np.float_t, ndim=2] Z not None):

    cdef np.ndarray[np.float64_t, ndim=1] counts, factor
    cdef np.ndarray[np.int_t, ndim=1] row, col, repeats
    cdef np.ndarray[Point] indices

    cdef int x_, y_

    _, row = np.unique(X, return_inverse=True); x_ = _.size
    _, col = np.unique(Y, return_inverse=True); y_ = _.size
    indices = np.rec.fromarrays([row,col])
    _, repeats = np.unique(indices, return_inverse=True)
    counts = 1. / fbincount(repeats)
    Z.flat *= counts.take(repeats)

    return sp.sparse.csr_matrix((Z.flat,(row,col)), shape=(x_, y_)).toarray()

Question 36

In your setup.py, the Extension should have the argument include_dirs=[numpy.get_include()].

Also, you are missing np.import_array() in your code.

—

Example setup.py:

from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy

setup(
    ext_modules=[
        Extension("my_module", ["my_module.c"],
                  include_dirs=[numpy.get_include()]),
    ],
)

# Or, if you use cythonize() to make the ext_modules list,
# include_dirs can be passed to setup()

setup(
    ext_modules=cythonize("my_module.pyx"),
    include_dirs=[numpy.get_include()]
)

Question 37

For a one-file project like yours, another alternative is to use pyximport. You don’t need to create a setup.py … you don’t need to even open a command line if you use IPython … it’s all very convenient. In your case, try running these commands in IPython or in a normal Python script:

import numpy
import pyximport
pyximport.install(setup_args={"script_args":["--compiler=mingw32"],
                              "include_dirs":numpy.get_include()},
                  reload_support=True)

import my_pyx_module

print my_pyx_module.some_function(...)
...

You may need to edit the compiler of course. This makes import and reload work the same for .pyx files as they work for .py files.

Source: http://wiki.cython.org/InstallingOnWindows

Question 38

The error means that a numpy header file isn’t being found during compilation.

Try doing export CFLAGS=-I/usr/lib/python2.7/site-packages/numpy/core/include/, and then compiling. This is a problem with a few different packages. There’s a bug filed in ArchLinux for the same issue: https://bugs.archlinux.org/task/22326

Question 39

Simple answer

A way simpler way is to add the path to your file distutils.cfg. It’s path behalf of Windows 7 is by default C:\Python27\Lib\distutils\. You just assert the following contents and it should work out:

[build_ext]
include_dirs= C:\Python27\Lib\site-packages\numpy\core\include

Entire config file

To give you an example how the config file could look like, my entire file reads:

[build]
compiler = mingw32

[build_ext]
include_dirs= C:\Python27\Lib\site-packages\numpy\core\include
compiler = mingw32

Question 40

It should be able to do it within cythonize() function as mentioned here, but it doesn’t work beacuse there is a known issue

Question 41

If you are too lazy to write setup files and figure out the path for include directories, try cyper. It can compile your Cython code and set include_dirs for Numpy automatically.

Load your code into a string, then simply run cymodule = cyper.inline(code_string), then your function is available as cymodule.sparsemaker instantaneously. Something like this

code = open(your_pyx_file).read()
cymodule = cyper.inline(code)

cymodule.sparsemaker(...)
# do what you want with your function

You can install cyper via pip install cyper.

Question 42

I have a column in python pandas DataFrame that has boolean True/False values, but for further calculations I need 1/0 representation. Is there a quick pandas/numpy way to do that?

Question 43

A succinct way to convert a single column of boolean values to a column of integers 1 or 0:

df["somecolumn"] = df["somecolumn"].astype(int)

Question 44

Just multiply your Dataframe by 1 (int)

[1]: data = pd.DataFrame([[True, False, True], [False, False, True]])
[2]: print data
          0      1     2
     0   True  False  True
     1   False False  True

[3]: print data*1
         0  1  2
     0   1  0  1
     1   0  0  1

Question 45

True is 1 in Python, and likewise False is 0^*:

>>> True == 1
True
>>> False == 0
True

You should be able to perform any operations you want on them by just treating them as though they were numbers, as they are numbers:

>>> issubclass(bool, int)
True
>>> True * 5
5

So to answer your question, no work necessary – you already have what you are looking for.

^{* Note I use is as an English word, not the Python keyword is – True will not be the same object as any random 1.}

Question 46

You also can do this directly on Frames

In [104]: df = DataFrame(dict(A = True, B = False),index=range(3))

In [105]: df
Out[105]: 
      A      B
0  True  False
1  True  False
2  True  False

In [106]: df.dtypes
Out[106]: 
A    bool
B    bool
dtype: object

In [107]: df.astype(int)
Out[107]: 
   A  B
0  1  0
1  1  0
2  1  0

In [108]: df.astype(int).dtypes
Out[108]: 
A    int64
B    int64
dtype: object

Question 47

You can use a transformation for your data frame:

df = pd.DataFrame(my_data condition)

transforming True/False in 1/0

df = df*1

Question 48

Use Series.view for convert boolean to integers:

df["somecolumn"] = df["somecolumn"].view('i1')

1.准备

2.AutoGrad 计算斜率

3.实现一个逻辑回归模型

1.Numpy Array内存占用更小

2.Numpy Array速度更快、内置计算方法

问题：切片NumPy 2d数组，或者如何从nxn数组（n> m）中提取mxm子矩阵？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：如何捕获像异常一样的numpy警告（不仅用于测试）？

回答 0

回答 1

编辑：

Edit:

回答 2

回答 3

问题：使用Matplotlib绘制2D热图

回答 0

回答 1

回答 2

回答 3

回答 4

问题：我应该使用scipy.pi，numpy.pi还是math.pi？

回答 0

回答 1

问题：如何将NumPy数组标准化到一定范围内？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：遍历一个numpy数组

回答 0

回答 1

回答 2

问题：Cython：“严重错误：numpy / arrayobject.h：没有此类文件或目录”

回答 0

回答 1

回答 2

回答 3

简单的答案

整个配置文件

Simple answer

Entire config file

回答 4

回答 5

问题：如何在Pandas DataFrame中将True / False映射到1/0？

回答 0

回答 1

回答 2

回答 3

回答 4

在1/0中转换真/假

transforming True/False in 1/0

回答 5

有趣好用的Python教程