Python 实用宝典

Question 1

I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.

Question 2

Here’s an example:

>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero.

If you need the inverse CDF:

>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)

Question 3

It may be too late to answer the question but since Google still leads people here, I decide to write my solution here.

That is, since Python 2.7, the math library has integrated the error function math.erf(x)

The erf() function can be used to compute traditional statistical functions such as the cumulative standard normal distribution:

from math import *
def phi(x):
    #'Cumulative distribution function for the standard normal distribution'
    return (1.0 + erf(x / sqrt(2.0))) / 2.0

Ref:

https://docs.python.org/2/library/math.html

https://docs.python.org/3/library/math.html

How are the Error Function and Standard Normal distribution function related?

Question 4

Adapted from here http://mail.python.org/pipermail/python-list/2000-June/039873.html

from math import *
def erfcc(x):
    """Complementary error function."""
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def ncdf(x):
    return 1. - 0.5*erfcc(x/(2**0.5))

Question 5

To build upon Unknown’s example, the Python equivalent of the function normdist() implemented in a lot of libraries would be:

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y

Question 6

Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.

It can be used to get the cumulative distribution function (cdf – probability that a random sample X will be less than or equal to x) for a given mean (mu) and standard deviation (sigma):

from statistics import NormalDist

NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796

Which can be simplified for the standard normal distribution (mu = 0 and sigma = 1):

NormalDist().cdf(1.96)
# 0.9750021048517796

NormalDist().cdf(-1.96)
# 0.024997895148220428

Question 7

Alex’s answer shows you a solution for standard normal distribution (mean = 0, standard deviation = 1). If you have normal distribution with mean and std (which is sqr(var)) and you want to calculate:

from scipy.stats import norm

# cdf(x < val)
print norm.cdf(val, m, s)

# cdf(x > val)
print 1 - norm.cdf(val, m, s)

# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)

Read more about cdf here and scipy implementation of normal distribution with many formulas here.

Question 8

Taken from above:

from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

For a two-tailed test:

Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087

Question 9

Simple like this:

import math
def my_cdf(x):
    return 0.5*(1+math.erf(x/math.sqrt(2)))

I found the formula in this page https://www.danielsoper.com/statcalc/formulas.aspx?id=55

Question 10

As Google gives this answer for the search netlogo pdf, here’s the netlogo version of the above python code


    ;; Normal distribution cumulative density function
    to-report normcdf [x mu sigma]
        let t x - mu
        let y 0.5 * erfcc [ - t / ( sigma * sqrt 2.0)]
        if ( y > 1.0 ) [ set y 1.0 ]
        report y
    end

    ;; Normal distribution probability density function
    to-report normpdf [x mu sigma]
        let u = (x - mu) / abs sigma
        let y = 1 / ( sqrt [2 * pi] * abs sigma ) * exp ( - u * u / 2.0)
        report y
    end

    ;; Complementary error function
    to-report erfcc [x]
        let z abs x
        let t 1.0 / (1.0 + 0.5 * z)
        let r t *  exp ( - z * z -1.26551223 + t * (1.00002368 + t * (0.37409196 +
            t * (0.09678418 + t * (-0.18628806 + t * (.27886807 +
            t * (-1.13520398 +t * (1.48851587 +t * (-0.82215223 +
            t * .17087277 )))))))))
        ifelse (x >= 0) [ report r ] [report 2.0 - r]
    end

Question 11

I have a Data Frame column with numeric values:

df['percentage'].head()
46.5
44.2
100.0
42.12

I want to see the column as bin counts:

bins = [0, 1, 5, 10, 25, 50, 100]

How can I get the result as bins with their value counts?

[0, 1] bin amount
[1, 5] etc 
[5, 10] etc 
......

Question 12

You can use pandas.cut:

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = pd.cut(df['percentage'], bins)
print (df)
   percentage     binned
0       46.50   (25, 50]
1       44.20   (25, 50]
2      100.00  (50, 100]
3       42.12   (25, 50]

bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)
print (df)
   percentage binned
0       46.50      5
1       44.20      5
2      100.00      6
3       42.12      5

Or numpy.searchsorted:

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = np.searchsorted(bins, df['percentage'].values)
print (df)
   percentage  binned
0       46.50       5
1       44.20       5
2      100.00       6
3       42.12       5

…and then value_counts or groupby and aggregate size:

s = pd.cut(df['percentage'], bins=bins).value_counts()
print (s)
(25, 50]     3
(50, 100]    1
(10, 25]     0
(5, 10]      0
(1, 5]       0
(0, 1]       0
Name: percentage, dtype: int64

s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()
print (s)
percentage
(0, 1]       0
(1, 5]       0
(5, 10]      0
(10, 25]     0
(25, 50]     3
(50, 100]    1
dtype: int64

By default cut return categorical.

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data, operations in categorical.

Question 13

Using `numba` module for speed up.

On big datasets (500k >) pd.cut can be quite slow for binning data.

I wrote my own function in numba with just in time compilation, which is roughly 16x faster:

from numba import njit

@njit
def cut(arr):
    bins = np.empty(arr.shape[0])
    for idx, x in enumerate(arr):
        if (x >= 0) & (x < 1):
            bins[idx] = 1
        elif (x >= 1) & (x < 5):
            bins[idx] = 2
        elif (x >= 5) & (x < 10):
            bins[idx] = 3
        elif (x >= 10) & (x < 25):
            bins[idx] = 4
        elif (x >= 25) & (x < 50):
            bins[idx] = 5
        elif (x >= 50) & (x < 100):
            bins[idx] = 6
        else:
            bins[idx] = 7

    return bins

cut(df['percentage'].to_numpy())

# array([5., 5., 7., 5.])

Optional: you can also map it to bins as strings:

a = cut(df['percentage'].to_numpy())

conversion_dict = {1: 'bin1',
                   2: 'bin2',
                   3: 'bin3',
                   4: 'bin4',
                   5: 'bin5',
                   6: 'bin6',
                   7: 'bin7'}

bins = list(map(conversion_dict.get, a))

# ['bin5', 'bin5', 'bin7', 'bin5']

Speed comparison:

# create dataframe of 8 million rows for testing
dfbig = pd.concat([df]*2000000, ignore_index=True)

dfbig.shape

# (8000000, 1)

%%timeit
cut(dfbig['percentage'].to_numpy())

# 38 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%%timeit
bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
pd.cut(dfbig['percentage'], bins=bins, labels=labels)

# 215 ms ± 9.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Question 14

我正在尝试获取Numpy数组中最大元素的索引。可以使用来完成numpy.argmax。我的问题是，我想在整个数组中找到最大的元素并获取其索引。

numpy.argmax 既可以沿一个轴（不是我想要的）应用，也可以沿扁平数组（这是我想要的一种）应用。

我的问题是，当我想要多维索引时，使用numpy.argmaxwithaxis=None返回平面索引。

我可以divmod用来获取非固定索引，但这很难看。有什么更好的方法吗？

Question 15

I’m trying to get the indices of the maximum element in a Numpy array. This can be done using numpy.argmax. My problem is, that I would like to find the biggest element in the whole array and get the indices of that.

numpy.argmax can be either applied along one axis, which is not what I want, or on the flattened array, which is kind of what I want.

My problem is that using numpy.argmax with axis=None returns the flat index when I want the multi-dimensional index.

I could use divmod to get a non-flat index but this feels ugly. Is there any better way of doing this?

Question 16

您可以使用numpy.unravel_index()以下结果numpy.argmax()：

>>> a = numpy.random.random((10, 10))
>>> numpy.unravel_index(a.argmax(), a.shape)
(6, 7)
>>> a[6, 7] == a.max()
True

Question 17

You could use numpy.unravel_index() on the result of numpy.argmax():

>>> a = numpy.random.random((10, 10))
>>> numpy.unravel_index(a.argmax(), a.shape)
(6, 7)
>>> a[6, 7] == a.max()
True

Question 18

np.where(a==a.max())

返回最大元素的坐标，但必须将数组解析两次。

>>> a = np.array(((3,4,5),(0,1,2)))
>>> np.where(a==a.max())
(array([0]), array([2]))

与相比argmax，这将返回等于最大值的所有元素的坐标。argmax仅返回其中之一（np.ones(5).argmax()return 0）。

Question 19

np.where(a==a.max())

returns coordinates of the maximum element(s), but has to parse the array twice.

>>> a = np.array(((3,4,5),(0,1,2)))
>>> np.where(a==a.max())
(array([0]), array([2]))

This, comparing to argmax, returns coordinates of all elements equal to the maximum. argmax returns just one of them (np.ones(5).argmax() returns 0).

Question 20

要获取所有出现的最大值的非平坦索引，可以使用代替来稍微修改eumiro的答案：argwherewhere

np.argwhere(a==a.max())

>>> a = np.array([[1,2,4],[4,3,4]])
>>> np.argwhere(a==a.max())
array([[0, 2],
       [1, 0],
       [1, 2]])

Question 21

To get the non-flat index of all occurrences of the maximum value, you can modify eumiro’s answer slightly by using argwhere instead of where:

np.argwhere(a==a.max())

>>> a = np.array([[1,2,4],[4,3,4]])
>>> np.argwhere(a==a.max())
array([[0, 2],
       [1, 0],
       [1, 2]])

Question 22

我想知道如何使用python 2.6.6和numpy版本1.5.0用零填充2D numpy数组。抱歉! 但是这些是我的局限性。因此我不能使用np.pad。例如，我想a用零填充以使其形状匹配b。我想这样做的原因是我可以这样做：

b-a

这样

>>> a
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> b
array([[ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.]])
>>> c
array([[1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0]])

我能想到的唯一方法是追加，但这看起来很丑。是否有可能使用更清洁的解决方案b.shape？

编辑，谢谢MSeiferts的答案。我必须清理一下，这就是我得到的：

def pad(array, reference_shape, offsets):
    """
    array: Array to be padded
    reference_shape: tuple of size of ndarray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result

Question 23

I want to know how I can pad a 2D numpy array with zeros using python 2.6.6 with numpy version 1.5.0. Sorry! But these are my limitations. Therefore I cannot use np.pad. For example, I want to pad a with zeros such that its shape matches b. The reason why I want to do this is so I can do:

b-a

such that

>>> a
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> b
array([[ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.]])
>>> c
array([[1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0]])

The only way I can think of doing this is appending, however this seems pretty ugly. is there a cleaner solution possibly using b.shape?

Edit, Thank you to MSeiferts answer. I had to clean it up a bit, and this is what I got:

def pad(array, reference_shape, offsets):
    """
    array: Array to be padded
    reference_shape: tuple of size of ndarray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result

Question 24

很简单，使用参考形状创建一个包含零的数组：

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

然后在需要的地方插入数组：

result[:a.shape[0],:a.shape[1]] = a

瞧，您已经填充了它：

print(result)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

如果您定义应该在左上方插入元素的位置，也可以使其更通用一些

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

但请注意，偏移量不要超过允许的范围。例如x_offset = 2，这将失败。

如果您有任意数量的维，则可以定义切片列表以插入原始数组。我发现有趣的是可以玩一下，并创建了一个填充函数，该函数可以填充（偏移）任意形状的数组，只要数组和引用的维数相同且偏移量不太大即可。

def pad(array, reference, offsets):
    """
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    """
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

和一些测试用例：

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Question 25

Very simple, you create an array containing zeros using the reference shape:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

and then insert the array where you need it:

result[:a.shape[0],:a.shape[1]] = a

and voila you have padded it:

print(result)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

You can also make it a bit more general if you define where your upper left element should be inserted

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

but then be careful that you don’t have offsets bigger than allowed. For x_offset = 2 for example this will fail.

If you have an arbitary number of dimensions you can define a list of slices to insert the original array. I’ve found it interesting to play around a bit and created a padding function that can pad (with offset) an arbitary shaped array as long as the array and reference have the same number of dimensions and the offsets are not too big.

def pad(array, reference, offsets):
    """
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    """
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

And some test cases:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Question 26

NumPy 1.7.0（numpy.pad添加时）现在已经很老了（它于2013年发布），因此即使问题要求使用不使用该功能的方法，我也认为了解使用可以实现该功能很有用numpy.pad。

实际上很简单：

>>> import numpy as np
>>> a = np.array([[ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.]])
>>> np.pad(a, [(0, 1), (0, 1)], mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

在这种情况下，我使用0的默认值mode='constant'。但是也可以通过显式传递它来指定它：

>>> np.pad(a, [(0, 1), (0, 1)], mode='constant', constant_values=0)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

以防第二个参数（[(0, 1), (0, 1)]）令人困惑：每个列表项（在本例中为元组）都对应于一个维度，并且其中的每个项都表示（第一个元素）之前和之后（第二个元素）的填充。所以：

[(0, 1), (0, 1)]
         ^^^^^^------ padding for second dimension
 ^^^^^^-------------- padding for first dimension

  ^------------------ no padding at the beginning of the first axis
     ^--------------- pad with one "value" at the end of the first axis.

在这种情况下，第一轴和第二轴的填充相同，因此也可以只传入2元组：

>>> np.pad(a, (0, 1), mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

如果前后的填充相同，甚至可以省略该元组（尽管在这种情况下不适用）：

>>> np.pad(a, 1, mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]])

或者，如果前后的填充相同但轴的填充不同，则也可以在内部元组中省略第二个参数：

>>> np.pad(a, [(1, ), (2, )], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

但是我倾向于始终使用显式的，因为这样做很容易犯错（当NumPys的期望与您的意图有所不同时）：

>>> np.pad(a, [1, 2], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

在这里，NumPy认为您希望在每个轴前填充1个元素，在每个轴后填充2个元素！即使您打算用轴1中的1个元素和轴2中的2个元素填充。

我使用元组列表进行填充，请注意，这只是“我的约定”，您也可以使用列表列表或元组的元组，甚至数组的元组。NumPy只是检查参数的长度（如果没有长度）和每个项目的长度（或者如果有长度）！

Question 27

NumPy 1.7.0 (when numpy.pad was added) is pretty old now (it was released in 2013) so even though the question asked for a way without using that function I thought it could be useful to know how that could be achieved using numpy.pad.

It’s actually pretty simple:

>>> import numpy as np
>>> a = np.array([[ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.]])
>>> np.pad(a, [(0, 1), (0, 1)], mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In this case I used that 0 is the default value for mode='constant'. But it could also be specified by passing it in explicitly:

>>> np.pad(a, [(0, 1), (0, 1)], mode='constant', constant_values=0)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

Just in case the second argument ([(0, 1), (0, 1)]) seems confusing: Each list item (in this case tuple) corresponds to a dimension and item therein represents the padding before (first element) and after (second element). So:

[(0, 1), (0, 1)]
         ^^^^^^------ padding for second dimension
 ^^^^^^-------------- padding for first dimension

  ^------------------ no padding at the beginning of the first axis
     ^--------------- pad with one "value" at the end of the first axis.

In this case the padding for the first and second axis are identical, so one could also just pass in the 2-tuple:

>>> np.pad(a, (0, 1), mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In case the padding before and after is identical one could even omit the tuple (not applicable in this case though):

>>> np.pad(a, 1, mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]])

Or if the padding before and after is identical but different for the axis, you could also omit the second argument in the inner tuples:

>>> np.pad(a, [(1, ), (2, )], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

However I tend to prefer to always use the explicit one, because it’s just to easy to make mistakes (when NumPys expectations differ from your intentions):

>>> np.pad(a, [1, 2], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

Here NumPy thinks you wanted to pad all axis with 1 element before and 2 elements after each axis! Even if you intended it to pad with 1 element in axis 1 and 2 elements for axis 2.

I used lists of tuples for the padding, note that this is just “my convention”, you could also use lists of lists or tuples of tuples, or even tuples of arrays. NumPy just checks the length of the argument (or if it doesn’t have a length) and the length of each item (or if it has a length)!

Question 28

我了解您的主要问题是您需要计算，d=b-a但数组的大小不同。无需中间填充c

您可以解决此问题而无需填充：

import numpy as np

a = np.array([[ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.]])

b = np.array([[ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.]])

d = b.copy()
d[:a.shape[0],:a.shape[1]] -=  a

print d

输出：

[[ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 3.  3.  3.  3.  3.  3.]]

Question 29

I understand that your main problem is that you need to calculate d=b-a but your arrays have different sizes. There is no need for an intermediate padded c

You can solve this without padding:

import numpy as np

a = np.array([[ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.]])

b = np.array([[ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.]])

d = b.copy()
d[:a.shape[0],:a.shape[1]] -=  a

print d

Output:

[[ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 3.  3.  3.  3.  3.  3.]]

Question 30

如果需要向数组添加1s的范围：

>>> mat = np.zeros((4,4), np.int32)
>>> mat
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])
>>> mat[0,:] = mat[:,0] = mat[:,-1] =  mat[-1,:] = 1
>>> mat
array([[1, 1, 1, 1],
       [1, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 1, 1]])

Question 31

In case you need to add a fence of 1s to an array:

>>> mat = np.zeros((4,4), np.int32)
>>> mat
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])
>>> mat[0,:] = mat[:,0] = mat[:,-1] =  mat[-1,:] = 1
>>> mat
array([[1, 1, 1, 1],
       [1, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 1, 1]])

Question 32

我知道我有点晚了，但是如果您想执行相对填充（aka边缘填充），可以通过以下方法实现它。请注意，分配的第一个实例将导致零填充，因此您可以将其用于零填充和相对填充（这是将原始数组的边值复制到填充数组中的地方）。

def replicate_padding(arr):
    """Perform replicate padding on a numpy array."""
    new_pad_shape = tuple(np.array(arr.shape) + 2) # 2 indicates the width + height to change, a (512, 512) image --> (514, 514) padded image.
    padded_array = np.zeros(new_pad_shape) #create an array of zeros with new dimensions
    
    # perform replication
    padded_array[1:-1,1:-1] = arr        # result will be zero-pad
    padded_array[0,1:-1] = arr[0]        # perform edge pad for top row
    padded_array[-1, 1:-1] = arr[-1]     # edge pad for bottom row
    padded_array.T[0, 1:-1] = arr.T[0]   # edge pad for first column
    padded_array.T[-1, 1:-1] = arr.T[-1] # edge pad for last column
    
    #at this point, all values except for the 4 corners should have been replicated
    padded_array[0][0] = arr[0][0]     # top left corner
    padded_array[-1][0] = arr[-1][0]   # bottom left corner
    padded_array[0][-1] = arr[0][-1]   # top right corner 
    padded_array[-1][-1] = arr[-1][-1] # bottom right corner

    return padded_array

复杂度分析：

对此的最佳解决方案是numpy的pad方法。在平均运行5次之后，具有相对填充的np.pad仅8%比上面定义的函数好。这表明这是相对填充和零填充的最佳方法。


#My method, replicate_padding
start = time.time()
padded = replicate_padding(input_image)
end = time.time()
delta0 = end - start

#np.pad with edge padding
start = time.time()
padded = np.pad(input_image, 1, mode='edge')
end = time.time()
delta = end - start


print(delta0) # np Output: 0.0008790493011474609 
print(delta)  # My Output: 0.0008130073547363281
print(100*((delta0-delta)/delta)) # Percent difference: 8.12316715542522%

Question 33

I know I’m a bit late to this, but in case you wanted to perform relative padding (aka edge padding), here’s how you can implement it. Note that the very first instance of assignment results in zero-padding, so you can use this for both zero-padding and relative padding (this is where you copy the edge values of the original array into the padded array).

def replicate_padding(arr):
    """Perform replicate padding on a numpy array."""
    new_pad_shape = tuple(np.array(arr.shape) + 2) # 2 indicates the width + height to change, a (512, 512) image --> (514, 514) padded image.
    padded_array = np.zeros(new_pad_shape) #create an array of zeros with new dimensions
    
    # perform replication
    padded_array[1:-1,1:-1] = arr        # result will be zero-pad
    padded_array[0,1:-1] = arr[0]        # perform edge pad for top row
    padded_array[-1, 1:-1] = arr[-1]     # edge pad for bottom row
    padded_array.T[0, 1:-1] = arr.T[0]   # edge pad for first column
    padded_array.T[-1, 1:-1] = arr.T[-1] # edge pad for last column
    
    #at this point, all values except for the 4 corners should have been replicated
    padded_array[0][0] = arr[0][0]     # top left corner
    padded_array[-1][0] = arr[-1][0]   # bottom left corner
    padded_array[0][-1] = arr[0][-1]   # top right corner 
    padded_array[-1][-1] = arr[-1][-1] # bottom right corner

    return padded_array

Complexity Analysis:

The optimal solution for this is numpy’s pad method. After averaging for 5 runs, np.pad with relative padding is only 8% better than the function defined above. This shows that this is fairly an optimal method for relative and zero-padding padding.


#My method, replicate_padding
start = time.time()
padded = replicate_padding(input_image)
end = time.time()
delta0 = end - start

#np.pad with edge padding
start = time.time()
padded = np.pad(input_image, 1, mode='edge')
end = time.time()
delta = end - start


print(delta0) # np Output: 0.0008790493011474609 
print(delta)  # My Output: 0.0008130073547363281
print(100*((delta0-delta)/delta)) # Percent difference: 8.12316715542522%

Question 34

I am trying to install python and a series of packages onto a 64bit windows 7 desktop. I have installed Python 3.4, have Microsoft Visual Studio C++ installed, and have successfully installed numpy, pandas and a few others. I am getting the following error when trying to install scipy;

numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

I am using pip install offline, the install command I am using is;

pip install --no-index --find-links="S:\python\scipy 0.15.0" scipy

I have read the posts on here about requiring a compiler which if I understand correctly is the VS C++ compiler. I am using the 2010 version as I am using Python 3.4. This has worked for other packages.

Do I have to use the window binary or is there a way I can get pip install to work?

Many thanks for the help

Question 35

The solution to the absence of BLAS/LAPACK libraries for SciPy installations on Windows 7 64-bit is described here:

http://www.scipy.org/scipylib/building/windows.html

Installing Anaconda is much easier, but you still don’t get Intel MKL or GPU support without paying for it (they are in the MKL Optimizations and Accelerate add-ons for Anaconda – I’m not sure if they use PLASMA and MAGMA either). With MKL optimization, numpy has outperformed IDL on large matrix computations by 10-fold. MATLAB uses the Intel MKL library internally and supports GPU computing, so one might as well use that for the price if they’re a student ($50 for MATLAB + $10 for the Parallel Computing Toolbox). If you get the free trial of Intel Parallel Studio, it comes with the MKL library, as well as C++ and FORTRAN compilers that will come in handy if you want to install BLAS and LAPACK from MKL or ATLAS on Windows:

http://icl.cs.utk.edu/lapack-for-windows/lapack/

Parallel Studio also comes with the Intel MPI library, useful for cluster computing applications and their latest Xeon processsors. While the process of building BLAS and LAPACK with MKL optimization is not trivial, the benefits of doing so for Python and R are quite large, as described in this Intel webinar:

https://software.intel.com/en-us/articles/powered-by-mkl-accelerating-numpy-and-scipy-performance-with-intel-mkl-python

Anaconda and Enthought have built businesses out of making this functionality and a few other things easier to deploy. However, it is freely available to those willing to do a little work (and a little learning).

For those who use R, you can now get MKL optimized BLAS and LAPACK for free with R Open from Revolution Analytics.

EDIT: Anaconda Python now ships with MKL optimization, as well as support for a number of other Intel library optimizations through the Intel Python distribution. However, GPU support for Anaconda in the Accelerate library (formerly known as NumbaPro) is still over $10k USD! The best alternatives for that are probably PyCUDA and scikit-cuda, as copperhead (essentially a free version of Anaconda Accelerate) unfortunately ceased development five years ago. It can be found here if anybody wants to pick up where they left off.

Question 36

The following link should solve all problems with Windows and SciPy; just choose the appropriate download. I was able to pip install the package with no problems. Every other solution I have tried gave me big headaches.

Source: http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Command:

 pip install [Local File Location]\[Your specific file such as scipy-0.16.0-cp27-none-win_amd64.whl]

This assumes you have installed the following already:

Install Visual Studio 2015/2013 with Python Tools
(Is integrated into the setup options on install of 2015)
Install Visual Studio C++ Compiler for Python
Source: http://www.microsoft.com/en-us/download/details.aspx?id=44266
File Name: VCForPython27.msi
Install Python Version of choice
Source: python.org
File Name (e.g.): python-2.7.10.amd64.msi

Question 37

My python’s version is 2.7.10, 64-bits Windows 7.

Download scipy-0.18.0-cp27-cp27m-win_amd64.whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
Open cmd
Make sure scipy-0.18.0-cp27-cp27m-win_amd64.whl is in cmd‘s current directory, then type pip install scipy-0.18.0-cp27-cp27m-win_amd64.whl.

It will be successful installed.

Question 38

Sorry to necro, but this is the first google search result. This is the solution that worked for me:

Download numpy+mkl wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy. Use the version that is the same as your python version (check using python -V). Eg. if your python is 3.5.2, download the wheel which shows cp35
Open command prompt and navigate to the folder where you downloaded the wheel. Run the command: pip install [file name of wheel]
Download the SciPy wheel from: http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy (similar to the step above).
As above, pip install [file name of wheel]

Question 39

This was the order I got everything working. The second point is the most important one. Scipy needs Numpy+MKL, not just vanilla Numpy.

Install python 3.5
pip install "file path" (download Numpy+MKL wheel from here http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy)
pip install scipy

Question 40

If you are working with Windows and Visual Studio 2015

Install miniconda http://conda.pydata.org/miniconda.html
Change your python environment to python 3.4 (32bit)
click on python environment 3.4 and open cmd

Enter the following commands

“conda install numpy”
“conda install pandas”
“conda install scipy”

Question 41

My 5 cents; You can just install the entire (pre-compiled) SciPy from https://github.com/scipy/scipy/releases

Good Luck!

Question 42

Simple and Fast Installation of Scipy in Windows

From http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy download the correct Scipy package for your Python version (e.g. the correct package for python 3.5 and Windows x64 is scipy-0.19.1-cp35-cp35m-win_amd64.whl).
Open cmd inside the directory containing the downloaded Scipy package.
Type pip install <<your-scipy-package-name>> (e.g. pip install scipy-0.19.1-cp35-cp35m-win_amd64.whl).

Question 43

For python27 1、Install numpy + mkl（download link:http://www.lfd.uci.edu/~gohlke/pythonlibs/） 2、install scipy (the same site) OK!

Question 44

Intel now provides a Python distribution for Linux / Windows / OS X for free called “Intel distribution for Python“.

Its a complete Python distribution (e.g. python.exe is included in the package) which includes some pre-installed modules compiled against Intel’s MKL (Math Kernel Library) and thus optimized for faster performance.

The distribution includes the modules NumPy, SciPy, scikit-learn, pandas, matplotlib, Numba, tbb, pyDAAL, Jupyter, and others. The drawback is a bit of lateness in upgrading to more recent versions of Python. For example as of today (1 May 2017) the distribution provides CPython 3.5 while the 3.6 version is already out. But if you don’t need the new features they should be perfectly fine.

Question 45

I was also getting same error while installing scikit-fuzzy. I resolved error as follows:

Install Numpy, a whl file
Install Scipy, again a whl file

choose file according to python version like amd64 for python3 and other win32 file for the python27

then pip install --user skfuzzy

I hope, It will work for you

Question 46

Solutions:

As specified in many answers, download NumPy and SciPy whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/ and install with
```
pip install <whl_location>
```
Building BLAS/LAPACK from source
Using Miniconda.

Refer:

Question 47

Using resources at http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy will solve the problem. However, you should be careful about versions compatibility. After trying for several times, finally I decided to uninstall python and then installed a fresh version of python along with numpy and then installed scipy and this resolved my problem.

Question 48

install intel’s distribution of python https://software.intel.com/en-us/intel-distribution-for-python

better of for distribution of python should contain them initially

Question 49

do this, it solved for me pip install -U scikit-learn

Question 50

How can I check whether a numpy array is empty or not?

I used the following code, but this fails if the array contains a zero.

if not self.Definition.all():

Is this the solution?

if self.Definition == array( [] ):

Question 51

You can always take a look at the .size attribute. It is defined as an integer, and is zero (0) when there are no elements in the array:

import numpy as np
a = np.array([])

if a.size == 0:
    # Do something when `a` is empty

Question 52

http://www.scipy.org/Tentative_NumPy_Tutorial#head-6a1bc005bd80e1b19f812e1e64e0d25d50f99fe2

NumPy’s main object is the homogeneous multidimensional array. In Numpy dimensions are called axes. The number of axes is rank. Numpy’s array class is called ndarray. It is also known by the alias array. The more important attributes of an ndarray object are:

ndarray.ndim
the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.

ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.

Question 53

One caveat, though. Note that np.array(None).size returns 1! This is because a.size is equivalent to np.prod(a.shape), np.array(None).shape is (), and an empty product is 1.

>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0

Therefore, I use the following to test if a numpy array has elements:

>>> def elements(array):
    ...     return array.ndim and array.size

>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24

Question 54

Why would we want to check if an array is empty? Arrays don’t grow or shrink in the same that lists do. Starting with a ’empty’ array, and growing with np.append is a frequent novice error.

Using a list in if alist: hinges on its boolean value:

In [102]: bool([])                                                                       
Out[102]: False
In [103]: bool([1])                                                                      
Out[103]: True

But trying to do the same with an array produces (in version 1.18):

In [104]: bool(np.array([]))                                                             
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value 
   of an empty array is ambiguous. Returning False, but in 
   future this will result in an error. Use `array.size > 0` to 
   check that an array is not empty.
  #!/usr/bin/python3
Out[104]: False

In [105]: bool(np.array([1]))                                                            
Out[105]: True

and bool(np.array([1,2]) produces the infamous ambiguity error.

Question 55

I have trouble properly understanding numpy.where() despite reading the doc, this post and this other post.

Can someone provide step-by-step commented examples with 1D and 2D arrays?

Question 56

After fiddling around for a while, I figured things out, and am posting them here hoping it will help others.

Intuitively, np.where is like asking “tell me where in this array, entries satisfy a given condition“.

>>> a = np.arange(5,10)
>>> np.where(a < 8)       # tell me where in a, entries are < 8
(array([0, 1, 2]),)       # answer: entries indexed by 0, 1, 2

It can also be used to get entries in array that satisfy the condition:

>>> a[np.where(a < 8)] 
array([5, 6, 7])          # selects from a entries 0, 1, 2

When a is a 2d array, np.where() returns an array of row idx’s, and an array of col idx’s:

>>> a = np.arange(4,10).reshape(2,3)
array([[4, 5, 6],
       [7, 8, 9]])
>>> np.where(a > 8)
(array(1), array(2))

As in the 1d case, we can use np.where() to get entries in the 2d array that satisfy the condition:

>>> a[np.where(a > 8)] # selects from a entries 0, 1, 2

array([9])

Note, when a is 1d, np.where() still returns an array of row idx’s and an array of col idx’s, but columns are of length 1, so latter is empty array.

Question 57

Here is a little more fun. I’ve found that very often NumPy does exactly what I wish it would do – sometimes it’s faster for me to just try things than it is to read the docs. Actually a mixture of both is best.

I think your answer is fine (and it’s OK to accept it if you like). This is just “extra”.

import numpy as np

a = np.arange(4,10).reshape(2,3)

wh = np.where(a>7)
gt = a>7
x  = np.where(gt)

print "wh: ", wh
print "gt: ", gt
print "x:  ", x

gives:

wh:  (array([1, 1]), array([1, 2]))
gt:  [[False False False]
      [False  True  True]]
x:   (array([1, 1]), array([1, 2]))

… but:

print "a[wh]: ", a[wh]
print "a[gt]  ", a[gt]
print "a[x]:  ", a[x]

gives:

a[wh]:  [8 9]
a[gt]   [8 9]
a[x]:   [8 9]

Question 58

How can I build a numpy array out of a generator object?

Let me illustrate the problem:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In this instance, gimme() is the generator whose output I’d like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme())), but I don’t want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?

Question 59

Numpy arrays require their length to be set explicitly at creation time, unlike python lists. This is necessary so that space for each item can be consecutively allocated in memory. Consecutive allocation is the key feature of numpy arrays: this combined with native code implementation let operations on them execute much quicker than regular lists.

Keeping this in mind, it is technically impossible to take a generator object and turn it into an array unless you either:

can predict how many elements it will yield when run:

my_array = numpy.empty(predict_length())
for i, el in enumerate(gimme()): my_array[i] = el

are willing to store its elements in an intermediate list :
```
my_array = numpy.array(list(gimme()))
```
can make two identical generators, run through the first one to find the total length, initialize the array, and then run through the generator again to find each element:
```
length = sum(1 for el in gimme())
my_array = numpy.empty(length)
for i, el in enumerate(gimme()): my_array[i] = el
```

1 is probably what you’re looking for. 2 is space inefficient, and 3 is time inefficient (you have to go through the generator twice).

Question 60

One google behind this stackoverflow result, I found that there is a numpy.fromiter(data, dtype, count). The default count=-1 takes all elements from the iterable. It requires a dtype to be set explicitly. In my case, this worked:

numpy.fromiter(something.generate(from_this_input), float)

Question 61

While you can create a 1D array from a generator with numpy.fromiter(), you can create an N-D array from a generator with numpy.stack:

>>> mygen = (np.ones((5, 3)) for _ in range(10))
>>> x = numpy.stack(mygen)
>>> x.shape
(10, 5, 3)

It also works for 1D arrays:

>>> numpy.stack(2*i for i in range(10))
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Note that numpy.stack is internally consuming the generator and creating an intermediate list with arrays = [asanyarray(arr) for arr in arrays]. The implementation can be found here.

[WARNING] As pointed out by @Joseh Seedy, Numpy 1.16 raises a warning that defeats usage of such function with generators.

Question 62

Somewhat tangential, but if your generator is a list comprehension, you can use numpy.where to more effectively get your result (I discovered this in my own code after seeing this post)

Question 63

The vstack, hstack, and dstack functions can take as input generators that yield multi-dimensional arrays.

Question 64

How do I take multiple lists and put them as different columns in a python dataframe? I tried this solution but had some trouble.

Attempt 1:

Have three lists, and zip them together and use that res = zip(lst1,lst2,lst3)
Yields just one column

Attempt 2:

percentile_list = pd.DataFrame({'lst1Tite' : [lst1],
                                'lst2Tite' : [lst2],
                                'lst3Tite' : [lst3] }, 
                                columns=['lst1Tite','lst1Tite', 'lst1Tite'])

yields either one row by 3 columns (the way above) or if I transpose it is 3 rows and 1 column

How do I get a 100 row (length of each independent list) by 3 column (three lists) pandas dataframe?

问题：如何计算累积正态分布？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：用python熊猫装箱列

回答 0

回答 1

使用numba模块加速。

Using numba module for speed up.

问题：numpy数组的argmax返回非固定索引

回答 0

回答 1

回答 2

问题：python如何用零填充numpy数组

回答 0

回答 1

回答 2

回答 3

回答 4

复杂度分析：

Complexity Analysis:

问题：Windows Scipy安装：未找到Lapack / Blas资源

回答 0

回答 1

回答 2

我的python版本是2.7.10，64位Windows 7。

My python’s version is 2.7.10, 64-bits Windows 7.

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

问题：如何检查numpy数组是否为空？

回答 0

回答 1

回答 2

回答 3

问题：numpy.where（）详细的逐步说明/示例

回答 0

回答 1

问题：如何从生成器构建numpy数组？

回答 0

回答 1

回答 2

回答 3

回答 4

问题：将多个列表放入数据框

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：如何让PyLint识别numpy成员？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

使用`numba`模块加速。

Using `numba` module for speed up.