标签归档:numpy

如何计算累积正态分布?

问题:如何计算累积正态分布?

我正在寻找Numpy或Scipy(或任何严格的Python库)中的函数,该函数将为我提供Python中的累积正态分布函数。

I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.


回答 0

这是一个例子:

>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

换句话说,大约95%的标准法线间隔位于两个标准偏差之内,以标准平均值零为中心。

如果需要逆CDF:

>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)

Here’s an example:

>>> from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

In other words, approximately 95% of the standard normal interval lies within two standard deviations, centered on a standard mean of zero.

If you need the inverse CDF:

>>> norm.ppf(norm.cdf(1.96))
array(1.9599999999999991)

回答 1

回答这个问题可能为时已晚,但是由于Google仍然领导这里的人们,因此我决定在此处编写解决方案。

也就是说,自Python 2.7起,该math库集成了error函数math.erf(x)

erf()函数可用于计算传统的统计函数,例如累积标准正态分布:

from math import *
def phi(x):
    #'Cumulative distribution function for the standard normal distribution'
    return (1.0 + erf(x / sqrt(2.0))) / 2.0

参考:

https://docs.python.org/2/library/math.html

https://docs.python.org/3/library/math.html

误差函数和标准正态分布函数有何关系?

It may be too late to answer the question but since Google still leads people here, I decide to write my solution here.

That is, since Python 2.7, the math library has integrated the error function math.erf(x)

The erf() function can be used to compute traditional statistical functions such as the cumulative standard normal distribution:

from math import *
def phi(x):
    #'Cumulative distribution function for the standard normal distribution'
    return (1.0 + erf(x / sqrt(2.0))) / 2.0

Ref:

https://docs.python.org/2/library/math.html

https://docs.python.org/3/library/math.html

How are the Error Function and Standard Normal distribution function related?


回答 2

从这里改编http://mail.python.org/pipermail/python-list/2000-June/039873.html

from math import *
def erfcc(x):
    """Complementary error function."""
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def ncdf(x):
    return 1. - 0.5*erfcc(x/(2**0.5))

Adapted from here http://mail.python.org/pipermail/python-list/2000-June/039873.html

from math import *
def erfcc(x):
    """Complementary error function."""
    z = abs(x)
    t = 1. / (1. + 0.5*z)
    r = t * exp(-z*z-1.26551223+t*(1.00002368+t*(.37409196+
        t*(.09678418+t*(-.18628806+t*(.27886807+
        t*(-1.13520398+t*(1.48851587+t*(-.82215223+
        t*.17087277)))))))))
    if (x >= 0.):
        return r
    else:
        return 2. - r

def ncdf(x):
    return 1. - 0.5*erfcc(x/(2**0.5))

回答 3

以Unknown的示例为基础,在许多库中实现的功能normdist()的Python等效项为:

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y

To build upon Unknown’s example, the Python equivalent of the function normdist() implemented in a lot of libraries would be:

def normcdf(x, mu, sigma):
    t = x-mu;
    y = 0.5*erfcc(-t/(sigma*sqrt(2.0)));
    if y>1.0:
        y = 1.0;
    return y

def normpdf(x, mu, sigma):
    u = (x-mu)/abs(sigma)
    y = (1/(sqrt(2*pi)*abs(sigma)))*exp(-u*u/2)
    return y

def normdist(x, mu, sigma, f):
    if f:
        y = normcdf(x,mu,sigma)
    else:
        y = normpdf(x,mu,sigma)
    return y

回答 4

从开始Python 3.8,标准库将NormalDist对象作为statistics模块的一部分提供。

对于给定的均值()和标准差(),它可用于获取累积分布函数cdf-随机样本X小于或等于x的概率):musigma

from statistics import NormalDist

NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796

对于标准正态分布mu = 0sigma = 1)可以简化:

NormalDist().cdf(1.96)
# 0.9750021048517796

NormalDist().cdf(-1.96)
# 0.024997895148220428

Starting Python 3.8, the standard library provides the NormalDist object as part of the statistics module.

It can be used to get the cumulative distribution function (cdf – probability that a random sample X will be less than or equal to x) for a given mean (mu) and standard deviation (sigma):

from statistics import NormalDist

NormalDist(mu=0, sigma=1).cdf(1.96)
# 0.9750021048517796

Which can be simplified for the standard normal distribution (mu = 0 and sigma = 1):

NormalDist().cdf(1.96)
# 0.9750021048517796

NormalDist().cdf(-1.96)
# 0.024997895148220428

回答 5

Alex的答案为您显示了标准正态分布的解决方案(均值= 0,标准差= 1)。如果您使用mean和进行正态分布std(是sqr(var)),并且要计算:

from scipy.stats import norm

# cdf(x < val)
print norm.cdf(val, m, s)

# cdf(x > val)
print 1 - norm.cdf(val, m, s)

# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)

了解更多关于此CDF和SciPy的执行正态分布的许多公式在这里

Alex’s answer shows you a solution for standard normal distribution (mean = 0, standard deviation = 1). If you have normal distribution with mean and std (which is sqr(var)) and you want to calculate:

from scipy.stats import norm

# cdf(x < val)
print norm.cdf(val, m, s)

# cdf(x > val)
print 1 - norm.cdf(val, m, s)

# cdf(v1 < x < v2)
print norm.cdf(v2, m, s) - norm.cdf(v1, m, s)

Read more about cdf here and scipy implementation of normal distribution with many formulas here.


回答 6

从上方拍摄:

from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

对于两尾测试:

Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087

Taken from above:

from scipy.stats import norm
>>> norm.cdf(1.96)
0.9750021048517795
>>> norm.cdf(-1.96)
0.024997895148220435

For a two-tailed test:

Import numpy as np
z = 1.96
p_value = 2 * norm.cdf(-np.abs(z))
0.04999579029644087

回答 7

像这样简单:

import math
def my_cdf(x):
    return 0.5*(1+math.erf(x/math.sqrt(2)))

我在此页面中找到了公式https://www.danielsoper.com/statcalc/formulas.aspx?id=55

Simple like this:

import math
def my_cdf(x):
    return 0.5*(1+math.erf(x/math.sqrt(2)))

I found the formula in this page https://www.danielsoper.com/statcalc/formulas.aspx?id=55


回答 8

当Google针对搜索netlogo pdf提供此答案时,这是上述python代码的netlogo版本

    ;; 正态分布累积密度函数
    报告normcdf [x mu sigma]
        让TX-亩
        让y 0.5 * erfcc [-t /(sigma * sqrt 2.0)]
        如果(y> 1.0)[设置y 1.0]
        报告y
    结束

    ;; 正态分布概率密度函数
    报告normpdf [x mu sigma]
        设u =(x-mu)/ abs sigma
        令y = 1 /(sqrt [2 * pi] * abs sigma)* exp(-u * u / 2.0)
        报告y
    结束

    ;; 互补误差函数
    报告erfcc [x]
        让z abs x
        令t 1.0 /(1.0 + 0.5 * z)
        令rt * exp(-z * z -1.26551223 + t *(1.00002368 + t *(0.37409196 +
            t *(0.09678418 + t *(-0.18628806 + t *(.27886807 +
            t *(-1.13520398 + t *(1.48851587 + t *(-0.82215223 +
            t * .17087277)))))))))
        ifelse(x> = 0)[报告r] [报告2.0-r]
    结束

As Google gives this answer for the search netlogo pdf, here’s the netlogo version of the above python code


    ;; Normal distribution cumulative density function
    to-report normcdf [x mu sigma]
        let t x - mu
        let y 0.5 * erfcc [ - t / ( sigma * sqrt 2.0)]
        if ( y > 1.0 ) [ set y 1.0 ]
        report y
    end

    ;; Normal distribution probability density function
    to-report normpdf [x mu sigma]
        let u = (x - mu) / abs sigma
        let y = 1 / ( sqrt [2 * pi] * abs sigma ) * exp ( - u * u / 2.0)
        report y
    end

    ;; Complementary error function
    to-report erfcc [x]
        let z abs x
        let t 1.0 / (1.0 + 0.5 * z)
        let r t *  exp ( - z * z -1.26551223 + t * (1.00002368 + t * (0.37409196 +
            t * (0.09678418 + t * (-0.18628806 + t * (.27886807 +
            t * (-1.13520398 +t * (1.48851587 +t * (-0.82215223 +
            t * .17087277 )))))))))
        ifelse (x >= 0) [ report r ] [report 2.0 - r]
    end


用python熊猫装箱列

问题:用python熊猫装箱列

我有一个带有数值的数据框列:

df['percentage'].head()
46.5
44.2
100.0
42.12

我想查看该列作为箱数:

bins = [0, 1, 5, 10, 25, 50, 100]

我如何将结果作为垃圾箱value counts

[0, 1] bin amount
[1, 5] etc 
[5, 10] etc 
......

I have a Data Frame column with numeric values:

df['percentage'].head()
46.5
44.2
100.0
42.12

I want to see the column as bin counts:

bins = [0, 1, 5, 10, 25, 50, 100]

How can I get the result as bins with their value counts?

[0, 1] bin amount
[1, 5] etc 
[5, 10] etc 
......

回答 0

您可以使用pandas.cut

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = pd.cut(df['percentage'], bins)
print (df)
   percentage     binned
0       46.50   (25, 50]
1       44.20   (25, 50]
2      100.00  (50, 100]
3       42.12   (25, 50]

bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)
print (df)
   percentage binned
0       46.50      5
1       44.20      5
2      100.00      6
3       42.12      5

numpy.searchsorted

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = np.searchsorted(bins, df['percentage'].values)
print (df)
   percentage  binned
0       46.50       5
1       44.20       5
2      100.00       6
3       42.12       5

…然后value_countsor groupby和合计size

s = pd.cut(df['percentage'], bins=bins).value_counts()
print (s)
(25, 50]     3
(50, 100]    1
(10, 25]     0
(5, 10]      0
(1, 5]       0
(0, 1]       0
Name: percentage, dtype: int64

s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()
print (s)
percentage
(0, 1]       0
(1, 5]       0
(5, 10]      0
(10, 25]     0
(25, 50]     3
(50, 100]    1
dtype: int64

默认cut返回categorical

Series像这样的方法Series.value_counts()将使用所有类别,即使数据中不存在某些类别,也可以使用categorical 操作

You can use pandas.cut:

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = pd.cut(df['percentage'], bins)
print (df)
   percentage     binned
0       46.50   (25, 50]
1       44.20   (25, 50]
2      100.00  (50, 100]
3       42.12   (25, 50]

bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
df['binned'] = pd.cut(df['percentage'], bins=bins, labels=labels)
print (df)
   percentage binned
0       46.50      5
1       44.20      5
2      100.00      6
3       42.12      5

Or numpy.searchsorted:

bins = [0, 1, 5, 10, 25, 50, 100]
df['binned'] = np.searchsorted(bins, df['percentage'].values)
print (df)
   percentage  binned
0       46.50       5
1       44.20       5
2      100.00       6
3       42.12       5

…and then value_counts or groupby and aggregate size:

s = pd.cut(df['percentage'], bins=bins).value_counts()
print (s)
(25, 50]     3
(50, 100]    1
(10, 25]     0
(5, 10]      0
(1, 5]       0
(0, 1]       0
Name: percentage, dtype: int64

s = df.groupby(pd.cut(df['percentage'], bins=bins)).size()
print (s)
percentage
(0, 1]       0
(1, 5]       0
(5, 10]      0
(10, 25]     0
(25, 50]     3
(50, 100]    1
dtype: int64

By default cut return categorical.

Series methods like Series.value_counts() will use all categories, even if some categories are not present in the data, operations in categorical.


回答 1

使用numba模块加速。

在大型数据集(500k >pd.cut上,对数据进行合并可能会非常慢。

我编写了自己的函数,numba并进行了及时编译,这大约16x要快一些:

from numba import njit

@njit
def cut(arr):
    bins = np.empty(arr.shape[0])
    for idx, x in enumerate(arr):
        if (x >= 0) & (x < 1):
            bins[idx] = 1
        elif (x >= 1) & (x < 5):
            bins[idx] = 2
        elif (x >= 5) & (x < 10):
            bins[idx] = 3
        elif (x >= 10) & (x < 25):
            bins[idx] = 4
        elif (x >= 25) & (x < 50):
            bins[idx] = 5
        elif (x >= 50) & (x < 100):
            bins[idx] = 6
        else:
            bins[idx] = 7

    return bins
cut(df['percentage'].to_numpy())

# array([5., 5., 7., 5.])

可选:您还可以将其作为字符串映射到垃圾箱:

a = cut(df['percentage'].to_numpy())

conversion_dict = {1: 'bin1',
                   2: 'bin2',
                   3: 'bin3',
                   4: 'bin4',
                   5: 'bin5',
                   6: 'bin6',
                   7: 'bin7'}

bins = list(map(conversion_dict.get, a))

# ['bin5', 'bin5', 'bin7', 'bin5']

速度比较

# create dataframe of 8 million rows for testing
dfbig = pd.concat([df]*2000000, ignore_index=True)

dfbig.shape

# (8000000, 1)
%%timeit
cut(dfbig['percentage'].to_numpy())

# 38 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
pd.cut(dfbig['percentage'], bins=bins, labels=labels)

# 215 ms ± 9.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

Using numba module for speed up.

On big datasets (500k >) pd.cut can be quite slow for binning data.

I wrote my own function in numba with just in time compilation, which is roughly 16x faster:

from numba import njit

@njit
def cut(arr):
    bins = np.empty(arr.shape[0])
    for idx, x in enumerate(arr):
        if (x >= 0) & (x < 1):
            bins[idx] = 1
        elif (x >= 1) & (x < 5):
            bins[idx] = 2
        elif (x >= 5) & (x < 10):
            bins[idx] = 3
        elif (x >= 10) & (x < 25):
            bins[idx] = 4
        elif (x >= 25) & (x < 50):
            bins[idx] = 5
        elif (x >= 50) & (x < 100):
            bins[idx] = 6
        else:
            bins[idx] = 7

    return bins
cut(df['percentage'].to_numpy())

# array([5., 5., 7., 5.])

Optional: you can also map it to bins as strings:

a = cut(df['percentage'].to_numpy())

conversion_dict = {1: 'bin1',
                   2: 'bin2',
                   3: 'bin3',
                   4: 'bin4',
                   5: 'bin5',
                   6: 'bin6',
                   7: 'bin7'}

bins = list(map(conversion_dict.get, a))

# ['bin5', 'bin5', 'bin7', 'bin5']

Speed comparison:

# create dataframe of 8 million rows for testing
dfbig = pd.concat([df]*2000000, ignore_index=True)

dfbig.shape

# (8000000, 1)
%%timeit
cut(dfbig['percentage'].to_numpy())

# 38 ms ± 616 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
%%timeit
bins = [0, 1, 5, 10, 25, 50, 100]
labels = [1,2,3,4,5,6]
pd.cut(dfbig['percentage'], bins=bins, labels=labels)

# 215 ms ± 9.76 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

numpy数组的argmax返回非固定索引

问题:numpy数组的argmax返回非固定索引

我正在尝试获取Numpy数组中最大元素的索引。可以使用来完成numpy.argmax。我的问题是,我想在整个数组中找到最大的元素并获取其索引。

numpy.argmax 既可以沿一个轴(不是我想要的)应用,也可以沿扁平数组(这是我想要的一种)应用。

我的问题是,当我想要多维索引时,使用numpy.argmaxwithaxis=None返回平面索引。

我可以divmod用来获取非固定索引,但这很难看。有什么更好的方法吗?

I’m trying to get the indices of the maximum element in a Numpy array. This can be done using numpy.argmax. My problem is, that I would like to find the biggest element in the whole array and get the indices of that.

numpy.argmax can be either applied along one axis, which is not what I want, or on the flattened array, which is kind of what I want.

My problem is that using numpy.argmax with axis=None returns the flat index when I want the multi-dimensional index.

I could use divmod to get a non-flat index but this feels ugly. Is there any better way of doing this?


回答 0

您可以使用numpy.unravel_index()以下结果numpy.argmax()

>>> a = numpy.random.random((10, 10))
>>> numpy.unravel_index(a.argmax(), a.shape)
(6, 7)
>>> a[6, 7] == a.max()
True

You could use numpy.unravel_index() on the result of numpy.argmax():

>>> a = numpy.random.random((10, 10))
>>> numpy.unravel_index(a.argmax(), a.shape)
(6, 7)
>>> a[6, 7] == a.max()
True

回答 1

np.where(a==a.max())

返回最大元素的坐标,但必须将数组解析两次。

>>> a = np.array(((3,4,5),(0,1,2)))
>>> np.where(a==a.max())
(array([0]), array([2]))

与相比argmax,这将返回等于最大值的所有元素的坐标。argmax仅返回其中之一(np.ones(5).argmax()return 0)。

np.where(a==a.max())

returns coordinates of the maximum element(s), but has to parse the array twice.

>>> a = np.array(((3,4,5),(0,1,2)))
>>> np.where(a==a.max())
(array([0]), array([2]))

This, comparing to argmax, returns coordinates of all elements equal to the maximum. argmax returns just one of them (np.ones(5).argmax() returns 0).


回答 2

要获取所有出现的最大值的非平坦索引,可以使用代替来稍微修改eumiro的答案argwherewhere

np.argwhere(a==a.max())

>>> a = np.array([[1,2,4],[4,3,4]])
>>> np.argwhere(a==a.max())
array([[0, 2],
       [1, 0],
       [1, 2]])

To get the non-flat index of all occurrences of the maximum value, you can modify eumiro’s answer slightly by using argwhere instead of where:

np.argwhere(a==a.max())

>>> a = np.array([[1,2,4],[4,3,4]])
>>> np.argwhere(a==a.max())
array([[0, 2],
       [1, 0],
       [1, 2]])

python如何用零填充numpy数组

问题:python如何用零填充numpy数组

我想知道如何使用python 2.6.6和numpy版本1.5.0用零填充2D numpy数组。抱歉! 但是这些是我的局限性。因此我不能使用np.pad。例如,我想a用零填充以使其形状匹配b。我想这样做的原因是我可以这样做:

b-a

这样

>>> a
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> b
array([[ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.]])
>>> c
array([[1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0]])

我能想到的唯一方法是追加,但这看起来很丑。是否有可能使用更清洁的解决方案b.shape

编辑,谢谢MSeiferts的答案。我必须清理一下,这就是我得到的:

def pad(array, reference_shape, offsets):
    """
    array: Array to be padded
    reference_shape: tuple of size of ndarray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result

I want to know how I can pad a 2D numpy array with zeros using python 2.6.6 with numpy version 1.5.0. Sorry! But these are my limitations. Therefore I cannot use np.pad. For example, I want to pad a with zeros such that its shape matches b. The reason why I want to do this is so I can do:

b-a

such that

>>> a
array([[ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.],
       [ 1.,  1.,  1.,  1.,  1.]])
>>> b
array([[ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.],
       [ 3.,  3.,  3.,  3.,  3.,  3.]])
>>> c
array([[1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [1, 1, 1, 1, 1, 0],
       [0, 0, 0, 0, 0, 0]])

The only way I can think of doing this is appending, however this seems pretty ugly. is there a cleaner solution possibly using b.shape?

Edit, Thank you to MSeiferts answer. I had to clean it up a bit, and this is what I got:

def pad(array, reference_shape, offsets):
    """
    array: Array to be padded
    reference_shape: tuple of size of ndarray to create
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    will throw a ValueError if offsets is too big and the reference_shape cannot handle the offsets
    """

    # Create an array of zeros with the reference shape
    result = np.zeros(reference_shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offsets[dim], offsets[dim] + array.shape[dim]) for dim in range(array.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = array
    return result

回答 0

很简单,使用参考形状创建一个包含零的数组:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

然后在需要的地方插入数组:

result[:a.shape[0],:a.shape[1]] = a

瞧,您已经填充了它:

print(result)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

如果您定义应该在左上方插入元素的位置,也可以使其更通用一些

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

但请注意,偏移量不要超过允许的范围。例如x_offset = 2,这将失败。


如果您有任意数量的维,则可以定义切片列表以插入原始数组。我发现有趣的是可以玩一下,并创建了一个填充函数,该函数可以填充(偏移)任意形状的数组,只要数组和引用的维数相同且偏移量不太大即可。

def pad(array, reference, offsets):
    """
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    """
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

和一些测试用例:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

Very simple, you create an array containing zeros using the reference shape:

result = np.zeros(b.shape)
# actually you can also use result = np.zeros_like(b) 
# but that also copies the dtype not only the shape

and then insert the array where you need it:

result[:a.shape[0],:a.shape[1]] = a

and voila you have padded it:

print(result)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

You can also make it a bit more general if you define where your upper left element should be inserted

result = np.zeros_like(b)
x_offset = 1  # 0 would be what you wanted
y_offset = 1  # 0 in your case
result[x_offset:a.shape[0]+x_offset,y_offset:a.shape[1]+y_offset] = a
result

array([[ 0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.],
       [ 0.,  1.,  1.,  1.,  1.,  1.]])

but then be careful that you don’t have offsets bigger than allowed. For x_offset = 2 for example this will fail.


If you have an arbitary number of dimensions you can define a list of slices to insert the original array. I’ve found it interesting to play around a bit and created a padding function that can pad (with offset) an arbitary shaped array as long as the array and reference have the same number of dimensions and the offsets are not too big.

def pad(array, reference, offsets):
    """
    array: Array to be padded
    reference: Reference array with the desired shape
    offsets: list of offsets (number of elements must be equal to the dimension of the array)
    """
    # Create an array of zeros with the reference shape
    result = np.zeros(reference.shape)
    # Create a list of slices from offset to offset + shape in each dimension
    insertHere = [slice(offset[dim], offset[dim] + array.shape[dim]) for dim in range(a.ndim)]
    # Insert the array in the result at the specified offsets
    result[insertHere] = a
    return result

And some test cases:

import numpy as np

# 1 Dimension
a = np.ones(2)
b = np.ones(5)
offset = [3]
pad(a, b, offset)

# 3 Dimensions

a = np.ones((3,3,3))
b = np.ones((5,4,3))
offset = [1,0,0]
pad(a, b, offset)

回答 1

NumPy 1.7.0(numpy.pad添加时)现在已经很老了(它于2013年发布),因此即使问题要求使用不使用该功能的方法,我也认为了解使用可以实现该功能很有用numpy.pad

实际上很简单:

>>> import numpy as np
>>> a = np.array([[ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.]])
>>> np.pad(a, [(0, 1), (0, 1)], mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

在这种情况下,我使用0的默认值mode='constant'。但是也可以通过显式传递它来指定它:

>>> np.pad(a, [(0, 1), (0, 1)], mode='constant', constant_values=0)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

以防第二个参数([(0, 1), (0, 1)])令人困惑:每个列表项(在本例中为元组)都对应于一个维度,并且其中的每个项都表示(第一个元素)之前之后(第二个元素)的填充。所以:

[(0, 1), (0, 1)]
         ^^^^^^------ padding for second dimension
 ^^^^^^-------------- padding for first dimension

  ^------------------ no padding at the beginning of the first axis
     ^--------------- pad with one "value" at the end of the first axis.

在这种情况下,第一轴和第二轴的填充相同,因此也可以只传入2元组:

>>> np.pad(a, (0, 1), mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

如果前后的填充相同,甚至可以省略该元组(尽管在这种情况下不适用):

>>> np.pad(a, 1, mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]])

或者,如果前后的填充相同但轴的填充不同,则也可以在内部元组中省略第二个参数:

>>> np.pad(a, [(1, ), (2, )], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

但是我倾向于始终使用显式的,因为这样做很容易犯错(当NumPys的期望与您的意图有所不同时):

>>> np.pad(a, [1, 2], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

在这里,NumPy认为您希望在每个轴前填充1个元素,在每个轴后填充2个元素!即使您打算用轴1中的1个元素和轴2中的2个元素填充。

我使用元组列表进行填充,请注意,这只是“我的约定”,您也可以使用列表列表或元组的元组,甚至数组的元组。NumPy只是检查参数的长度(如果没有长度)和每个项目的长度(或者如果有长度)!

NumPy 1.7.0 (when numpy.pad was added) is pretty old now (it was released in 2013) so even though the question asked for a way without using that function I thought it could be useful to know how that could be achieved using numpy.pad.

It’s actually pretty simple:

>>> import numpy as np
>>> a = np.array([[ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.],
...               [ 1.,  1.,  1.,  1.,  1.]])
>>> np.pad(a, [(0, 1), (0, 1)], mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In this case I used that 0 is the default value for mode='constant'. But it could also be specified by passing it in explicitly:

>>> np.pad(a, [(0, 1), (0, 1)], mode='constant', constant_values=0)
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

Just in case the second argument ([(0, 1), (0, 1)]) seems confusing: Each list item (in this case tuple) corresponds to a dimension and item therein represents the padding before (first element) and after (second element). So:

[(0, 1), (0, 1)]
         ^^^^^^------ padding for second dimension
 ^^^^^^-------------- padding for first dimension

  ^------------------ no padding at the beginning of the first axis
     ^--------------- pad with one "value" at the end of the first axis.

In this case the padding for the first and second axis are identical, so one could also just pass in the 2-tuple:

>>> np.pad(a, (0, 1), mode='constant')
array([[ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.]])

In case the padding before and after is identical one could even omit the tuple (not applicable in this case though):

>>> np.pad(a, 1, mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.]])

Or if the padding before and after is identical but different for the axis, you could also omit the second argument in the inner tuples:

>>> np.pad(a, [(1, ), (2, )], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

However I tend to prefer to always use the explicit one, because it’s just to easy to make mistakes (when NumPys expectations differ from your intentions):

>>> np.pad(a, [1, 2], mode='constant')
array([[ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  1.,  1.,  1.,  1.,  1.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.],
       [ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.]])

Here NumPy thinks you wanted to pad all axis with 1 element before and 2 elements after each axis! Even if you intended it to pad with 1 element in axis 1 and 2 elements for axis 2.

I used lists of tuples for the padding, note that this is just “my convention”, you could also use lists of lists or tuples of tuples, or even tuples of arrays. NumPy just checks the length of the argument (or if it doesn’t have a length) and the length of each item (or if it has a length)!


回答 2

我了解您的主要问题是您需要计算,d=b-a但数组的大小不同。无需中间填充c

您可以解决此问题而无需填充:

import numpy as np

a = np.array([[ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.]])

b = np.array([[ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.]])

d = b.copy()
d[:a.shape[0],:a.shape[1]] -=  a

print d

输出:

[[ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 3.  3.  3.  3.  3.  3.]]

I understand that your main problem is that you need to calculate d=b-a but your arrays have different sizes. There is no need for an intermediate padded c

You can solve this without padding:

import numpy as np

a = np.array([[ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.],
              [ 1.,  1.,  1.,  1.,  1.]])

b = np.array([[ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.],
              [ 3.,  3.,  3.,  3.,  3.,  3.]])

d = b.copy()
d[:a.shape[0],:a.shape[1]] -=  a

print d

Output:

[[ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 2.  2.  2.  2.  2.  3.]
 [ 3.  3.  3.  3.  3.  3.]]

回答 3

如果需要向数组添加1s的范围:

>>> mat = np.zeros((4,4), np.int32)
>>> mat
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])
>>> mat[0,:] = mat[:,0] = mat[:,-1] =  mat[-1,:] = 1
>>> mat
array([[1, 1, 1, 1],
       [1, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 1, 1]])

In case you need to add a fence of 1s to an array:

>>> mat = np.zeros((4,4), np.int32)
>>> mat
array([[0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0],
       [0, 0, 0, 0]])
>>> mat[0,:] = mat[:,0] = mat[:,-1] =  mat[-1,:] = 1
>>> mat
array([[1, 1, 1, 1],
       [1, 0, 0, 1],
       [1, 0, 0, 1],
       [1, 1, 1, 1]])

回答 4

我知道我有点晚了,但是如果您想执行相对填充(aka边缘填充),可以通过以下方法实现它。请注意,分配的第一个实例将导致零填充,因此您可以将其用于零填充和相对填充(这是将原始数组的边值复制到填充数组中的地方)。

def replicate_padding(arr):
    """Perform replicate padding on a numpy array."""
    new_pad_shape = tuple(np.array(arr.shape) + 2) # 2 indicates the width + height to change, a (512, 512) image --> (514, 514) padded image.
    padded_array = np.zeros(new_pad_shape) #create an array of zeros with new dimensions
    
    # perform replication
    padded_array[1:-1,1:-1] = arr        # result will be zero-pad
    padded_array[0,1:-1] = arr[0]        # perform edge pad for top row
    padded_array[-1, 1:-1] = arr[-1]     # edge pad for bottom row
    padded_array.T[0, 1:-1] = arr.T[0]   # edge pad for first column
    padded_array.T[-1, 1:-1] = arr.T[-1] # edge pad for last column
    
    #at this point, all values except for the 4 corners should have been replicated
    padded_array[0][0] = arr[0][0]     # top left corner
    padded_array[-1][0] = arr[-1][0]   # bottom left corner
    padded_array[0][-1] = arr[0][-1]   # top right corner 
    padded_array[-1][-1] = arr[-1][-1] # bottom right corner

    return padded_array

复杂度分析:

对此的最佳解决方案是numpy的pad方法。在平均运行5次之后,具有相对填充的np.pad仅8%比上面定义的函数好。这表明这是相对填充和零填充的最佳方法。


#My method, replicate_padding
start = time.time()
padded = replicate_padding(input_image)
end = time.time()
delta0 = end - start

#np.pad with edge padding
start = time.time()
padded = np.pad(input_image, 1, mode='edge')
end = time.time()
delta = end - start


print(delta0) # np Output: 0.0008790493011474609 
print(delta)  # My Output: 0.0008130073547363281
print(100*((delta0-delta)/delta)) # Percent difference: 8.12316715542522%

I know I’m a bit late to this, but in case you wanted to perform relative padding (aka edge padding), here’s how you can implement it. Note that the very first instance of assignment results in zero-padding, so you can use this for both zero-padding and relative padding (this is where you copy the edge values of the original array into the padded array).

def replicate_padding(arr):
    """Perform replicate padding on a numpy array."""
    new_pad_shape = tuple(np.array(arr.shape) + 2) # 2 indicates the width + height to change, a (512, 512) image --> (514, 514) padded image.
    padded_array = np.zeros(new_pad_shape) #create an array of zeros with new dimensions
    
    # perform replication
    padded_array[1:-1,1:-1] = arr        # result will be zero-pad
    padded_array[0,1:-1] = arr[0]        # perform edge pad for top row
    padded_array[-1, 1:-1] = arr[-1]     # edge pad for bottom row
    padded_array.T[0, 1:-1] = arr.T[0]   # edge pad for first column
    padded_array.T[-1, 1:-1] = arr.T[-1] # edge pad for last column
    
    #at this point, all values except for the 4 corners should have been replicated
    padded_array[0][0] = arr[0][0]     # top left corner
    padded_array[-1][0] = arr[-1][0]   # bottom left corner
    padded_array[0][-1] = arr[0][-1]   # top right corner 
    padded_array[-1][-1] = arr[-1][-1] # bottom right corner

    return padded_array

Complexity Analysis:

The optimal solution for this is numpy’s pad method. After averaging for 5 runs, np.pad with relative padding is only 8% better than the function defined above. This shows that this is fairly an optimal method for relative and zero-padding padding.


#My method, replicate_padding
start = time.time()
padded = replicate_padding(input_image)
end = time.time()
delta0 = end - start

#np.pad with edge padding
start = time.time()
padded = np.pad(input_image, 1, mode='edge')
end = time.time()
delta = end - start


print(delta0) # np Output: 0.0008790493011474609 
print(delta)  # My Output: 0.0008130073547363281
print(100*((delta0-delta)/delta)) # Percent difference: 8.12316715542522%

Windows Scipy安装:未找到Lapack / Blas资源

问题:Windows Scipy安装:未找到Lapack / Blas资源

我正在尝试在64位Windows 7桌面上安装python和一系列软件包。我已经安装了Python 3.4,已经安装了Microsoft Visual Studio C ++,并且已经成功安装了numpy,pandas和其他一些软件。尝试安装scipy时出现以下错误;

numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

我正在离线使用pip install,我正在使用的安装命令是;

pip install --no-index --find-links="S:\python\scipy 0.15.0" scipy

我已经阅读了有关要求编译器的信息,如果我正确理解的话,则是VS C ++编译器。我正在使用2010版本,就像在使用Python 3.4。这对于其他软件包也有效。

我必须使用窗口二进制文件还是有办法让pip安装正常工作?

非常感谢您的帮助

I am trying to install python and a series of packages onto a 64bit windows 7 desktop. I have installed Python 3.4, have Microsoft Visual Studio C++ installed, and have successfully installed numpy, pandas and a few others. I am getting the following error when trying to install scipy;

numpy.distutils.system_info.NotFoundError: no lapack/blas resources found

I am using pip install offline, the install command I am using is;

pip install --no-index --find-links="S:\python\scipy 0.15.0" scipy

I have read the posts on here about requiring a compiler which if I understand correctly is the VS C++ compiler. I am using the 2010 version as I am using Python 3.4. This has worked for other packages.

Do I have to use the window binary or is there a way I can get pip install to work?

Many thanks for the help


回答 0

此处介绍了在Windows 7 64位系统上不安装SciPy的BLAS / LAPACK库的解决方案:

http://www.scipy.org/scipylib/building/windows.html

安装Anaconda会容易得多,但是如果不付费就无法获得Intel MKL或GPU的支持(它们在MKL Optimizations和Anaconda的加速附件中-我不确定他们是否同时使用PLASMA和MAGMA) 。通过MKL优化,numpy在大型矩阵计算上的性能优于IDL十倍。MATLAB内部使用Intel MKL库并支持GPU计算,因此如果他们是学生,则不妨将其作为价格使用(MATLAB为50美元,并行计算工具箱为10美元)。如果您获得了Intel Parallel Studio的免费试用版,它将附带MKL库以及C ++和FORTRAN编译器,如果您想在Windows上从MKL或ATLAS安装BLAS和LAPACK,它们将派上用场:

http://icl.cs.utk.edu/lapack-for-windows/lapack/

Parallel Studio还带有Intel MPI库,可用于群集计算应用程序及其最新的Xeon处理器。尽管使用MKL优化来构建BLAS和LAPACK的过程并非易事,但针对Python和R这样做的好处却是巨大的,如以下英特尔网络研讨会所述:

https://software.intel.com/zh-CN/articles/powered-by-mkl-accelerating-numpy-and-scipy-performance-with-intel-mkl-python

Anaconda和Enthought通过使此功能和其他一些事情更易于部署而建立了业务。但是,愿意做一点工作(一点学习)的人可以免费使用它。

对于那些谁使用R,你现在可以得到优化MKL BLAS和LAPACK免费使用R打开从革命Analytics(分析)。

编辑:Anaconda Python现在附带MKL优化,以及通过Intel Python发行版对许多其他Intel库优化的支持。但是,Accelerate库(以前称为NumbaPro)中对Anaconda的GPU支持仍然超过1万美元!最好的替代方法可能是PyCUDA和scikit-cuda,因为铜头鱼(基本上是Anaconda Accelerate的免费版本)不幸在五年前停止开发。如果有人想在他们离开的地方接机,可以在这里找到。

The solution to the absence of BLAS/LAPACK libraries for SciPy installations on Windows 7 64-bit is described here:

http://www.scipy.org/scipylib/building/windows.html

Installing Anaconda is much easier, but you still don’t get Intel MKL or GPU support without paying for it (they are in the MKL Optimizations and Accelerate add-ons for Anaconda – I’m not sure if they use PLASMA and MAGMA either). With MKL optimization, numpy has outperformed IDL on large matrix computations by 10-fold. MATLAB uses the Intel MKL library internally and supports GPU computing, so one might as well use that for the price if they’re a student ($50 for MATLAB + $10 for the Parallel Computing Toolbox). If you get the free trial of Intel Parallel Studio, it comes with the MKL library, as well as C++ and FORTRAN compilers that will come in handy if you want to install BLAS and LAPACK from MKL or ATLAS on Windows:

http://icl.cs.utk.edu/lapack-for-windows/lapack/

Parallel Studio also comes with the Intel MPI library, useful for cluster computing applications and their latest Xeon processsors. While the process of building BLAS and LAPACK with MKL optimization is not trivial, the benefits of doing so for Python and R are quite large, as described in this Intel webinar:

https://software.intel.com/en-us/articles/powered-by-mkl-accelerating-numpy-and-scipy-performance-with-intel-mkl-python

Anaconda and Enthought have built businesses out of making this functionality and a few other things easier to deploy. However, it is freely available to those willing to do a little work (and a little learning).

For those who use R, you can now get MKL optimized BLAS and LAPACK for free with R Open from Revolution Analytics.

EDIT: Anaconda Python now ships with MKL optimization, as well as support for a number of other Intel library optimizations through the Intel Python distribution. However, GPU support for Anaconda in the Accelerate library (formerly known as NumbaPro) is still over $10k USD! The best alternatives for that are probably PyCUDA and scikit-cuda, as copperhead (essentially a free version of Anaconda Accelerate) unfortunately ceased development five years ago. It can be found here if anybody wants to pick up where they left off.


回答 1

以下链接应解决Windows和SciPy的所有问题;只需选择适当的下载即可。我能够毫无问题地安装该软件包。我尝试过的所有其他解决方案都让我头疼。

来源:http : //www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

命令:

 pip install [Local File Location]\[Your specific file such as scipy-0.16.0-cp27-none-win_amd64.whl]

假定您已经安装了以下软件:

  1. 使用Python工具安装Visual Studio 2015/2013
    (在2015年安装时已集成到安装选项中)

  2. 安装用于Python的Visual Studio C ++编译器
    来源:http : //www.microsoft.com/zh-cn/download/details.aspx?id=44266
    文件名:VCForPython27.msi

  3. 选择安装的Python版本
    来源:python.org
    文件名(例如):python-2.7.10.amd64.msi

The following link should solve all problems with Windows and SciPy; just choose the appropriate download. I was able to pip install the package with no problems. Every other solution I have tried gave me big headaches.

Source: http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Command:

 pip install [Local File Location]\[Your specific file such as scipy-0.16.0-cp27-none-win_amd64.whl]

This assumes you have installed the following already:

  1. Install Visual Studio 2015/2013 with Python Tools
    (Is integrated into the setup options on install of 2015)

  2. Install Visual Studio C++ Compiler for Python
    Source: http://www.microsoft.com/en-us/download/details.aspx?id=44266
    File Name: VCForPython27.msi

  3. Install Python Version of choice
    Source: python.org
    File Name (e.g.): python-2.7.10.amd64.msi


回答 2

我的python版本是2.7.10,64位Windows 7。

  1. scipy-0.18.0-cp27-cp27m-win_amd64.whl从下载http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  2. 打开 cmd
  3. 确保scipy-0.18.0-cp27-cp27m-win_amd64.whl位于cmd当前目录中,然后键入pip install scipy-0.18.0-cp27-cp27m-win_amd64.whl

将成功安装。

My python’s version is 2.7.10, 64-bits Windows 7.

  1. Download scipy-0.18.0-cp27-cp27m-win_amd64.whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  2. Open cmd
  3. Make sure scipy-0.18.0-cp27-cp27m-win_amd64.whl is in cmd‘s current directory, then type pip install scipy-0.18.0-cp27-cp27m-win_amd64.whl.

It will be successful installed.


回答 3

抱歉necro,但这是第一个Google搜索结果。这是为我工作的解决方案:

  1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy下载numpy + mkl滚轮 。使用与python版本相同的版本(使用python -V检查)。例如。如果您的python是3.5.2,请下载显示cp35的转盘

  2. 打开命令提示符,然后导航到下载滚轮的文件夹。运行命令:pip install [wheel文件名]

  3. 从以下网址下载SciPy滚轮:http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy (类似于上述步骤)。

  4. 如上所述,pip install [wheel的文件名]

Sorry to necro, but this is the first google search result. This is the solution that worked for me:

  1. Download numpy+mkl wheel from http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy. Use the version that is the same as your python version (check using python -V). Eg. if your python is 3.5.2, download the wheel which shows cp35

  2. Open command prompt and navigate to the folder where you downloaded the wheel. Run the command: pip install [file name of wheel]

  3. Download the SciPy wheel from: http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy (similar to the step above).

  4. As above, pip install [file name of wheel]


回答 4

这是我一切正常的顺序。第二点是最重要的。科学需要Numpy+MKL,而不仅仅是香草Numpy

  1. 安装python 3.5
  2. pip install "file path"(从此处http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy下载Numpy + MKL轮子)
  3. pip install scipy

This was the order I got everything working. The second point is the most important one. Scipy needs Numpy+MKL, not just vanilla Numpy.

  1. Install python 3.5
  2. pip install "file path" (download Numpy+MKL wheel from here http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy)
  3. pip install scipy

回答 5

如果您使用的是Windows和Visual Studio 2015

输入以下命令

  • “康达安装numpy的”
  • “康达安装熊猫”
  • “ conda安装scipy”

If you are working with Windows and Visual Studio 2015

Enter the following commands

  • “conda install numpy”
  • “conda install pandas”
  • “conda install scipy”

回答 6

我的5美分;您可以从https://github.com/scipy/scipy/releases安装整个(预编译的)SciPy

祝好运!

My 5 cents; You can just install the entire (pre-compiled) SciPy from https://github.com/scipy/scipy/releases

Good Luck!


回答 7

在Windows中简单快速地安装Scipy

  1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy下载适用于您的Python版本的正确Scipy软件包(例如,适用于python 3.5和Windows x64的正确软件包scipy-0.19.1-cp35-cp35m-win_amd64.whl)。
  2. cmd在包含下载的Scipy软件包的目录中打开。
  3. 键入pip install <<your-scipy-package-name>>(例如pip install scipy-0.19.1-cp35-cp35m-win_amd64.whl)。

Simple and Fast Installation of Scipy in Windows

  1. From http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy download the correct Scipy package for your Python version (e.g. the correct package for python 3.5 and Windows x64 is scipy-0.19.1-cp35-cp35m-win_amd64.whl).
  2. Open cmd inside the directory containing the downloaded Scipy package.
  3. Type pip install <<your-scipy-package-name>> (e.g. pip install scipy-0.19.1-cp35-cp35m-win_amd64.whl).

回答 8

对于 python27 1,安装numpy + mkl(下载链接:http : //www.lfd.uci.edu/~gohlke/pythonlibs/)2,安装scipy(在同一站点)OK!

For python27 1、Install numpy + mkl(download link:http://www.lfd.uci.edu/~gohlke/pythonlibs/) 2、install scipy (the same site) OK!


回答 9

英特尔现在免费提供用于Linux / Windows / OS X的Python发行版,称为“ 英特尔Python发行版 ”。

它是一个完整的Python发行版(例如,软件包中包含python.exe),其中包括一些根据Intel的MKL(数学内核库)编译的预安装模块,因此针对更快的性能进行了优化。

发行版包括模块NumPy,SciPy,scikit-learn,pandas,matplotlib,Numba,tbb,pyDAAL,Jupyter等。缺点是升级到最新版本的Python有点晚。例如,从今天(2017年5月1日)开始,发行版提供CPython 3.5,而3.6版本已经发布。但是,如果您不需要这些新功能,则应该很好。

Intel now provides a Python distribution for Linux / Windows / OS X for free called “Intel distribution for Python“.

Its a complete Python distribution (e.g. python.exe is included in the package) which includes some pre-installed modules compiled against Intel’s MKL (Math Kernel Library) and thus optimized for faster performance.

The distribution includes the modules NumPy, SciPy, scikit-learn, pandas, matplotlib, Numba, tbb, pyDAAL, Jupyter, and others. The drawback is a bit of lateness in upgrading to more recent versions of Python. For example as of today (1 May 2017) the distribution provides CPython 3.5 while the 3.6 version is already out. But if you don’t need the new features they should be perfectly fine.


回答 10

安装scikit-fuzzy时我也遇到了同样的错误。我解决了如下错误:

  1. 安装Whl文件Numpy
  2. 安装Scipy,再次是whl文件

根据python版本选择文件,例如python3的amd64和python27的其他win32文件

  1. 然后 pip install --user skfuzzy

我希望,它将为您工作

I was also getting same error while installing scikit-fuzzy. I resolved error as follows:

  1. Install Numpy, a whl file
  2. Install Scipy, again a whl file

choose file according to python version like amd64 for python3 and other win32 file for the python27

  1. then pip install --user skfuzzy

I hope, It will work for you


回答 11

解决方案:

  1. 如许多答案中所指定,请从http://www.lfd.uci.edu/~gohlke/pythonlibs/下载NumPySciPy whl 并安装

    pip install <whl_location>
  2. 从源代码构建BLAS / LAPACK

  3. 使用Miniconda

参考:

  1. ScikitLearn安装
  2. 为scipy安装BLAS和LAPACK的最简单方法?

Solutions:

  1. As specified in many answers, download NumPy and SciPy whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/ and install with

    pip install <whl_location>
    
  2. Building BLAS/LAPACK from source

  3. Using Miniconda.

Refer:

  1. ScikitLearn Installation
  2. Easiest way to install BLAS and LAPACK for scipy?

回答 12

使用http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy上的资源 可以解决此问题。但是,您应该注意版本兼容性。经过几次尝试,最终我决定卸载python,然后与numpy一起安装了新版本的python,然后安装了scipy,这解决了我的问题。

Using resources at http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy will solve the problem. However, you should be careful about versions compatibility. After trying for several times, finally I decided to uninstall python and then installed a fresh version of python along with numpy and then installed scipy and this resolved my problem.


回答 13

安装python的intel发行版https://software.intel.com/zh-cn/intel-distribution-for-python

更好的python发行版应首先包含它们

install intel’s distribution of python https://software.intel.com/en-us/intel-distribution-for-python

better of for distribution of python should contain them initially


回答 14

这样做,它为我解决了 pip install -U scikit-learn

do this, it solved for me pip install -U scikit-learn


如何检查numpy数组是否为空?

问题:如何检查numpy数组是否为空?

如何检查numpy数组是否为空?

我使用了以下代码,但是如果数组包含零,则此操作将失败。

if not self.Definition.all():

这是解决方案吗?

if self.Definition == array( [] ):

How can I check whether a numpy array is empty or not?

I used the following code, but this fails if the array contains a zero.

if not self.Definition.all():

Is this the solution?

if self.Definition == array( [] ):

回答 0

您可以随时查看.size属性。它定义为一个整数,并且0在数组中没有元素时为零():

import numpy as np
a = np.array([])

if a.size == 0:
    # Do something when `a` is empty

You can always take a look at the .size attribute. It is defined as an integer, and is zero (0) when there are no elements in the array:

import numpy as np
a = np.array([])

if a.size == 0:
    # Do something when `a` is empty

回答 1

http://www.scipy.org/Tentative_NumPy_Tutorial#head-6a1bc005bd80e1b19f812e1e64e0d25d50f99fe2

NumPy的主要对象是齐次多维数组。在Numpy中,尺寸称为轴。轴数为等级。Numpy的数组类称为ndarray。别名数组也知道它。ndarray对象的更重要的属性是:

ndarray.ndim
数组的轴数(尺寸)。在Python世界中,维数称为等级。

ndarray.shape
数组的尺寸。这是一个整数元组,指示每个维度中数组的大小。对于具有n行和m列的矩阵,形状将为(n,m)。因此,形状元组的长度为ndim的等级或维数。

ndarray.size
数组元素的总数。这等于形状元素的乘积。

http://www.scipy.org/Tentative_NumPy_Tutorial#head-6a1bc005bd80e1b19f812e1e64e0d25d50f99fe2

NumPy’s main object is the homogeneous multidimensional array. In Numpy dimensions are called axes. The number of axes is rank. Numpy’s array class is called ndarray. It is also known by the alias array. The more important attributes of an ndarray object are:

ndarray.ndim
the number of axes (dimensions) of the array. In the Python world, the number of dimensions is referred to as rank.

ndarray.shape
the dimensions of the array. This is a tuple of integers indicating the size of the array in each dimension. For a matrix with n rows and m columns, shape will be (n,m). The length of the shape tuple is therefore the rank, or number of dimensions, ndim.

ndarray.size
the total number of elements of the array. This is equal to the product of the elements of shape.


回答 2

不过,请注意。请注意,np.array(None).size返回1!这是因为a.size 等于 np.prod(a.shape),np.array(None).shape是(),空乘积是1。

>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0

因此,我使用以下命令测试numpy数组是否具有元素:

>>> def elements(array):
    ...     return array.ndim and array.size

>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24

One caveat, though. Note that np.array(None).size returns 1! This is because a.size is equivalent to np.prod(a.shape), np.array(None).shape is (), and an empty product is 1.

>>> import numpy as np
>>> np.array(None).size
1
>>> np.array(None).shape
()
>>> np.prod(())
1.0

Therefore, I use the following to test if a numpy array has elements:

>>> def elements(array):
    ...     return array.ndim and array.size

>>> elements(np.array(None))
0
>>> elements(np.array([]))
0
>>> elements(np.zeros((2,3,4)))
24

回答 3

我们为什么要检查数组是否为empty?数组不会像列表一样增长或缩小。从“空”数组开始,然后与之一起成长np.append是一个常见的新手错误。

使用列表if alist:取决于其布尔值:

In [102]: bool([])                                                                       
Out[102]: False
In [103]: bool([1])                                                                      
Out[103]: True

但是尝试对数组执行相同操作会产生(在版本1.18中):

In [104]: bool(np.array([]))                                                             
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value 
   of an empty array is ambiguous. Returning False, but in 
   future this will result in an error. Use `array.size > 0` to 
   check that an array is not empty.
  #!/usr/bin/python3
Out[104]: False

In [105]: bool(np.array([1]))                                                            
Out[105]: True

bool(np.array([1,2])产生臭名昭著的歧义错误。

Why would we want to check if an array is empty? Arrays don’t grow or shrink in the same that lists do. Starting with a ’empty’ array, and growing with np.append is a frequent novice error.

Using a list in if alist: hinges on its boolean value:

In [102]: bool([])                                                                       
Out[102]: False
In [103]: bool([1])                                                                      
Out[103]: True

But trying to do the same with an array produces (in version 1.18):

In [104]: bool(np.array([]))                                                             
/usr/local/bin/ipython3:1: DeprecationWarning: The truth value 
   of an empty array is ambiguous. Returning False, but in 
   future this will result in an error. Use `array.size > 0` to 
   check that an array is not empty.
  #!/usr/bin/python3
Out[104]: False

In [105]: bool(np.array([1]))                                                            
Out[105]: True

and bool(np.array([1,2]) produces the infamous ambiguity error.


numpy.where()详细的逐步说明/示例

问题:numpy.where()详细的逐步说明/示例

numpy.where()尽管阅读了文档这篇文章一篇文章,但我仍然无法正确理解。

有人可以提供有关1D和2D阵列的分步注释示例吗?

I have trouble properly understanding numpy.where() despite reading the doc, this post and this other post.

Can someone provide step-by-step commented examples with 1D and 2D arrays?


回答 0

摆弄了一段时间后,我发现了问题,并将它们发布在这里,希望对其他人有所帮助。

直观地,np.where就像问“ 告诉我这个数组中的位置满足给定条件 ”。

>>> a = np.arange(5,10)
>>> np.where(a < 8)       # tell me where in a, entries are < 8
(array([0, 1, 2]),)       # answer: entries indexed by 0, 1, 2

它也可以用于获取满足条件的数组中的条目:

>>> a[np.where(a < 8)] 
array([5, 6, 7])          # selects from a entries 0, 1, 2

a是2d数组时,np.where()返回行idx的数组和col idx的数组:

>>> a = np.arange(4,10).reshape(2,3)
array([[4, 5, 6],
       [7, 8, 9]])
>>> np.where(a > 8)
(array(1), array(2))

与1d情况一样,我们可以np.where()用来获取2d数组中满足条件的条目:

>>> a[np.where(a > 8)] # selects from a entries 0, 1, 2

数组([9])


注意,当a为1d时,np.where()仍返回行idx的数组和col idx的数组,但是列的长度为1,因此后者为空数组。

After fiddling around for a while, I figured things out, and am posting them here hoping it will help others.

Intuitively, np.where is like asking “tell me where in this array, entries satisfy a given condition“.

>>> a = np.arange(5,10)
>>> np.where(a < 8)       # tell me where in a, entries are < 8
(array([0, 1, 2]),)       # answer: entries indexed by 0, 1, 2

It can also be used to get entries in array that satisfy the condition:

>>> a[np.where(a < 8)] 
array([5, 6, 7])          # selects from a entries 0, 1, 2

When a is a 2d array, np.where() returns an array of row idx’s, and an array of col idx’s:

>>> a = np.arange(4,10).reshape(2,3)
array([[4, 5, 6],
       [7, 8, 9]])
>>> np.where(a > 8)
(array(1), array(2))

As in the 1d case, we can use np.where() to get entries in the 2d array that satisfy the condition:

>>> a[np.where(a > 8)] # selects from a entries 0, 1, 2

array([9])


Note, when a is 1d, np.where() still returns an array of row idx’s and an array of col idx’s, but columns are of length 1, so latter is empty array.


回答 1

这里有点有趣。我发现NumPy通常会完全按照我的意愿去做-有时候,我尝试一些事情比阅读文档要快。实际上两者都是最好的。

我认为您的回答很好(如果愿意,可以接受)。这仅仅是“额外”。

import numpy as np

a = np.arange(4,10).reshape(2,3)

wh = np.where(a>7)
gt = a>7
x  = np.where(gt)

print "wh: ", wh
print "gt: ", gt
print "x:  ", x

给出:

wh:  (array([1, 1]), array([1, 2]))
gt:  [[False False False]
      [False  True  True]]
x:   (array([1, 1]), array([1, 2]))

…但是:

print "a[wh]: ", a[wh]
print "a[gt]  ", a[gt]
print "a[x]:  ", a[x]

给出:

a[wh]:  [8 9]
a[gt]   [8 9]
a[x]:   [8 9]

Here is a little more fun. I’ve found that very often NumPy does exactly what I wish it would do – sometimes it’s faster for me to just try things than it is to read the docs. Actually a mixture of both is best.

I think your answer is fine (and it’s OK to accept it if you like). This is just “extra”.

import numpy as np

a = np.arange(4,10).reshape(2,3)

wh = np.where(a>7)
gt = a>7
x  = np.where(gt)

print "wh: ", wh
print "gt: ", gt
print "x:  ", x

gives:

wh:  (array([1, 1]), array([1, 2]))
gt:  [[False False False]
      [False  True  True]]
x:   (array([1, 1]), array([1, 2]))

… but:

print "a[wh]: ", a[wh]
print "a[gt]  ", a[gt]
print "a[x]:  ", a[x]

gives:

a[wh]:  [8 9]
a[gt]   [8 9]
a[x]:   [8 9]

如何从生成器构建numpy数组?

问题:如何从生成器构建numpy数组?

如何从生成器对象构建numpy数组?

让我说明一下这个问题:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

在这种情况下,gimme()是我想将其输出转换为数组的生成器。但是,数组构造函数不会迭代生成器,它只是存储生成器本身。我想要的行为是from的numpy.array(list(gimme())),但是我不想支付同时拥有中间列表和最终数组的内存开销。有没有更节省空间的方法?

How can I build a numpy array out of a generator object?

Let me illustrate the problem:

>>> import numpy
>>> def gimme():
...   for x in xrange(10):
...     yield x
...
>>> gimme()
<generator object at 0x28a1758>
>>> list(gimme())
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
>>> numpy.array(xrange(10))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> numpy.array(gimme())
array(<generator object at 0x28a1758>, dtype=object)
>>> numpy.array(list(gimme()))
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In this instance, gimme() is the generator whose output I’d like to turn into an array. However, the array constructor does not iterate over the generator, it simply stores the generator itself. The behaviour I desire is that from numpy.array(list(gimme())), but I don’t want to pay the memory overhead of having the intermediate list and the final array in memory at the same time. Is there a more space-efficient way?


回答 0

与python列表不同,numpy数组要求在创建时明确设置其长度。这是必需的,以便可以在内存中连续分配每个项目的空间。连续分配是numpy数组的关键特性:此方法与本机代码实现相结合,使对它们的操作比常规列表执行得快得多。

牢记这一点,从技术上讲,不可能将生成器对象转换为数组,除非您执行以下任一操作:

  1. 可以预测运行时将产生多少个元素:

    my_array = numpy.empty(predict_length())
    for i, el in enumerate(gimme()): my_array[i] = el
  2. 愿意将其元素存储在中间列表中:

    my_array = numpy.array(list(gimme()))
  3. 可以制作两个相同的生成器,遍历第一个生成器以找到总长度,初始化数组,然后再次遍历生成器以查找每个元素:

    length = sum(1 for el in gimme())
    my_array = numpy.empty(length)
    for i, el in enumerate(gimme()): my_array[i] = el

1可能是您要寻找的。2是空间效率低下的,而3是时间效率低下的(您必须两次通过生成器)。

Numpy arrays require their length to be set explicitly at creation time, unlike python lists. This is necessary so that space for each item can be consecutively allocated in memory. Consecutive allocation is the key feature of numpy arrays: this combined with native code implementation let operations on them execute much quicker than regular lists.

Keeping this in mind, it is technically impossible to take a generator object and turn it into an array unless you either:

  1. can predict how many elements it will yield when run:

    my_array = numpy.empty(predict_length())
    for i, el in enumerate(gimme()): my_array[i] = el
    
  2. are willing to store its elements in an intermediate list :

    my_array = numpy.array(list(gimme()))
    
  3. can make two identical generators, run through the first one to find the total length, initialize the array, and then run through the generator again to find each element:

    length = sum(1 for el in gimme())
    my_array = numpy.empty(length)
    for i, el in enumerate(gimme()): my_array[i] = el
    

1 is probably what you’re looking for. 2 is space inefficient, and 3 is time inefficient (you have to go through the generator twice).


回答 1

这个stackoverflow结果背后的一个Google,我发现有一个numpy.fromiter(data, dtype, count)。默认值count=-1从可迭代中获取所有元素。它需要dtype明确设置。就我而言,这可行:

numpy.fromiter(something.generate(from_this_input), float)

One google behind this stackoverflow result, I found that there is a numpy.fromiter(data, dtype, count). The default count=-1 takes all elements from the iterable. It requires a dtype to be set explicitly. In my case, this worked:

numpy.fromiter(something.generate(from_this_input), float)


回答 2

虽然可以使用生成器创建一维数组numpy.fromiter(),但可以使用生成器创建ND数组numpy.stack

>>> mygen = (np.ones((5, 3)) for _ in range(10))
>>> x = numpy.stack(mygen)
>>> x.shape
(10, 5, 3)

它也适用于一维数组:

>>> numpy.stack(2*i for i in range(10))
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

请注意,这numpy.stack在内部消耗了生成器并使用创建中间列表arrays = [asanyarray(arr) for arr in arrays]。可以在这里找到实现。

While you can create a 1D array from a generator with numpy.fromiter(), you can create an N-D array from a generator with numpy.stack:

>>> mygen = (np.ones((5, 3)) for _ in range(10))
>>> x = numpy.stack(mygen)
>>> x.shape
(10, 5, 3)

It also works for 1D arrays:

>>> numpy.stack(2*i for i in range(10))
array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18])

Note that numpy.stack is internally consuming the generator and creating an intermediate list with arrays = [asanyarray(arr) for arr in arrays]. The implementation can be found here.

[WARNING] As pointed out by @Joseh Seedy, Numpy 1.16 raises a warning that defeats usage of such function with generators.


回答 3

有点切线,但是如果生成器是列表理解器,则可以numpy.where用来更有效地获取结果(我在看完这篇文章后在自己的代码中发现了此结果)

Somewhat tangential, but if your generator is a list comprehension, you can use numpy.where to more effectively get your result (I discovered this in my own code after seeing this post)


回答 4

vstackhstackdstack功能可以作为输入的生成器,其产生多维数组。

The vstack, hstack, and dstack functions can take as input generators that yield multi-dimensional arrays.


将多个列表放入数据框

问题:将多个列表放入数据框

如何获取多个列表并将它们作为不同的列放在python数据框中?我尝试了此解决方案,但遇到了一些麻烦。

尝试1:

  • 有三个列表,并将它们压缩在一起并使用 res = zip(lst1,lst2,lst3)
  • Yield仅一栏

尝试2:

percentile_list = pd.DataFrame({'lst1Tite' : [lst1],
                                'lst2Tite' : [lst2],
                                'lst3Tite' : [lst3] }, 
                                columns=['lst1Tite','lst1Tite', 'lst1Tite'])
  • 产生一行3列(按上述方式),或者如果我转置则为3行1列

如何获得100行(每个独立列表的长度)乘3列(三个列表)的熊猫数据框?

How do I take multiple lists and put them as different columns in a python dataframe? I tried this solution but had some trouble.

Attempt 1:

  • Have three lists, and zip them together and use that res = zip(lst1,lst2,lst3)
  • Yields just one column

Attempt 2:

percentile_list = pd.DataFrame({'lst1Tite' : [lst1],
                                'lst2Tite' : [lst2],
                                'lst3Tite' : [lst3] }, 
                                columns=['lst1Tite','lst1Tite', 'lst1Tite'])
  • yields either one row by 3 columns (the way above) or if I transpose it is 3 rows and 1 column

How do I get a 100 row (length of each independent list) by 3 column (three lists) pandas dataframe?


回答 0

我认为您快到了,请尝试删除lsts 周围的多余方括号(此外,从像这样的字典创建数据框时,您无需指定列名):

import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame(
    {'lst1Title': lst1,
     'lst2Title': lst2,
     'lst3Title': lst3
    })

percentile_list
    lst1Title  lst2Title  lst3Title
0          0         0         0
1          1         1         1
2          2         2         2
3          3         3         3
4          4         4         4
5          5         5         5
6          6         6         6
...

如果您需要一种性能更高的解决方案,np.column_stack而不是zip第一次尝试,可以使用此方法,此处示例的速度大约提高了2倍,但是我认为这会降低可读性:

import numpy as np
percentile_list = pd.DataFrame(np.column_stack([lst1, lst2, lst3]), 
                               columns=['lst1Title', 'lst2Title', 'lst3Title'])

I think you’re almost there, try removing the extra square brackets around the lst‘s (Also you don’t need to specify the column names when you’re creating a dataframe from a dict like this):

import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)
percentile_list = pd.DataFrame(
    {'lst1Title': lst1,
     'lst2Title': lst2,
     'lst3Title': lst3
    })

percentile_list
    lst1Title  lst2Title  lst3Title
0          0         0         0
1          1         1         1
2          2         2         2
3          3         3         3
4          4         4         4
5          5         5         5
6          6         6         6
...

If you need a more performant solution you can use np.column_stack rather than zip as in your first attempt, this has around a 2x speedup on the example here, however comes at bit of a cost of readability in my opinion:

import numpy as np
percentile_list = pd.DataFrame(np.column_stack([lst1, lst2, lst3]), 
                               columns=['lst1Title', 'lst2Title', 'lst3Title'])

回答 1

在此处添加到Aditya Guru的答案中。无需使用地图。您可以通过以下方式简单地做到这一点:

pd.DataFrame(list(zip(lst1, lst2, lst3)))

这会将列的名称设置为0,1,2。要设置自己的列名,可以将关键字参数传递columns给上述方法。

pd.DataFrame(list(zip(lst1, lst2, lst3)),
              columns=['lst1_title','lst2_title', 'lst3_title'])

Adding to Aditya Guru‘s answer here. There is no need of using map. You can do it simply by:

pd.DataFrame(list(zip(lst1, lst2, lst3)))

This will set the column’s names as 0,1,2. To set your own column names, you can pass the keyword argument columns to the method above.

pd.DataFrame(list(zip(lst1, lst2, lst3)),
              columns=['lst1_title','lst2_title', 'lst3_title'])

回答 2

只需添加使用第一种方法,即可完成-

pd.DataFrame(list(map(list, zip(lst1,lst2,lst3))))

Just adding that using the first approach it can be done as –

pd.DataFrame(list(map(list, zip(lst1,lst2,lst3))))

回答 3

添加了另一种可扩展的解决方案。

lists = [lst1, lst2, lst3, lst4]
df = pd.concat([pd.Series(x) for x in lists], axis=1)

Adding one more scalable solution.

lists = [lst1, lst2, lst3, lst4]
df = pd.concat([pd.Series(x) for x in lists], axis=1)

回答 4

除了上述答案,我们可以随时创建

df= pd.DataFrame()
list1 = list(range(10))
list2 = list(range(10,20))
df['list1'] = list1
df['list2'] = list2
print(df)

希望能帮助到你 !

Adding to above answers, we can create on the fly

df= pd.DataFrame()
list1 = list(range(10))
list2 = list(range(10,20))
df['list1'] = list1
df['list2'] = list2
print(df)

hope it helps !


回答 5

@oopsi已使用pd.concat(),但未包含列名称。您可以执行以下操作,与接受的答案中的第一个解决方案不同,该操作使您可以控制列顺序(避免使用无序的字典):

import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)

s1=pd.Series(lst1,name='lst1Title')
s2=pd.Series(lst2,name='lst2Title')
s3=pd.Series(lst3 ,name='lst3Title')
percentile_list = pd.concat([s1,s2,s3], axis=1)

percentile_list
Out[2]: 
    lst1Title  lst2Title  lst3Title
0           0          0          0
1           1          1          1
2           2          2          2
3           3          3          3
4           4          4          4
5           5          5          5
6           6          6          6
7           7          7          7
8           8          8          8
...

@oopsi used pd.concat() but didn’t include the column names. You could do the following, which, unlike the first solution in the accepted answer, gives you control over the column order (avoids dicts, which are unordered):

import pandas as pd
lst1 = range(100)
lst2 = range(100)
lst3 = range(100)

s1=pd.Series(lst1,name='lst1Title')
s2=pd.Series(lst2,name='lst2Title')
s3=pd.Series(lst3 ,name='lst3Title')
percentile_list = pd.concat([s1,s2,s3], axis=1)

percentile_list
Out[2]: 
    lst1Title  lst2Title  lst3Title
0           0          0          0
1           1          1          1
2           2          2          2
3           3          3          3
4           4          4          4
5           5          5          5
6           6          6          6
7           7          7          7
8           8          8          8
...

回答 6

有多种方法可以从多个列表创建数据框。

list1=[1,2,3,4]
list2=[5,6,7,8]
list3=[9,10,11,12]
  1. pd.DataFrame({'list1':list1, 'list2':list2, 'list3'=list3})

  2. pd.DataFrame(data=zip(list1,list2,list3),columns=['list1','list2','list3'])

There are several ways to create a dataframe from multiple lists.

list1=[1,2,3,4]
list2=[5,6,7,8]
list3=[9,10,11,12]
  1. pd.DataFrame({'list1':list1, 'list2':list2, 'list3'=list3})

  2. pd.DataFrame(data=zip(list1,list2,list3),columns=['list1','list2','list3'])


回答 7

您可以简单地使用以下代码

train_data['labels']= train_data[["LABEL1","LABEL1","LABEL2","LABEL3","LABEL4","LABEL5","LABEL6","LABEL7"]].values.tolist()
train_df = pd.DataFrame(train_data, columns=['text','labels'])

you can simple use this following code

train_data['labels']= train_data[["LABEL1","LABEL1","LABEL2","LABEL3","LABEL4","LABEL5","LABEL6","LABEL7"]].values.tolist()
train_df = pd.DataFrame(train_data, columns=['text','labels'])

如何让PyLint识别numpy成员?

问题:如何让PyLint识别numpy成员?

我在Python项目上运行PyLint。PyLint抱怨无法找到numpy成员。在避免跳过成员资格检查的同时如何避免这种情况。

从代码:

import numpy as np

print np.zeros([1, 4])

运行时,我得到了预期的结果:

[[0. 0. 0. 0.]]

但是,pylint给了我这个错误:

E:3,6:模块’numpy’没有’zeros’成员(no-member)

对于版本,我使用的是pylint 1.0.0(星号1.0.1,常见的0.60.0),并尝试使用numpy 1.8.0。

I am running PyLint on a Python project. PyLint makes many complaints about being unable to find numpy members. How can I avoid this while avoiding skipping membership checks.

From the code:

import numpy as np

print np.zeros([1, 4])

Which, when ran, I get the expected:

[[ 0. 0. 0. 0.]]

However, pylint gives me this error:

E: 3, 6: Module ‘numpy’ has no ‘zeros’ member (no-member)

For versions, I am using pylint 1.0.0 (astroid 1.0.1, common 0.60.0) and trying to work with numpy 1.8.0 .


回答 0

如果将Visual Studio Code与Don Jayamanne的出色Python扩展一起使用,请将用户设置添加到numpy白名单:

{
    // whitelist numpy to remove lint errors
    "python.linting.pylintArgs": [
        "--extension-pkg-whitelist=numpy"
    ]
}

If using Visual Studio Code with Don Jayamanne’s excellent Python extension, add a user setting to whitelist numpy:

{
    // whitelist numpy to remove lint errors
    "python.linting.pylintArgs": [
        "--extension-pkg-whitelist=numpy"
    ]
}

回答 1

我这里有同样的问题,即使所有相关的软件包的最新版本(astroid 1.3.2logilab_common 0.63.2pylon 1.4.0)。

以下解决方案非常有用:在该部分中,numpy我通过修改pylintrc文件将其添加到了被忽略的模块列表中[TYPECHECK]

[TYPECHECK]

ignored-modules = numpy

根据错误,您可能还需要添加以下行(仍在中[TYPECHECK] section):

ignored-classes = numpy

I had the same issue here, even with the latest versions of all related packages (astroid 1.3.2, logilab_common 0.63.2, pylon 1.4.0).

The following solution worked like a charm: I added numpy to the list of ignored modules by modifying my pylintrc file, in the [TYPECHECK] section:

[TYPECHECK]

ignored-modules = numpy

Depending on the error, you might also need to add the following line (still in the [TYPECHECK] section):

ignored-classes = numpy

回答 2

对于正在处理的一个小numpy项目,我遇到了相同的错误,因此决定忽略numpy模块就可以了。我创建了一个.pylintrc文件:

$ pylint --generate-rcfile > ~/.pylintrc

根据paduwan和j_houg的建议,我修改了以下部分:

[MASTER]

# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code
extension-pkg-whitelist=numpy

[TYPECHECK]

# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=numpy

# List of classes names for which member attributes should not be checked
# (useful for classes with attributes dynamically set). This supports can work
# with qualified names.
ignored-classes=numpy

它“解决”了我的问题。

I was getting the same error for a small numpy project I was working on and decided that ignoring the numpy modules would do just fine. I created a .pylintrc file with:

$ pylint --generate-rcfile > ~/.pylintrc

and following paduwan’s and j_houg’s advice I modified the following sectors:

[MASTER]

# A comma-separated list of package or module names from where C extensions may
# be loaded. Extensions are loading into the active Python interpreter and may
# run arbitrary code
extension-pkg-whitelist=numpy

and

[TYPECHECK]

# List of module names for which member attributes should not be checked
# (useful for modules/projects where namespaces are manipulated during runtime
# and thus existing member attributes cannot be deduced by static analysis. It
# supports qualified module names, as well as Unix pattern matching.
ignored-modules=numpy

# List of classes names for which member attributes should not be checked
# (useful for classes with attributes dynamically set). This supports can work
# with qualified names.
ignored-classes=numpy

and it “fixed” my issue.


回答 3

在最新版本的pylint中,您可以添加--extension-pkg-whitelist=numpy到pylint命令中。他们以不安全的方式解决了早期版本中的此问题。现在,如果希望他们更加仔细地查看标准库之外的软件包,则必须将其明确列入白名单。看这里。

In recent versions of pylint you can add --extension-pkg-whitelist=numpy to your pylint command. They had fixed this problem in an earlier version in an unsafe way. Now if you want them to look more carefully at a package outside of the standard library, you must explicitly whitelist it. See here.


回答 4

由于这是google中的最高结果,它给我的印象是您必须忽略所有文件中的警告:

这个问题实际上已经在上个月的pylint / astroid来源https://bitbucket.org/logilab/astroid/commits/83d78af4866be5818f193360c78185e1008fd29e的来源中得到解决, 但尚未在Ubuntu软件包中。

要获取来源,只需

hg clone https://bitbucket.org/logilab/pylint/
hg clone https://bitbucket.org/logilab/astroid
mkdir logilab && touch logilab/__init__.py
hg clone http://hg.logilab.org/logilab/common logilab/common
cd pylint && python setup.py install

因此,最后一步很可能需要一个sudo,当然您也需要一些技巧来克隆。

Since this is the top result in google and it gave me the impression that you have to ignore those warnings in all files:

The problem has actually been fixed in the sources of pylint/astroid last month https://bitbucket.org/logilab/astroid/commits/83d78af4866be5818f193360c78185e1008fd29e but are not yet in the Ubuntu packages.

To get the sources, just

hg clone https://bitbucket.org/logilab/pylint/
hg clone https://bitbucket.org/logilab/astroid
mkdir logilab && touch logilab/__init__.py
hg clone http://hg.logilab.org/logilab/common logilab/common
cd pylint && python setup.py install

whereby the last step will most likely require a sudo and of course you need mercurial to clone.


回答 5

为了忽略numpy.core属性产生的所有错误,我们现在可以使用:

$ pylint a.py --generated-members=numpy.*

作为另一个解决方案,将此选项添加到〜/ .pylintrc/ etc / pylintrc文件中:

[TYPECHECK]

# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E1101 when accessed. Python regular
# expressions are accepted.
generated-members=numpy.*

到目前为止,对于有问题的代码来说,这似乎是多余的,但对于另一个模块(即,)仍然很重要。netifaces

For ignoring all the errors generated by numpy.core‘s attributes, we can now use:

$ pylint a.py --generated-members=numpy.*

As another solution, add this option to ~/.pylintrc or /etc/pylintrc file:

[TYPECHECK]

# List of members which are set dynamically and missed by pylint inference
# system, and so shouldn't trigger E1101 when accessed. Python regular
# expressions are accepted.
generated-members=numpy.*

For mentioned in question code by now this seems reduntant, but still matters for another modules, ie. netifaces and etc.


回答 6

如果您不想添加更多配置,请将此代码添加到您的配置文件中,而不是“ whitelist”。

{
"python.linting.pylintArgs": ["--generate-members"],
}

If you don’t want to add more config, please add this code to your config file, instead of ‘whitelist’.

{
"python.linting.pylintArgs": ["--generate-members"],
}

回答 7

在过去的几年中,有许多关于此的错误报告,即https://bitbucket.org/logilab/pylint/issue/58/false-positive-no-member-on-numpy-imports

我建议禁用投诉发生的线路。

# pylint: disable=E1103
print np.zeros([1, 4])
# pylint: enable=E1103

There have been many different bugs reported about this over the past few years i.e. https://bitbucket.org/logilab/pylint/issue/58/false-positive-no-member-on-numpy-imports

I’d suggest disabling for the lines where the complaints occur.

# pylint: disable=E1103
print np.zeros([1, 4])
# pylint: enable=E1103

回答 8

可能是,它与numpy的方法导入的抽象方法混淆了。也就是说,zeros实际上numpy.core.multiarray.zeros是通过numpy语句导入的

from .core import *

依次导入

from .numeric import *

并且以数字形式您会发现

zeros = multiarray.zeros

我想我会代替PyLint感到困惑!

有关PyLint侧面的信息,请参见此错误

Probably, it’s confused with numpy’s abstruse method of methods import. Namely, zeros is in fact numpy.core.multiarray.zeros, imported in numpy with statement

from .core import *

in turn imported with

from .numeric import *

and in numeric you’ll find

zeros = multiarray.zeros

I guess I would be confused in place of PyLint!

See this bug for PyLint side of view.


回答 9

我必须将其添加到我经常使用numpy的任何文件的顶部。

# To ignore numpy errors:
#     pylint: disable=E1101

以防万一有人在日食中遇到Pydev和pylint的麻烦…

I had to add this at the top of any file where I use numpy a lot.

# To ignore numpy errors:
#     pylint: disable=E1101

Just in case someone in eclipse is having trouble with Pydev and pylint…


回答 10

在j_hougs答案的扩展中,您现在可以在.pylintrc的此行中添加有问题的模块,该行在生成时已经准备好为空:

extension-pkg-whitelist=numpy

您可以通过执行以下操作来生成示例.pylintrc:

pylint --generate-rcfile > .pylintrc

然后编辑提到的行

In Extension to j_hougs answer, you can now add the modules in question to this line in .pylintrc, which is already prepared empty on generation:

extension-pkg-whitelist=numpy

you can generate a sample .pylintrc by doing:

pylint --generate-rcfile > .pylintrc

and then edit the mentioned line


回答 11

终于在Pylint 1.8.2中解决了这个问题。开箱即用,无需pylintrc调整!

This has finally been resolved in Pylint 1.8.2. Works out of the box, no pylintrc tweaks needed!


回答 12

这是我针对此问题提出的伪解决方案。

#pylint: disable=no-name-in-module
from numpy import array as np_array, transpose as np_transpose, \
      linspace as np_linspace, zeros as np_zeros
from numpy.random import uniform as random_uniform
#pylint: enable=no-name-in-module

然后,在你的代码,而不是调用numpy功能np.arraynp.zeros等等,你会写np_arraynp_zeros等等。这种做法与在其他的答案提出其他方法的优点:

  • pylint禁用/启用仅限于代码的一小部分
  • 这意味着您不必用pylint指令将每一个调用numpy函数的行都包围起来。
  • 您没有为整个文件执行pylint禁用错误操作,这可能会掩盖代码的其他问题。

明显的缺点是,您必须显式导入您使用的每个numpy函数。该方法可以进一步阐述。您可以定义自己的模块,numpy_importer如下所示

""" module: numpy_importer.py
       explicitely import numpy functions while avoiding pylint errors  
"""
#pylint: disable=unused-import
#pylint: disable=no-name-in-module
from numpy import array, transpose, zeros  #add all things you need  
from numpy.random import uniform as random_uniform
#pylint: enable=no-name-in-module

然后,您的应用程序代码只能将该模块(而不是numpy)导入为

import numpy_importer as np 

并像往常一样使用名称:np.zerosnp.array等等。

这样做的好处是您将拥有一个模块,其中所有numpy相关的导入都将一劳永逸地完成,然后您可以在任意位置使用该行导入它。仍然要注意numpy_importer不要导入不存在的名称,numpy因为这些错误不会被pylint捕获。

This is the pseudo-solution I have come up with for this problem.

#pylint: disable=no-name-in-module
from numpy import array as np_array, transpose as np_transpose, \
      linspace as np_linspace, zeros as np_zeros
from numpy.random import uniform as random_uniform
#pylint: enable=no-name-in-module

Then, in your code, instead of calling numpy functions as np.array and np.zeros and so on, you would write np_array, np_zeros, etc. Advantages of this approach vs. other approaches suggested in other answers:

  • The pylint disable/enable is restricted to a small region of your code
  • That means that you don’t have to surround every single line that has an invocation of a numpy function with a pylint directive.
  • You are not doing pylint disable of the error for your whole file, which might mask other issues with your code.

The clear disadvantage is that you have to explicitely import every numpy function you use. The approach could be elaborated on further. You could define your own module, call it say, numpy_importer as follows

""" module: numpy_importer.py
       explicitely import numpy functions while avoiding pylint errors  
"""
#pylint: disable=unused-import
#pylint: disable=no-name-in-module
from numpy import array, transpose, zeros  #add all things you need  
from numpy.random import uniform as random_uniform
#pylint: enable=no-name-in-module

Then, your application code could import this module only (instead of numpy) as

import numpy_importer as np 

and use the names as usual: np.zeros, np.array etc.

The advantage of this is that you will have a single module in which all numpy related imports are done once and for all, and then you import it with that single line, wherever you want. Still you have to be careful that numpy_importer does not import names that don´t exist in numpy as those errors won’t be caught by pylint.


回答 13

我遇到了numpy,scipy,sklearn,nipy等问题,并通过包裹epylint来解决了这个问题,如下所示:

$猫epylint.py

#!/usr/bin/python

"""
Synopsis: epylint wrapper that filters a bunch of false-positive warnings and errors
Author: DOHMATOB Elvis Dopgima <gmdopp@gmail.com> <elvis.dohmatob@inria.fr>

"""

import os
import sys
import re
from subprocess import Popen, STDOUT, PIPE

NUMPY_HAS_NO_MEMBER = re.compile("Module 'numpy(?:\..+)?' has no '.+' member")
SCIPY_HAS_NO_MEMBER = re.compile("Module 'scipy(?:\..+)?' has no '.+' member")
SCIPY_HAS_NO_MEMBER2 = re.compile("No name '.+' in module 'scipy(?:\..+)?'")
NIPY_HAS_NO_MEMBER = re.compile("Module 'nipy(?:\..+)?' has no '.+' member")
SK_ATTR_DEFINED_OUTSIDE_INIT = re.compile("Attribute '.+_' defined outside __init__")
REL_IMPORT_SHOULD_BE = re.compile("Relative import '.+', should be '.+")
REDEFINING_NAME_FROM_OUTER_SCOPE = re.compile("Redefining name '.+' from outer scope")

if __name__ == "__main__":
    basename = os.path.basename(sys.argv[1])
    for line in Popen(['epylint', sys.argv[1], '--disable=C,R,I'  # filter thesew arnings
                       ], stdout=PIPE, stderr=STDOUT, universal_newlines=True).stdout:
        if line.startswith("***********"):
            continue
        elif line.startswith("No config file found,"):
            continue
        elif "anomalous-backslash-in-string," in line:
            continue
        if NUMPY_HAS_NO_MEMBER.search(line):
            continue
        if SCIPY_HAS_NO_MEMBER.search(line):
            continue
        if SCIPY_HAS_NO_MEMBER2.search(line):
            continue
        if "Used * or ** magic" in line:
            continue
        if "No module named" in line and "_flymake" in line:
            continue
        if SK_ATTR_DEFINED_OUTSIDE_INIT.search(line):
            continue
        if "Access to a protected member" in line:
            continue
        if REL_IMPORT_SHOULD_BE.search(line):
            continue
        if REDEFINING_NAME_FROM_OUTER_SCOPE.search(line):
            continue
        if NIPY_HAS_NO_MEMBER.search(line):
            continue
        # XXX extend by adding more handles for false-positives here
        else:
            print line,

该脚本仅运行epylint,然后刮擦其输出以过滤掉错误肯定的警告和错误。您可以通过添加更多elif案例来扩展它。

注意:如果这适用于您,则您需要修改pychechers.sh,使其像这样

#!/bin/bash

epylint.py "$1" 2>/dev/null
pyflakes "$1"
pep8 --ignore=E221,E701,E202 --repeat "$1"
true

(当然,您必须首先使epylint.py可执行)

这是我的.emacs https://github.com/dohmatob/mydotemacs的链接。希望这对某人有用。

I had this problem with numpy, scipy, sklearn, nipy, etc., and I solved it by wrapping epylint like so:

$ cat epylint.py

#!/usr/bin/python

"""
Synopsis: epylint wrapper that filters a bunch of false-positive warnings and errors
Author: DOHMATOB Elvis Dopgima <gmdopp@gmail.com> <elvis.dohmatob@inria.fr>

"""

import os
import sys
import re
from subprocess import Popen, STDOUT, PIPE

NUMPY_HAS_NO_MEMBER = re.compile("Module 'numpy(?:\..+)?' has no '.+' member")
SCIPY_HAS_NO_MEMBER = re.compile("Module 'scipy(?:\..+)?' has no '.+' member")
SCIPY_HAS_NO_MEMBER2 = re.compile("No name '.+' in module 'scipy(?:\..+)?'")
NIPY_HAS_NO_MEMBER = re.compile("Module 'nipy(?:\..+)?' has no '.+' member")
SK_ATTR_DEFINED_OUTSIDE_INIT = re.compile("Attribute '.+_' defined outside __init__")
REL_IMPORT_SHOULD_BE = re.compile("Relative import '.+', should be '.+")
REDEFINING_NAME_FROM_OUTER_SCOPE = re.compile("Redefining name '.+' from outer scope")

if __name__ == "__main__":
    basename = os.path.basename(sys.argv[1])
    for line in Popen(['epylint', sys.argv[1], '--disable=C,R,I'  # filter thesew arnings
                       ], stdout=PIPE, stderr=STDOUT, universal_newlines=True).stdout:
        if line.startswith("***********"):
            continue
        elif line.startswith("No config file found,"):
            continue
        elif "anomalous-backslash-in-string," in line:
            continue
        if NUMPY_HAS_NO_MEMBER.search(line):
            continue
        if SCIPY_HAS_NO_MEMBER.search(line):
            continue
        if SCIPY_HAS_NO_MEMBER2.search(line):
            continue
        if "Used * or ** magic" in line:
            continue
        if "No module named" in line and "_flymake" in line:
            continue
        if SK_ATTR_DEFINED_OUTSIDE_INIT.search(line):
            continue
        if "Access to a protected member" in line:
            continue
        if REL_IMPORT_SHOULD_BE.search(line):
            continue
        if REDEFINING_NAME_FROM_OUTER_SCOPE.search(line):
            continue
        if NIPY_HAS_NO_MEMBER.search(line):
            continue
        # XXX extend by adding more handles for false-positives here
        else:
            print line,

This script simply runs epylint, then scrapes its output to filter out false-positive warnings and errors. You can extend it by added more elif cases.

N.B.: If this applies to you, then you’ll want to modify your pychechers.sh so it likes like this

#!/bin/bash

epylint.py "$1" 2>/dev/null
pyflakes "$1"
pep8 --ignore=E221,E701,E202 --repeat "$1"
true

(Of course, you have to make epylint.py executable first)

Here is a link to my .emacs https://github.com/dohmatob/mydotemacs. Hope this is useful to someone.


回答 14

这似乎至少在Pylint 1.1.0上有效:

[TYPECHECK]

ignored-classes=numpy

This seems to work on at least Pylint 1.1.0:

[TYPECHECK]

ignored-classes=numpy

回答 15

这个解决方案对我有用

基本上,请从左下角选择齿轮图标=> Setting => Workspace Setting => Extension => Python Configuration =>单击任何Settings.json =>将此添加到文件“ python.linting.pylintArgs”中:[ –extension-pkg-whitelist = numpy“]我正在使用VS 1.27.2

This solution worked for me

Basically, go to Select the gear icon from bottom left=>Setting=>Workspace Setting =>Extension=>Python Configuration=>Click on any Settings.json => add this in the file “python.linting.pylintArgs” : [ “–extension-pkg-whitelist=numpy” ] I am using VS 1.27.2


回答 16

我在另一个不同的模块(kivy.properties)上遇到了同样的问题,它是一个包装好的C模块,例如numpy

使用VSCode V1.38.0,公认的解决方案停止了该项目的所有棉绒。因此,尽管它确实消除了误报no-name-in-module,但并没有真正改善这种情况。

对我来说,最好的解决方法是--ignored-modules在有问题的模块上使用参数。麻烦的是,通过传递任何参数python.linting.pylintArgs抹了默认VSCode设置,所以你需要重新设置者也。那给我留下了以下settings.json文件:

{
    "python.pythonPath": "C:\\Python\\Python37\\python.exe",
    "python.linting.pylintEnabled": true,
    "python.linting.enabled": true,
    "python.linting.pylintArgs": [
        "--ignored-modules=kivy.properties",
        "--disable=all",
        "--enable=F,E,unreachable,duplicate-key,unnecessary-semicolon,global-variable-not-assigned,unused-variable,binary-op-exception,bad-format-string,anomalous-backslash-in-string,bad-open-mode"
    ]
}

I had the same problem with a different module (kivy.properties) which is a wrapped C module like numpy.

Using VSCode V1.38.0, the accepted solution stopped all linting for the project. So, while it did indeed remove the false-positive no-name-in-module, it didn’t really improve the situation.

The best workaround for me was to use the --ignored-modules argument on the offending module. Trouble is, passing any argument via python.linting.pylintArgs wipes out the default VSCode settings, so you need to re-set those also. That left me with the following settings.json file:

{
    "python.pythonPath": "C:\\Python\\Python37\\python.exe",
    "python.linting.pylintEnabled": true,
    "python.linting.enabled": true,
    "python.linting.pylintArgs": [
        "--ignored-modules=kivy.properties",
        "--disable=all",
        "--enable=F,E,unreachable,duplicate-key,unnecessary-semicolon,global-variable-not-assigned,unused-variable,binary-op-exception,bad-format-string,anomalous-backslash-in-string,bad-open-mode"
    ]
}

回答 17

从前面的答案中复制了一些文字,以总结出有效的方法(至少对我来说:debian-jessie)

  1. 在某些旧版本中,pylint存在一个问题,使其无法与numpy(及其他类似软件包)一起使用。

  2. 现在已经解决了该问题,但是出于安全原因,默认情况下禁用了外部C包(C代码的python接口-例如numpy-)。

  3. 您可以创建白名单,以允许pylint在文件中使用它们~/.pylintrc

要运行的基本命令:#仅在您家中没有.pylintrc文件的情况下$ pylint –generate-rcfile> .pylintrc

然后打开文件并添加所需的软件包(extension-pkg-whitelist=用逗号分隔)。使用--extension-pkg-whitelist=numpy命令行中的选项,您可以具有相同的行为。

如果您忽略了本[TYPECHECK]节中的某些软件包,则这pylint将永远不会显示与该软件包相关的错误。实际上,pylint不会告诉您有关那些软件包的任何信息。

A little bit of copy paste from the previous answer to summarize what is working (at least for me: debian-jessie)

  1. In some older version of pylint there was a problem preventing it working with numpy (and other similar packages).

  2. Now that problem has been solved but external C packages (python interfaces to C code -like numpy-) are disabled by default for security reasons.

  3. You can create a white list, to allow pylint to use them in the file ~/.pylintrc.

Basic command to run: # ONLY if you do not already have a .pylintrc file in your home $ pylint –generate-rcfile > .pylintrc

Then open the file and add the packages you want after extension-pkg-whitelist= separated by comma. You can have the same behavior using the option --extension-pkg-whitelist=numpy from the command line.

If you ignore some packages in the [TYPECHECK] section that means that pylint will never show error related to that packages. In practice, pylint will not tell you anything about those packages.


回答 18

我一直在为pylint制作补丁,以解决numpy等库中动态成员的问题。它添加了一个“动态模块”选项,该选项通过实际导入模块来强制检查运行时是否存在成员。请参阅logilab / pylint中的第413期。还有一个拉取请求,请参阅注释之一中的链接。

I’ve been working on a patch to pylint to solve the issue with dynamic members in libraries such as numpy. It adds a “dynamic-modules” option which forces to check if members exist during runtime by making a real import of the module. See Issue #413 in logilab/pylint. There is also a pull request, see link in one of the comments.


回答 19

快速答案:将Pylint更新为1.7.1(如果使用conda管理软件包,请使用conda-forge提供的Pylint 1.7.1)

我在此处的pylint GitHub中发现了类似的问题,有人回复说更新到1.7.1后一切正常。

A quick answer: update Pylint to 1.7.1 (use conda-forge provided Pylint 1.7.1 if you use conda to manage packages)

I found a similar issue in pylint GitHub here and someone replied everything getting OK after updating to 1.7.1.


回答 20

我不确定这是否是解决方案,但是在VSCode中,我在用户设置中明确编写了启用pylint的代码后,所有模块都被识别。

{
    "python.linting.pep8Enabled": true,
    "python.linting.pylintEnabled": true
}

I’m not sure if this is a solution, but in VSCode once I wrote explicitly in my user settings to enable pylint, all modules were recognized.

{
    "python.linting.pep8Enabled": true,
    "python.linting.pylintEnabled": true
}

回答 21

最近(由于spyder或pylint或?有所更改),我从astropy.constants符号的spyder静态代码分析中得到了E1101错误(“无成员”)。不知道为什么。

对于Linux或Unix系统(Mac可能与此类似)上的所有用户,我的简化解决方案是创建一个/ etc / pylintrc,如下所示:

[TYPECHECK]
ignored-modules=astropy.constants

当然,可以将其放在个人$ HOME / .pylintrc文件中。而且,我本可以更新现有文件。

Lately (since something changed in spyder or pylint or ?), I have been getting E1101 errors (“no member”) from spyder’s static code analysis on astropy.constants symbols. No idea why.

My simplistic solution for all users on a Linux or Unix system (Mac is probably similar) is to create an /etc/pylintrc as follows:

[TYPECHECK]
ignored-modules=astropy.constants

Of course, this could, instead, be put in a personal $HOME/.pylintrc file. And, I could have updated an existing file.