标签归档:scipy

Python SciPy是否需要BLAS?

问题:Python SciPy是否需要BLAS?

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

我需要从该站点下载哪个tar?

我已经尝试过fortrans,但是一直出现此错误(明显地设置了环境变量之后)。

numpy.distutils.system_info.BlasNotFoundError: 
    Blas (http://www.netlib.org/blas/) libraries not found.
    Directories to search for the libraries can be specified in the
    numpy/distutils/site.cfg file (section [blas]) or by setting
    the BLAS environment variable.

Which tar do I need to download off this site?

I’ve tried the fortrans, but I keep getting this error (after setting the environment variable obviously).


回答 0

SciPy的网页用来提供构建和安装说明,但说明现在依靠操作系统二进制分发。要在没有预编译所需库软件包的操作系统上构建SciPy(和NumPy),必须先构建然后静态链接到Fortran库BLASLAPACK

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

仅执行五个g77 / gfortran / ifort命令之一。我已注释掉所有内容,但我使用的是gfortran。随后的LAPACK安装需要一个Fortran 90编译器,并且由于两个安装都应使用相同的Fortran编译器,因此g77不应用于BLAS。

接下来,您需要安装LAPACK东西。SciPy网页的说明在这里也对我有所帮助,但我必须对其进行修改以适合我的环境:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

2015年9月3日更新:今天验证了一些评论(感谢所有):运行之前,make lapacklib编辑make.inc文件-fPIC并向OPTSNOOPT设置添加选项。如果您使用的是64位体系结构或要编译为64位体系结构,请同时添加-m64。重要的是,在将这些选项设置为相同值的情况下编译BLAS和LAPACK。如果您忘记了,-fPICSciPy实际上会给您有关符号丢失的错误,并建议您使用此开关。make.inc我的设置中的特定部分如下所示:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

在旧机器(例如RedHat 5)上,gfortran可能安装在旧版本(例如4.1.2)中,并且不理解option -frecursivemake.inc在这种情况下,只需将其从文件中删除即可。

Makefile的lapack测试目标在我的设置中失败,因为它找不到blas库。如果您周全,则可以将blas库临时移至指定位置以测试lapack。我是一个懒惰的人,所以我相信开发人员可以使其工作并仅在SciPy中进行验证。

The SciPy webpage used to provide build and installation instructions, but the instructions there now rely on OS binary distributions. To build SciPy (and NumPy) on operating systems without precompiled packages of the required libraries, you must build and then statically link to the Fortran libraries BLAS and LAPACK:

mkdir -p ~/src/
cd ~/src/
wget http://www.netlib.org/blas/blas.tgz
tar xzf blas.tgz
cd BLAS-*

## NOTE: The selected Fortran compiler must be consistent for BLAS, LAPACK, NumPy, and SciPy.
## For GNU compiler on 32-bit systems:
#g77 -O2 -fno-second-underscore -c *.f                     # with g77
#gfortran -O2 -std=legacy -fno-second-underscore -c *.f    # with gfortran
## OR for GNU compiler on 64-bit systems:
#g77 -O3 -m64 -fno-second-underscore -fPIC -c *.f                     # with g77
gfortran -O3 -std=legacy -m64 -fno-second-underscore -fPIC -c *.f    # with gfortran
## OR for Intel compiler:
#ifort -FI -w90 -w95 -cm -O3 -unroll -c *.f

# Continue below irrespective of compiler:
ar r libfblas.a *.o
ranlib libfblas.a
rm -rf *.o
export BLAS=~/src/BLAS-*/libfblas.a

Execute only one of the five g77/gfortran/ifort commands. I have commented out all, but the gfortran which I use. The subsequent LAPACK installation requires a Fortran 90 compiler, and since both installs should use the same Fortran compiler, g77 should not be used for BLAS.

Next, you’ll need to install the LAPACK stuff. The SciPy webpage’s instructions helped me here as well, but I had to modify them to suit my environment:

mkdir -p ~/src
cd ~/src/
wget http://www.netlib.org/lapack/lapack.tgz
tar xzf lapack.tgz
cd lapack-*/
cp INSTALL/make.inc.gfortran make.inc          # On Linux with lapack-3.2.1 or newer
make lapacklib
make clean
export LAPACK=~/src/lapack-*/liblapack.a

Update on 3-Sep-2015: Verified some comments today (thanks to all): Before running make lapacklib edit the make.inc file and add -fPIC option to OPTS and NOOPT settings. If you are on a 64bit architecture or want to compile for one, also add -m64. It is important that BLAS and LAPACK are compiled with these options set to the same values. If you forget the -fPIC SciPy will actually give you an error about missing symbols and will recommend this switch. The specific section of make.inc looks like this in my setup:

FORTRAN  = gfortran 
OPTS     = -O2 -frecursive -fPIC -m64
DRVOPTS  = $(OPTS)
NOOPT    = -O0 -frecursive -fPIC -m64
LOADER   = gfortran

On old machines (e.g. RedHat 5), gfortran might be installed in an older version (e.g. 4.1.2) and does not understand option -frecursive. Simply remove it from the make.inc file in such cases.

The lapack test target of the Makefile fails in my setup because it cannot find the blas libraries. If you are thorough you can temporarily move the blas library to the specified location to test the lapack. I’m a lazy person, so I trust the devs to have it working and verify only in SciPy.


回答 1

如果您需要使用最新版本的SciPy而非打包的版本,而无需经历构建BLAS和LAPACK的麻烦,则可以按照以下过程进行操作。

从存储库安装线性代数库(对于Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

然后安装SciPy(在下载SciPy源代码之后):python setup.py install

pip install scipy

视情况可以是。

If you need to use the latest versions of SciPy rather than the packaged version, without going through the hassle of building BLAS and LAPACK, you can follow the below procedure.

Install linear algebra libraries from repository (for Ubuntu),

sudo apt-get install gfortran libopenblas-dev liblapack-dev

Then install SciPy, (after downloading the SciPy source): python setup.py install or

pip install scipy

As the case may be.


回答 2

在Fedora上,这有效:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

请记住除了安装“ blas ”和“ lapack ”之外,还要安装“ lapack-devel ”和“ blas-devel ”,否则,您将得到所提到的错误或“ numpy.distutils.system_info。LapackNotFoundError ”错误。

On Fedora, this works:

 yum install lapack lapack-devel blas blas-devel
 pip install numpy
 pip install scipy

Remember to install ‘lapack-devel‘ and ‘blas-devel‘ in addition to ‘blas’ and ‘lapack’ otherwise you’ll get the error you mentioned or the “numpy.distutils.system_info.LapackNotFoundError” error.


回答 3

我猜您在谈论在Ubuntu中进行安装。只需使用:

apt-get install python-numpy python-scipy

那也应该照顾BLAS库的编译。否则,编译BLAS库非常困难。

I guess you are talking about installation in Ubuntu. Just use:

apt-get install python-numpy python-scipy

That should take care of the BLAS libraries compiling as well. Else, compiling the BLAS libraries is very difficult.


回答 4

对于Windows用户,Chris提供了一个不错的二进制程序包(警告:下载量很大,为191 MB):

For Windows users there is a nice binary package by Chris (warning: it’s a pretty large download, 191 MB):


回答 5

遵循“ cfi”给出的说明对我有用,尽管它们遗漏了一些您可能需要的部分:

1)解压缩后的lapack目录可能称为lapack-XY(某些版本号),因此您可以将其重命名为LAPACK。

cd ~/src
mv lapack-[tab] LAPACK

2)在该目录中,您可能需要执行以下操作:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

Following the instructions given by ‘cfi’ works for me, although there are a few pieces they left out that you might need:

1) Your lapack directory, after unzipping, may be called lapack-X-Y (some version number), so you can just rename that to LAPACK.

cd ~/src
mv lapack-[tab] LAPACK

2) In that directory, you may need to do:

cd ~/src/LAPACK 
cp lapack_LINUX.a libflapack.a

回答 6

尝试使用

sudo apt-get install python3-scipy

Try using

sudo apt-get install python3-scipy

我应该使用scipy.pi,numpy.pi还是math.pi?

问题:我应该使用scipy.pi,numpy.pi还是math.pi?

在使用SciPy的和NumPy的一个项目,我应该使用scipy.pinumpy.pimath.pi

In a project using SciPy and NumPy, should I use scipy.pi, numpy.pi, or math.pi?


回答 0

>>> import math
>>> import numpy as np
>>> import scipy
>>> math.pi == np.pi == scipy.pi
True

所以没关系,它们都是相同的值。

这三个模块均提供pi值的唯一原因是,如果仅使用三个模块之一,则可以方便地访问pi,而不必导入另一个模块。他们没有为pi提供不同的值。

>>> import math
>>> import numpy as np
>>> import scipy
>>> math.pi == np.pi == scipy.pi
True

So it doesn’t matter, they are all the same value.

The only reason all three modules provide a pi value is so if you are using just one of the three modules, you can conveniently have access to pi without having to import another module. They’re not providing different values for pi.


回答 1

需要注意的一件事是,当然,并非所有库都将对pi使用相同的含义,因此知道您使用的内容永远不会有任何伤害。例如,符号数学库Sympy对pi的表示与math和numpy不同:

import math
import numpy
import scipy
import sympy

print(math.pi == numpy.pi)
> True
print(math.pi == scipy.pi)
> True
print(math.pi == sympy.pi)
> False

One thing to note is that not all libraries will use the same meaning for pi, of course, so it never hurts to know what you’re using. For example, the symbolic math library Sympy’s representation of pi is not the same as math and numpy:

import math
import numpy
import scipy
import sympy

print(math.pi == numpy.pi)
> True
print(math.pi == scipy.pi)
> True
print(math.pi == sympy.pi)
> False

Python / SciPy的峰值发现算法

问题:Python / SciPy的峰值发现算法

我可以通过找到一阶导数的零交叉点或类似的东西自己写点东西,但是它似乎包含在标准库中,具有足够的通用性。有人知道吗?

我的特定应用是2D阵列,但通常将其用于查找FFT等中的峰值。

具体来说,在这些类型的问题中,有多个强峰值,然后是由噪声引起的许多较小的“峰值”,应将其忽略。这些仅仅是示例;不是我的实际数据:

一维峰:

二维峰:

峰值查找算法将找到这些峰的位置(而不仅仅是它们的值),理想情况下,可能会使用二次插值或其他方法找到真正的样本间峰,而不仅仅是具有最大值的索引。

通常,您只关心几个强峰,因此选择它们是因为它们高于某个阈值,或者因为它们是有序列表的前n个峰(按振幅排序)。

正如我说的,我自己会写这样的东西。我只是问是否有一个已知的运作良好的功能或软件包。

更新:

翻译了一个MATLAB脚本,它在1-D情况下工作得很好,但可能会更好。

更新的更新:

sixtenbe 为一维案例创建了更好的版本

I can write something myself by finding zero-crossings of the first derivative or something, but it seems like a common-enough function to be included in standard libraries. Anyone know of one?

My particular application is a 2D array, but usually it would be used for finding peaks in FFTs, etc.

Specifically, in these kinds of problems, there are multiple strong peaks, and then lots of smaller “peaks” that are just caused by noise that should be ignored. These are just examples; not my actual data:

1-dimensional peaks:

2-dimensional peaks:

The peak-finding algorithm would find the location of these peaks (not just their values), and ideally would find the true inter-sample peak, not just the index with maximum value, probably using quadratic interpolation or something.

Typically you only care about a few strong peaks, so they’d either be chosen because they’re above a certain threshold, or because they’re the first n peaks of an ordered list, ranked by amplitude.

As I said, I know how to write something like this myself. I’m just asking if there’s a pre-existing function or package that’s known to work well.

Update:

I translated a MATLAB script and it works decently for the 1-D case, but could be better.

Updated update:

sixtenbe created a better version for the 1-D case.


回答 0

scipy.signal.find_peaks顾名思义,该功能对此有用。但是,要理解以及它的参数是非常重要的widththresholddistance 和高于一切prominence,以获得良好的峰值提取。

根据我的测试和文档,突出的概念是“有用的概念”,用于保持良好的峰值,并丢弃嘈杂的峰值。

什么是(地形)突出?它是“从山顶下降到更高地形所需的最低高度”,如下所示:

这个想法是:

突出程度越高,峰越“重要”。

测试:

我故意使用了一个(嘈杂的)频率变化正弦曲线,因为它显示了很多困难。我们可以看到该width参数在这里不是很有用,因为如果您将最小值设置width得太高,则它将无法跟踪高频部分中非常接近的峰值。如果设置width得太低,则信号左侧会出现许多不需要的峰值。同样的问题distancethreshold仅与直接邻居比较,在这里没有用。prominence是提供最佳解决方案的一种。请注意,您可以结合使用许多这些参数!

码:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()

The function scipy.signal.find_peaks, as its name suggests, is useful for this. But it’s important to understand well its parameters width, threshold, distance and above all prominence to get a good peak extraction.

According to my tests and the documentation, the concept of prominence is “the useful concept” to keep the good peaks, and discard the noisy peaks.

What is (topographic) prominence? It is “the minimum height necessary to descend to get from the summit to any higher terrain”, as it can be seen here:

The idea is:

The higher the prominence, the more “important” the peak is.

Test:

I used a (noisy) frequency-varying sinusoid on purpose because it shows many difficulties. We can see that the width parameter is not very useful here because if you set a minimum width too high, then it won’t be able to track very close peaks in the high frequency part. If you set width too low, you would have many unwanted peaks in the left part of the signal. Same problem with distance. threshold only compares with the direct neighbours, which is not useful here. prominence is the one that gives the best solution. Note that you can combine many of these parameters!

Code:

import numpy as np
import matplotlib.pyplot as plt 
from scipy.signal import find_peaks

x = np.sin(2*np.pi*(2**np.linspace(2,10,1000))*np.arange(1000)/48000) + np.random.normal(0, 1, 1000) * 0.15
peaks, _ = find_peaks(x, distance=20)
peaks2, _ = find_peaks(x, prominence=1)      # BEST!
peaks3, _ = find_peaks(x, width=20)
peaks4, _ = find_peaks(x, threshold=0.4)     # Required vertical distance to its direct neighbouring samples, pretty useless
plt.subplot(2, 2, 1)
plt.plot(peaks, x[peaks], "xr"); plt.plot(x); plt.legend(['distance'])
plt.subplot(2, 2, 2)
plt.plot(peaks2, x[peaks2], "ob"); plt.plot(x); plt.legend(['prominence'])
plt.subplot(2, 2, 3)
plt.plot(peaks3, x[peaks3], "vg"); plt.plot(x); plt.legend(['width'])
plt.subplot(2, 2, 4)
plt.plot(peaks4, x[peaks4], "xk"); plt.plot(x); plt.legend(['threshold'])
plt.show()

回答 1

我正在寻找一个类似的问题,并且我发现一些最佳参考来自化学(来自质谱数据中的峰)。有关峰发现算法的详尽综述,请阅读本章。这是我所遇到的关于峰发现技术的最清晰的评论之一。(小波最适合在嘈杂的数据中找到此类峰。)。

看来您的峰清晰地定义了,并且没有隐藏在噪音中。在这种情况下,我建议您使用平滑的savtizky-golay导数来查找峰(如果仅区分上面的数据,则会有一些误报。)。这是一种非常有效的技术,非常容易实现(您确实需要带有基本操作的矩阵类)。如果您只是找到一阶SG导数的零交叉,我想您会很高兴的。

I’m looking at a similar problem, and I’ve found some of the best references come from chemistry (from peaks finding in mass-spec data). For a good thorough review of peaking finding algorithms read this. This is one of the best clearest reviews of peak finding techniques that I’ve run across. (Wavelets are the best for finding peaks of this sort in noisy data.).

It looks like your peaks are clearly defined and aren’t hidden in the noise. That being the case I’d recommend using smooth savtizky-golay derivatives to find the peaks (If you just differentiate the data above you’ll have a mess of false positives.). This is a very effective technique and is pretty easy to implemented (you do need a matrix class w/ basic operations). If you simply find the zero crossing of the first S-G derivative I think you’ll be happy.


回答 2

scipy中有一个名为的功能scipy.signal.find_peaks_cwt,听起来像很适合您的需求,但是我没有经验,所以我不推荐。

http://docs.scipy.org/doc/scipy/reference/generation/scipy.signal.find_peaks_cwt.html

There is a function in scipy named scipy.signal.find_peaks_cwt which sounds like is suitable for your needs, however I don’t have experience with it so I cannot recommend..

http://docs.scipy.org/doc/scipy/reference/generated/scipy.signal.find_peaks_cwt.html


回答 3

对于那些不确定在Python中使用哪种峰值查找算法的人,这里是替代方法的快速概述:https : //github.com/MonsieurV/py-findpeaks

想要自己等同于MatLab findpeaks函数,我发现Marcos Duarte 的detect_peaks函数是一个不错的选择。

相当容易使用:

import numpy as np
from vector import vector, plot_peaks
from libs import detect_peaks
print('Detect peaks with minimum height and distance filters.')
indexes = detect_peaks.detect_peaks(vector, mph=7, mpd=2)
print('Peaks are: %s' % (indexes))

这会给你:

For those not sure about which peak-finding algorithms to use in Python, here a rapid overview of the alternatives: https://github.com/MonsieurV/py-findpeaks

Wanting myself an equivalent to the MatLab findpeaks function, I’ve found that the detect_peaks function from Marcos Duarte is a good catch.

Pretty easy to use:

import numpy as np
from vector import vector, plot_peaks
from libs import detect_peaks
print('Detect peaks with minimum height and distance filters.')
indexes = detect_peaks.detect_peaks(vector, mph=7, mpd=2)
print('Peaks are: %s' % (indexes))

Which will give you:


回答 4

以可靠的方式检测频谱中的峰值已经进行了很多研究,例如80年代对音乐/音频信号的正弦建模的所有工作。在文献中查找“正弦建模”。

如果您的信号像示例一样干净,那么简单的“给我振幅大于N个邻居的东西”应该可以正常工作。如果您有嘈杂的信号,一种简单而有效的方法就是及时查看峰值并进行跟踪:然后检测频谱线而不是频谱峰值。IOW,您可以在信号的滑动窗口上计算FFT,以获得时间上的一组频谱(也称为频谱图)。然后,您可以查看频谱峰值随时间的变化(即在连续的窗口中)。

Detecting peaks in a spectrum in a reliable way has been studied quite a bit, for example all the work on sinusoidal modelling for music/audio signals in the 80ies. Look for “Sinusoidal Modeling” in the literature.

If your signals are as clean as the example, a simple “give me something with an amplitude higher than N neighbours” should work reasonably well. If you have noisy signals, a simple but effective way is to look at your peaks in time, to track them: you then detect spectral lines instead of spectral peaks. IOW, you compute the FFT on a sliding window of your signal, to get a set of spectrum in time (also called spectrogram). You then look at the evolution of the spectral peak in time (i.e. in consecutive windows).


回答 5

我认为您所寻找的不是SciPy提供的。在这种情况下,我将自己编写代码。

scipy.interpolate的样条曲线插值和平滑效果非常好,可能对拟合峰然后找到最大值的位置很有帮助。

I do not think that what you are looking for is provided by SciPy. I would write the code myself, in this situation.

The spline interpolation and smoothing from scipy.interpolate are quite nice and might be quite helpful in fitting peaks and then finding the location of their maximum.


回答 6

有一些标准的统计功能和方法可以找到数据的异常值,这可能是第一种情况。使用导数将解决您的第二个问题。但是,我不确定是否可以解决连续函数和采样数据的方法。

There are standard statistical functions and methods for finding outliers to data, which is probably what you need in the first case. Using derivatives would solve your second. I’m not sure for a method which solves both continuous functions and sampled data, however.


回答 7

首先,如果没有进一步说明,“峰值”的定义是模糊的。例如,对于以下系列,您将5-4-5称为一个峰还是两个峰?

1-2-1-2-1-1-5-4-5-1-1-5-1

在这种情况下,您至少需要两个阈值:1)仅在高阈值之上,极值才能注册为峰值;2)较低的阈值,以使极小值被其以下的小数值分隔开将成为两个峰值。

峰值检测是极值理论文献中一个经过充分研究的主题,也称为“极值的聚类”。它的典型应用包括基于连续读取环境变量来识别危险事件,例如分析风速以检测风暴事件。

First things first, the definition of “peak” is vague if without further specifications. For example, for the following series, would you call 5-4-5 one peak or two?

1-2-1-2-1-1-5-4-5-1-1-5-1

In this case, you’ll need at least two thresholds: 1) a high threshold only above which can an extreme value register as a peak; and 2) a low threshold so that extreme values separated by small values below it will become two peaks.

Peak detection is a well-studied topic in Extreme Value Theory literature, also known as “declustering of extreme values”. Its typical applications include identifying hazard events based on continuous readings of environmental variables e.g. analysing wind speed to detect storm events.


如何将NumPy数组标准化到一定范围内?

问题:如何将NumPy数组标准化到一定范围内?

在对音频或图像阵列进行一些处理之后,需要先在一定范围内对其进行标准化,然后才能将其写回到文件中。可以这样完成:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

有没有那么繁琐,方便的函数方式来做到这一点?matplotlib.colors.Normalize()似乎无关。

After doing some processing on an audio or image array, it needs to be normalized within a range before it can be written back to a file. This can be done like so:

# Normalize audio channels to between -1.0 and +1.0
audio[:,0] = audio[:,0]/abs(audio[:,0]).max()
audio[:,1] = audio[:,1]/abs(audio[:,1]).max()

# Normalize image to between 0 and 255
image = image/(image.max()/255.0)

Is there a less verbose, convenience function way to do this? matplotlib.colors.Normalize() doesn’t seem to be related.


回答 0

audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())

使用/=*=可以消除中间的临时阵列,从而节省了一些内存。乘法比除法便宜,所以

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

比…快一点

image /= image.max()/255.0    # Uses 1+image.size divisions

由于我们在这里使用基本的numpy方法,因此我认为这是尽可能有效的numpy解决方案。


就地操作不会更改容器数组的dtype。由于所需的标准化值是浮点型,因此在执行就地操作之前,audioand image数组需要具有浮点dtype。如果它们还不是浮点dtype,则需要使用进行转换astype。例如,

image = image.astype('float64')
audio /= np.max(np.abs(audio),axis=0)
image *= (255.0/image.max())

Using /= and *= allows you to eliminate an intermediate temporary array, thus saving some memory. Multiplication is less expensive than division, so

image *= 255.0/image.max()    # Uses 1 division and image.size multiplications

is marginally faster than

image /= image.max()/255.0    # Uses 1+image.size divisions

Since we are using basic numpy methods here, I think this is about as efficient a solution in numpy as can be.


In-place operations do not change the dtype of the container array. Since the desired normalized values are floats, the audio and image arrays need to have floating-point point dtype before the in-place operations are performed. If they are not already of floating-point dtype, you’ll need to convert them using astype. For example,

image = image.astype('float64')

回答 1

如果数组同时包含正数和负数,我将使用:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)        

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

如果数组包含nan,则一种解决方案是将其删除为:

def nan_ptp(a):
    return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)

但是,根据上下文,您可能需要nan不同的对待。例如,插值,用例如0代替,或引发错误。

最后,值得一提的是,即使不是OP的问题,也要标准化

e = (a - np.mean(a)) / np.std(a)

If the array contains both positive and negative data, I’d go with:

import numpy as np

a = np.random.rand(3,2)

# Normalised [0,1]
b = (a - np.min(a))/np.ptp(a)

# Normalised [0,255] as integer: don't forget the parenthesis before astype(int)
c = (255*(a - np.min(a))/np.ptp(a)).astype(int)        

# Normalised [-1,1]
d = 2.*(a - np.min(a))/np.ptp(a)-1

If the array contains nan, one solution could be to just remove them as:

def nan_ptp(a):
    return np.ptp(a[np.isfinite(a)])

b = (a - np.nanmin(a))/nan_ptp(a)

However, depending on the context you might want to treat nan differently. E.g. interpolate the value, replacing in with e.g. 0, or raise an error.

Finally, worth mentioning even if it’s not OP’s question, standardization:

e = (a - np.mean(a)) / np.std(a)

回答 2

您也可以使用重新缩放sklearn。优势在于,除了对数据进行均值居中之外,还可以调整标准差的归一化,并且可以在任一轴上,通过要素或按记录进行校准。

from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )

关键词参数axiswith_meanwith_std是自我解释,并且在默认状态显示。如果该参数copy设置为,则执行就地操作False这里的文件

You can also rescale using sklearn. The advantages are that you can adjust normalize the standard deviation, in addition to mean-centering the data, and that you can do this on either axis, by features, or by records.

from sklearn.preprocessing import scale
X = scale( X, axis=0, with_mean=True, with_std=True, copy=True )

The keyword arguments axis, with_mean, with_std are self explanatory, and are shown in their default state. The argument copy performs the operation in-place if it is set to False. Documentation here.


回答 3

您可以使用“ i”版本(如idiv中的imul ..),它看起来还不错:

image /= (image.max()/255.0)

在另一种情况下,您可以编写一个函数来通过colums标准化n维数组:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()

You can use the “i” (as in idiv, imul..) version, and it doesn’t look half bad:

image /= (image.max()/255.0)

For the other case you can write a function to normalize an n-dimensional array by colums:

def normalize_columns(arr):
    rows, cols = arr.shape
    for col in xrange(cols):
        arr[:,col] /= abs(arr[:,col]).max()

回答 4

您正在尝试最小-最大比例缩放audio介于-1和+1 image之间以及0和255之间的值。

使用sklearn.preprocessing.minmax_scale,应该可以轻松解决您的问题。

例如:

audio_scaled = minmax_scale(audio, feature_range=(-1,1))

shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)

注意:不要与将向量的范数(长度)缩放到某个值(通常为1)的操作相混淆,该操作通常也称为归一化。

You are trying to min-max scale the values of audio between -1 and +1 and image between 0 and 255.

Using sklearn.preprocessing.minmax_scale, should easily solve your problem.

e.g.:

audio_scaled = minmax_scale(audio, feature_range=(-1,1))

and

shape = image.shape
image_scaled = minmax_scale(image.ravel(), feature_range=(0,255)).reshape(shape)

note: Not to be confused with the operation that scales the norm (length) of a vector to a certain value (usually 1), which is also commonly referred to as normalization.


回答 5

一个简单的解决方案是使用sklearn.preprocessing库提供的缩放器。

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

错误X_rec-X将为零。您可以根据需要调整feature_range,甚至可以使用标准缩放器sk.StandardScaler()

A simple solution is using the scalers offered by the sklearn.preprocessing library.

scaler = sk.MinMaxScaler(feature_range=(0, 250))
scaler = scaler.fit(X)
X_scaled = scaler.transform(X)
# Checking reconstruction
X_rec = scaler.inverse_transform(X_scaled)

The error X_rec-X will be zero. You can adjust the feature_range for your needs, or even use a standart scaler sk.StandardScaler()


回答 6

我尝试按照此操作,但出现了错误

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

numpy我试图正常化阵列是一个integer数组。似乎他们不赞成在版本>中进行类型转换1.10,而您必须使用它numpy.true_divide()来解决该问题。

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img是一个PIL.Image对象。

I tried following this, and got the error

TypeError: ufunc 'true_divide' output (typecode 'd') could not be coerced to provided output parameter (typecode 'l') according to the casting rule ''same_kind''

The numpy array I was trying to normalize was an integer array. It seems they deprecated type casting in versions > 1.10, and you have to use numpy.true_divide() to resolve that.

arr = np.array(img)
arr = np.true_divide(arr,[255.0],out=None)

img was an PIL.Image object.


numpy,scipy,matplotlib和pylab之间的混淆

问题:numpy,scipy,matplotlib和pylab之间的混淆

Numpy,scipy,matplotlib和pylab是使用python进行科学计算的常用术语。

我只是学习了一些有关pylab的知识,而感到困惑。每当我要导入numpy时,我都可以执行以下操作:

import numpy as np

我只是认为,一旦我这样做

from pylab import *

numpy也将被导入(使用np别名)。所以基本上,第二个相比第一个做更多的事情。

我想问的几件事:

  1. pylab仅仅是numpy,scipy和matplotlib的包装吗?
  2. 由于NP是pylab中的numpy别名,因此pylab中的scipy和matplotlib别名是什么?(据我所知,plt是matplotlib.pyplot的别名,但我不知道matplotlib本身的别名)

Numpy, scipy, matplotlib, and pylab are common terms among they who use python for scientific computation.

I just learn a bit about pylab, and I got confused. Whenever I want to import numpy, I can always do:

import numpy as np

I just consider, that once I do

from pylab import *

the numpy will be imported as well (with np alias). So basically the second one does more things compared to the first one.

There are few things I want to ask:

  1. Is it right that pylab is just a wrapper for numpy, scipy and matplotlib?
  2. As np is the numpy alias in pylab, what is the scipy and matplotlib alias in pylab? (as far as I know, plt is alias of matplotlib.pyplot, but I don’t know the alias for the matplotlib itself)

回答 0

  1. 没有,pylab是的一部分matplotlib(在matplotlib.pylab),并试图给你喜欢的环境Matlab的。matplotlib有许多依赖项,其中有一些依赖项numpy以通用别名导入npscipy不是的依赖项matplotlib

  2. 如果运行ipython --pylab自动导入,则会将所有符号从中matplotlib.pylab放入全局范围。就像您写的一样numpy,在np别名下导入。别名matplotlib下的符号来自mpl

  1. No, pylab is part of matplotlib (in matplotlib.pylab) and tries to give you a MatLab like environment. matplotlib has a number of dependencies, among them numpy which it imports under the common alias np. scipy is not a dependency of matplotlib.

  2. If you run ipython --pylab an automatic import will put all symbols from matplotlib.pylab into global scope. Like you wrote numpy gets imported under the np alias. Symbols from matplotlib are available under the mpl alias.


回答 1

Scipy和numpy是科学项目,旨在为python带来高效,快速的数值计算。

Matplotlib是python绘图库的名称。

Pyplot是matplotlib的交互式api,主要用于jupyter之类的笔记本中。您通常会这样使用它:import matplotlib.pyplot as plt

Pylab与pyplot相同,但是具有额外的功能(目前不鼓励使用)。

  • pylab = pyplot + numpy的

在此处查看更多信息:Matplotlib,Pylab,Pyplot等:这些和何时使用它们有什么区别?

Scipy and numpy are scientific projects whose aim is to bring efficient and fast numeric computing to python.

Matplotlib is the name of the python plotting library.

Pyplot is an interactive api for matplotlib, mostly for use in notebooks like jupyter. You generally use it like this: import matplotlib.pyplot as plt.

Pylab is the same thing as pyplot, but with extra features (its use is currently discouraged).

  • pylab = pyplot + numpy

See more information here: Matplotlib, Pylab, Pyplot, etc: What’s the difference between these and when to use each?


回答 2

由于某些示例(例如我)可能仍然对pylab的使用感到困惑,因为pylab互联网上存在使用示例的示例,因此这里引用了官方matplotlib常见问题解答:

pylab是一个便捷模块,可在单个命名空间中批量导入matplotlib.pyplot(用于绘图)和numpy(用于数学以及使用数组)。尽管许多示例都使用pylab,但不再建议使用。

因此,TL; DR; 是不使用pylab,句点。根据需要分别使用pyplot和导入numpy

这是进一步阅读和其他有用示例的链接

Since some people (like me) may still be confused about usage of pylab since examples using pylab are out there on the internet, here is a quote from the official matplotlib FAQ:

pylab is a convenience module that bulk imports matplotlib.pyplot (for plotting) and numpy (for mathematics and working with arrays) in a single name space. Although many examples use pylab, it is no longer recommended.

So, TL;DR; is do not use pylab, period. Use pyplot and import numpy separately as needed.

Here is the link for further reading and other useful examples.


Python中的多元线性回归

问题:Python中的多元线性回归

我似乎找不到任何进行多元回归的python库。我发现的唯一的东西只是做简单的回归。我需要针对几个自变量(x1,x2,x3等)对我的因变量(y)进行回归。

例如,使用以下数据:

print 'y        x1      x2       x3       x4      x5     x6       x7'
for t in texts:
    print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
   .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)

(以上输出:)

      y        x1       x2       x3        x4     x5     x6       x7
   -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
   -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
  -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
   -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
   -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
   -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
   -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
   -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
   -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55

我将如何在python中进行回归,以获得线性回归公式:

Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + + a7x7 + c

I can’t seem to find any python libraries that do multiple regression. The only things I find only do simple regression. I need to regress my dependent variable (y) against several independent variables (x1, x2, x3, etc.).

For example, with this data:

print 'y        x1      x2       x3       x4      x5     x6       x7'
for t in texts:
    print "{:>7.1f}{:>10.2f}{:>9.2f}{:>9.2f}{:>10.2f}{:>7.2f}{:>7.2f}{:>9.2f}" /
   .format(t.y,t.x1,t.x2,t.x3,t.x4,t.x5,t.x6,t.x7)

(output for above:)

      y        x1       x2       x3        x4     x5     x6       x7
   -6.0     -4.95    -5.87    -0.76     14.73   4.02   0.20     0.45
   -5.0     -4.55    -4.52    -0.71     13.74   4.47   0.16     0.50
  -10.0    -10.96   -11.64    -0.98     15.49   4.18   0.19     0.53
   -5.0     -1.08    -3.36     0.75     24.72   4.96   0.16     0.60
   -8.0     -6.52    -7.45    -0.86     16.59   4.29   0.10     0.48
   -3.0     -0.81    -2.36    -0.50     22.44   4.81   0.15     0.53
   -6.0     -7.01    -7.33    -0.33     13.93   4.32   0.21     0.50
   -8.0     -4.46    -7.65    -0.94     11.40   4.43   0.16     0.49
   -8.0    -11.54   -10.03    -1.03     18.18   4.28   0.21     0.55

How would I regress these in python, to get the linear regression formula:

Y = a1x1 + a2x2 + a3x3 + a4x4 + a5x5 + a6x6 + +a7x7 + c


回答 0

sklearn.linear_model.LinearRegression 会做的:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],
        [t.y for t in texts])

然后clf.coef_将具有回归系数。

sklearn.linear_model 也具有类似的接口,可以对回归进行各种正则化。

sklearn.linear_model.LinearRegression will do it:

from sklearn import linear_model
clf = linear_model.LinearRegression()
clf.fit([[getattr(t, 'x%d' % i) for i in range(1, 8)] for t in texts],
        [t.y for t in texts])

Then clf.coef_ will have the regression coefficients.

sklearn.linear_model also has similar interfaces to do various kinds of regularizations on the regression.


回答 1

这是我创建的一些解决方法。我用R检查了它,它可以正常工作。

import numpy as np
import statsmodels.api as sm

y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]

x = [
     [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
     [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
     [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
     ]

def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

结果:

print reg_m(y, x).summary()

输出:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.535
Model:                            OLS   Adj. R-squared:                  0.461
Method:                 Least Squares   F-statistic:                     7.281
Date:                Tue, 19 Feb 2013   Prob (F-statistic):            0.00191
Time:                        21:51:28   Log-Likelihood:                -26.025
No. Observations:                  23   AIC:                             60.05
Df Residuals:                      19   BIC:                             64.59
Df Model:                           3                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.2424      0.139      1.739      0.098        -0.049     0.534
x2             0.2360      0.149      1.587      0.129        -0.075     0.547
x3            -0.0618      0.145     -0.427      0.674        -0.365     0.241
const          1.5704      0.633      2.481      0.023         0.245     2.895

==============================================================================
Omnibus:                        6.904   Durbin-Watson:                   1.905
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708
Skew:                          -0.849   Prob(JB):                       0.0950
Kurtosis:                       4.426   Cond. No.                         38.6

pandas 提供了运行此答案中给出的OLS的便捷方法:

使用Pandas Data Frame运行OLS回归

Here is a little work around that I created. I checked it with R and it works correct.

import numpy as np
import statsmodels.api as sm

y = [1,2,3,4,3,4,5,4,5,5,4,5,4,5,4,5,6,5,4,5,4,3,4]

x = [
     [4,2,3,4,5,4,5,6,7,4,8,9,8,8,6,6,5,5,5,5,5,5,5],
     [4,1,2,3,4,5,6,7,5,8,7,8,7,8,7,8,7,7,7,7,7,6,5],
     [4,1,2,5,6,7,8,9,7,8,7,8,7,7,7,7,7,7,6,6,4,4,4]
     ]

def reg_m(y, x):
    ones = np.ones(len(x[0]))
    X = sm.add_constant(np.column_stack((x[0], ones)))
    for ele in x[1:]:
        X = sm.add_constant(np.column_stack((ele, X)))
    results = sm.OLS(y, X).fit()
    return results

Result:

print reg_m(y, x).summary()

Output:

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       0.535
Model:                            OLS   Adj. R-squared:                  0.461
Method:                 Least Squares   F-statistic:                     7.281
Date:                Tue, 19 Feb 2013   Prob (F-statistic):            0.00191
Time:                        21:51:28   Log-Likelihood:                -26.025
No. Observations:                  23   AIC:                             60.05
Df Residuals:                      19   BIC:                             64.59
Df Model:                           3                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             0.2424      0.139      1.739      0.098        -0.049     0.534
x2             0.2360      0.149      1.587      0.129        -0.075     0.547
x3            -0.0618      0.145     -0.427      0.674        -0.365     0.241
const          1.5704      0.633      2.481      0.023         0.245     2.895

==============================================================================
Omnibus:                        6.904   Durbin-Watson:                   1.905
Prob(Omnibus):                  0.032   Jarque-Bera (JB):                4.708
Skew:                          -0.849   Prob(JB):                       0.0950
Kurtosis:                       4.426   Cond. No.                         38.6

pandas provides a convenient way to run OLS as given in this answer:

Run an OLS regression with Pandas Data Frame


回答 2

为了澄清起见,您给出的示例是多元线性回归,而不是多元线性回归。区别

单个标量预测变量x和单个标量响应变量y的最简单情况就是简单线性回归。对多个和/或向量值的预测变量(用大写的X表示)的扩展被称为多元线性回归,也称为多元线性回归。几乎所有现实世界中的回归模型都涉及多个预测变量,而线性回归的基本描述通常用多元回归模型来表述。但是请注意,在这些情况下,响应变量y仍然是标量。另一个变量多元线性回归是指y是向量的情况,即与一般线性回归相同。

简而言之:

  • 多元线性回归:响应y是一个标量。
  • 多元线性回归:响应y是向量。

(另一个来源。)

Just to clarify, the example you gave is multiple linear regression, not multivariate linear regression refer. Difference:

The very simplest case of a single scalar predictor variable x and a single scalar response variable y is known as simple linear regression. The extension to multiple and/or vector-valued predictor variables (denoted with a capital X) is known as multiple linear regression, also known as multivariable linear regression. Nearly all real-world regression models involve multiple predictors, and basic descriptions of linear regression are often phrased in terms of the multiple regression model. Note, however, that in these cases the response variable y is still a scalar. Another term multivariate linear regression refers to cases where y is a vector, i.e., the same as general linear regression. The difference between multivariate linear regression and multivariable linear regression should be emphasized as it causes much confusion and misunderstanding in the literature.

In short:

  • multiple linear regression: the response y is a scalar.
  • multivariate linear regression: the response y is a vector.

(Another source.)


回答 3

您可以使用numpy.linalg.lstsq

import numpy as np
y = np.array([-6,-5,-10,-5,-8,-3,-6,-8,-8])
X = np.array([[-4.95,-4.55,-10.96,-1.08,-6.52,-0.81,-7.01,-4.46,-11.54],[-5.87,-4.52,-11.64,-3.36,-7.45,-2.36,-7.33,-7.65,-10.03],[-0.76,-0.71,-0.98,0.75,-0.86,-0.50,-0.33,-0.94,-1.03],[14.73,13.74,15.49,24.72,16.59,22.44,13.93,11.40,18.18],[4.02,4.47,4.18,4.96,4.29,4.81,4.32,4.43,4.28],[0.20,0.16,0.19,0.16,0.10,0.15,0.21,0.16,0.21],[0.45,0.50,0.53,0.60,0.48,0.53,0.50,0.49,0.55]])
X = X.T # transpose so input vectors are along the rows
X = np.c_[X, np.ones(X.shape[0])] # add bias term
beta_hat = np.linalg.lstsq(X,y)[0]
print beta_hat

结果:

[ -0.49104607   0.83271938   0.0860167    0.1326091    6.85681762  22.98163883 -41.08437805 -19.08085066]

您可以通过以下方式查看估计的输出:

print np.dot(X,beta_hat)

结果:

[ -5.97751163,  -5.06465759, -10.16873217,  -4.96959788,  -7.96356915,  -3.06176313,  -6.01818435,  -7.90878145,  -7.86720264]

You can use numpy.linalg.lstsq:

import numpy as np

y = np.array([-6, -5, -10, -5, -8, -3, -6, -8, -8])
X = np.array(
    [
        [-4.95, -4.55, -10.96, -1.08, -6.52, -0.81, -7.01, -4.46, -11.54],
        [-5.87, -4.52, -11.64, -3.36, -7.45, -2.36, -7.33, -7.65, -10.03],
        [-0.76, -0.71, -0.98, 0.75, -0.86, -0.50, -0.33, -0.94, -1.03],
        [14.73, 13.74, 15.49, 24.72, 16.59, 22.44, 13.93, 11.40, 18.18],
        [4.02, 4.47, 4.18, 4.96, 4.29, 4.81, 4.32, 4.43, 4.28],
        [0.20, 0.16, 0.19, 0.16, 0.10, 0.15, 0.21, 0.16, 0.21],
        [0.45, 0.50, 0.53, 0.60, 0.48, 0.53, 0.50, 0.49, 0.55],
    ]
)
X = X.T  # transpose so input vectors are along the rows
X = np.c_[X, np.ones(X.shape[0])]  # add bias term
beta_hat = np.linalg.lstsq(X, y, rcond=None)[0]
print(beta_hat)

Result:

[ -0.49104607   0.83271938   0.0860167    0.1326091    6.85681762  22.98163883 -41.08437805 -19.08085066]

You can see the estimated output with:

print(np.dot(X,beta_hat))

Result:

[ -5.97751163,  -5.06465759, -10.16873217,  -4.96959788,  -7.96356915,  -3.06176313,  -6.01818435,  -7.90878145,  -7.86720264]

回答 4

使用scipy.optimize.curve_fit。而且不仅适用于线性拟合。

from scipy.optimize import curve_fit
import scipy

def fn(x, a, b, c):
    return a + b*x[0] + c*x[1]

# y(x0,x1) data:
#    x0=0 1 2
# ___________
# x1=0 |0 1 2
# x1=1 |1 2 3
# x1=2 |2 3 4

x = scipy.array([[0,1,2,0,1,2,0,1,2,],[0,0,0,1,1,1,2,2,2]])
y = scipy.array([0,1,2,1,2,3,2,3,4])
popt, pcov = curve_fit(fn, x, y)
print popt

Use scipy.optimize.curve_fit. And not only for linear fit.

from scipy.optimize import curve_fit
import scipy

def fn(x, a, b, c):
    return a + b*x[0] + c*x[1]

# y(x0,x1) data:
#    x0=0 1 2
# ___________
# x1=0 |0 1 2
# x1=1 |1 2 3
# x1=2 |2 3 4

x = scipy.array([[0,1,2,0,1,2,0,1,2,],[0,0,0,1,1,1,2,2,2]])
y = scipy.array([0,1,2,1,2,3,2,3,4])
popt, pcov = curve_fit(fn, x, y)
print popt

回答 5

将数据转换为熊猫数据框(df)后,

import statsmodels.formula.api as smf
lm = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7', data=df).fit()
print(lm.params)

默认情况下包括拦截项。

有关更多示例,请参见此笔记本

Once you convert your data to a pandas dataframe (df),

import statsmodels.formula.api as smf
lm = smf.ols(formula='y ~ x1 + x2 + x3 + x4 + x5 + x6 + x7', data=df).fit()
print(lm.params)

The intercept term is included by default.

See this notebook for more examples.


回答 6

我认为这可能是完成这项工作的最简单方法:

from random import random
from pandas import DataFrame
from statsmodels.api import OLS
lr = lambda : [random() for i in range(100)]
x = DataFrame({'x1': lr(), 'x2':lr(), 'x3':lr()})
x['b'] = 1
y = x.x1 + x.x2 * 2 + x.x3 * 3 + 4

print x.head()

         x1        x2        x3  b
0  0.433681  0.946723  0.103422  1
1  0.400423  0.527179  0.131674  1
2  0.992441  0.900678  0.360140  1
3  0.413757  0.099319  0.825181  1
4  0.796491  0.862593  0.193554  1

print y.head()

0    6.637392
1    5.849802
2    7.874218
3    7.087938
4    7.102337
dtype: float64

model = OLS(y, x)
result = model.fit()
print result.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 5.859e+30
Date:                Wed, 09 Dec 2015   Prob (F-statistic):               0.00
Time:                        15:17:32   Log-Likelihood:                 3224.9
No. Observations:                 100   AIC:                            -6442.
Df Residuals:                      96   BIC:                            -6431.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             1.0000   8.98e-16   1.11e+15      0.000         1.000     1.000
x2             2.0000   8.28e-16   2.41e+15      0.000         2.000     2.000
x3             3.0000   8.34e-16    3.6e+15      0.000         3.000     3.000
b              4.0000   8.51e-16    4.7e+15      0.000         4.000     4.000
==============================================================================
Omnibus:                        7.675   Durbin-Watson:                   1.614
Prob(Omnibus):                  0.022   Jarque-Bera (JB):                3.118
Skew:                           0.045   Prob(JB):                        0.210
Kurtosis:                       2.140   Cond. No.                         6.89
==============================================================================

I think this may the most easy way to finish this work:

from random import random
from pandas import DataFrame
from statsmodels.api import OLS
lr = lambda : [random() for i in range(100)]
x = DataFrame({'x1': lr(), 'x2':lr(), 'x3':lr()})
x['b'] = 1
y = x.x1 + x.x2 * 2 + x.x3 * 3 + 4

print x.head()

         x1        x2        x3  b
0  0.433681  0.946723  0.103422  1
1  0.400423  0.527179  0.131674  1
2  0.992441  0.900678  0.360140  1
3  0.413757  0.099319  0.825181  1
4  0.796491  0.862593  0.193554  1

print y.head()

0    6.637392
1    5.849802
2    7.874218
3    7.087938
4    7.102337
dtype: float64

model = OLS(y, x)
result = model.fit()
print result.summary()

                            OLS Regression Results                            
==============================================================================
Dep. Variable:                      y   R-squared:                       1.000
Model:                            OLS   Adj. R-squared:                  1.000
Method:                 Least Squares   F-statistic:                 5.859e+30
Date:                Wed, 09 Dec 2015   Prob (F-statistic):               0.00
Time:                        15:17:32   Log-Likelihood:                 3224.9
No. Observations:                 100   AIC:                            -6442.
Df Residuals:                      96   BIC:                            -6431.
Df Model:                           3                                         
Covariance Type:            nonrobust                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
x1             1.0000   8.98e-16   1.11e+15      0.000         1.000     1.000
x2             2.0000   8.28e-16   2.41e+15      0.000         2.000     2.000
x3             3.0000   8.34e-16    3.6e+15      0.000         3.000     3.000
b              4.0000   8.51e-16    4.7e+15      0.000         4.000     4.000
==============================================================================
Omnibus:                        7.675   Durbin-Watson:                   1.614
Prob(Omnibus):                  0.022   Jarque-Bera (JB):                3.118
Skew:                           0.045   Prob(JB):                        0.210
Kurtosis:                       2.140   Cond. No.                         6.89
==============================================================================

回答 7

可以使用上面提到的sklearn库处理多个线性回归。我正在使用Python 3.6的Anaconda安装。

如下创建模型:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)

# display coefficients
print(regressor.coef_)

Multiple Linear Regression can be handled using the sklearn library as referenced above. I’m using the Anaconda install of Python 3.6.

Create your model as follows:

from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X, y)

# display coefficients
print(regressor.coef_)

回答 8

您可以使用numpy.linalg.lstsq


回答 9

您可以使用下面的函数并将其传递给DataFrame:

def linear(x, y=None, show=True):
    """
    @param x: pd.DataFrame
    @param y: pd.DataFrame or pd.Series or None
              if None, then use last column of x as y
    @param show: if show regression summary
    """
    import statsmodels.api as sm

    xy = sm.add_constant(x if y is None else pd.concat([x, y], axis=1))
    res = sm.OLS(xy.ix[:, -1], xy.ix[:, :-1], missing='drop').fit()

    if show: print res.summary()
    return res

You can use the function below and pass it a DataFrame:

def linear(x, y=None, show=True):
    """
    @param x: pd.DataFrame
    @param y: pd.DataFrame or pd.Series or None
              if None, then use last column of x as y
    @param show: if show regression summary
    """
    import statsmodels.api as sm

    xy = sm.add_constant(x if y is None else pd.concat([x, y], axis=1))
    res = sm.OLS(xy.ix[:, -1], xy.ix[:, :-1], missing='drop').fit()

    if show: print res.summary()
    return res

回答 10

Scikit-learn是一个适用于Python的机器学习库,可以为您完成这项工作。只需将sklearn.linear_model模块导入脚本即可。

在python中使用sklearn查找多重线性回归的代码模板:

import numpy as np
import matplotlib.pyplot as plt #to plot visualizations
import pandas as pd

# Importing the dataset
df = pd.read_csv(<Your-dataset-path>)
# Assigning feature and target variables
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

# Use label encoders, if you have any categorical variable
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X['<column-name>'] = labelencoder.fit_transform(X['<column-name>'])

from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features = ['<index-value>'])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the dummy variable trap
X = X[:,1:] # Usually done by the algorithm itself

#Spliting the data into test and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 0, test_size = 0.2)

# Fitting the model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the test set results
y_pred = regressor.predict(X_test)

而已。您可以将此代码用作在任何数据集中实现多元线性回归的模板。为了更好地理解示例,请访问:带有示例的线性回归

Scikit-learn is a machine learning library for Python which can do this job for you. Just import sklearn.linear_model module into your script.

Find the code template for Multiple Linear Regression using sklearn in Python:

import numpy as np
import matplotlib.pyplot as plt #to plot visualizations
import pandas as pd

# Importing the dataset
df = pd.read_csv(<Your-dataset-path>)
# Assigning feature and target variables
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

# Use label encoders, if you have any categorical variable
from sklearn.preprocessing import LabelEncoder
labelencoder = LabelEncoder()
X['<column-name>'] = labelencoder.fit_transform(X['<column-name>'])

from sklearn.preprocessing import OneHotEncoder
onehotencoder = OneHotEncoder(categorical_features = ['<index-value>'])
X = onehotencoder.fit_transform(X).toarray()

# Avoiding the dummy variable trap
X = X[:,1:] # Usually done by the algorithm itself

#Spliting the data into test and train set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,y, random_state = 0, test_size = 0.2)

# Fitting the model
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)

# Predicting the test set results
y_pred = regressor.predict(X_test)

That’s it. You can use this code as a template for implementing Multiple Linear Regression in any dataset. For a better understanding with an example, Visit: Linear Regression with an example


回答 11

这是另一种基本方法:

from patsy import dmatrices
import statsmodels.api as sm

y,x = dmatrices("y_data ~ x_1 + x_2 ", data = my_data)
### y_data is the name of the dependent variable in your data ### 
model_fit = sm.OLS(y,x)
results = model_fit.fit()
print(results.summary())

代替sm.OLS您也可以使用sm.Logitor sm.Probit和等。

Here is an alternative and basic method:

from patsy import dmatrices
import statsmodels.api as sm

y,x = dmatrices("y_data ~ x_1 + x_2 ", data = my_data)
### y_data is the name of the dependent variable in your data ### 
model_fit = sm.OLS(y,x)
results = model_fit.fit()
print(results.summary())

Instead of sm.OLS you can also use sm.Logit or sm.Probit and etc.


scipy.misc模块没有属性读取?

问题:scipy.misc模块没有属性读取?

我正在尝试读取图像。但是,它不接受该scipy.misc.imread零件。这可能是什么原因?

>>> import scipy
>>> scipy.misc
<module 'scipy.misc' from 'C:\Python27\lib\site-packages\scipy\misc\__init__.pyc'>
>>> scipy.misc.imread('test.tif')
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    scipy.misc.imread('test.tif')
AttributeError: 'module' object has no attribute 'imread'

I am trying to read an image with scipy. However it does not accept the scipy.misc.imread part. What could be the cause of this?

>>> import scipy
>>> scipy.misc
<module 'scipy.misc' from 'C:\Python27\lib\site-packages\scipy\misc\__init__.pyc'>
>>> scipy.misc.imread('test.tif')
Traceback (most recent call last):
  File "<pyshell#11>", line 1, in <module>
    scipy.misc.imread('test.tif')
AttributeError: 'module' object has no attribute 'imread'

回答 0

您需要安装Pillow(以前称为PIL)。从在文档scipy.misc

请注意,Pillow不是SciPy的依赖项,但是如果没有它,下面列表中指示的图像处理功能将不可用:

imread

安装Pillow后,我可以imread如下访问:

In [1]: import scipy.misc

In [2]: scipy.misc.imread
Out[2]: <function scipy.misc.pilutil.imread>

You need to install Pillow (formerly PIL). From the docs on scipy.misc:

Note that Pillow is not a dependency of SciPy but the image manipulation functions indicated in the list below are not available without it:

imread

After installing Pillow, I was able to access imread as follows:

In [1]: import scipy.misc

In [2]: scipy.misc.imread
Out[2]: <function scipy.misc.pilutil.imread>

回答 1

imread在SciPy 1.0.0中已弃用,在1.2.0中将被删除。使用imageio.imread代替。

import imageio
im = imageio.imread('astronaut.png')
im.shape  # im is a numpy array
(512, 512, 3)
imageio.imwrite('imageio:astronaut-gray.jpg', im[:, :, 0])

imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use imageio.imread instead.

import imageio
im = imageio.imread('astronaut.png')
im.shape  # im is a numpy array
(512, 512, 3)
imageio.imwrite('imageio:astronaut-gray.jpg', im[:, :, 0])

回答 2

版本1.2.0之后,imread贬值!因此,要解决此问题,我必须安装版本1.1.0。

pip install scipy==1.1.0

imread is depreciated after version 1.2.0! So to solve this issue I had to install version 1.1.0.

pip install scipy==1.1.0

回答 3

对于Python 3,最好是使用imreadmatplotlib.pyplot

from matplotlib.pyplot import imread

For Python 3, it is best to use imread in matplotlib.pyplot:

from matplotlib.pyplot import imread

回答 4

如果有人遇到相同的问题,请卸载scipy并安装scipy == 1.1.0

$ pip uninstall scipy

$ pip install scipy==1.1.0

In case anyone encountering the same issue, please uninstall scipy and install scipy==1.1.0

$ pip uninstall scipy

$ pip install scipy==1.1.0

回答 5

您需要Python Imaging Library(PIL),但是but!PIL项目似乎已被放弃。特别是,它尚未移植到Python3。因此,如果要在Python 3中使用PIL功能,则最好使用Pillow,它是PIL的半官方分支,并且正在积极开发中。实际上,如果您完全需要现代的PIL实现,我建议您选择Pillow。就像一样简单pip install pillow。由于它使用与PIL相同的命名空间,因此实质上是直接替代。

这个叉子有多“半官方”?你可能会问。“ 枕头”文档的“ 关于”页面说:

自上次发布PIL之后,随着时间的流逝,新发布PIL的可能性降低。但是,我们尚未听到官方的“ PIL已死”声明。因此,如果您仍然希望支持PIL,请先在此处报告问题,然后在此处打开相应的枕头票。

请提供第一张票证的链接,以便我们可以跟踪上游问题。

但是,PIL 官方站点上的最新PIL版本发布于2009年11月15日。我认为,在将近八年没有新版本发布之时,我们可以肯定地说Pillow是PIL的继承者。因此,即使您不需要Python 3支持,我也建议您避免使用PyPI中可用的古老PIL 1.1.6发行版,而只需安装新的,最新的,兼容的Pillow。

You need the Python Imaging Library (PIL) but alas! the PIL project seems to have been abandoned. In particular, it hasn’t been ported to Python 3. So if you want PIL functionality in Python 3, you’ll do well do use Pillow, which is the semi-official fork of PIL and appears to be actively developed. Actually, if you need a modern PIL implementation at all I’d recommend Pillow. It’s as simple as pip install pillow. As it uses the same namespace as PIL it’s essentially a drop-in replacement.

How “semi-official” is this fork? you may ask. The About page of the Pillow docs say this:

As more time passes since the last PIL release, the likelihood of a new PIL release decreases. However, we’ve yet to hear an official “PIL is dead” announcement. So if you still want to support PIL, please report issues here first, then open corresponding Pillow tickets here.

Please provide a link to the first ticket so we can track the issue(s) upstream.

However, the most recent PIL release on the official PIL site is dated November 15, 2009. I think we can safely proclaim Pillow as the successor of PIL after (as of this writing) nearly eight years of no new releases. So even if you don’t need Python 3 support, I suggest you eschew the ancient PIL 1.1.6 distribution available in PyPI and just install fresh, up-to-date, compatible Pillow.


回答 6

通过以下命令安装枕头库:

pip install pillow

请注意,所选答案已过时。查看SciPy的文档

请注意,Pillow(https://python-pillow.org/)不是SciPy的依赖项,但如果没有它,则下面列表中指示的图像处理功能将不可用。

Install the Pillow library by following commands:

pip install pillow

Note, the selected answer has been outdated. See the docs of SciPy

Note that Pillow (https://python-pillow.org/) is not a dependency of SciPy, but the image manipulation functions indicated in the list below are not available without it.


回答 7

答案是:misc.imread在SciPy 1.0.0中已弃用,在1.2.0中将被删除。imageio是一个选项,它将返回类型为object的对象:

<class 'imageio.core.util.Image'>

但要使用image2,而不要使用imageio

import cv2
im = cv2.imread('astronaut.png')

我将是类型: <class 'numpy.ndarray'>

由于numpy数组的计算速度更快。

As answered misc.imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. imageio is one option,it will return object of type :

<class 'imageio.core.util.Image'>

but instead of imageio, use cv2

import cv2
im = cv2.imread('astronaut.png')

im will be of type : <class 'numpy.ndarray'>

As numpy arrays are faster to compute.


回答 8

Imread使用PIL库,如果已安装该库,则使用:“ from scipy.ndimage import imread”

资料来源:http : //docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.ndimage.imread.html

Imread uses PIL library, if the library is installed use : “from scipy.ndimage import imread”

Source: http://docs.scipy.org/doc/scipy-0.17.0/reference/generated/scipy.ndimage.imread.html


回答 9

python -m pip install pillow

这对我有用。

python -m pip install pillow

This worked for me.


回答 10

您需要一个python图像库(PIL),但是现在仅PIL还不够,您最好安装Pillow。这很好。

You need a python image library (PIL), but now PIL only is not enough, you’d better install Pillow. This works well.


回答 11

在Jupyter Notebook中运行以下命令,我收到了类似的错误消息:

from skimage import data
photo_data = misc.imread('C:/Users/ers.jpg')
type(photo_data)

“错误”消息:

D:\ Program Files(x86)\ Microsoft Visual Studio \ Shared \ Anaconda3_64 \ lib \ site-packages \ ipykernel_launcher.py:3:DeprecationWarning:已imread弃用!imread在SciPy 1.0.0中已弃用,在1.2.0中将被删除。使用imageio.imread 代替。这与ipykernel软件包分开,因此我们可以避免导入,直到

并使用以下我解决了:

import matplotlib.pyplot
photo_data = matplotlib.pyplot.imread('C:/Users/ers.jpg')
type(photo_data)

Running the following in a Jupyter Notebook, I had a similar error message:

from skimage import data
photo_data = misc.imread('C:/Users/ers.jpg')
type(photo_data)

‘error’ msg:

D:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\ipykernel_launcher.py:3: DeprecationWarning: imread is deprecated! imread is deprecated in SciPy 1.0.0, and will be removed in 1.2.0. Use imageio.imread instead. This is separate from the ipykernel package so we can avoid doing imports until

And using the following I got it solved:

import matplotlib.pyplot
photo_data = matplotlib.pyplot.imread('C:/Users/ers.jpg')
type(photo_data)

回答 12

我在jupyter笔记本上具有图像提取所需的所有软件包,但即使如此,它仍然显示相同的错误。

Jupyter Notebook上的错误

阅读以上评论,我已经安装了必需的软件包。请告诉我是否错过了一些包裹。

pip3 freeze | grep -i -E "pillow|scipy|scikit-image"
Pillow==5.4.1
scikit-image==0.14.2

scipy==1.2.1

I have all the packages required for the image extraction on jupyter notebook, but even then it shows me the same error.

Error on Jupyter Notebook

Reading the above comments, I have installed the required packages. Please do tell if I have missed some packages.

pip3 freeze | grep -i -E "pillow|scipy|scikit-image"
Pillow==5.4.1
scikit-image==0.14.2

scipy==1.2.1

回答 13

在python 3.6中为我工作的解决方案如下

py -m pip安装枕头

The solution that work for me in python 3.6 is the following

py -m pip install Pillow


如何在matplotlib中创建密度图?

问题:如何在matplotlib中创建密度图?

在RI中,可以通过执行以下操作来创建所需的输出:

data = c(rep(1.5, 7), rep(2.5, 2), rep(3.5, 8),
         rep(4.5, 3), rep(5.5, 1), rep(6.5, 8))
plot(density(data, bw=0.5))

在python(带有matplotlib)中,我得到的最接近的是一个简单的直方图:

import matplotlib.pyplot as plt
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
plt.hist(data, bins=6)
plt.show()

我还尝试了normed = True参数,但除了尝试使高斯拟合直方图外什么也没有。

我的最新尝试是围绕scipy.statsgaussian_kde,以下是网上的示例,但到目前为止我一直没有成功。

In R I can create the desired output by doing:

data = c(rep(1.5, 7), rep(2.5, 2), rep(3.5, 8),
         rep(4.5, 3), rep(5.5, 1), rep(6.5, 8))
plot(density(data, bw=0.5))

In python (with matplotlib) the closest I got was with a simple histogram:

import matplotlib.pyplot as plt
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
plt.hist(data, bins=6)
plt.show()

I also tried the normed=True parameter but couldn’t get anything other than trying to fit a gaussian to the histogram.

My latest attempts were around scipy.stats and gaussian_kde, following examples on the web, but I’ve been unsuccessful so far.


回答 0

Sven展示了如何使用gaussian_kdeScipy中的类,但是您会注意到它与您使用R生成的类看起来不太一样。这是因为gaussian_kde尝试自动推断带宽。您可以使用带宽的方式改变功能发挥covariance_factor的的gaussian_kde类。首先,这是您无需更改该功能即可得到的结果:

但是,如果我使用以下代码:

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = gaussian_kde(data)
xs = np.linspace(0,8,200)
density.covariance_factor = lambda : .25
density._compute_covariance()
plt.plot(xs,density(xs))
plt.show()

我懂了

这与您从R获得的收益非常接近。我做了什么?gaussian_kde使用可变函数covariance_factor来计算其带宽。在更改函数之前,covariance_factor针对此数据返回的值约为0.5。降低它会降低带宽。我必须_compute_covariance在更改该函数后调用,以便可以正确计算所有因素。它与R中的bw参数并不完全对应,但是希望它可以帮助您朝正确的方向前进。

Sven has shown how to use the class gaussian_kde from Scipy, but you will notice that it doesn’t look quite like what you generated with R. This is because gaussian_kde tries to infer the bandwidth automatically. You can play with the bandwidth in a way by changing the function covariance_factor of the gaussian_kde class. First, here is what you get without changing that function:

However, if I use the following code:

import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = gaussian_kde(data)
xs = np.linspace(0,8,200)
density.covariance_factor = lambda : .25
density._compute_covariance()
plt.plot(xs,density(xs))
plt.show()

I get

which is pretty close to what you are getting from R. What have I done? gaussian_kde uses a changable function, covariance_factor to calculate its bandwidth. Before changing the function, the value returned by covariance_factor for this data was about .5. Lowering this lowered the bandwidth. I had to call _compute_covariance after changing that function so that all of the factors would be calculated correctly. It isn’t an exact correspondence with the bw parameter from R, but hopefully it helps you get in the right direction.


回答 1

五年后,当我用Google搜索“如何使用python创建内核密度图”时,该线程仍显示在顶部!

如今,更简单的方法是使用seaborn,这是一个提供许多便捷的绘图功能和良好的样式管理的软件包。

import numpy as np
import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.set_style('whitegrid')
sns.kdeplot(np.array(data), bw=0.5)

Five years later, when I Google “how to create a kernel density plot using python”, this thread still shows up at the top!

Today, a much easier way to do this is to use seaborn, a package that provides many convenient plotting functions and good style management.

import numpy as np
import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.set_style('whitegrid')
sns.kdeplot(np.array(data), bw=0.5)


回答 2

选项1:

使用pandas数据框图(建立在之上matplotlib):

import pandas as pd
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
pd.DataFrame(data).plot(kind='density') # or pd.Series()

选项2:

使用distplotseaborn

import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.distplot(data, hist=False)

Option 1:

Use pandas dataframe plot (built on top of matplotlib):

import pandas as pd
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
pd.DataFrame(data).plot(kind='density') # or pd.Series()

Option 2:

Use distplot of seaborn:

import seaborn as sns
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
sns.distplot(data, hist=False)


回答 3

也许尝试类似:

import matplotlib.pyplot as plt
import numpy
from scipy import stats
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = stats.kde.gaussian_kde(data)
x = numpy.arange(0., 8, .1)
plt.plot(x, density(x))
plt.show()

您可以轻松地用gaussian_kde()其他内核密度估计值代替。

Maybe try something like:

import matplotlib.pyplot as plt
import numpy
from scipy import stats
data = [1.5]*7 + [2.5]*2 + [3.5]*8 + [4.5]*3 + [5.5]*1 + [6.5]*8
density = stats.kde.gaussian_kde(data)
x = numpy.arange(0., 8, .1)
plt.plot(x, density(x))
plt.show()

You can easily replace gaussian_kde() by a different kernel density estimate.


回答 4

也可以使用matplotlib创建密度图:函数plt.hist(data)返回密度图所需的y和x值(请参阅文档https://matplotlib.org/3.1.1/api/_as_gen/ matplotlib.pyplot.hist.html)。结果,以下代码通过使用matplotlib库创建了密度图:

import matplotlib.pyplot as plt
dat=[-1,2,1,4,-5,3,6,1,2,1,2,5,6,5,6,2,2,2]
a=plt.hist(dat,density=True)
plt.close()
plt.figure()
plt.plot(a[1][1:],a[0])      

该代码返回以下密度图

The density plot can also be created by using matplotlib: The function plt.hist(data) returns the y and x values necessary for the density plot (see the documentation https://matplotlib.org/3.1.1/api/_as_gen/matplotlib.pyplot.hist.html). Resultingly, the following code creates a density plot by using the matplotlib library:

import matplotlib.pyplot as plt
dat=[-1,2,1,4,-5,3,6,1,2,1,2,5,6,5,6,2,2,2]
a=plt.hist(dat,density=True)
plt.close()
plt.figure()
plt.plot(a[1][1:],a[0])      

This code returns the following density plot


无法通过pip安装Scipy

问题:无法通过pip安装Scipy

当通过pip安装scipy时:

pip install scipy

Pip无法构建scipy,并引发以下错误:

Cleaning up...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Storing debug log for failure in /Users/administrator/.pip/pip.log

如何获得成功构建的秘诀?这可能是OSX Yosemite的一个新问题,因为我刚刚升级并且之前没有安装scipy的问题。


调试日志:

Cleaning up...
  Removing temporary dir /Users/administrator/dev/KaggleAux/env/build...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Exception information:
Traceback (most recent call last):
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/commands/install.py", line 283, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 1435, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 706, in install
    cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/util.py", line 697, in call_subprocess
    % (command_desc, proc.returncode, cwd))
InstallationError: Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy

When installing scipy through pip with :

pip install scipy

Pip fails to build scipy and throws the following error:

Cleaning up...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Storing debug log for failure in /Users/administrator/.pip/pip.log

How can I get scipy to build successfully? This may be a new issue with OSX Yosemite since I just upgraded and haven’t had issues installing scipy before.


Debug log:

Cleaning up...
  Removing temporary dir /Users/administrator/dev/KaggleAux/env/build...
Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy
Exception information:
Traceback (most recent call last):
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/basecommand.py", line 122, in main
    status = self.run(options, args)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/commands/install.py", line 283, in run
    requirement_set.install(install_options, global_options, root=options.root_path)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 1435, in install
    requirement.install(install_options, global_options, *args, **kwargs)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/req.py", line 706, in install
    cwd=self.source_dir, filter_stdout=self._filter_install, show_stdout=False)
  File "/Users/administrator/dev/KaggleAux/env/lib/python2.7/site-packages/pip/util.py", line 697, in call_subprocess
    % (command_desc, proc.returncode, cwd))
InstallationError: Command /Users/administrator/dev/KaggleAux/env/bin/python2.7 -c "import setuptools, tokenize;__file__='/Users/administrator/dev/KaggleAux/env/build/scipy/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /var/folders/zl/7698ng4d4nxd49q1845jd9340000gn/T/pip-eO8gua-record/install-record.txt --single-version-externally-managed --compile --install-headers /Users/administrator/dev/KaggleAux/env/bin/../include/site/python2.7 failed with error code 1 in /Users/administrator/dev/KaggleAux/env/build/scipy

回答 0

向SciPy团队提出问题后,我们发现您需要使用以下方法升级点子:

pip install --upgrade pip

Python 3这项工作中:

python3 -m pip install --upgrade pip

为SciPy正确安装。为什么?因为:

必须告知较旧版本的pip使用IIRC车轮–use-wheel。或者,您可以升级点子本身,然后应该拿起轮子。

升级pip可以解决此问题,但是您也可以只使用该--use-wheel标志。

After opening up an issue with the SciPy team, we found that you need to upgrade pip with:

pip install --upgrade pip

And in Python 3 this works:

python3 -m pip install --upgrade pip

for SciPy to install properly. Why? Because:

Older versions of pip have to be told to use wheels, IIRC with –use-wheel. Or you can upgrade pip itself, then it should pick up the wheels.

Upgrading pip solves the issue, but you might be able to just use the --use-wheel flag as well.


回答 1

安装了64位Python的Microsoft Windows用户需要.whl此处下载64位的Scipy ,然后将其简单地cd下载到您已下载.whl文件并运行的文件夹中:

pip install scipy-0.16.1-cp27-none-win_amd64.whl

Microsoft Windows users of 64 bit Python installations will need to download the 64 bit .whl of Scipy from here, then simply cd into the folder you’ve downloaded the .whl file and run:

pip install scipy-0.16.1-cp27-none-win_amd64.whl

回答 2

在ubuntu下安装Scipy时遇到相同的问题。
我不得不使用命令:

$ sudo apt-get install libatlas-base-dev gfortran
$ sudo pip3 install scipy

您可以在此处获得更多详细信息使用pip安装SciPy
对不起,在OS X Yosemite下不知道如何进行操作。

I face same problem when install Scipy under ubuntu.
I had to use command:

$ sudo apt-get install libatlas-base-dev gfortran
$ sudo pip3 install scipy

You can get more details here Installing SciPy with pip
Sorry don’t know how to do it under OS X Yosemite.


回答 3

在Windows 10中,大多数选项将不起作用。跟着这些步骤:

在Windows 10与CMD,你不能下载scipy直接使用最知名的命令喜欢的wgetcloning scipy githubpip install scipy,等

要安装,请转至pythonlibs .whl文件,如果正在使用,请python 2.7 32 bit下载,numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl and scipy-0.18.1-cp27-cp27m-win32.whl或者python 2.7 62 bit先下载numpy-1.11.2rc1+mkl-cp27-cp27m-win_amd64.whl and scipy-0.18.1-cp27-cp27m-win_amd64.whl

下载后,将文件保存在您的下python directory,在我的情况下是c:\>python27

然后运行:

pip install C:\Python27\numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl 
pip install C:\Python27\scipy-0.18.1-cp27-cp27m-win32.whl

注意:

  1. scipy需要numpy作为依赖项,所以这就是我们在下载numpy之前scipy
  2. cp27.whl文件中的文件意味着这些文件专门用于python 2.7cp33表示python 3.x> = 3.3

In windows 10, most options will not work. Follow these steps:

In Windows 10 with CMD, you cannot download scipy directly using most of the well known commands like wget, cloning scipy github, pip install scipy, etc

To install, go to pythonlibs .whl files , and if you are using python 2.7 32 bit then download numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl and scipy-0.18.1-cp27-cp27m-win32.whl or if python 2.7 62 bit then download numpy-1.11.2rc1+mkl-cp27-cp27m-win_amd64.whl and scipy-0.18.1-cp27-cp27m-win_amd64.whl

After downloading,save the files under your python directory , in my case it was c:\>python27

Then run:

pip install C:\Python27\numpy-1.11.2rc1+mkl-cp27-cp27m-win32.whl 
pip install C:\Python27\scipy-0.18.1-cp27-cp27m-win32.whl

Note:

  1. scipy needs numpy as dependency, so that’s why we are downloading numpy before scipy.
  2. cp27 in .whl files means that these files are meant for python 2.7 and cp33 stands for python 3.x speciafically >=3.3

回答 4

找到一些线索的答案后,我通过做

brew install gcc 
pip install scipy

(第一步是在我的2011 Mac Book Air上花费96分钟,所以希望您不要着急!)

After finding this answer for some clues, I got this working by doing

brew install gcc 
pip install scipy

(The first of these steps took 96 minutes on my 2011 Mac Book Air so I hope you’re not in a hurry!)


回答 5

如果您是python的新手,请分步阅读或直接进入最后一步。请按照以下方法在Windows 64位,Python 64位上安装scipy 0.18.1。如果以下命令不起作用,请继续进行操作

pip install scipy

注意以下版本

  1. Python

  2. 视窗

  3. .whl版本的numpy和scipy文件

  4. 首先安装numpy和scipy。

    pip install FileName.whl
  5. 对于Numpy:http : //www.lfd.uci.edu/~gohlke/pythonlibs/#numpy 对于Scipy:http : //www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

注意文件名(检查版本号)。

例如:scipy-0.18.1-cp35-cp35m-win_amd64.whl

要检查您的点子支持哪个版本,请转到下面的第2点。

如果您正在使用.whl文件。可能会发生以下错误。

  1. 您正在使用pip版本7.1.0,但是版本8.1.2可用。

您应该考虑通过’python -m pip install –upgrade pip’命令进行升级

  1. 在此平台上不支持scipy-0.15.1-cp33-none-win_amd64.whl.whl

对于上述错误:启动Python并输入:

import pip
print(pip.pep425tags.get_supported())

输出:

[(’cp35’,’cp35m’,’win_amd64’),(’cp35’,’none’,’win_amd64’),(’py3’,’none’,’win_amd64’),(’cp35’,’none ‘,’any’),(’cp3’,’none’,’any’),(’py35’,’none’,’any’),(’py3’,’none’,’any’),( ‘py34’,’none’,’any’),(’py33’,’none’,’any’),(’py32’,’none’,’any’),(’py31’,’none’, ‘any’),(’py30’,’none’,’any’)]

在输出中,您会看到cp35在那里,因此为numpy和scipy下载cp35,欢迎进一步编辑。

If you are totally new to python read step by step or go directly to last step. Follow the below method to install scipy 0.18.1 on Windows 64-bit , Python 64-bit . If below command is not working then proceed further

pip install scipy

Be careful with the versions of

  1. Python

  2. Windows

  3. .whl version of numpy and scipy files

  4. First install numpy and scipy.

    pip install FileName.whl
    
  5. For Numpy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy For Scipy:http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy

Be aware of the file name (check the version number).

Ex :scipy-0.18.1-cp35-cp35m-win_amd64.whl

To check which version is supported by your pip, go to point No 2 below.

If you are using .whl file . Following errors are likely to occur .

  1. You are using pip version 7.1.0, however version 8.1.2 is available.

You should consider upgrading via the ‘python -m pip install –upgrade pip’ command

  1. scipy-0.15.1-cp33-none-win_amd64.whl.whl is not supported wheel on this platform

For the above error: start Python and type :

import pip
print(pip.pep425tags.get_supported())

Output:

[(‘cp35’, ‘cp35m’, ‘win_amd64’), (‘cp35’, ‘none’, ‘win_amd64’), (‘py3’, ‘none’, ‘win_amd64’), (‘cp35’, ‘none’, ‘any’), (‘cp3’, ‘none’, ‘any’), (‘py35’, ‘none’, ‘any’), (‘py3’, ‘none’, ‘any’), (‘py34’, ‘none’, ‘any’), (‘py33’, ‘none’, ‘any’), (‘py32’, ‘none’, ‘any’), (‘py31’, ‘none’, ‘any’), (‘py30’, ‘none’, ‘any’)]

In the output you will observe cp35 is there , so download cp35 for numpy as well as scipy.Further edits are most welcome.


回答 6

对于Windows 10

C:\目录> pip安装scipy-0.19.0rc2-cp35-cp35m-win_amd64.whl

For Windows 10

C:\directory> pip install scipy-0.19.0rc2-cp35-cp35m-win_amd64.whl


回答 7

而不是艰难地下载特定软件包。我更喜欢使用Conda的更快途径。点有问题。

  • Python -v(3.6.0)
  • Windows 10(64位)

Conda,从以下位置安装conda:https ://conda.io/docs/install/quick.html#windows-miniconda-install

命令提示符

C:\Users\xyz>conda install -c anaconda scipy=0.18.1
Fetching package metadata .............
Solving package specifications:

在环境C:\ Users \ xyz \ Miniconda3中安装的软件包计划:

将安装以下新软件包:

mkl:       2017.0.1-0         anaconda
numpy:     1.12.0-py36_0      anaconda
scipy:     0.18.1-np112py36_1 anaconda

优先级较高的频道将对以下程序包进行超级管理:

conda:     4.3.11-py36_0               --> 4.3.11-py36_0 anaconda
conda-env: 2.6.0-0                     --> 2.6.0-0       anaconda

是否继续([y] / n)?ÿ

conda-env-2.6. 100% |###############################| Time: 0:00:00  32.92 kB/s
mkl-2017.0.1-0 100% |###############################| Time: 0:00:24   5.45 MB/s
numpy-1.12.0-p 100% |###############################| Time: 0:00:00   5.09 MB/s
scipy-0.18.1-n 100% |###############################| Time: 0:00:02   5.59 MB/s
conda-4.3.11-p 100% |###############################| Time: 0:00:00   4.70 MB/s

Rather than going the harder route of downloading specific packages. I prefer to go the faster route of using Conda. pip has its issues.

  • Python -v (3.6.0)
  • Windows 10 (64 bit)

Conda , install conda from : https://conda.io/docs/install/quick.html#windows-miniconda-install

command prompt

C:\Users\xyz>conda install -c anaconda scipy=0.18.1
Fetching package metadata .............
Solving package specifications:

Package plan for installation in environment C:\Users\xyz\Miniconda3:

The following NEW packages will be INSTALLED:

mkl:       2017.0.1-0         anaconda
numpy:     1.12.0-py36_0      anaconda
scipy:     0.18.1-np112py36_1 anaconda

The following packages will be SUPERCEDED by a higher-priority channel:

conda:     4.3.11-py36_0               --> 4.3.11-py36_0 anaconda
conda-env: 2.6.0-0                     --> 2.6.0-0       anaconda

Proceed ([y]/n)? y

conda-env-2.6. 100% |###############################| Time: 0:00:00  32.92 kB/s
mkl-2017.0.1-0 100% |###############################| Time: 0:00:24   5.45 MB/s
numpy-1.12.0-p 100% |###############################| Time: 0:00:00   5.09 MB/s
scipy-0.18.1-n 100% |###############################| Time: 0:00:02   5.59 MB/s
conda-4.3.11-p 100% |###############################| Time: 0:00:00   4.70 MB/s

回答 8

  1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy下载SciPy
  2. 进入下载文件所在的目录和pip install文件。
  3. 转到python shell,运行import scipy;它对我没有任何错误。
  1. Download SciPy from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  2. Go into the directory the downloaded file is in and pip install the file.
  3. Go to python shell, run import scipy; it worked for me with no errors.

回答 9

这是pip的替代方法。安装scipy时我也遇到了相同的错误使用pip。

然后我下载并安装了MiniConda。然后,我使用以下命令安装pytables。

conda install -c conda-forge scipy

请参考以下屏幕截图。

This is an alternative to pip. I also had the same error when installing scipy with pip.

Then I downloaded and installed MiniConda. And then I used the below command to install pytables.

conda install -c conda-forge scipy

Please refer the below screenshot.


回答 10

我可以建议的最好方法是

  1. 从此位置下载wheel文件以获取您的python版本

  2. 将文件移动到主驱动器,例如C:>

  3. 运行Cmd并输入以下内容

    • 点安装scipy-1.0.0rc1-cp36-none-win_amd64.whl

请注意,这是我用于pyhton 3.6.2的版本,应该可以正常安装

您可能要在确保所有python附加组件都是最新的之后运行此命令

pip list --outdated

the best method I could suggest is this

  1. Download the wheel file from this location for your version of python

  2. Move the file to your Main Drive eg C:>

  3. Run Cmd and enter the following

    • pip install scipy-1.0.0rc1-cp36-none-win_amd64.whl

Please note this is the version I am using for my pyhton 3.6.2 it should install fine

you may want to run this command after to make sure all your python add ons are up to date

pip list --outdated

回答 11

或者,手动下载并执行 适合您的http://www.lfd.uci.edu/~gohlke/pythonlibs Scipy版本。考虑您的Python版本(python –version)和您的系统架构(32/64位)。相应地选择Scipy版本。SciPy的-0.18.1- cp27 -cp27m- 的win32 -用于Python 2.7 32位SciPy的-0.18.1- cp27 -cp27m- win_amd64 -用于Python 2.7 64位否则错误 SciPy的-0.15.1-CP33-NONE-win_amd64.whl不支持.whl的平台 上的滚轮将在安装时弹出。

现在将目录更改为下载的文件并执行命令 pip install scipy-0.15.1-cp33-none-win_amd64.whl.whl(适当更改文件名)

我添加此答案的唯一原因是Arun的答案(对我自己有用)没有提及我所遇到的有关32/64位匹配的任何内容。

Alternatively, manually download and execute http://www.lfd.uci.edu/~gohlke/pythonlibs Scipy version suitable for you. Consider your Python version (python –version) and your system architecture (32/64 bit). Choose the Scipy version accordingly. scipy-0.18.1-cp27-cp27m-win32 – for Python 2.7 32 bit scipy-0.18.1-cp27-cp27m-win_amd64 – for Python 2.7 64 bit Otherwise the error scipy-0.15.1-cp33-none-win_amd64.whl.whl is not supported wheel on this platform will popup on installation.

Now change directory to the downloaded file and execute command pip install scipy-0.15.1-cp33-none-win_amd64.whl.whl (change file name appropriately)

I have added this answer only because the Arun’s answer(found useful by myself) has not mentioned anything about 32/64 bit matching which i have faced.


回答 12

如果您使用的是CentOS,则需要按以下方式安装lapack-devel:

 $ yum install lapack-devel

If you are using CentOS you need to install lapack-devel like so:

 $ yum install lapack-devel

回答 13

尝试从下面的链接下载scipy文件

https://sourceforge.net/projects/scipy/?source=typ_redirect

这将是一个.exe文件,您只需要运行它即可。但请确保选择与您的python版本相对应的scipy版本。

运行scipy.exe文件时,它将找到python目录并进行安装。

Try downloading the scipy file from the below link

https://sourceforge.net/projects/scipy/?source=typ_redirect

It will be a .exe file and you just need to run it. But be sure to chose the scipy version corresponding to your python version.

When the scipy.exe file is run it will locate the python directory and will be installed .


回答 14

使用wheel文件从此处下载安装文件 http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy install

pip install c:\jjjj\ggg\fdadf.whl

use the wheel file to install download from here http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy install

pip install c:\jjjj\ggg\fdadf.whl

回答 15

我遇到了同样的问题,并且成功使用了sudo

$ sudo pip install scipy

I was having the same issue, and I had succeeded using sudo.

$ sudo pip install scipy

回答 16

最简单的方法是执行以下步骤:修复python [2.n <python <3.n]的scipy

从以下位置下载必要的文件:http : //www.lfd.uci.edu/~gohlke/pythonlibs/

下载numpy + mkl的版本(需要运行scipy),然后为您的python类型(2.n python编写为2n)或(3.n python编写为3n)下载scipy,n是一个变量。请注意,您必须知道您拥有32位还是64位处理器。

在计算机上的某个位置创建一个目录,例如[C:\ DIRECTORY]以安装文件numpy + mkd.whl和scipy.whl

下载完两个文件后,在计算机上找到文件的位置,然后将其移动到您创建的目录中。

示例:首先需要安装文件才能安装scipy

C:\ DIRECTORY \ numpy \ numpy-0.0.0 + mkl-cp2n-cp2nm-win_amd32.whl

示例:第二个文件安装在

C:\ DIRECTORY \ scipy \ scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

转到命令提示符并针对python 2.n版本继续以下示例:

py -2.n -m pip install C:\DIRECTORY\numpy\numpy-0.0.0+mkl-cp2n-cp2nm-win_amd32.whl

应该安装

py -2.n -m pip install C:\DIRECTORY\scipy\scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

应该安装

如下测试python IDLE上的两个模块:

import numpy

import scipy

如果没有错误返回,则模块正在工作。

国际食品药品监督管理局

The easiest way is in the following steps: Fixing scipy for python [ 2.n < python < 3.n ]

Download the necessary files from: http://www.lfd.uci.edu/~gohlke/pythonlibs/

Download the version of numpy+mkl (needed to run scipy) and then download scipy for your python type (2.n python written as 2n) or (3.n python written as 3n), n is a variable. Note you must know whether you have a 32bit or 64bit processor.

Create a directory somewhere on your computer, example [C:\DIRECTORY] to install the files numpy+mkd.whl and scipy.whl

Once both file are downloaded, find the location of the file on your computer and move it to the directory you created.

Example: First file installation is needed for scipy is in

C:\DIRECTORY\numpy\numpy-0.0.0+mkl-cp2n-cp2nm-win_amd32.whl

Example: Second file installation is in

C:\DIRECTORY\scipy\scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

Go to your command prompt and proceed the following example for a python version 2.n:

py -2.n -m pip install C:\DIRECTORY\numpy\numpy-0.0.0+mkl-cp2n-cp2nm-win_amd32.whl

should install

py -2.n -m pip install C:\DIRECTORY\scipy\scipy-0.0.0-cp2n-cp2nm-win_amd32.whl

should install

Test both modules on your python IDLE as following:

import numpy

import scipy

the modules are working if no errors are returned.

IFDAAS


回答 17

对于Windows(在我的情况下为7):

  1. http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy下载scipy-0.19.1-cp36-cp36m-win32.whl
  2. 创建一个带有内容的some.bat文件

    @echo off C:\Python36\python.exe -m pip -V C:\Python36\python.exe -m pip install scipy-0.19.1-cp36-cp36m-win32.whl C:\Python36\python.exe -m pip list pause

  3. 然后运行此批处理文件some.bat

  4. 调用python shell“ C:\ Python36 \ pythonw.exe” C:\ Python36 \ Lib \ idlelib \ idle.pyw“并测试scipy是否与

导入密码

For windows(7 in my case):

  1. download scipy-0.19.1-cp36-cp36m-win32.whl from http://www.lfd.uci.edu/~gohlke/pythonlibs/#scipy
  2. create one some.bat file with content

    @echo off C:\Python36\python.exe -m pip -V C:\Python36\python.exe -m pip install scipy-0.19.1-cp36-cp36m-win32.whl C:\Python36\python.exe -m pip list pause

  3. then run this batch file some.bat

  4. call python shell “C:\Python36\pythonw.exe “C:\Python36\Lib\idlelib\idle.pyw” and test if scipy was installed with

import scipy


回答 18

在Windows 10 100%上安装scipy的简单方法是:只需点此====> pip install scipy==1.0.0rc2

晚点再谢我 :)

The easy way to install scipy on Windows 10 100% is this: Just pip this ====> pip install scipy==1.0.0rc2

Thank me later :)


回答 19

我在Python 3.7(3.7.0b4)中遇到了类似的问题。这是由于有关某些编码假设的某些更改(Python 3.6 >> Python 3.7)

结果,许多软件包安装(例如通过pip)失败。

I experienced similar issues with Python 3.7 (3.7.0b4). This was due to some changes regarding some encoding assumptions (Python 3.6 >> Python 3.7)

As a result lots of package installations (e.g. via pip) failed.


回答 20

您可以测试以下答案:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

You can test this answer:

python -m pip install --user numpy scipy matplotlib ipython jupyter pandas sympy nose

numpy:将每行除以一个向量元素

问题:numpy:将每行除以一个向量元素

假设我有一个numpy数组:

data = np.array([[1,1,1],[2,2,2],[3,3,3]])

我有一个对应的“向量”:

vector = np.array([1,2,3])

我如何data沿着每一行进行减法或除法运算,所以结果是:

sub_result = [[0,0,0], [0,0,0], [0,0,0]]
div_result = [[1,1,1], [1,1,1], [1,1,1]]

长话短说:如何使用对应于每一行的1D标量数组在2D数组的每一行上执行操作?

Suppose I have a numpy array:

data = np.array([[1,1,1],[2,2,2],[3,3,3]])

and I have a corresponding “vector:”

vector = np.array([1,2,3])

How do I operate on data along each row to either subtract or divide so the result is:

sub_result = [[0,0,0], [0,0,0], [0,0,0]]
div_result = [[1,1,1], [1,1,1], [1,1,1]]

Long story short: How do I perform an operation on each row of a 2D array with a 1D array of scalars that correspond to each row?


回答 0

干得好。您只需要与广播结合使用None(或np.newaxis):

In [6]: data - vector[:,None]
Out[6]:
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [7]: data / vector[:,None]
Out[7]:
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

Here you go. You just need to use None (or alternatively np.newaxis) combined with broadcasting:

In [6]: data - vector[:,None]
Out[6]:
array([[0, 0, 0],
       [0, 0, 0],
       [0, 0, 0]])

In [7]: data / vector[:,None]
Out[7]:
array([[1, 1, 1],
       [1, 1, 1],
       [1, 1, 1]])

回答 1

正如已经提到,切片用None或者np.newaxes是一个伟大的方式来做到这一点。另一种选择是使用转置和广播,如

(data.T - vector).T

(data.T / vector).T

对于高维数组,您可能需要使用swapaxesNumPy数组或NumPy的方法rollaxis函数。确实有很多方法可以做到这一点。

有关广播的完整说明,请参见 http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html

As has been mentioned, slicing with None or with np.newaxes is a great way to do this. Another alternative is to use transposes and broadcasting, as in

(data.T - vector).T

and

(data.T / vector).T

For higher dimensional arrays you may want to use the swapaxes method of NumPy arrays or the NumPy rollaxis function. There really are a lot of ways to do this.

For a fuller explanation of broadcasting, see http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html


回答 2

JoshAdel的解决方案使用np.newaxis添加尺寸。一种替代方法是使用reshape()对齐尺寸以准备广播

data = np.array([[1,1,1],[2,2,2],[3,3,3]])
vector = np.array([1,2,3])

data
# array([[1, 1, 1],
#        [2, 2, 2],
#        [3, 3, 3]])
vector
# array([1, 2, 3])

data.shape
# (3, 3)
vector.shape
# (3,)

data / vector.reshape((3,1))
# array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])

执行reshape()可以将尺寸对齐以进行广播:

data:            3 x 3
vector:              3
vector reshaped: 3 x 1

请注意,这data/vector可以,但是并不能为您提供所需的答案。它把各array(而不是每一由每个相应的元素)vector。如果您明确将其重塑vector1x3而不是,则会得到此结果3x1

data / vector
# array([[1, 0, 0],
#        [2, 1, 0],
#        [3, 1, 1]])
data / vector.reshape((1,3))
# array([[1, 0, 0],
#        [2, 1, 0],
#        [3, 1, 1]])

JoshAdel’s solution uses np.newaxis to add a dimension. An alternative is to use reshape() to align the dimensions in preparation for broadcasting.

data = np.array([[1,1,1],[2,2,2],[3,3,3]])
vector = np.array([1,2,3])

data
# array([[1, 1, 1],
#        [2, 2, 2],
#        [3, 3, 3]])
vector
# array([1, 2, 3])

data.shape
# (3, 3)
vector.shape
# (3,)

data / vector.reshape((3,1))
# array([[1, 1, 1],
#        [1, 1, 1],
#        [1, 1, 1]])

Performing the reshape() allows the dimensions to line up for broadcasting:

data:            3 x 3
vector:              3
vector reshaped: 3 x 1

Note that data/vector is ok, but it doesn’t get you the answer that you want. It divides each column of array (instead of each row) by each corresponding element of vector. It’s what you would get if you explicitly reshaped vector to be 1x3 instead of 3x1.

data / vector
# array([[1, 0, 0],
#        [2, 1, 0],
#        [3, 1, 1]])
data / vector.reshape((1,3))
# array([[1, 0, 0],
#        [2, 1, 0],
#        [3, 1, 1]])

回答 3

Pythonic的方法是…

np.divide(data.T,vector).T

这需要重整形,并且结果为浮点格式。在其他答案中,结果为四舍五入的整数格式。

#注意:数据和向量中的列数均应匹配

Pythonic way to do this is …

np.divide(data.T,vector).T

This takes care of reshaping and also the results are in floating point format. In other answers results are in rounded integer format.

#NOTE: No of columns in both data and vector should match


回答 4

在一般情况下,您可以使用stackoverflowuser2010的答案

data = np.array([[1,1,1],[2,2,2],[3,3,3]])

vector = np.array([1,2,3])

data / vector.reshape(-1,1)

这会将您的向量变成column matrix/vector。允许您根据需要执行元素操作。至少对我来说,这是最直观的方式,因为(在大多数情况下)numpy只会使用同一内部存储器的视图来重塑它的效率。

Adding to the answer of stackoverflowuser2010, in the general case you can just use

data = np.array([[1,1,1],[2,2,2],[3,3,3]])

vector = np.array([1,2,3])

data / vector.reshape(-1,1)

This will turn your vector into a column matrix/vector. Allowing you to do the elementwise operations as you wish. At least to me, this is the most intuitive way going about it and since (in most cases) numpy will just use a view of the same internal memory for the reshaping it’s efficient too.