SciPy和NumPy之间的关系

问题:SciPy和NumPy之间的关系

SciPy似乎在其自己的命名空间中提供了NumPy的大多数(但不是全部[1])功能。换句话说,如果有一个名为的函数numpy.foo,则几乎可以肯定有一个scipy.foo。在大多数情况下,两者看起来是完全相同的,甚至有时指向相同的功能对象。

有时,它们是不同的。举一个最近出现的例子:

  • numpy.log10是一个ufunc该返回的NaN为负参数;
  • scipy.log10 返回负参数的复数值,并且似乎不是ufunc。

同样可以说,大约loglog2logn,但不是关于log1p[2]。

另一方面,numpy.expscipy.exp似乎对于同一ufunc是不同的名称。scipy.log1p和的情况也是如此numpy.log1p

另一个例子是numpy.linalg.solveVS scipy.linalg.solve。它们相似,但是后者比前者提供了一些附加功能。

为什么出现明显的重复?如果这意味着要的批发进口numpyscipy命名空间,为什么在行为的细微差别和缺少的功能?是否有一些有助于消除混乱的总体逻辑?

[1] ,,numpy.min 和其他几个人都在没有同行的命名空间。numpy.maxnumpy.absscipy

[2]使用NumPy 1.5.1和SciPy 0.9.0rc2进行了测试。

SciPy appears to provide most (but not all [1]) of NumPy’s functions in its own namespace. In other words, if there’s a function named numpy.foo, there’s almost certainly a scipy.foo. Most of the time, the two appear to be exactly the same, oftentimes even pointing to the same function object.

Sometimes, they’re different. To give an example that came up recently:

  • numpy.log10 is a ufunc that returns NaNs for negative arguments;
  • scipy.log10 returns complex values for negative arguments and doesn’t appear to be a ufunc.

The same can be said about log, log2 and logn, but not about log1p [2].

On the other hand, numpy.exp and scipy.exp appear to be different names for the same ufunc. This is also true of scipy.log1p and numpy.log1p.

Another example is numpy.linalg.solve vs scipy.linalg.solve. They’re similar, but the latter offers some additional features over the former.

Why the apparent duplication? If this is meant to be a wholesale import of numpy into the scipy namespace, why the subtle differences in behaviour and the missing functions? Is there some overarching logic that would help clear up the confusion?

[1] numpy.min, numpy.max, numpy.abs and a few others have no counterparts in the scipy namespace.

[2] Tested using NumPy 1.5.1 and SciPy 0.9.0rc2.


回答 0

上次我检查它时,scipy __init__方法执行

from numpy import *

以便在导入scipy模块时将整个numpy命名空间包含到scipy中。

log10您描述的行为很有趣,因为两个版本都来自numpy。一个是a ufunc,另一个是numpy.lib功能。为什么scipy偏爱库函数而不是ufunc,我不知道该怎么办。


编辑:事实上,我可以回答这个log10问题。在scipy __init__方法中,我看到以下内容:

# Import numpy symbols to scipy name space
import numpy as _num
from numpy import oldnumeric
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

log10您获得scipy 的功能来自numpy.lib.scimath。查看该代码,它说:

"""
Wrapper functions to more user-friendly calling of certain math functions
whose output data-type is different than the input data-type in certain
domains of the input.

For example, for functions like log() with branch cuts, the versions in this
module provide the mathematically valid answers in the complex plane:

>>> import math
>>> from numpy.lib import scimath
>>> scimath.log(-math.exp(1)) == (1+1j*math.pi)
True

Similarly, sqrt(), other base logarithms, power() and trig functions are
correctly handled.  See their respective docstrings for specific examples.
"""

看来模块覆盖了基础numpy的ufuncs sqrtloglog2lognlog10powerarccosarcsin,和arctanh。这就解释了您所看到的行为。这样做的根本设计原因可能埋在某个地方的邮件列表中。

Last time I checked it, the scipy __init__ method executes a

from numpy import *

so that the whole numpy namespace is included into scipy when the scipy module is imported.

The log10 behavior you are describing is interesting, because both versions are coming from numpy. One is a ufunc, the other is a numpy.lib function. Why scipy is preferring the library function over the ufunc, I don’t know off the top of my head.


EDIT: In fact, I can answer the log10 question. Looking in the scipy __init__ method I see this:

# Import numpy symbols to scipy name space
import numpy as _num
from numpy import oldnumeric
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

The log10 function you get in scipy comes from numpy.lib.scimath. Looking at that code, it says:

"""
Wrapper functions to more user-friendly calling of certain math functions
whose output data-type is different than the input data-type in certain
domains of the input.

For example, for functions like log() with branch cuts, the versions in this
module provide the mathematically valid answers in the complex plane:

>>> import math
>>> from numpy.lib import scimath
>>> scimath.log(-math.exp(1)) == (1+1j*math.pi)
True

Similarly, sqrt(), other base logarithms, power() and trig functions are
correctly handled.  See their respective docstrings for specific examples.
"""

It seems that module overlays the base numpy ufuncs for sqrt, log, log2, logn, log10, power, arccos, arcsin, and arctanh. That explains the behavior you are seeing. The underlying design reason why it is done like that is probably buried in a mailing list post somewhere.


回答 1

从《 SciPy参考指南》中:

…所有的Nu​​mpy函数都已包含在scipy 命名空间中,因此所有这些函数都可用而无需另外导入Numpy。

目的是使用户不必知道scipynumpy命名空间之间的区别,尽管显然您已经发现了一个exceptions。

From the SciPy Reference Guide:

… all of the Numpy functions have been subsumed into the scipy namespace so that all of those functions are available without additionally importing Numpy.

The intention is for users not to have to know the distinction between the scipy and numpy namespaces, though apparently you’ve found an exception.


回答 2

从看来 SciPy常见问题解答 NumPy的某些功能出于历史原因而在这里,而它仅应在SciPy中:

NumPy和SciPy有什么区别?

在理想的情况下,NumPy只会包含数组数据类型和最基本的操作:索引,排序,重塑,基本的元素函数等。所有数字代码都将驻留在SciPy中。但是,NumPy的重要目标之一是兼容性,因此NumPy尝试保留其前任任一个所支持的所有功能。因此,NumPy包含一些线性代数函数,即使这些函数更恰当地属于SciPy。无论如何,SciPy都包含线性代数模块的更多全功能版本,以及许多其他数值算法。如果您正在使用python进行科学计算,则可能应该同时安装NumPy和SciPy。大多数新功能属于SciPy,而不是NumPy。

这就解释了为什么scipy.linalg.solve在之上提供了一些附加功能numpy.linalg.solve

我没有看到SethMMorton对相关问题的回答

It seems from the SciPy FAQ that some functions from NumPy are here for historical reasons while it should only be in SciPy:

What is the difference between NumPy and SciPy?

In an ideal world, NumPy would contain nothing but the array data type and the most basic operations: indexing, sorting, reshaping, basic elementwise functions, et cetera. All numerical code would reside in SciPy. However, one of NumPy’s important goals is compatibility, so NumPy tries to retain all features supported by either of its predecessors. Thus NumPy contains some linear algebra functions, even though these more properly belong in SciPy. In any case, SciPy contains more fully-featured versions of the linear algebra modules, as well as many other numerical algorithms. If you are doing scientific computing with python, you should probably install both NumPy and SciPy. Most new features belong in SciPy rather than NumPy.

That explains why scipy.linalg.solve offers some additional features over numpy.linalg.solve.

I did not see the answer of SethMMorton to the related question


回答 3

SciPy文档的简介末尾有一段简短的评论:

另一个有用的命令是source。当给定一个用Python编写的函数作为参数时,它将打印出该函数的源代码清单。这有助于学习算法或准确了解函数对其参数的作用。另外,不要忘记Python命令目录,该目录可用于查看模块或包的命名空间。

我认为,这将允许有人用所有的软件包足够的知识涉及挑开完全的差异是什么之间的一些 SciPy的和numpy的功能(它没有帮助我在所有的日志10题)。我绝对不具备这些知识,但是source确实表明了这一点,scipy.linalg.solvenumpy.linalg.solve以不同的方式与lapack进行了交互。

Python 2.4.3 (#1, May  5 2011, 18:44:23) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
>>> import scipy
>>> import scipy.linalg
>>> import numpy
>>> scipy.source(scipy.linalg.solve)
In file: /usr/lib64/python2.4/site-packages/scipy/linalg/basic.py

def solve(a, b, sym_pos=0, lower=0, overwrite_a=0, overwrite_b=0,
          debug = 0):
    """ solve(a, b, sym_pos=0, lower=0, overwrite_a=0, overwrite_b=0) -> x

    Solve a linear system of equations a * x = b for x.

    Inputs:

      a -- An N x N matrix.
      b -- An N x nrhs matrix or N vector.
      sym_pos -- Assume a is symmetric and positive definite.
      lower -- Assume a is lower triangular, otherwise upper one.
               Only used if sym_pos is true.
      overwrite_y - Discard data in y, where y is a or b.

    Outputs:

      x -- The solution to the system a * x = b
    """
    a1, b1 = map(asarray_chkfinite,(a,b))
    if len(a1.shape) != 2 or a1.shape[0] != a1.shape[1]:
        raise ValueError, 'expected square matrix'
    if a1.shape[0] != b1.shape[0]:
        raise ValueError, 'incompatible dimensions'
    overwrite_a = overwrite_a or (a1 is not a and not hasattr(a,'__array__'))
    overwrite_b = overwrite_b or (b1 is not b and not hasattr(b,'__array__'))
    if debug:
        print 'solve:overwrite_a=',overwrite_a
        print 'solve:overwrite_b=',overwrite_b
    if sym_pos:
        posv, = get_lapack_funcs(('posv',),(a1,b1))
        c,x,info = posv(a1,b1,
                        lower = lower,
                        overwrite_a=overwrite_a,
                        overwrite_b=overwrite_b)
    else:
        gesv, = get_lapack_funcs(('gesv',),(a1,b1))
        lu,piv,x,info = gesv(a1,b1,
                             overwrite_a=overwrite_a,
                             overwrite_b=overwrite_b)

    if info==0:
        return x
    if info>0:
        raise LinAlgError, "singular matrix"
    raise ValueError,\
          'illegal value in %-th argument of internal gesv|posv'%(-info)

>>> scipy.source(numpy.linalg.solve)
In file: /usr/lib64/python2.4/site-packages/numpy/linalg/linalg.py

def solve(a, b):
    """
    Solve the equation ``a x = b`` for ``x``.

    Parameters
    ----------
    a : array_like, shape (M, M)
        Input equation coefficients.
    b : array_like, shape (M,)
        Equation target values.

    Returns
    -------
    x : array, shape (M,)

    Raises
    ------
    LinAlgError
        If `a` is singular or not square.

    Examples
    --------
    Solve the system of equations ``3 * x0 + x1 = 9`` and ``x0 + 2 * x1 = 8``:

    >>> a = np.array([[3,1], [1,2]])
    >>> b = np.array([9,8])
    >>> x = np.linalg.solve(a, b)
    >>> x
    array([ 2.,  3.])

    Check that the solution is correct:

    >>> (np.dot(a, x) == b).all()
    True

    """
    a, _ = _makearray(a)
    b, wrap = _makearray(b)
    one_eq = len(b.shape) == 1
    if one_eq:
        b = b[:, newaxis]
    _assertRank2(a, b)
    _assertSquareness(a)
    n_eq = a.shape[0]
    n_rhs = b.shape[1]
    if n_eq != b.shape[0]:
        raise LinAlgError, 'Incompatible dimensions'
    t, result_t = _commonType(a, b)
#    lapack_routine = _findLapackRoutine('gesv', t)
    if isComplexType(t):
        lapack_routine = lapack_lite.zgesv
    else:
        lapack_routine = lapack_lite.dgesv
    a, b = _fastCopyAndTranspose(t, a, b)
    pivots = zeros(n_eq, fortran_int)
    results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0)
    if results['info'] > 0:
        raise LinAlgError, 'Singular matrix'
    if one_eq:
        return wrap(b.ravel().astype(result_t))
    else:
        return wrap(b.transpose().astype(result_t))

这也是我的第一篇文章,因此如果我要在此处进行更改,请告诉我。

There is a short comment at the end of the introduction to SciPy documentation:

Another useful command issource. When given a function written in Python as an argument, it prints out a listing of the source code for that function. This can be helpful in learning about an algorithm or understanding exactly what a function is doing with its arguments. Also don’t forget about the Python command dir which can be used to look at the namespace of a module or package.

I think this will allow someone with enough knowledge of all the packages involved to pick apart exactly what the differences are between some scipy and numpy functions (it didn’t help me with the log10 question at all). I definitely don’t have that knowledge but source does indicate that scipy.linalg.solve and numpy.linalg.solve interact with lapack in different ways;

Python 2.4.3 (#1, May  5 2011, 18:44:23) 
[GCC 4.1.2 20080704 (Red Hat 4.1.2-50)] on linux2
>>> import scipy
>>> import scipy.linalg
>>> import numpy
>>> scipy.source(scipy.linalg.solve)
In file: /usr/lib64/python2.4/site-packages/scipy/linalg/basic.py

def solve(a, b, sym_pos=0, lower=0, overwrite_a=0, overwrite_b=0,
          debug = 0):
    """ solve(a, b, sym_pos=0, lower=0, overwrite_a=0, overwrite_b=0) -> x

    Solve a linear system of equations a * x = b for x.

    Inputs:

      a -- An N x N matrix.
      b -- An N x nrhs matrix or N vector.
      sym_pos -- Assume a is symmetric and positive definite.
      lower -- Assume a is lower triangular, otherwise upper one.
               Only used if sym_pos is true.
      overwrite_y - Discard data in y, where y is a or b.

    Outputs:

      x -- The solution to the system a * x = b
    """
    a1, b1 = map(asarray_chkfinite,(a,b))
    if len(a1.shape) != 2 or a1.shape[0] != a1.shape[1]:
        raise ValueError, 'expected square matrix'
    if a1.shape[0] != b1.shape[0]:
        raise ValueError, 'incompatible dimensions'
    overwrite_a = overwrite_a or (a1 is not a and not hasattr(a,'__array__'))
    overwrite_b = overwrite_b or (b1 is not b and not hasattr(b,'__array__'))
    if debug:
        print 'solve:overwrite_a=',overwrite_a
        print 'solve:overwrite_b=',overwrite_b
    if sym_pos:
        posv, = get_lapack_funcs(('posv',),(a1,b1))
        c,x,info = posv(a1,b1,
                        lower = lower,
                        overwrite_a=overwrite_a,
                        overwrite_b=overwrite_b)
    else:
        gesv, = get_lapack_funcs(('gesv',),(a1,b1))
        lu,piv,x,info = gesv(a1,b1,
                             overwrite_a=overwrite_a,
                             overwrite_b=overwrite_b)

    if info==0:
        return x
    if info>0:
        raise LinAlgError, "singular matrix"
    raise ValueError,\
          'illegal value in %-th argument of internal gesv|posv'%(-info)

>>> scipy.source(numpy.linalg.solve)
In file: /usr/lib64/python2.4/site-packages/numpy/linalg/linalg.py

def solve(a, b):
    """
    Solve the equation ``a x = b`` for ``x``.

    Parameters
    ----------
    a : array_like, shape (M, M)
        Input equation coefficients.
    b : array_like, shape (M,)
        Equation target values.

    Returns
    -------
    x : array, shape (M,)

    Raises
    ------
    LinAlgError
        If `a` is singular or not square.

    Examples
    --------
    Solve the system of equations ``3 * x0 + x1 = 9`` and ``x0 + 2 * x1 = 8``:

    >>> a = np.array([[3,1], [1,2]])
    >>> b = np.array([9,8])
    >>> x = np.linalg.solve(a, b)
    >>> x
    array([ 2.,  3.])

    Check that the solution is correct:

    >>> (np.dot(a, x) == b).all()
    True

    """
    a, _ = _makearray(a)
    b, wrap = _makearray(b)
    one_eq = len(b.shape) == 1
    if one_eq:
        b = b[:, newaxis]
    _assertRank2(a, b)
    _assertSquareness(a)
    n_eq = a.shape[0]
    n_rhs = b.shape[1]
    if n_eq != b.shape[0]:
        raise LinAlgError, 'Incompatible dimensions'
    t, result_t = _commonType(a, b)
#    lapack_routine = _findLapackRoutine('gesv', t)
    if isComplexType(t):
        lapack_routine = lapack_lite.zgesv
    else:
        lapack_routine = lapack_lite.dgesv
    a, b = _fastCopyAndTranspose(t, a, b)
    pivots = zeros(n_eq, fortran_int)
    results = lapack_routine(n_eq, n_rhs, a, n_eq, pivots, b, n_eq, 0)
    if results['info'] > 0:
        raise LinAlgError, 'Singular matrix'
    if one_eq:
        return wrap(b.ravel().astype(result_t))
    else:
        return wrap(b.transpose().astype(result_t))

This is also my first post so if I should change something here please let me know.


回答 4

从Wikipedia(http://en.wikipedia.org/wiki/NumPy#History):

修改了数字代码,使其更具可维护性和灵活性,足以实现Numarray的新颖功能。这个新项目是SciPy的一部分。为了避免仅为了获取数组对象而安装整个程序包,将此新程序包分开并称为NumPy。

scipy为了方便起见,依赖numpy并将许多numpy函数导入其命名空间。

From Wikipedia ( http://en.wikipedia.org/wiki/NumPy#History ):

The Numeric code was adapted to make it more maintainable and flexible enough to implement the novel features of Numarray. This new project was part of SciPy. To avoid installing a whole package just to get an array object, this new package was separated and called NumPy.

scipy depends on numpy and imports many numpy functions into its namespace for convenience.


回答 5

关于linalg软件包-scipy函数将调用lapack和blas,它们在许多平台上都具有高度优化的版本,并且具有非常好的性能,尤其是对于在较大密度矩阵上的操作。另一方面,它们不是易于编译的库,需要fortran编译器和许多特定于平台的调整才能获得完整的性能。因此,numpy提供了许多常见线性代数函数的简单实现,这些函数通常足以满足许多目的。

Regarding the linalg package – the scipy functions will call lapack and blas, which are available in highly optimised versions on many platforms and offer very good performance, particularly for operations on reasonably large dense matrices. On the other hand, they are not easy libraries to compile, requiring a fortran compiler and many platform specific tweaks to get full performance. Therefore, numpy provides simple implementations of many common linear algebra functions which are often good enough for many purposes.


回答 6

从“ 定量经济学 ” 讲座

SciPy是一个软件包,其中包含使用NumPy构建的各种工具,这些工具使用其数组数据类型和相关功能

实际上,当我们导入SciPy时,我们也会得到NumPy,这可以从SciPy初始化文件中看到

# Import numpy symbols to scipy name space
import numpy as _num
linalg = None
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

__all__  = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']

del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')

但是,显式使用NumPy功能是更常见和更好的做法

import numpy as np

a = np.identity(3)

在SciPy中有用的是其子包中的功能

  • scipy.optimize,scipy.integrate,scipy.stats等。

From Lectures on ‘Quantitative Economics

SciPy is a package that contains various tools that are built on top of NumPy, using its array data type and related functionality

In fact, when we import SciPy we also get NumPy, as can be seen from the SciPy initialization file

# Import numpy symbols to scipy name space
import numpy as _num
linalg = None
from numpy import *
from numpy.random import rand, randn
from numpy.fft import fft, ifft
from numpy.lib.scimath import *

__all__  = []
__all__ += _num.__all__
__all__ += ['randn', 'rand', 'fft', 'ifft']

del _num
# Remove the linalg imported from numpy so that the scipy.linalg package can be
# imported.
del linalg
__all__.remove('linalg')

However, it’s more common and better practice to use NumPy functionality explicitly

import numpy as np

a = np.identity(3)

What is useful in SciPy is the functionality in its subpackages

  • scipy.optimize, scipy.integrate, scipy.stats, etc.

回答 7

除了SciPy FAQ中描述的重复主要是为了向后兼容之外,在NumPy文档中进一步阐明说:

可选的SciPy加速例程(numpy.dual)

Scipy可能会加速的功能别名。

可以将SciPy构建为对FFT,线性代数和特殊函数使用加速或其他改进的库。该模块允许开发人员在SciPy可用时透明地支持这些加速功能,但仍支持仅安装NumPy的用户。

为简便起见,这些是:

  • 线性代数
  • 快速傅立叶变换
  • 第一种修改贝塞尔函数,阶数为0

另外,从SciPy教程中

SciPy的顶层还包含NumPy和numpy.lib.scimath中的函数。但是,最好直接从NumPy模块中使用它们。

因此,对于新应用程序,您应该首选在SciPy顶层重复的数组操作的NumPy版本。对于上面列出的域,您应该首选SciPy中的域,并在必要时在NumPy中检查向后兼容性。

以我的个人经验,我使用的大多数数组函数都位于NumPy的顶层(除外random)。但是,所有特定于域的例程都存在于SciPy的子包中,因此我很少使用SciPy顶层的任何东西。

In addition to the SciPy FAQ describing the duplication is mainly for backwards compatibility, it is further clarified in the NumPy documentation to say that

Optionally SciPy-accelerated routines (numpy.dual)

Aliases for functions which may be accelerated by Scipy.

SciPy can be built to use accelerated or otherwise improved libraries for FFTs, linear algebra, and special functions. This module allows developers to transparently support these accelerated functions when SciPy is available but still support users who have only installed NumPy.

For brevity, these are:

  • Linear algebra
  • FFT
  • The Modified Bessel function of the first kind, order 0

Also, from the SciPy Tutorial:

The top level of SciPy also contains functions from NumPy and numpy.lib.scimath. However, it is better to use them directly from the NumPy module instead.

So, for new applications, you should prefer the NumPy version of the array operations that are duplicated in the top level of SciPy. For the domains listed above, you should prefer those in SciPy and check backward compatibility if necessary in NumPy.

In my personal experience, most of the array functions I use exist in the top level of NumPy (except for random). However, all the domain specific routines exist in subpackages of SciPy, so I rarely use anything from the top level of SciPy.