标签归档:cython

Cython:“严重错误:numpy / arrayobject.h:没有此类文件或目录”

问题:Cython:“严重错误:numpy / arrayobject.h:没有此类文件或目录”

我试图加快答案在这里使用用Cython。我尝试编译代码(在完成此处cygwinccompiler.py介绍的hack 之后),但出现错误。谁能告诉我我的代码是否有问题,或者Cython有点神秘?fatal error: numpy/arrayobject.h: No such file or directory...compilation terminated

下面是我的代码。

import numpy as np
import scipy as sp
cimport numpy as np
cimport cython

cdef inline np.ndarray[np.int, ndim=1] fbincount(np.ndarray[np.int_t, ndim=1] x):
    cdef int m = np.amax(x)+1
    cdef int n = x.size
    cdef unsigned int i
    cdef np.ndarray[np.int_t, ndim=1] c = np.zeros(m, dtype=np.int)

    for i in xrange(n):
        c[<unsigned int>x[i]] += 1

    return c

cdef packed struct Point:
    np.float64_t f0, f1

@cython.boundscheck(False)
def sparsemaker(np.ndarray[np.float_t, ndim=2] X not None,
                np.ndarray[np.float_t, ndim=2] Y not None,
                np.ndarray[np.float_t, ndim=2] Z not None):

    cdef np.ndarray[np.float64_t, ndim=1] counts, factor
    cdef np.ndarray[np.int_t, ndim=1] row, col, repeats
    cdef np.ndarray[Point] indices

    cdef int x_, y_

    _, row = np.unique(X, return_inverse=True); x_ = _.size
    _, col = np.unique(Y, return_inverse=True); y_ = _.size
    indices = np.rec.fromarrays([row,col])
    _, repeats = np.unique(indices, return_inverse=True)
    counts = 1. / fbincount(repeats)
    Z.flat *= counts.take(repeats)

    return sp.sparse.csr_matrix((Z.flat,(row,col)), shape=(x_, y_)).toarray()

I’m trying to speed up the answer here using Cython. I try to compile the code (after doing the cygwinccompiler.py hack explained here), but get a fatal error: numpy/arrayobject.h: No such file or directory...compilation terminated error. Can anyone tell me if it’s a problem with my code, or some esoteric subtlety with Cython?

Below is my code.

import numpy as np
import scipy as sp
cimport numpy as np
cimport cython

cdef inline np.ndarray[np.int, ndim=1] fbincount(np.ndarray[np.int_t, ndim=1] x):
    cdef int m = np.amax(x)+1
    cdef int n = x.size
    cdef unsigned int i
    cdef np.ndarray[np.int_t, ndim=1] c = np.zeros(m, dtype=np.int)

    for i in xrange(n):
        c[<unsigned int>x[i]] += 1

    return c

cdef packed struct Point:
    np.float64_t f0, f1

@cython.boundscheck(False)
def sparsemaker(np.ndarray[np.float_t, ndim=2] X not None,
                np.ndarray[np.float_t, ndim=2] Y not None,
                np.ndarray[np.float_t, ndim=2] Z not None):

    cdef np.ndarray[np.float64_t, ndim=1] counts, factor
    cdef np.ndarray[np.int_t, ndim=1] row, col, repeats
    cdef np.ndarray[Point] indices

    cdef int x_, y_

    _, row = np.unique(X, return_inverse=True); x_ = _.size
    _, col = np.unique(Y, return_inverse=True); y_ = _.size
    indices = np.rec.fromarrays([row,col])
    _, repeats = np.unique(indices, return_inverse=True)
    counts = 1. / fbincount(repeats)
    Z.flat *= counts.take(repeats)

    return sp.sparse.csr_matrix((Z.flat,(row,col)), shape=(x_, y_)).toarray()

回答 0

在你里面setup.pyExtension应该有论据include_dirs=[numpy.get_include()]

另外,您np.import_array()的代码中缺少您。

示例setup.py:

from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy

setup(
    ext_modules=[
        Extension("my_module", ["my_module.c"],
                  include_dirs=[numpy.get_include()]),
    ],
)

# Or, if you use cythonize() to make the ext_modules list,
# include_dirs can be passed to setup()

setup(
    ext_modules=cythonize("my_module.pyx"),
    include_dirs=[numpy.get_include()]
)    

In your setup.py, the Extension should have the argument include_dirs=[numpy.get_include()].

Also, you are missing np.import_array() in your code.

Example setup.py:

from distutils.core import setup, Extension
from Cython.Build import cythonize
import numpy

setup(
    ext_modules=[
        Extension("my_module", ["my_module.c"],
                  include_dirs=[numpy.get_include()]),
    ],
)

# Or, if you use cythonize() to make the ext_modules list,
# include_dirs can be passed to setup()

setup(
    ext_modules=cythonize("my_module.pyx"),
    include_dirs=[numpy.get_include()]
)    

回答 1

对于像您这样的单文件项目,另一种选择是使用pyximportsetup.py如果使用IPython,则无需创建… …甚至无需打开命令行…都非常方便。您可以尝试在IPython或普通的Python脚本中运行以下命令:

import numpy
import pyximport
pyximport.install(setup_args={"script_args":["--compiler=mingw32"],
                              "include_dirs":numpy.get_include()},
                  reload_support=True)

import my_pyx_module

print my_pyx_module.some_function(...)
...

当然,您可能需要编辑编译器。这使得导入和重新加载对.pyx文件的作用与对文件的作用相同.py

资料来源:http : //wiki.cython.org/InstallingOnWindows

For a one-file project like yours, another alternative is to use pyximport. You don’t need to create a setup.py … you don’t need to even open a command line if you use IPython … it’s all very convenient. In your case, try running these commands in IPython or in a normal Python script:

import numpy
import pyximport
pyximport.install(setup_args={"script_args":["--compiler=mingw32"],
                              "include_dirs":numpy.get_include()},
                  reload_support=True)

import my_pyx_module

print my_pyx_module.some_function(...)
...

You may need to edit the compiler of course. This makes import and reload work the same for .pyx files as they work for .py files.

Source: http://wiki.cython.org/InstallingOnWindows


回答 2

该错误意味着在编译过程中找不到numpy头文件。

尝试这样做export CFLAGS=-I/usr/lib/python2.7/site-packages/numpy/core/include/,然后进行编译。这是几个不同软件包的问题。在ArchLinux中,存在针对同一问题的错误: https //bugs.archlinux.org/task/22326

The error means that a numpy header file isn’t being found during compilation.

Try doing export CFLAGS=-I/usr/lib/python2.7/site-packages/numpy/core/include/, and then compiling. This is a problem with a few different packages. There’s a bug filed in ArchLinux for the same issue: https://bugs.archlinux.org/task/22326


回答 3

简单的答案

一种更简单的方法是将路径添加到文件中distutils.cfg。默认情况下,它代表Windows 7的路径C:\Python27\Lib\distutils\。您只需声明以下内容即可解决:

[build_ext]
include_dirs= C:\Python27\Lib\site-packages\numpy\core\include

整个配置文件

为了给您一个示例,配置文件的外观,我的整个文件显示为:

[build]
compiler = mingw32

[build_ext]
include_dirs= C:\Python27\Lib\site-packages\numpy\core\include
compiler = mingw32

Simple answer

A way simpler way is to add the path to your file distutils.cfg. It’s path behalf of Windows 7 is by default C:\Python27\Lib\distutils\. You just assert the following contents and it should work out:

[build_ext]
include_dirs= C:\Python27\Lib\site-packages\numpy\core\include

Entire config file

To give you an example how the config file could look like, my entire file reads:

[build]
compiler = mingw32

[build_ext]
include_dirs= C:\Python27\Lib\site-packages\numpy\core\include
compiler = mingw32

回答 4

它应该能够在此处cythonize()提到的函数中执行此操作,但是由于存在已知问题,因此它不起作用

It should be able to do it within cythonize() function as mentioned here, but it doesn’t work beacuse there is a known issue


回答 5

如果您懒得编写设置文件并弄清楚包含目录的路径,请尝试cyper。它可以编译您的Cython代码并进行设置include_dirs自动为Numpy。

将您的代码加载到字符串中,然后简单地运行cymodule = cyper.inline(code_string),然后您的函数cymodule.sparsemaker即刻可用。像这样

code = open(your_pyx_file).read()
cymodule = cyper.inline(code)

cymodule.sparsemaker(...)
# do what you want with your function

您可以通过安装cyper pip install cyper

If you are too lazy to write setup files and figure out the path for include directories, try cyper. It can compile your Cython code and set include_dirs for Numpy automatically.

Load your code into a string, then simply run cymodule = cyper.inline(code_string), then your function is available as cymodule.sparsemaker instantaneously. Something like this

code = open(your_pyx_file).read()
cymodule = cyper.inline(code)

cymodule.sparsemaker(...)
# do what you want with your function

You can install cyper via pip install cyper.


如何构造包含Cython代码的Python包

问题:如何构造包含Cython代码的Python包

我想制作一个包含一些Cython代码的Python包。我的Cython代码运行良好。但是,现在我想知道如何最好地打包它。

对于大多数只想安装软件包的人,我想包括.cCython创建的文件,并安排对其setup.py进行编译以生成模块。然后,用户不需要安装Cython即可安装软件包。

但是对于那些可能想要修改程序包的人,我也想提供Cython .pyx文件,并以某种方式还允许setup.py使用Cython构建它们(因此这些用户需要安装Cython)。

我应该如何构造软件包中的文件以适应这两种情况?

用Cython文档提供了一些指导。但这并没有说明如何制作一个setup.py处理Cython情况的单例。

I’d like to make a Python package containing some Cython code. I’ve got the the Cython code working nicely. However, now I want to know how best to package it.

For most people who just want to install the package, I’d like to include the .c file that Cython creates, and arrange for setup.py to compile that to produce the module. Then the user doesn’t need Cython installed in order to install the package.

But for people who may want to modify the package, I’d also like to provide the Cython .pyx files, and somehow also allow for setup.py to build them using Cython (so those users would need Cython installed).

How should I structure the files in the package to cater for both these scenarios?

The Cython documentation gives a little guidance. But it doesn’t say how to make a single setup.py that handles both the with/without Cython cases.


回答 0

我现在已经在Python程序包simplerandomBitBucket repo-编辑:now github)中亲自完成了这个任务(我不希望这是一个受欢迎的程序包,但这是学习Cython的好机会)。

此方法依赖于以下事实:.pyx使用Cython.Distutils.build_ext(至少使用Cython版本0.14)构建文件似乎总是.c在与源.pyx文件相同的目录中创建文件。

这是一个精简版setup.py,我希望其中显示要点:

from distutils.core import setup
from distutils.extension import Extension

try:
    from Cython.Distutils import build_ext
except ImportError:
    use_cython = False
else:
    use_cython = True

cmdclass = {}
ext_modules = []

if use_cython:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.pyx"]),
    ]
    cmdclass.update({'build_ext': build_ext})
else:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.c"]),
    ]

setup(
    name='mypackage',
    ...
    cmdclass=cmdclass,
    ext_modules=ext_modules,
    ...
)

我还进行了编辑,MANIFEST.in以确保将mycythonmodule.c其包含在源分发中(使用创建的源分发python setup.py sdist):

...
recursive-include cython *
...

我不承诺mycythonmodule.c版本控制“ trunk”(或Mercurial的“ default”)。发布时,我需要记住先进行操作python setup.py build_ext,以确保mycythonmodule.c该源代码是最新的并且是最新的。我还创建了一个release分支,并将C文件提交到该分支中。这样,我就拥有与该发行版一起分发的C文件的历史记录。

I’ve done this myself now, in a Python package simplerandom (BitBucket repo – EDIT: now github) (I don’t expect this to be a popular package, but it was a good chance to learn Cython).

This method relies on the fact that building a .pyx file with Cython.Distutils.build_ext (at least with Cython version 0.14) always seems to create a .c file in the same directory as the source .pyx file.

Here is a cut-down version of setup.py which I hope shows the essentials:

from distutils.core import setup
from distutils.extension import Extension

try:
    from Cython.Distutils import build_ext
except ImportError:
    use_cython = False
else:
    use_cython = True

cmdclass = {}
ext_modules = []

if use_cython:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.pyx"]),
    ]
    cmdclass.update({'build_ext': build_ext})
else:
    ext_modules += [
        Extension("mypackage.mycythonmodule", ["cython/mycythonmodule.c"]),
    ]

setup(
    name='mypackage',
    ...
    cmdclass=cmdclass,
    ext_modules=ext_modules,
    ...
)

I also edited MANIFEST.in to ensure that mycythonmodule.c is included in a source distribution (a source distribution that is created with python setup.py sdist):

...
recursive-include cython *
...

I don’t commit mycythonmodule.c to version control ‘trunk’ (or ‘default’ for Mercurial). When I make a release, I need to remember to do a python setup.py build_ext first, to ensure that mycythonmodule.c is present and up-to-date for the source code distribution. I also make a release branch, and commit the C file into the branch. That way I have a historical record of the C file that was distributed with that release.


回答 1

克雷格·麦昆(Craig McQueen)的答案有所添加:请参见下文,了解如何覆盖sdist命令以使Cython在创建源代码分发之前自动编译源文件。

这样一来,您就可以避免意外分发过期C资源的风险。在您对分发过程的控制有限的情况下(例如,通过持续集成自动创建分发时),这也很有帮助。

from distutils.command.sdist import sdist as _sdist

...

class sdist(_sdist):
    def run(self):
        # Make sure the compiled Cython files in the distribution are up-to-date
        from Cython.Build import cythonize
        cythonize(['cython/mycythonmodule.pyx'])
        _sdist.run(self)
cmdclass['sdist'] = sdist

Adding to Craig McQueen’s answer: see below for how to override the sdist command to have Cython automatically compile your source files before creating a source distribution.

That way your run no risk of accidentally distributing outdated C sources. It also helps in the case where you have limited control over the distribution process e.g. when automatically creating distributions from continuous integration etc.

from distutils.command.sdist import sdist as _sdist

...

class sdist(_sdist):
    def run(self):
        # Make sure the compiled Cython files in the distribution are up-to-date
        from Cython.Build import cythonize
        cythonize(['cython/mycythonmodule.pyx'])
        _sdist.run(self)
cmdclass['sdist'] = sdist

回答 2

http://docs.cython.org/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules

强烈建议您分发生成的.c文件以及Cython源,以便用户无需使用Cython即可安装模块。

还建议您分发的版本中默认不启用Cython编译。即使用户安装了Cython,他也可能不想仅使用它来安装模块。另外,他使用的版本可能与您使用的版本不同,并且可能无法正确编译您的源代码。

这只是意味着您附带的setup.py文件将只是生成的.c文件上的常规distutils文件,对于基本示例,我们将使用:

from distutils.core import setup
from distutils.extension import Extension
 
setup(
    ext_modules = [Extension("example", ["example.c"])]
)

http://docs.cython.org/en/latest/src/userguide/source_files_and_compilation.html#distributing-cython-modules

It is strongly recommended that you distribute the generated .c files as well as your Cython sources, so that users can install your module without needing to have Cython available.

It is also recommended that Cython compilation not be enabled by default in the version you distribute. Even if the user has Cython installed, he probably doesn’t want to use it just to install your module. Also, the version he has may not be the same one you used, and may not compile your sources correctly.

This simply means that the setup.py file that you ship with will just be a normal distutils file on the generated .c files, for the basic example we would have instead:

from distutils.core import setup
from distutils.extension import Extension
 
setup(
    ext_modules = [Extension("example", ["example.c"])]
)

回答 3

最简单的方法是同时包含两者,而仅使用c文件?包括.pyx文件是不错的选择,但是无论如何只要有了.c文件就不需要了。想要重新编译.pyx的人可以安装Pyrex并手动进行。

否则,您需要有一个用于distutils的自定义build_ext命令,该命令首先生成C文件。Cython已经包含一个。http://docs.cython.org/src/userguide/source_files_and_compilation.html

该文档没有做的是说如何使其成为条件,但是

try:
     from Cython.distutils import build_ext
except ImportError:
     from distutils.command import build_ext

应该处理。

The easiest is to include both but just use the c-file? Including the .pyx file is nice, but it’s not needed once you have the .c file anyway. People who want to recompile the .pyx can install Pyrex and do it manually.

Otherwise you need to have a custom build_ext command for distutils that builds the C file first. Cython already includes one. http://docs.cython.org/src/userguide/source_files_and_compilation.html

What that documentation doesn’t do is say how to make this conditional, but

try:
     from Cython.distutils import build_ext
except ImportError:
     from distutils.command import build_ext

Should handle it.


回答 4

包含(Cython)生成的.c文件非常奇怪。尤其是当我们在git中包含它时。我更喜欢使用setuptools_cython。当Cython不可用时,它将构建一个具有内置Cython环境的鸡蛋,然后使用该鸡蛋构建代码。

一个可能的示例:https : //github.com/douban/greenify/blob/master/setup.py


更新(2017-01-05):

因为setuptools 18.0,没有必要使用setuptools_cython是一个从头开始构建Cython项目而无需的示例setuptools_cython

Including (Cython) generated .c files are pretty weird. Especially when we include that in git. I’d prefer to use setuptools_cython. When Cython is not available, it will build an egg which has built-in Cython environment, and then build your code using the egg.

A possible example: https://github.com/douban/greenify/blob/master/setup.py


Update(2017-01-05):

Since setuptools 18.0, there’s no need to use setuptools_cython. Here is an example to build Cython project from scratch without setuptools_cython.


回答 5

这是我编写的安装脚本,它使在构建中包括嵌套目录更加容易。需要从一个程序包中的文件夹运行它。

Givig结构如下:

__init__.py
setup.py
test.py
subdir/
      __init__.py
      anothertest.py

setup.py

from setuptools import setup, Extension
from Cython.Distutils import build_ext
# from os import path
ext_names = (
    'test',
    'subdir.anothertest',       
) 

cmdclass = {'build_ext': build_ext}
# for modules in main dir      
ext_modules = [
    Extension(
        ext,
        [ext + ".py"],            
    ) 
    for ext in ext_names if ext.find('.') < 0] 
# for modules in subdir ONLY ONE LEVEL DOWN!! 
# modify it if you need more !!!
ext_modules += [
    Extension(
        ext,
        ["/".join(ext.split('.')) + ".py"],     
    )
    for ext in ext_names if ext.find('.') > 0]

setup(
    name='name',
    ext_modules=ext_modules,
    cmdclass=cmdclass,
    packages=["base", "base.subdir"],
)
#  Build --------------------------
#  python setup.py build_ext --inplace

编译愉快;)

This is a setup script I wrote which makes it easier to include nested directories inside the build. One needs to run it from folder within a package.

Givig structure like this:

__init__.py
setup.py
test.py
subdir/
      __init__.py
      anothertest.py

setup.py

from setuptools import setup, Extension
from Cython.Distutils import build_ext
# from os import path
ext_names = (
    'test',
    'subdir.anothertest',       
) 

cmdclass = {'build_ext': build_ext}
# for modules in main dir      
ext_modules = [
    Extension(
        ext,
        [ext + ".py"],            
    ) 
    for ext in ext_names if ext.find('.') < 0] 
# for modules in subdir ONLY ONE LEVEL DOWN!! 
# modify it if you need more !!!
ext_modules += [
    Extension(
        ext,
        ["/".join(ext.split('.')) + ".py"],     
    )
    for ext in ext_names if ext.find('.') > 0]

setup(
    name='name',
    ext_modules=ext_modules,
    cmdclass=cmdclass,
    packages=["base", "base.subdir"],
)
#  Build --------------------------
#  python setup.py build_ext --inplace

Happy compiling ;)


回答 6

我想到的简单技巧:

from distutils.core import setup

try:
    from Cython.Build import cythonize
except ImportError:
    from pip import pip

    pip.main(['install', 'cython'])

    from Cython.Build import cythonize


setup(…)

如果无法导入,只需安装Cython。一个人可能不应该共享此代码,但是对于我自己的依赖关系来说已经足够了。

The simple hack I came up with:

from distutils.core import setup

try:
    from Cython.Build import cythonize
except ImportError:
    from pip import pip

    pip.main(['install', 'cython'])

    from Cython.Build import cythonize


setup(…)

Just install Cython if it could not be imported. One should probably not share this code, but for my own dependencies it’s good enough.


回答 7

所有其他答案都依赖

  • 发行版
  • 从导入Cython.Build,在通过cython setup_requires导入和导入cython之间会产生鸡与蛋的问题。

一种现代的解决方案是改用setuptools,请参见以下答案(自动处理Cython扩展需要setuptools 18.0,即,它已经可用了很多年)。setup.py具有需求处理,入口点和cython模块的现代标准可能如下所示:

from setuptools import setup, Extension

with open('requirements.txt') as f:
    requirements = f.read().splitlines()

setup(
    name='MyPackage',
    install_requires=requirements,
    setup_requires=[
        'setuptools>=18.0',  # automatically handles Cython extensions
        'cython>=0.28.4',
    ],
    entry_points={
        'console_scripts': [
            'mymain = mypackage.main:main',
        ],
    },
    ext_modules=[
        Extension(
            'mypackage.my_cython_module',
            sources=['mypackage/my_cython_module.pyx'],
        ),
    ],
)

All other answers either rely on

  • distutils
  • importing from Cython.Build, which creates a chicken-and-egg problem between requiring cython via setup_requires and importing it.

A modern solution is to use setuptools instead, see this answer (automatic handling of Cython extensions requires setuptools 18.0, i.e., it’s available for many years already). A modern standard setup.py with requirements handling, an entry point, and a cython module could look like this:

from setuptools import setup, Extension

with open('requirements.txt') as f:
    requirements = f.read().splitlines()

setup(
    name='MyPackage',
    install_requires=requirements,
    setup_requires=[
        'setuptools>=18.0',  # automatically handles Cython extensions
        'cython>=0.28.4',
    ],
    entry_points={
        'console_scripts': [
            'mymain = mypackage.main:main',
        ],
    },
    ext_modules=[
        Extension(
            'mypackage.my_cython_module',
            sources=['mypackage/my_cython_module.pyx'],
        ),
    ],
)

回答 8

我发现仅使用setuptools而非功能受限的distutils的最简单方法是

from setuptools import setup
from setuptools.extension import Extension
try:
    from Cython.Build import cythonize
except ImportError:
    use_cython = False
else:
    use_cython = True

ext_modules = []
if use_cython:
    ext_modules += cythonize('package/cython_module.pyx')
else:
    ext_modules += [Extension('package.cython_module',
                              ['package/cython_modules.c'])]

setup(name='package_name', ext_modules=ext_modules)

The easiest way I found using only setuptools instead of the feature limited distutils is

from setuptools import setup
from setuptools.extension import Extension
try:
    from Cython.Build import cythonize
except ImportError:
    use_cython = False
else:
    use_cython = True

ext_modules = []
if use_cython:
    ext_modules += cythonize('package/cython_module.pyx')
else:
    ext_modules += [Extension('package.cython_module',
                              ['package/cython_modules.c'])]

setup(name='package_name', ext_modules=ext_modules)

回答 9

我想我通过提供自定义build_ext命令找到了一种很好的方法。这个想法如下:

  1. 我通过重写finalize_options()import numpy在函数的主体中添加numpy标头,很好地避免了numpy在setup()安装之前不可用的问题。

  2. 如果cython在系统上可用,它将挂接到命令的check_extensions_list()方法中,并通过cython化所有过时的cython模块,将其替换为C扩展,稍后可通过该build_extension() 方法处理。我们也只是在模块中提供功能的后一部分:这意味着,如果cython不可用,但是我们有C扩展名,它仍然可以工作,从而可以进行源代码分发。

这是代码:

import re, sys, os.path
from distutils import dep_util, log
from setuptools.command.build_ext import build_ext

try:
    import Cython.Build
    HAVE_CYTHON = True
except ImportError:
    HAVE_CYTHON = False

class BuildExtWithNumpy(build_ext):
    def check_cython(self, ext):
        c_sources = []
        for fname in ext.sources:
            cname, matches = re.subn(r"(?i)\.pyx$", ".c", fname, 1)
            c_sources.append(cname)
            if matches and dep_util.newer(fname, cname):
                if HAVE_CYTHON:
                    return ext
                raise RuntimeError("Cython and C module unavailable")
        ext.sources = c_sources
        return ext

    def check_extensions_list(self, extensions):
        extensions = [self.check_cython(ext) for ext in extensions]
        return build_ext.check_extensions_list(self, extensions)

    def finalize_options(self):
        import numpy as np
        build_ext.finalize_options(self)
        self.include_dirs.append(np.get_include())

这样一来,人们就可以编写setup()参数而不必担心导入以及是否有可用的cython的问题:

setup(
    # ...
    ext_modules=[Extension("_my_fast_thing", ["src/_my_fast_thing.pyx"])],
    setup_requires=['numpy'],
    cmdclass={'build_ext': BuildExtWithNumpy}
    )

I think I found a pretty good way of doing this by providing a custom build_ext command. The idea is the following:

  1. I add the numpy headers by overriding finalize_options() and doing import numpy in the body of the function, which nicely avoids the problem of numpy not being available before setup() installs it.

  2. If cython is available on the system, it hooks into the command’s check_extensions_list() method and by cythonizes all out-of-date cython modules, replacing them with C extensions that can later handled by the build_extension() method. We just provide the latter part of the functionality in our module too: this means that if cython is not available but we have a C extension present, it still works, which allows you to do source distributions.

Here’s the code:

import re, sys, os.path
from distutils import dep_util, log
from setuptools.command.build_ext import build_ext

try:
    import Cython.Build
    HAVE_CYTHON = True
except ImportError:
    HAVE_CYTHON = False

class BuildExtWithNumpy(build_ext):
    def check_cython(self, ext):
        c_sources = []
        for fname in ext.sources:
            cname, matches = re.subn(r"(?i)\.pyx$", ".c", fname, 1)
            c_sources.append(cname)
            if matches and dep_util.newer(fname, cname):
                if HAVE_CYTHON:
                    return ext
                raise RuntimeError("Cython and C module unavailable")
        ext.sources = c_sources
        return ext

    def check_extensions_list(self, extensions):
        extensions = [self.check_cython(ext) for ext in extensions]
        return build_ext.check_extensions_list(self, extensions)

    def finalize_options(self):
        import numpy as np
        build_ext.finalize_options(self)
        self.include_dirs.append(np.get_include())

This allows one to just write the setup() arguments without worrying about imports and whether one has cython available:

setup(
    # ...
    ext_modules=[Extension("_my_fast_thing", ["src/_my_fast_thing.pyx"])],
    setup_requires=['numpy'],
    cmdclass={'build_ext': BuildExtWithNumpy}
    )

使用cython和mingw进行编译会产生gcc:错误:无法识别的命令行选项’-mno-cygwin’

问题:使用cython和mingw进行编译会产生gcc:错误:无法识别的命令行选项’-mno-cygwin’

我正在尝试使用mingw(64位)在win 7 64位中使用cython编译python扩展。
我正在使用Python 2.6(Active Python 2.6.6)和足够的distutils.cfg文件(将mingw设置为编译器)

执行时

> C:\Python26\programas\Cython>python setup.py build_ext --inplace

我收到一条错误消息,说gcc没有-mno-cygwin选项:

> C:\Python26\programas\Cython>python setup.py build_ext --inplace
running build_ext
skipping 'hello2.c' Cython extension (up-to-date)
building 'hello2' extension
C:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python26\include -IC:\Python26\PC -c hello2.c -o build\temp.win-amd64-2.6\Release\hello2.o
gcc: error: unrecognized command line option '-mno-cygwin'
error: command 'gcc' failed with exit status 1

gcc是:

C:\>gcc --version
gcc (GCC) 4.7.0 20110430 (experimental)
Copyright (C) 2011 Free Software Foundation, Inc.

我该如何解决?

I’m trying to compile a python extension with cython in win 7 64-bit using mingw (64-bit).
I’m working with Python 2.6 (Active Python 2.6.6) and with the adequate distutils.cfg file (setting mingw as the compiler)

When executing

> C:\Python26\programas\Cython>python setup.py build_ext --inplace

I get an error saying that gcc has not an -mno-cygwin option:

> C:\Python26\programas\Cython>python setup.py build_ext --inplace
running build_ext
skipping 'hello2.c' Cython extension (up-to-date)
building 'hello2' extension
C:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python26\include -IC:\Python26\PC -c hello2.c -o build\temp.win-amd64-2.6\Release\hello2.o
gcc: error: unrecognized command line option '-mno-cygwin'
error: command 'gcc' failed with exit status 1

gcc is:

C:\>gcc --version
gcc (GCC) 4.7.0 20110430 (experimental)
Copyright (C) 2011 Free Software Foundation, Inc.

How could I fix it?


回答 0

听起来好像GCC 4.7.0终于删除了不推荐使用的-mno-cygwin选项,但是distutils尚未赶上它。请安装较旧版本的MinGW,或者distutils\cygwinccompiler.py在Python目录中进行编辑以删除的所有实例-mno-cygwin

It sounds like GCC 4.7.0 has finally removed the deprecated -mno-cygwin option, but distutils has not yet caught up with it. Either install a slightly older version of MinGW, or edit distutils\cygwinccompiler.py in your Python directory to remove all instances of -mno-cygwin.


回答 1

在解决我发现的这些问题和以下问题的过程中,我在此线程中编写了一个配方。我在这里复制它,以防它可能对其他人有用:


在Win 7 64位中使用mingw编译器使用python 2.6.6逐步编译64位cython扩展的配方

安装mingw编译器
1)安装tdm64-gcc-4.5.2.exe进行64位编译

将补丁应用到python.h
2)按照http://bugs.python.org/file12411/mingw-w64.patch中的指示在C:\ python26 \ include中修改python.h

修改distutils
Edit 2013:注意,比python 2.7.6和3.3.3–mno-cygwin最终已被删除,因此可以跳过第3步

3)在Python26 \ Lib \ distutils \ cygwinccompiler.py中删除Mingw32CCompiler类中对gcc的调用中的所有参数-mno-cygwin
4)在同一模块中,修改get_msvcr()以返回空列表,而不是[‘msvcr90 ‘]当msc_ver ==’1500’时。

产生libpython26.a文件(64位python中不包括)
编辑2013:通过从gohlke下载并安装libpython26.a可以跳过以下步骤5-10

5)从mingw-w64-bin_x86_64- mingw_20101003_sezero.zip中获取gendef.exe(tmd64发行版中不提供gendef.exe。另一种解决方案是从源代码编译gendef …)
6)复制python26.dll(位于C中) \ windows \ system32)到用户目录(C:\ Users \ myname)
7)使用以下命令生成python26.def文件:

gendef.exe C:\ Users \ myname \ python26.dll

8)将生成的python.def文件(位于执行gendef的文件夹中)移至用户目录
9)生成libpython.a,其内容如下:

dlltool -v –dllname python26.dll –def C:\ Users \ myname \ python26.def –output-lib C:\ Users \ myname \ libpython26.a

10)将创建的libpython26.a移至C:\ Python26 \ libs

产生您的.pyd扩展名
。11)按照cython教程(http://docs.cython.org/src/quickstart/build.html)中的指示,创建一个hello.pyx测试文件和setup.py文件
。12)编译为

python setup.py build_ext –inplace

做完了!

During the process of solving these and the following problems I found, I wrote a recipe in this thread. I reproduce it here in case it could be of utility for others:


Step by step recipe to compile 64-bit cython extensions with python 2.6.6 with mingw compiler in win 7 64-bit

Install mingw compiler
1) Install tdm64-gcc-4.5.2.exe for 64-bit compilation

Apply patch to python.h
2) Modify python.h in C:\python26\include as indicated in http://bugs.python.org/file12411/mingw-w64.patch

Modify distutils
Edit 2013: Note than in python 2.7.6 and 3.3.3 -mno-cygwin has been finally removed so step 3 can be skipped.

3) Eliminate all the parameters -mno-cygwin fom the call to gcc in the Mingw32CCompiler class in Python26\Lib\distutils\cygwinccompiler.py
4) In the same module, modify get_msvcr() to return an empty list instead of [‘msvcr90’] when msc_ver == ‘1500’ .

Produce the libpython26.a file (not included in 64 bit python)
Edit 2013: the following steps 5-10 can be skipped by downloading and installing libpython26.a from gohlke.

5) Obtain gendef.exe from mingw-w64-bin_x86_64- mingw_20101003_sezero.zip (gendef.exe is not available in the tmd64 distribution. Another solution is to compile gendef from source…)
6) Copy python26.dll (located at C\windows\system32) to the user directory (C:\Users\myname)
7) Produce the python26.def file with:

gendef.exe C:\Users\myname\python26.dll

8) Move the python.def file produced (located in the folder from where gendef was executed) to the user directory
9) Produce the libpython.a with:

dlltool -v –dllname python26.dll –def C:\Users\myname \python26.def –output-lib C:\Users\myname\libpython26.a

10) Move the created libpython26.a to C:\Python26\libs

Produce your .pyd extension
11) Create a test hello.pyx file and a setup.py file as indicated in cython tutorial (http://docs.cython.org/src/quickstart/build.html)
12) Compile with

python setup.py build_ext –inplace

Done!


回答 2

现在,该错误已在Python 2.7.6版本候选1中修复。

补丁提交在这里

已解决的问题跟踪器线程在此处

This bug has now been fixed in Python 2.7.6 release candidate 1.

The patching commit is here.

The resolved issue tracker thread is here.


回答 3

试试这个 。它确实适用于错误
https://github.com/develersrl/gccwinbinaries


在Python中包装C库:C,Cython或ctypes?

问题:在Python中包装C库:C,Cython或ctypes?

我想从Python应用程序调用C库。我不想包装整个API,只包装与我的情况相关的函数和数据类型。如我所见,我有三个选择:

  1. 在C中创建一个实际的扩展模块。可能有点过头了,我还想避免学习扩展编写的开销。
  2. 使用Cython将C库的相关部分公开给Python。
  3. 使用Python ctypes与外部库进行通信,从而完成整个工作。

我不确定2)还是3)是更好的选择。3)的优点是它ctypes是标准库的一部分,并且生成的代码将是纯Python –尽管我不确定该优点实际上有多大。

两种选择都有其他优点/缺点吗?您推荐哪种方法?


编辑:感谢您的所有答复,它们为希望做类似事情的任何人提供了很好的资源。当然,仍需针对单个案例做出决定-没有人会回答“这是对的”。就我自己而言,我可能会使用ctypes,但我也期待在其他项目中试用Cython。

由于没有一个单一的真实答案,因此接受一个答案有点武断。我选择了FogleBird的答案,因为它提供了对ctypes的一些很好的了解,并且它也是当前投票最高的答案。但是,我建议您阅读所有答案以获得一个很好的概述。

再次感谢。

I want to call a C library from a Python application. I don’t want to wrap the whole API, only the functions and datatypes that are relevant to my case. As I see it, I have three choices:

  1. Create an actual extension module in C. Probably overkill, and I’d also like to avoid the overhead of learning extension writing.
  2. Use Cython to expose the relevant parts from the C library to Python.
  3. Do the whole thing in Python, using ctypes to communicate with the external library.

I’m not sure whether 2) or 3) is the better choice. The advantage of 3) is that ctypes is part of the standard library, and the resulting code would be pure Python – although I’m not sure how big that advantage actually is.

Are there more advantages / disadvantages with either choice? Which approach do you recommend?


Edit: Thanks for all your answers, they provide a good resource for anyone looking to do something similar. The decision, of course, is still to be made for the single case—there’s no one “This is the right thing” sort of answer. For my own case, I’ll probably go with ctypes, but I’m also looking forward to trying out Cython in some other project.

With there being no single true answer, accepting one is somewhat arbitrary; I chose FogleBird’s answer as it provides some good insight into ctypes and it currently also is the highest-voted answer. However, I suggest to read all the answers to get a good overview.

Thanks again.


回答 0

ctypes 是快速完成它的最佳选择,并且在仍在编写Python的情况下很高兴与您合作!

我最近包装了一个FTDI驱动程序,用于使用ctypes与USB芯片进行通信,这很棒。我完成了所有工作,并在不到一个工作日的时间内完成了工作。(我只实现了我们需要的功能,大约有15个功能)。

出于同一目的,我们以前使用的是第三方模块PyUSB。PyUSB是实际的C / Python扩展模块。但是PyUSB在阻止读写时并没有释放GIL,这给我们带来了麻烦。因此,我使用ctypes编写了自己的模块,该模块在调用本机函数时会释放GIL。

需要注意的一件事是,ctypes不会知道#define所使用的库中的常数和内容,而仅是函数,因此您必须在自己的代码中重新定义这些常数。

这是一个代码最终的外观示例(很多内容被删除,只是试图向您展示其要旨):

from ctypes import *

d2xx = WinDLL('ftd2xx')

OK = 0
INVALID_HANDLE = 1
DEVICE_NOT_FOUND = 2
DEVICE_NOT_OPENED = 3

...

def openEx(serial):
    serial = create_string_buffer(serial)
    handle = c_int()
    if d2xx.FT_OpenEx(serial, OPEN_BY_SERIAL_NUMBER, byref(handle)) == OK:
        return Handle(handle.value)
    raise D2XXException

class Handle(object):
    def __init__(self, handle):
        self.handle = handle
    ...
    def read(self, bytes):
        buffer = create_string_buffer(bytes)
        count = c_int()
        if d2xx.FT_Read(self.handle, buffer, bytes, byref(count)) == OK:
            return buffer.raw[:count.value]
        raise D2XXException
    def write(self, data):
        buffer = create_string_buffer(data)
        count = c_int()
        bytes = len(data)
        if d2xx.FT_Write(self.handle, buffer, bytes, byref(count)) == OK:
            return count.value
        raise D2XXException

有人对各种选项做了一些基准测试

如果不得不包装带有许多类/模板/等的C ++库,我可能会更加犹豫。但是ctypes可以很好地与结构配合使用,甚至可以回调到Python中。

ctypes is your best bet for getting it done quickly, and it’s a pleasure to work with as you’re still writing Python!

I recently wrapped an FTDI driver for communicating with a USB chip using ctypes and it was great. I had it all done and working in less than one work day. (I only implemented the functions we needed, about 15 functions).

We were previously using a third-party module, PyUSB, for the same purpose. PyUSB is an actual C/Python extension module. But PyUSB wasn’t releasing the GIL when doing blocking reads/writes, which was causing problems for us. So I wrote our own module using ctypes, which does release the GIL when calling the native functions.

One thing to note is that ctypes won’t know about #define constants and stuff in the library you’re using, only the functions, so you’ll have to redefine those constants in your own code.

Here’s an example of how the code ended up looking (lots snipped out, just trying to show you the gist of it):

from ctypes import *

d2xx = WinDLL('ftd2xx')

OK = 0
INVALID_HANDLE = 1
DEVICE_NOT_FOUND = 2
DEVICE_NOT_OPENED = 3

...

def openEx(serial):
    serial = create_string_buffer(serial)
    handle = c_int()
    if d2xx.FT_OpenEx(serial, OPEN_BY_SERIAL_NUMBER, byref(handle)) == OK:
        return Handle(handle.value)
    raise D2XXException

class Handle(object):
    def __init__(self, handle):
        self.handle = handle
    ...
    def read(self, bytes):
        buffer = create_string_buffer(bytes)
        count = c_int()
        if d2xx.FT_Read(self.handle, buffer, bytes, byref(count)) == OK:
            return buffer.raw[:count.value]
        raise D2XXException
    def write(self, data):
        buffer = create_string_buffer(data)
        count = c_int()
        bytes = len(data)
        if d2xx.FT_Write(self.handle, buffer, bytes, byref(count)) == OK:
            return count.value
        raise D2XXException

Someone did some benchmarks on the various options.

I might be more hesitant if I had to wrap a C++ library with lots of classes/templates/etc. But ctypes works well with structs and can even callback into Python.


回答 1

警告:Cython核心开发人员的意见。

我几乎总是建议Cython胜过ctypes。原因是它具有更平滑的升级路径。如果使用ctypes,一开始很多事情都会很简单,用纯Python编写FFI代码当然很酷,而无需编译,构建依赖关系以及所有这些。但是,在某个时候,您几乎可以肯定会发现,您必须循环或以较长的一系列相互依赖的调用方式大量调用C库,并且您希望加快速度。在这一点上,您会注意到无法使用ctypes做到这一点。或者,当您需要回调函数并且发现Python回调代码成为瓶颈时,您也想加快它的速度和/或也将它移入C。同样,您不能使用ctypes做到这一点。

使用OTOH的Cython,您可以完全自由地使包装和调用代码变薄或变厚。您可以从常规Python代码对C代码的简单调用开始,然后Cython会将它们转换为本地C调用,而没有任何额外的调用开销,并且Python参数的转换开销非常低。当您发现需要对C库进行太多昂贵的调用时甚至需要更高的性能时,可以开始使用静态类型注释周围的Python代码,并让Cython为您直接对其进行优化。或者,您可以开始在Cython中重写部分C代码,以避免调用并在算法上专门化和加强循环。如果您需要快速回调,只需编写具有适当签名的函数,然后将其直接传递到C回调注册表即可。再次,没有开销,并且它为您提供了普通的C调用性能。而且在不太可能发生的情况下,您实际上无法在Cython中获得足够快的代码,您仍然可以考虑使用C(或C ++或Fortran)重写其真正关键的部分,并自然地从本地从Cython代码中调用它。但是,这实际上成了最后的选择,而不是唯一的选择。

因此,ctypes可以很好地完成简单的事情并快速使某些事情运行。但是,一旦事情开始发展,您很可能会发现您最好从一开始就使用Cython。

Warning: a Cython core developer’s opinion ahead.

I almost always recommend Cython over ctypes. The reason is that it has a much smoother upgrade path. If you use ctypes, many things will be simple at first, and it’s certainly cool to write your FFI code in plain Python, without compilation, build dependencies and all that. However, at some point, you will almost certainly find that you have to call into your C library a lot, either in a loop or in a longer series of interdependent calls, and you would like to speed that up. That’s the point where you’ll notice that you can’t do that with ctypes. Or, when you need callback functions and you find that your Python callback code becomes a bottleneck, you’d like to speed it up and/or move it down into C as well. Again, you cannot do that with ctypes. So you have to switch languages at that point and start rewriting parts of your code, potentially reverse engineering your Python/ctypes code into plain C, thus spoiling the whole benefit of writing your code in plain Python in the first place.

With Cython, OTOH, you’re completely free to make the wrapping and calling code as thin or thick as you want. You can start with simple calls into your C code from regular Python code, and Cython will translate them into native C calls, without any additional calling overhead, and with an extremely low conversion overhead for Python parameters. When you notice that you need even more performance at some point where you are making too many expensive calls into your C library, you can start annotating your surrounding Python code with static types and let Cython optimise it straight down into C for you. Or, you can start rewriting parts of your C code in Cython in order to avoid calls and to specialise and tighten your loops algorithmically. And if you need a fast callback, just write a function with the appropriate signature and pass it into the C callback registry directly. Again, no overhead, and it gives you plain C calling performance. And in the much less likely case that you really cannot get your code fast enough in Cython, you can still consider rewriting the truly critical parts of it in C (or C++ or Fortran) and call it from your Cython code naturally and natively. But then, this really becomes the last resort instead of the only option.

So, ctypes is nice to do simple things and to quickly get something running. However, as soon as things start to grow, you’ll most likely come to the point where you notice that you’d better used Cython right from the start.


回答 2

Cython本身就是一个很酷的工具,值得学习,并且令人惊讶地接近Python语法。如果您使用Numpy进行任何科学计算,那么Cython是必经之路,因为它与Numpy集成在一起可实现快速矩阵运算。

Cython是Python语言的超集。您可以向其抛出任何有效的Python文件,它将吐出有效的C程序。在这种情况下,Cython只会将Python调用映射到基础CPython API。由于不再解释您的代码,因此可能导致50%的加速。

为了进行一些优化,您必须开始告诉Cython有关代码的其他事实,例如类型声明。如果讲得足够多,它可以将代码简化为纯C。也就是说,Python中的for循环变成C中的for循环。在这里,您将看到大量的速度提升。您也可以在此处链接到外部C程序。

使用Cython代码也非常容易。我以为手册听起来很难。您实际上只是这样做:

$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so

然后您可以import mymodule在您的Python代码中完全忘记了它可以编译为C。

无论如何,由于Cython易于安装和开始使用,因此建议您尝试一下它是否适合您的需求。如果事实证明不是您要寻找的工具,那将不是浪费。

Cython is a pretty cool tool in itself, well worth learning, and is surprisingly close to the Python syntax. If you do any scientific computing with Numpy, then Cython is the way to go because it integrates with Numpy for fast matrix operations.

Cython is a superset of Python language. You can throw any valid Python file at it, and it will spit out a valid C program. In this case, Cython will just map the Python calls to the underlying CPython API. This results in perhaps a 50% speedup because your code is no longer interpreted.

To get some optimizations, you have to start telling Cython additional facts about your code, such as type declarations. If you tell it enough, it can boil the code down to pure C. That is, a for loop in Python becomes a for loop in C. Here you will see massive speed gains. You can also link to external C programs here.

Using Cython code is also incredibly easy. I thought the manual makes it sound difficult. You literally just do:

$ cython mymodule.pyx
$ gcc [some arguments here] mymodule.c -o mymodule.so

and then you can import mymodule in your Python code and forget entirely that it compiles down to C.

In any case, because Cython is so easy to setup and start using, I suggest trying it to see if it suits your needs. It won’t be a waste if it turns out not to be the tool you’re looking for.


回答 3

为了从Python应用程序调用C库,还有cffi,它是ctypes的新选择。它为FFI带来了全新外观:

  • 它以一种引人入胜,干净的方式处理问题(与ctypes相对
  • 它不需要编写非Python代码(如SWIG,Cython等)

For calling a C library from a Python application there is also cffi which is a new alternative for ctypes. It brings a fresh look for FFI:

  • it handles the problem in a fascinating, clean way (as opposed to ctypes)
  • it doesn’t require to write non Python code (as in SWIG, Cython, …)

回答 4

我再扔一个:SWIG

它易于学习,可以正确完成许多事情,并且支持更多语言,因此花时间学习它会非常有用。

如果您使用SWIG,则将创建一个新的python扩展模块,但是SWIG将为您完成大部分繁重的工作。

I’ll throw another one out there: SWIG

It’s easy to learn, does a lot of things right, and supports many more languages so the time spent learning it can be pretty useful.

If you use SWIG, you are creating a new python extension module, but with SWIG doing most of the heavy lifting for you.


回答 5

就个人而言,我会用C语言编写一个扩展模块。不要被Python C扩展吓到了-它们一点也不难编写。该文档非常清楚并且很有帮助。当我第一次用Python编写C扩展时,我认为花了大约一个小时才弄清楚如何编写一个-根本没有多少时间。

Personally, I’d write an extension module in C. Don’t be intimidated by Python C extensions — they’re not hard at all to write. The documentation is very clear and helpful. When I first wrote a C extension in Python, I think it took me about an hour to figure out how to write one — not much time at all.


回答 6

当您已经有编译的库blob要处理时(例如OS库),ctypes很棒。但是,调用开销很大,因此,如果您要在库中进行大量调用,并且无论如何都要编写C代码(或至少要编译它),我会说赛顿。它不需要太多工作,并且使用生成的pyd文件会更快,更pythonic。

我个人倾向于使用cython来加快python代码的速度(循环和整数比较是cython尤为突出的两个领域),并且当涉及到更多涉及代码/其他库的包装时,我将转向Boost.Python。。Boost.Python的设置可能很繁琐,但是一旦它开始工作,它将使包装C / C ++代码变得简单。

cython也非常擅长包装numpy(这是我从SciPy 2009程序中了解到的),但是我没有使用numpy,所以我无法对此发表评论。

ctypes is great when you’ve already got a compiled library blob to deal with (such as OS libraries). The calling overhead is severe, however, so if you’ll be making a lot of calls into the library, and you’re going to be writing the C code anyway (or at least compiling it), I’d say to go for cython. It’s not much more work, and it’ll be much faster and more pythonic to use the resulting pyd file.

I personally tend to use cython for quick speedups of python code (loops and integer comparisons are two areas where cython particularly shines), and when there is some more involved code/wrapping of other libraries involved, I’ll turn to Boost.Python. Boost.Python can be finicky to set up, but once you’ve got it working, it makes wrapping C/C++ code straightforward.

cython is also great at wrapping numpy (which I learned from the SciPy 2009 proceedings), but I haven’t used numpy, so I can’t comment on that.


回答 7

如果您已经有一个带有定义的API的库,我认为 ctypes是最好的选择,因为您只需要进行一些初始化,然后或多或少以您习惯的方式调用该库。

我认为Cython或用C创建扩展模块(这不是很困难)在需要新代码时更有用,例如,调用该库并执行一些复杂,耗时的任务,然后将结果传递给Python。

对于简单程序,另一种方法是直接执行不同的过程(在外部编译),将结果输出到标准输出,并使用子过程模块进行调用。有时,这是最简单的方法。

例如,如果您使控制台C程序或多或少地以这种方式工作

$miCcode 10
Result: 12345678

您可以从Python调用它

>>> import subprocess
>>> p = subprocess.Popen(['miCcode', '10'], shell=True, stdout=subprocess.PIPE)
>>> std_out, std_err = p.communicate()
>>> print std_out
Result: 12345678

只需一点点字符串格式化,您就可以按照您想要的任何方式获取结果。您还可以捕获标准错误输出,因此非常灵活。

If you have already a library with a defined API, I think ctypes is the best option, as you only have to do a little initialization and then more or less call the library the way you’re used to.

I think Cython or creating an extension module in C (which is not very difficult) are more useful when you need new code, e.g. calling that library and do some complex, time-consuming tasks, and then passing the result to Python.

Another approach, for simple programs, is directly do a different process (compiled externally), outputting the result to standard output and call it with subprocess module. Sometimes it’s the easiest approach.

For example, if you make a console C program that works more or less that way

$miCcode 10
Result: 12345678

You could call it from Python

>>> import subprocess
>>> p = subprocess.Popen(['miCcode', '10'], shell=True, stdout=subprocess.PIPE)
>>> std_out, std_err = p.communicate()
>>> print std_out
Result: 12345678

With a little string formating, you can take the result in any way you want. You can also capture the standard error output, so it’s quite flexible.


回答 8

有一个问题使我使用ctypes而不是cython,其他答案中未提及。

使用ctypes的结果根本不取决于您使用的编译器。您可以或多或少地使用可以编译为本机共享库的任何语言来编写库。哪种系统,哪种语言和哪种编译器都没关系。但是,Cython受基础架构的限制。例如,如果您想在Windows上使用Intel编译器,则使cython正常工作要困难得多:您应将cython解释为cython,使用此精确的编译器重新编译某些内容,等等。这大大限制了可移植性。

There is one issue which made me use ctypes and not cython and which is not mentioned in other answers.

Using ctypes the result does not depend on compiler you are using at all. You may write a library using more or less any language which may be compiled to native shared library. It does not matter much, which system, which language and which compiler. Cython, however, is limited by the infrastructure. E.g, if you want to use intel compiler on windows, it is much more tricky to make cython work: you should “explain” compiler to cython, recompile something with this exact compiler, etc. Which significantly limits portability.


回答 9

如果您以Windows为目标并且选择包装一些专有的C ++库,那么您很快就会发现msvcrt***.dll(Visual C ++ Runtime)的不同版本略有不兼容。

这意味着您可能无法使用,Cython因为结果wrapper.pyd链接到msvcr90.dll (Python 2.7)msvcr100.dll (Python 3.x)。如果要包装的库是针对不同版本的运行时链接的,那么您就不走运了。

然后,要使工作正常,您需要为C ++库创建C包装程序,将该包装程序dll链接到与msvcrt***.dllC ++库相同的版本。然后用于ctypes在运行时动态加载手动包装的dll。

因此,有很多小细节,下面的文章中将对其进行详细描述:

“美丽的本地库(使用Python) ”:http : //lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/

If you are targeting Windows and choose to wrap some proprietary C++ libraries, then you may soon discover that different versions of msvcrt***.dll (Visual C++ Runtime) are slightly incompatible.

This means that you may not be able to use Cython since resulting wrapper.pyd is linked against msvcr90.dll (Python 2.7) or msvcr100.dll (Python 3.x). If the library that you are wrapping is linked against different version of runtime, then you’re out of luck.

Then to make things work you’ll need to create C wrappers for C++ libraries, link that wrapper dll against the same version of msvcrt***.dll as your C++ library. And then use ctypes to load your hand-rolled wrapper dll dynamically at the runtime.

So there are lots of small details, which are described in great detail in following article:

“Beautiful Native Libraries (in Python)“: http://lucumr.pocoo.org/2013/8/18/beautiful-native-libraries/


回答 10

对于使用GLib的库,也有可能使用GObject Introspection

There’s also one possibility to use GObject Introspection for libraries that are using GLib.


回答 11

我知道这是一个老问题,但是当您搜索诸如之类的东西时,这件事就会出现在Google上ctypes vs cython,并且这里的大多数答案都是由精通这些知识的人编写的,cython或者c可能无法反映您学习这些所需要的实际时间实施您的解决方案。我都是这方面的初学者。我以前从未接触过cython,并且经验很少c/c++

在过去的两天里,我一直在寻找一种方法来将代码中性能很重要的部分委托给比python更底层的东西。我在ctypes和中都实现了我的代码Cython,该代码基本上由两个简单的函数组成。

我有一个庞大的字符串列表需要处理。注意liststring。这两种类型都不完全对应于中的类型c,因为默认情况下python字符串是unicode,而c字符串不是。python中的列表根本不是c的数组。

这是我的判决。使用cython。它与python的集成更加流畅,并且通常更易于使用。当出现问题时,ctypes只会引发段错误,至少cython会在可能的情况下为您提供带有堆栈跟踪的编译警告,并且您可以使用轻松返回有效的python对象cython

这是有关我需要花多少时间来实现这两个功能的详细说明。顺便说一下,我很少进行C / C ++编程:

  • C类型:

    • 关于研究如何将unicode字符串列表转换为ac兼容类型的大约2小时。
    • 关于如何从ac函数正确返回字符串的大约一个小时。在编写函数之后,实际上我在这里为SO提供了自己的解决方案。
    • 用c编写代码大约半小时,然后将其编译到动态库中。
    • 10分钟在python中编写测试代码以检查c代码是否有效。
    • 做一些测试并重新排列c代码大约一个小时。
    • 然后,我将c代码插入到实际的代码库中,发现该模块ctypes不能很好地与multiprocessing模块配合使用,因为默认情况下无法选择其处理程序。
    • 大约20分钟后,我重新排列了代码以不使用multiprocessing模块,然后重试。
    • 然后,c尽管我的代码中的第二个功能通过了我的测试代码,但仍在我的代码库中产生了段错误。好吧,这可能是我不能很好地检查边缘情况的错,我一直在寻找一种快速的解决方案。
    • 在大约40分钟的时间里,我试图确定这些段错误的可能原因。
    • 我将函数分为两个库,然后重试。我的第二个功能仍然存在段错误。
    • 我决定放开第二个函数,只使用c代码的第一个函数,并且在使用它的python循环的第二或第三次迭代中,UnicodeError尽管我进行了一切编码和解码,但我仍未在某个位置解码字节明确地

在这一点上,我决定寻找一种替代方法,并决定研究cython

  • 赛顿
    • 阅读cython hello world的 10分钟。
    • 15分钟内检查SO如何使用cython setuptools代替distutils
    • 关于cython类型和python类型的阅读,需要 10分钟。我了解到我可以使用大多数内置的python类型进行静态输入。
    • 用cython类型重新注释我的python代码的15分钟。
    • 修改setup.py我的代码库中使用编译模块的10分钟。
    • 将模块直接插入multiprocessing代码库的版本。有用。

根据记录,我当然没有衡量投资的确切时机。很可能是由于我在处理ctypes时需要付出精神上的努力,所以我对时间的感觉有些不专心。但是,应该传达处理的手感cythonctypes

I know this is an old question but this thing comes up on google when you search stuff like ctypes vs cython, and most of the answers here are written by those who are proficient already in cython or c which might not reflect the actual time you needed to invest to learn those to implement your solution. I am a complete beginner in both. I have never touched cython before, and have very little experience on c/c++.

For the last two days, I was looking for a way to delegate a performance heavy part of my code to something more low level than python. I implemented my code both in ctypes and Cython, which consisted basically of two simple functions.

I had a huge string list that needed to processed. Notice list and string. Both types do not correspond perfectly to types in c, because python strings are by default unicode and c strings are not. Lists in python are simply NOT arrays of c.

Here is my verdict. Use cython. It integrates more fluently to python, and easier to work with in general. When something goes wrong ctypes just throws you segfault, at least cython will give you compile warnings with a stack trace whenever it is possible, and you can return a valid python object easily with cython.

Here is a detailed account on how much time I needed to invest in both them to implement the same function. I did very little C/C++ programming by the way:

  • Ctypes:

    • About 2h on researching how to transform my list of unicode strings to a c compatible type.
    • About an hour on how to return a string properly from a c function. Here I actually provided my own solution to SO once I have written the functions.
    • About half an hour to write the code in c, compile it to a dynamic library.
    • 10 minutes to write a test code in python to check if c code works.
    • About an hour of doing some tests and rearranging the c code.
    • Then I plugged the c code into actual code base, and saw that ctypes does not play well with multiprocessing module as its handler is not pickable by default.
    • About 20 minutes I rearranged my code to not use multiprocessing module, and retried.
    • Then second function in my c code generated segfaults in my code base although it passed my testing code. Well, this is probably my fault for not checking well with edge cases, I was looking for a quick solution.
    • For about 40 minutes I tried to determine possible causes of these segfaults.
    • I split my functions into two libraries and tried again. Still had segfaults for my second function.
    • I decided to let go of the second function and use only the first function of c code and at the second or third iteration of the python loop that uses it, I had a UnicodeError about not decoding a byte at the some position though I encoded and decoded everthing explicitely.

At this point, I decided to search for an alternative and decided to look into cython:

  • Cython
    • 10 min of reading cython hello world.
    • 15 min of checking SO on how to use cython with setuptools instead of distutils.
    • 10 min of reading on cython types and python types. I learnt I can use most of the builtin python types for static typing.
    • 15 min of reannotating my python code with cython types.
    • 10 min of modifying my setup.py to use compiled module in my codebase.
    • Plugged in the module directly to the multiprocessing version of codebase. It works.

For the record, I of course, did not measure the exact timings of my investment. It may very well be the case that my perception of time was a little to attentive due to mental effort required while I was dealing with ctypes. But it should convey the feel of dealing with cython and ctypes


Borg-使用压缩和身份验证加密对服务器执行数据消除

更多截屏视频:installationadvanced usage

什么是BorgBackup?

BorgBackup(简称:Borg)是一个重复数据消除备份程序。或者,它支持压缩和经过身份验证的加密

Borg的主要目标是提供一种高效、安全的数据备份方式。由于只存储更改,因此使用的重复数据消除技术使Borg适用于日常备份。经过身份验证的加密技术使其适用于备份到不完全受信任的目标

请参阅installation manual或者,如果您已经下载了Borg,docs/installation.rst开始学习博格。还有一个offline documentation可用,有多种格式

主要特点

节省空间的存储
基于内容定义的区块的重复数据消除用于减少存储的字节数:每个文件被拆分成多个可变长度的区块,并且只有以前从未见过的区块才会添加到存储库中

如果一个块的id_hash值相同,则认为它是重复的。密码强散列或MAC函数用作id_hash,例如(HMAC-)sha256

要执行重复数据消除,将考虑同一存储库中的所有区块,无论它们来自不同的计算机、来自以前的备份、来自相同的备份,甚至来自相同的单个文件

与其他重复数据消除方法相比,此方法不依赖于:

  • 文件/目录名称保持不变:因此,即使在共享回收站的计算机之间,您也可以在不影响重复数据删除的情况下四处移动数据
  • 完整的文件或时间戳保持不变:如果大文件稍有更改,则只需存储几个新的区块-这对虚拟机或原始磁盘非常有用
  • 数据块在文件内的绝对位置:填充可能会发生移位,但重复数据消除算法仍会找到该位置
速度
  • 性能关键型代码(分块、压缩、加密)是用C/Cython实现的
  • 文件/块索引数据的本地缓存
  • 快速检测未修改的文件
数据加密
所有数据均可使用256位AES加密进行保护,数据完整性和真实性使用HMAC-SHA256进行验证。数据是加密的客户端
模糊处理
可选地,Borg可以主动地模糊例如文件/块的大小,以使指纹攻击更加困难
压缩
可以选择压缩所有数据:
  • LZ4(超高速、低压缩)
  • zstd(从高速低压缩到高压缩低速的大范围)
  • zlib(中速和压缩)
  • LZMA(低速、高压缩)
异地备份
Borg可以将数据存储在可通过SSH访问的任何远程主机上。如果在远程主机上安装了Borg,与使用网络文件系统(sshfs、nfs、.)相比,可以获得很大的性能提升。
可装载为文件系统的备份
备份归档可作为用户空间文件系统挂载,以便轻松进行交互式备份检查和恢复(例如,使用常规文件管理器)
在多个平台上轻松安装
我们提供不需要安装任何内容的单文件二进制文件-您只需在以下平台上运行它们:
  • Linux操作系统
  • Mac OS X
  • FreeBSD
  • OpenBSD和NetBSD(尚不支持xattrs/ACL或二进制文件)
  • Cygwin(试验性的,目前还没有二进制文件)
  • Windows 10的Linux子系统(实验性)
自由开放源码软件
  • 安全性和功能可以独立审核
  • 在BSD(3条款)许可下获得许可,请参阅License获取完整的许可证

易于使用

初始化新的备份存储库(请参见borg init --help对于加密选项):

$ borg init -e repokey /path/to/repo

创建备份存档:

$ borg create /path/to/repo::Saturday1 ~/Documents

现在再做一次备份,只是为了炫耀一下伟大的重复数据消除功能:

$ borg create -v --stats /path/to/repo::Saturday2 ~/Documents
-----------------------------------------------------------------------------
Archive name: Saturday2
Archive fingerprint: 622b7c53c...
Time (start): Sat, 2016-02-27 14:48:13
Time (end):   Sat, 2016-02-27 14:48:14
Duration: 0.88 seconds
Number of files: 163
-----------------------------------------------------------------------------
               Original size      Compressed size    Deduplicated size
This archive:        6.85 MB              6.85 MB             30.79 kB  <-- !
All archives:       13.69 MB             13.71 MB              6.88 MB

               Unique chunks         Total chunks
Chunk index:             167                  330
-----------------------------------------------------------------------------

有关图形前端,请参阅我们的补充项目BorgWeb

帮助,捐款,施舍,成为赞助人

我们随时欢迎您的帮助!

传播信息、提供反馈、帮助编写文档、测试或开发

你也可以给这个项目提供资金支持,详情请看那里:

https://www.borgbackup.org/support/fund.html

链接

兼容性说明

预计当主要版本号更改时(如从0.x.y到1.0.0或从1.x.y到2.0.0),我们会反复破坏兼容性

未发布的开发版本具有未知的兼容性属性

这是正在开发的软件,您自己决定它是否适合您的需求

安全问题应报告给Security contact(或参阅docs/support.rst在源代码分发中)

SpaCy-基于Python语言的💫工业级自然语言处理

Spacy:工业实力NLP

Spacy是一个图书馆,用于高级自然语言处理在Python和Cython中。它建立在最新的研究基础上,从第一天起就被设计用于真正的产品中

Spacy伴随着pretrained pipelines,并且当前支持标记化和培训60多种语言它的特点是最先进的速度和神经网络模型用于标记、解析命名实体识别文本分类更重要的是,通过预先培训实现多任务学习变形金刚像伯特一样,也是一个随时准备生产的training system以及轻松的模型打包、部署和工作流管理。Spacy是在麻省理工学院许可下发布的商业开源软件

💫现在发布3.0版!
Check out the release notes here.

📖文档

文档
⭐️spaCy 101 对Spacy来说是新手吗?这是你需要知道的一切!
📚Usage Guides 如何使用Spacy及其功能
🚀New in v3.0 新功能、向后不兼容性和迁移指南
🪐Project Templates 您可以克隆、修改和运行端到端工作流
🎛API Reference Spacy的API的详细参考资料
📦Models 为Spacy下载经过培训的管道
🌌Universe 来自Spacy生态系统的插件、扩展、演示和书籍
👩‍🏫Online Course 在这个免费的交互式在线课程中学习Spacy
📺Videos 我们的YouTube频道提供视频教程、讲座等
🛠Changelog 更改和版本历史记录
💝Contribute 如何为Spacy项目和代码库做出贡献

💬在哪里提问?

Spacy项目由@honnibal@ines@svlandeg@adrianeboyd@polm请理解,我们将不能通过电子邮件提供个人支持。我们还相信,如果帮助被公开分享,那么它就更有价值,这样更多的人就可以从中受益

类型 站台
🚨错误报告 GitHub Issue Tracker
🎁功能要求和想法 GitHub Discussions
👩‍💻使用问题 GitHub Discussions·Stack Overflow
🗯一般性讨论 GitHub Discussions

功能

  • 支持60多种语言
  • 训练有素的管道对于不同的语言和任务
  • 带预训练的多任务学习变形金刚像伯特一样
  • 对预训人员的支持词向量和嵌入
  • 最先进的速度
  • 生产准备就绪培训系统
  • 语言动机标记化
  • 命名的组件实体识别词性标注、依存关系分析、句子切分文本分类、词汇化、词法分析、实体链接等
  • 通过以下功能轻松扩展自定义组件和属性
  • 支持中的自定义模型PyTorchTensorFlow和其他框架
  • 内置可视化工具对于语法和NER
  • 简单易懂模型包装、部署和工作流管理
  • 稳健、经过严格评估的准确性

📖有关更多详细信息,请参阅facts, figures and benchmarks

⏳安装空间

有关详细的安装说明,请参阅documentation

  • 操作系统:MacOS/OS X·Linux·Windows(Cygwin、MinGW、Visual Studio)
  • Python版本:Python 3.6+(仅64位)
  • 包管理器pip·conda(通过conda-forge)

管道

使用pip,spacy发行版以源码包和二进制轮子的形式可用。在安装Spacy及其依赖项之前,请确保您的pipsetuptoolswheel是最新的

pip install -U pip setuptools wheel
pip install spacy

要安装额外的数据表以进行列举化和规范化,可以运行pip install spacy[lookups]或安装spacy-lookups-data分开的。使用列举化数据创建空白模型需要使用查找包,并需要使用尚未附带预先训练的模型和第三方库支持的语言进行词汇化

使用pip时,一般建议在虚拟环境中安装包,避免修改系统状态:

python -m venv .env
source .env/bin/activate
pip install -U pip setuptools wheel
pip install spacy

孔达

您也可以从安装Spacyconda通过conda-forge频道。有关原料(包括构建配方和配置),请查看this repository

conda install -c conda-forge spacy

更新空间

Spacy的一些更新可能需要下载新的统计模型。如果您运行的是spacy v2.0或更高版本,则可以使用validate命令检查您安装的型号是否兼容,如果不兼容,则打印有关如何更新它们的详细信息:

pip install -U spacy
python -m spacy validate

如果您训练过自己的模型,请记住您的训练和运行时输入必须匹配。更新Spacy后,我们建议重新培训您的模型使用新版本

📖有关从spacy 2.x升级到spacy 3.x的详细信息,请参阅migration guide

📦下载模型包

经过培训的空间管道可以安装为Python包这意味着它们是应用程序的一个组件,就像任何其他模块一样。可以使用Spacy的安装模型download命令,或通过将pip指向路径或URL手动执行

文档
Available Pipelines 详细的管道描述、精度数字和基准
Models Documentation 详细的使用和安装说明
Training 如何根据您的数据培训您自己的管道
# Download best-matching version of specific model for your spaCy installation
python -m spacy download en_core_web_sm

# pip install .tar.gz archive or .whl from path or URL
pip install /Users/you/en_core_web_sm-3.0.0.tar.gz
pip install /Users/you/en_core_web_sm-3.0.0-py3-none-any.whl
pip install https://github.com/explosion/spacy-models/releases/download/en_core_web_sm-3.0.0/en_core_web_sm-3.0.0.tar.gz

加载和使用模型

要加载模型,请使用spacy.load()使用模型名称或模型数据目录的路径

import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp("This is a sentence.")

您还可以import直接通过其全名创建模型,然后调用其load()不带参数的方法

import spacy
import en_core_web_sm

nlp = en_core_web_sm.load()
doc = nlp("This is a sentence.")

📖有关更多信息和示例,请查看models documentation

⚒从源代码编译

安装Spacy的另一种方法是克隆其GitHub repository并从源头上建造它。如果您想要更改代码库,这是常用的方法。您需要确保您有一个由Python发行版组成的开发环境,其中包括头文件、编译器pipvirtualenvgit已安装。编译器部分是最棘手的。如何做到这一点取决于您的系统

站台
Ubuntu 通过以下方式安装系统级依赖项apt-getsudo apt-get install build-essential python-dev git
Mac 安装最新版本的XCode包括所谓的“命令行工具”。MacOS和OS X附带预装的Python和Git
窗口 安装一个版本的Visual C++ Build ToolsVisual Studio Express与用于编译Python解释器的版本相匹配

有关更多详细信息和说明,请参阅上的文档compiling spaCy from source以及quickstart widget获取适合您的平台和Python版本的正确命令

git clone https://github.com/explosion/spaCy
cd spaCy

python -m venv .env
source .env/bin/activate

# make sure you are using the latest pip
python -m pip install -U pip setuptools wheel

pip install -r requirements.txt
pip install --no-build-isolation --editable .

要与附加软件一起安装,请执行以下操作:

pip install --no-build-isolation --editable .[lookups,cuda102]

🚦运行测试

Spacy伴随着一个extensive test suite为了运行测试,您通常需要克隆存储库并从源代码构建空间。这还将安装在中定义的所需开发依赖项和测试实用程序requirements.txt

或者,您可以运行pytest在已安装的spacy包裹。别忘了也要通过Spacy的安装测试实用程序requirements.txt

pip install -r requirements.txt
python -m pytest --pyargs spacy