分类目录归档:知识问答

Python请求抛出SSLError

问题:Python请求抛出SSLError

我正在研究一个简单的脚本,涉及CAS,jspring安全检查,重定向等。我想使用Kenneth Reitz的python请求,因为这是一项很棒的工作!但是,CAS需要通过SSL进行验证,因此我必须首先通过该步骤。我不知道想要什么Python请求吗?该SSL证书应该存放在哪里?

Traceback (most recent call last):
  File "./test.py", line 24, in <module>
  response = requests.get(url1, headers=headers)
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 52, in get
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 40, in request
  File "build/bdist.linux-x86_64/egg/requests/sessions.py", line 209, in request 
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 624, in send
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 300, in _build_response
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 611, in send
requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

I’m working on a simple script that involves CAS, jspring security check, redirection, etc. I would like to use Kenneth Reitz’s python requests because it’s a great piece of work! However, CAS requires getting validated via SSL so I have to get past that step first. I don’t know what Python requests is wanting? Where is this SSL certificate supposed to reside?

Traceback (most recent call last):
  File "./test.py", line 24, in <module>
  response = requests.get(url1, headers=headers)
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 52, in get
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 40, in request
  File "build/bdist.linux-x86_64/egg/requests/sessions.py", line 209, in request 
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 624, in send
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 300, in _build_response
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 611, in send
requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

回答 0

您遇到的问题是由不受信任的SSL证书引起的。

就像之前评论中提到的@dirk一样,最快的解决方法是设置verify=False

requests.get('https://example.com', verify=False)

请注意,这将导致证书不被验证。这会使您的应用程序面临安全风险,例如中间人攻击。

当然要运用判断力。正如在评论中提到的,这可能是为快速/一次性应用程序/脚本可以接受的,但真的不应该去制作软件

如果在特定情况下仅跳过证书检查是不可接受的,请考虑以下选项,最好的选择是将verify参数设置为字符串,该字符串是.pem证书文件的路径(应通过某种安全方法获取该字符串)手段)。

因此,从2.0版开始,该verify参数接受以下值及其各自的语义:

  • True:使证书根据库自身的可信证书颁发机构进行验证(注意:您可以通过Certifi库查看哪些根证书请求使用,Certifi库是从Requests:Certifi-Human Trust Database中提取的RC的信任数据库)。
  • False完全绕过证书验证。
  • CA_BUNDLE文件的路径,供请求用于验证证书。

来源:请求-SSL证书验证

还要看一下cert同一链接上的参数。

The problem you are having is caused by an untrusted SSL certificate.

Like @dirk mentioned in a previous comment, the quickest fix is setting verify=False:

requests.get('https://example.com', verify=False)

Please note that this will cause the certificate not to be verified. This will expose your application to security risks, such as man-in-the-middle attacks.

Of course, apply judgment. As mentioned in the comments, this may be acceptable for quick/throwaway applications/scripts, but really should not go to production software.

If just skipping the certificate check is not acceptable in your particular context, consider the following options, your best option is to set the verify parameter to a string that is the path of the .pem file of the certificate (which you should obtain by some sort of secure means).

So, as of version 2.0, the verify parameter accepts the following values, with their respective semantics:

  • True: causes the certificate to validated against the library’s own trusted certificate authorities (Note: you can see which Root Certificates Requests uses via the Certifi library, a trust database of RCs extracted from Requests: Certifi – Trust Database for Humans).
  • False: bypasses certificate validation completely.
  • Path to a CA_BUNDLE file for Requests to use to validate the certificates.

Source: Requests – SSL Cert Verification

Also take a look at the cert parameter on the same link.


回答 1

关于SSL验证的请求文档中

就像网络浏览器一样,请求可以验证HTTPS请求的SSL证书。要检查主机的SSL证书,可以使用verify参数:

>>> requests.get('https://kennethreitz.com', verify=True)

如果您不想验证自己的SSL证书,请输入 verify=False

From requests documentation on SSL verification:

Requests can verify SSL certificates for HTTPS requests, just like a web browser. To check a host’s SSL certificate, you can use the verify argument:

>>> requests.get('https://kennethreitz.com', verify=True)

If you don’t want to verify your SSL certificate, make verify=False


回答 2

要使用的CA文件名可以通过以下方式传递verify

cafile = 'cacert.pem' # http://curl.haxx.se/ca/cacert.pem
r = requests.get(url, verify=cafile)

如果使用,verify=Truerequests使用它自己的CA集,该CA集可能没有用于签署服务器证书的CA。

The name of CA file to use you could pass via verify:

cafile = 'cacert.pem' # http://curl.haxx.se/ca/cacert.pem
r = requests.get(url, verify=cafile)

If you use verify=True then requests uses its own CA set that might not have CA that signed your server certificate.


回答 3

$ pip install -U requests[security]

  • 已在Python 2.7.6 @ Ubuntu 14.04.4 LTS上测试
  • 在Python 2.7.5 @ MacOSX 10.9.5(Mavericks)上测试

打开此问题时(2012-05),请求版本为0.13.1。在版本2.4.1(2014-09)上,引入了“安全”附加功能,并使用certifi软件包(如果有)。

目前(2016-09)主版本为2.11.1,如果没有 ,则可以正常使用verify=False。无需使用requests.get(url, verify=False),如果已安装requests[security]其他功能。

$ pip install -U requests[security]

  • Tested on Python 2.7.6 @ Ubuntu 14.04.4 LTS
  • Tested on Python 2.7.5 @ MacOSX 10.9.5 (Mavericks)

When this question was opened (2012-05) the Requests version was 0.13.1. On version 2.4.1 (2014-09) the “security” extras were introduced, using certifi package if available.

Right now (2016-09) the main version is 2.11.1, that works good without verify=False. No need to use requests.get(url, verify=False), if installed with requests[security] extras.


回答 4

使用aws boto3时遇到相同的问题,并且ssl证书验证失败的问题,通过查看boto3代码,我发现REQUESTS_CA_BUNDLE未设置,因此我通过手动设置解决了这两个问题:

from boto3.session import Session
import os

# debian
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(
    '/etc/ssl/certs/',
    'ca-certificates.crt')
# centos
#   'ca-bundle.crt')

对于aws-cli,我想将REQUESTS_CA_BUNDLE设置为~/.bashrc可以解决此问题(未经测试,因为我的aws-cli没有它就可以工作)。

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt # ca-bundle.crt
export REQUESTS_CA_BUNDLE

I encountered the same issue and ssl certificate verify failed issue when using aws boto3, by review boto3 code, I found the REQUESTS_CA_BUNDLE is not set, so I fixed the both issue by setting it manually:

from boto3.session import Session
import os

# debian
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(
    '/etc/ssl/certs/',
    'ca-certificates.crt')
# centos
#   'ca-bundle.crt')

For aws-cli, I guess setting REQUESTS_CA_BUNDLE in ~/.bashrc will fix this issue (not tested because my aws-cli works without it).

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt # ca-bundle.crt
export REQUESTS_CA_BUNDLE

回答 5

如果您有一个依赖的库requests并且不能修改验证路径(如pyvmomi),则必须找到cacert.pem与请求捆绑在一起的文件,然后在其中附加您的CA。这是找到cacert.pem位置的通用方法:

视窗

C:\>python -c "import requests; print requests.certs.where()"
c:\Python27\lib\site-packages\requests-2.8.1-py2.7.egg\requests\cacert.pem

linux

#  (py2.7.5,requests 2.7.0, verify not enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/lib/python2.7/dist-packages/certifi/cacert.pem

#  (py2.7.10, verify enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/local/lib/python2.7/dist-packages/requests/cacert.pem

顺便说一句 @ requests-devs,将自己的cacerts与请求捆绑在一起确实非常烦人……尤其是您似乎没有先使用ca ca系统存储这一事实,并且在任何地方都没有记录。

更新

在使用库且无法控制ca-bundle位置的情况下,还可以将ca-bundle位置显式设置为主机范围的ca-bundle:

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-bundle.crt python -c "import requests; requests.get('https://somesite.com';)"

In case you have a library that relies on requests and you cannot modify the verify path (like with pyvmomi) then you’ll have to find the cacert.pem bundled with requests and append your CA there. Here’s a generic approach to find the cacert.pem location:

windows

C:\>python -c "import requests; print requests.certs.where()"
c:\Python27\lib\site-packages\requests-2.8.1-py2.7.egg\requests\cacert.pem

linux

#  (py2.7.5,requests 2.7.0, verify not enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/lib/python2.7/dist-packages/certifi/cacert.pem

#  (py2.7.10, verify enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/local/lib/python2.7/dist-packages/requests/cacert.pem

btw. @requests-devs, bundling your own cacerts with request is really, really annoying… especially the fact that you do not seem to use the system ca store first and this is not documented anywhere.

update

in situations, where you’re using a library and have no control over the ca-bundle location you could also explicitly set the ca-bundle location to be your host-wide ca-bundle:

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-bundle.crt python -c "import requests; requests.get('https://somesite.com';)"

回答 6

使用gspread会遇到相同的问题,这些命令对我有用:

sudo pip uninstall -y certifi
sudo pip install certifi==2015.04.28

I face the same problem using gspread and these commands works for me:

sudo pip uninstall -y certifi
sudo pip install certifi==2015.04.28

回答 7

如果要删除警告,请使用以下代码。

import urllib3

urllib3.disable_warnings()

以及verify=Falsewith request.getpostmethod

If you want to remove the warnings, use the code below.

import urllib3

urllib3.disable_warnings()

and verify=False with request.get or post method


回答 8

我找到了解决类似问题的特定方法。这个想法是指向存储在系统上的cacert文件,并由另一个基于ssl的应用程序使用。

在Debian中(我不确定其他发行版中是否相同),证书文件(.pem)存储在/etc/ssl/certs/So,这是对我有用的代码:

import requests
verify='/etc/ssl/certs/cacert.org.pem'
response = requests.get('https://lists.cacert.org', verify=verify)

为了猜测pem选择哪个文件,我浏览了该URL,然后检查哪个证书颁发机构(CA)生成了证书。

编辑:如果您不能编辑代码(因为正在运行第三个应用程序),则可以尝试将pem证书直接添加到其中/usr/local/lib/python2.7/dist-packages/requests/cacert.pem(例如,将证书复制到文件末尾)。

I have found an specific approach for solving a similar issue. The idea is pointing the cacert file stored at the system and used by another ssl based applications.

In Debian (I’m not sure if same in other distributions) the certificate files (.pem) are stored at /etc/ssl/certs/ So, this is the code that work for me:

import requests
verify='/etc/ssl/certs/cacert.org.pem'
response = requests.get('https://lists.cacert.org', verify=verify)

For guessing what pem file choose, I have browse to the url and check which Certificate Authority (CA) has generated the certificate.

EDIT: if you cannot edit the code (because you are running a third app) you can try to add the pem certificate directly into /usr/local/lib/python2.7/dist-packages/requests/cacert.pem (e.g. copying it to the end of the file).


回答 9

如果您不关心证书,请使用verify=False

import requests

url = "Write your url here"

returnResponse = requests.get(url, verify=False)

If you don’t bother about certificate just use verify=False.

import requests

url = "Write your url here"

returnResponse = requests.get(url, verify=False)

回答 10

经过数小时的调试,我只能使用以下软件包来使其工作:

requests[security]==2.7.0  # not 2.18.1
cryptography==1.9  # not 2.0

使用 OpenSSL 1.0.2g 1 Mar 2016

没有这些软件包将verify=False无法正常工作。

我希望这可以帮助别人。

After hours of debugging I could only get this to work using the following packages:

requests[security]==2.7.0  # not 2.18.1
cryptography==1.9  # not 2.0

using OpenSSL 1.0.2g 1 Mar 2016

Without these packages verify=False was not working.

I hope this helps someone.


回答 11

我遇到了同样的问题。原来我没有在服务器上安装中间证书(只需将其附加到证书的底部,如下所示)。

https://www.digicert.com/ssl-support/pem-ssl-creation.htm

确保已安装ca-certificates软件包:

sudo apt-get install ca-certificates

更新时间也可以解决此问题:

sudo apt-get install ntpdate
sudo ntpdate -u ntp.ubuntu.com

如果您使用的是自签名证书,则可能必须手动将其添加到系统中。

I ran into the same issue. Turns out I hadn’t installed the intermediate certificate on my server (just append it to the bottom of your certificate as seen below).

https://www.digicert.com/ssl-support/pem-ssl-creation.htm

Make sure you have the ca-certificates package installed:

sudo apt-get install ca-certificates

Updating the time may also resolve this:

sudo apt-get install ntpdate
sudo ntpdate -u ntp.ubuntu.com

If you’re using a self-signed certificate, you’ll probably have to add it to your system manually.


回答 12

如果请求调用被埋在代码的深处,并且您不想安装服务器证书,则仅出于调试目的,可以对请求进行monkeypatch:

import requests.api
import warnings


def requestspatch(method, url, **kwargs):
    kwargs['verify'] = False
    return _origcall(method, url, **kwargs)

_origcall = requests.api.request
requests.api.request = requestspatch
warnings.warn('Patched requests: SSL verification disabled!')

切勿在生产中使用!

If the request calls are buried somewhere deep in the code and you do not want to install the server certificate, then, just for debug purposes only, it’s possible to monkeypatch requests:

import requests.api
import warnings


def requestspatch(method, url, **kwargs):
    kwargs['verify'] = False
    return _origcall(method, url, **kwargs)

_origcall = requests.api.request
requests.api.request = requestspatch
warnings.warn('Patched requests: SSL verification disabled!')

Never use in production!


回答 13

我想参加聚会太晚了,但我想为像我这样的流浪者粘贴修复程序!所以以下内容在Python 3.7.x上为我解决了

在终端中输入以下内容

pip install --upgrade certifi      # hold your breath..

尝试再次运行您的脚本/请求,看看它是否有效(我确定它不会被修复!)。如果不起作用,请尝试直接在终端中运行以下命令

open /Applications/Python\ 3.6/Install\ Certificates.command  # please replace 3.6 here with your suitable python version

Too late to the party I guess but I wanted to paste the fix for fellow wanderers like myself! So the following worked out for me on Python 3.7.x

Type the following in your terminal

pip install --upgrade certifi      # hold your breath..

Try running your script/requests again and see if it works (I’m sure it won’t be fixed yet!). If it didn’t work then try running the following command in the terminal directly

open /Applications/Python\ 3.6/Install\ Certificates.command  # please replace 3.6 here with your suitable python version

回答 14

我为HOURS争取了这个问题。

我试图更新请求。然后,我更新了证书。我指出了对certifi.where()的验证(无论如何,代码默认情况下都会这样做)。没事。

最后,我将python版本更新为python 2.7.11。我使用的是Python 2.7.5,它与验证证书的方式有些不兼容。一旦我更新了Python(以及其他一些依赖项),它便开始工作。

I fought this problem for HOURS.

I tried to update requests. Then I updated certifi. I pointed verify to certifi.where() (The code does this by default anyways). Nothing worked.

Finally I updated my version of python to python 2.7.11. I was on Python 2.7.5 which had some incompatibilities with the way that the certificates are verified. Once I updated Python (and a handful of other dependencies) it started working.


回答 15

这类似于@ rafael-almeida的答案,但我想指出,从请求2.11+开始,没有3个值verify可以使用,实际上有4个:

  • True:根据请求的内部可信CA进行验证。
  • False完全绕过证书验证。(不建议)
  • CA_BUNDLE文件的路径。请求将使用它来验证服务器的证书。
  • 包含公共证书文件的目录的路径。请求将使用它来验证服务器的证书。

我剩下的答案是关于#4,如何使用包含证书的目录进行验证:

获取所需的公共证书并将其放置在目录中。

严格来说,您可能“应该”使用带外方法来获取证书,但是您也可以仅使用任何浏览器下载它们。

如果服务器使用证书链,请确保获取链中的每个证书。

根据请求文档,必须首先使用“ rehash”实用程序(openssl rehash)处理包含证书的目录。

(这需要openssl 1.1.1+,并且并非所有Windows openssl实施都支持rehash。如果openssl rehash不适合您,则可以尝试在https://github.com/ruby/openssl/blob/master上运行rehash ruby​​脚本。/sample/c_rehash.rb,尽管我还没有尝试过。

我在获取要求识别我的证书的请求时遇到了一些麻烦,但是在使用openssl x509 -outform PEM命令将证书转换为Base64 .pem格式后,一切工作正常。

您也可以只进行懒散的重新哈希处理:

try:
    # As long as the certificates in the certs directory are in the OS's certificate store, `verify=True` is fine.
    return requests.get(url, auth=auth, verify=True)
except requests.exceptions.SSLError:
    subprocess.run(f"openssl rehash -compat -v my_certs_dir", shell=True, check=True)
    return requests.get(url, auth=auth, verify="my_certs_dir")

This is similar to @rafael-almeida ‘s answer, but I want to point out that as of requests 2.11+, there are not 3 values that verify can take, there are actually 4:

  • True: validates against requests’s internal trusted CAs.
  • False: bypasses certificate validation completely. (Not recommended)
  • Path to a CA_BUNDLE file. requests will use this to validate the server’s certificates.
  • Path to a directory containing public certificate files. requests will use this to validate the server’s certificates.

The rest of my answer is about #4, how to use a directory containing certificates to validate:

Obtain the public certificates needed and place them in a directory.

Strictly speaking, you probably “should” use an out-of-band method of obtaining the certificates, but you could also just download them using any browser.

If the server uses a certificate chain, be sure to obtain every single certificate in the chain.

According to the requests documentation, the directory containing the certificates must first be processed with the “rehash” utility (openssl rehash).

(This requires openssl 1.1.1+, and not all Windows openssl implementations support rehash. If openssl rehash won’t work for you, you could try running the rehash ruby script at https://github.com/ruby/openssl/blob/master/sample/c_rehash.rb , though I haven’t tried this. )

I had some trouble with getting requests to recognize my certificates, but after I used the openssl x509 -outform PEM command to convert the certs to Base64 .pem format, everything worked perfectly.

You can also just do lazy rehashing:

try:
    # As long as the certificates in the certs directory are in the OS's certificate store, `verify=True` is fine.
    return requests.get(url, auth=auth, verify=True)
except requests.exceptions.SSLError:
    subprocess.run(f"openssl rehash -compat -v my_certs_dir", shell=True, check=True)
    return requests.get(url, auth=auth, verify="my_certs_dir")

回答 16

目前,请求模块中存在一个导致此错误的问题,存在于v2.6.2至v2.12.4(ATOW)中:https : //github.com/kennethreitz/requests/issues/2573

解决此问题的方法是添加以下行: requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS'

There is currently an issue in the requests module causing this error, present in v2.6.2 to v2.12.4 (ATOW): https://github.com/kennethreitz/requests/issues/2573

Workaround for this issue is adding the following line: requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS'


回答 17

如@Rafael Almeida所述,您遇到的问题是由不受信任的SSL证书引起的。就我而言,我的服务器不信任SSL证书。为了解决此问题而不损害安全性,我下载了证书并将其安装在服务器上(只需双击.crt文件,然后安装证书…)。

As mentioned by @Rafael Almeida, the problem you are having is caused by an untrusted SSL certificate. In my case, the SSL certificate was untrusted by my server. To get around this without compromising security, I downloaded the certificate, and installed it on the server (by simply double clicking on the .crt file and then Install Certificate…).


回答 18

如果正在从另一个包中调用请求,则添加选项是不可行的。在那种情况下,将证书添加到cacert捆绑包是直接的方法,例如,我必须添加“ StartCom Class 1 Primary Intermediate Server CA”,为此,我将根证书下载到StartComClass1.pem中。鉴于我的virtualenv名为caldav,我添加了以下证书:

cat StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/pip/_vendor/requests/cacert.pem
cat temp/StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/requests/cacert.pem

其中之一可能就足够了,我没有检查

It is not feasible to add options if requests is being called from another package. In that case adding certificates to the cacert bundle is the straight path, e.g. I had to add “StartCom Class 1 Primary Intermediate Server CA”, for which I downloaded the root cert into StartComClass1.pem. given my virtualenv is named caldav, I added the certificate with:

cat StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/pip/_vendor/requests/cacert.pem
cat temp/StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/requests/cacert.pem

one of those might be enough, I did not check


回答 19

我遇到了相似或相同的认证验证问题。我读到的OpenSSL版本低于1.0.2,该请求有时取决于验证强证书的困难(请参阅此处)。CentOS 7似乎使用了1.0.1e,这似乎有问题。

我不确定如何在CentOS上解决此问题,因此我决定允许使用较弱的1024位CA证书。

import certifi # This should be already installed as a dependency of 'requests'
requests.get("https://example.com", verify=certifi.old_where())

I was having a similar or the same certification validation problem. I read that OpenSSL versions less than 1.0.2, which requests depends upon sometimes have trouble validating strong certificates (see here). CentOS 7 seems to use 1.0.1e which seems to have the problem.

I wasn’t sure how to get around this problem on CentOS, so I decided to allow weaker 1024bit CA certificates.

import certifi # This should be already installed as a dependency of 'requests'
requests.get("https://example.com", verify=certifi.old_where())

回答 20

我必须从Python 3.4.0升级到3.4.6

pyenv virtualenv 3.4.6 myvenv
pyenv activate myvenv
pip install -r requirements.txt

I had to upgrade from Python 3.4.0 to 3.4.6

pyenv virtualenv 3.4.6 myvenv
pyenv activate myvenv
pip install -r requirements.txt

回答 21

就我而言,原因是无关紧要的。

我知道SSL验证已经进行了几天,实际上是在另一台机器上工作。

我的下一步是比较正在验证的计算机和未进行验证的计算机之间的证书内容和大小。

这很快导致我确定“工作不正确”的计算机上的证书不好,一旦我将其替换为“好”证书,一切就很好了。

In my case the reason was fairly trivial.

I had known that the SSL verification had worked until a few days earlier, and was infact working on a different machine.

My next step was to compare the certificate contents and size between the machine on which verification was working, and the one on which it was not.

This quickly led to me determining that the Certificate on the ‘incorrectly’ working machine was not good, and once I replaced it with the ‘good’ cert, everything was fine.


线程池类似于多处理池?

问题:线程池类似于多处理池?

是否有用于工作线程的Pool类,类似于多处理模块的Pool类

我喜欢例如并行化地图功能的简单方法

def long_running_func(p):
    c_func_no_gil(p)

p = multiprocessing.Pool(4)
xs = p.map(long_running_func, range(100))

但是,我希望这样做而不会产生新流程的开销。

我知道GIL。但是,在我的用例中,该函数将是IO绑定的C函数,python包装程序将在实际函数调用之前为其释放GIL。

我必须编写自己的线程池吗?

Is there a Pool class for worker threads, similar to the multiprocessing module’s Pool class?

I like for example the easy way to parallelize a map function

def long_running_func(p):
    c_func_no_gil(p)

p = multiprocessing.Pool(4)
xs = p.map(long_running_func, range(100))

however I would like to do it without the overhead of creating new processes.

I know about the GIL. However, in my usecase, the function will be an IO-bound C function for which the python wrapper will release the GIL before the actual function call.

Do I have to write my own threading pool?


回答 0

我刚刚发现模块中实际上 一个基于线程的Pool接口multiprocessing,但是它有些隐藏并且没有正确记录。

可以通过导入

from multiprocessing.pool import ThreadPool

它是使用包装Python线程的虚拟Process类实现的。可以找到基于线程的Process类multiprocessing.dummy,在docs中对其进行了简要介绍。该虚拟模块应该提供基于线程的整个多处理接口。

I just found out that there actually is a thread-based Pool interface in the multiprocessing module, however it is hidden somewhat and not properly documented.

It can be imported via

from multiprocessing.pool import ThreadPool

It is implemented using a dummy Process class wrapping a python thread. This thread-based Process class can be found in multiprocessing.dummy which is mentioned briefly in the docs. This dummy module supposedly provides the whole multiprocessing interface based on threads.


回答 1

在Python 3中,您可以使用concurrent.futures.ThreadPoolExecutor,即:

executor = ThreadPoolExecutor(max_workers=10)
a = executor.submit(my_function)

有关更多信息和示例,请参阅文档

In Python 3 you can use concurrent.futures.ThreadPoolExecutor, i.e.:

executor = ThreadPoolExecutor(max_workers=10)
a = executor.submit(my_function)

See the docs for more info and examples.


回答 2

是的,它似乎(或多或少)具有相同的API。

import multiprocessing

def worker(lnk):
    ....    
def start_process():
    .....
....

if(PROCESS):
    pool = multiprocessing.Pool(processes=POOL_SIZE, initializer=start_process)
else:
    pool = multiprocessing.pool.ThreadPool(processes=POOL_SIZE, 
                                           initializer=start_process)

pool.map(worker, inputs)
....

Yes, and it seems to have (more or less) the same API.

import multiprocessing

def worker(lnk):
    ....    
def start_process():
    .....
....

if(PROCESS):
    pool = multiprocessing.Pool(processes=POOL_SIZE, initializer=start_process)
else:
    pool = multiprocessing.pool.ThreadPool(processes=POOL_SIZE, 
                                           initializer=start_process)

pool.map(worker, inputs)
....

回答 3

对于非常简单和轻巧的东西(从此处稍作修改):

from Queue import Queue
from threading import Thread


class Worker(Thread):
    """Thread executing tasks from a given tasks queue"""
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

    def run(self):
        while True:
            func, args, kargs = self.tasks.get()
            try:
                func(*args, **kargs)
            except Exception, e:
                print e
            finally:
                self.tasks.task_done()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads):
            Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

if __name__ == '__main__':
    from random import randrange
    from time import sleep

    delays = [randrange(1, 10) for i in range(100)]

    def wait_delay(d):
        print 'sleeping for (%d)sec' % d
        sleep(d)

    pool = ThreadPool(20)

    for i, d in enumerate(delays):
        pool.add_task(wait_delay, d)

    pool.wait_completion()

要支持完成任务的回调,您只需将回调添加到任务元组即可。

For something very simple and lightweight (slightly modified from here):

from Queue import Queue
from threading import Thread


class Worker(Thread):
    """Thread executing tasks from a given tasks queue"""
    def __init__(self, tasks):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon = True
        self.start()

    def run(self):
        while True:
            func, args, kargs = self.tasks.get()
            try:
                func(*args, **kargs)
            except Exception, e:
                print e
            finally:
                self.tasks.task_done()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads):
        self.tasks = Queue(num_threads)
        for _ in range(num_threads):
            Worker(self.tasks)

    def add_task(self, func, *args, **kargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kargs))

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

if __name__ == '__main__':
    from random import randrange
    from time import sleep

    delays = [randrange(1, 10) for i in range(100)]

    def wait_delay(d):
        print 'sleeping for (%d)sec' % d
        sleep(d)

    pool = ThreadPool(20)

    for i, d in enumerate(delays):
        pool.add_task(wait_delay, d)

    pool.wait_completion()

To support callbacks on task completion you can just add the callback to the task tuple.


回答 4

嗨,在Python中使用线程池可以使用以下库:

from multiprocessing.dummy import Pool as ThreadPool

然后使用,这个库就是这样的:

pool = ThreadPool(threads)
results = pool.map(service, tasks)
pool.close()
pool.join()
return results

线程是所需的线程数,任务是大多数映射到服务的任务列表。

Hi to use the thread pool in Python you can use this library :

from multiprocessing.dummy import Pool as ThreadPool

and then for use, this library do like that :

pool = ThreadPool(threads)
results = pool.map(service, tasks)
pool.close()
pool.join()
return results

The threads are the number of threads that you want and tasks are a list of task that most map to the service.


回答 5

这是我最终使用的结果。它是上述dgorissen类的修改版本。

文件: threadpool.py

from queue import Queue, Empty
import threading
from threading import Thread


class Worker(Thread):
    _TIMEOUT = 2
    """ Thread executing tasks from a given tasks queue. Thread is signalable, 
        to exit
    """
    def __init__(self, tasks, th_num):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon, self.th_num = True, th_num
        self.done = threading.Event()
        self.start()

    def run(self):       
        while not self.done.is_set():
            try:
                func, args, kwargs = self.tasks.get(block=True,
                                                   timeout=self._TIMEOUT)
                try:
                    func(*args, **kwargs)
                except Exception as e:
                    print(e)
                finally:
                    self.tasks.task_done()
            except Empty as e:
                pass
        return

    def signal_exit(self):
        """ Signal to thread to exit """
        self.done.set()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads, tasks=[]):
        self.tasks = Queue(num_threads)
        self.workers = []
        self.done = False
        self._init_workers(num_threads)
        for task in tasks:
            self.tasks.put(task)

    def _init_workers(self, num_threads):
        for i in range(num_threads):
            self.workers.append(Worker(self.tasks, i))

    def add_task(self, func, *args, **kwargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kwargs))

    def _close_all_threads(self):
        """ Signal all threads to exit and lose the references to them """
        for workr in self.workers:
            workr.signal_exit()
        self.workers = []

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

    def __del__(self):
        self._close_all_threads()


def create_task(func, *args, **kwargs):
    return (func, args, kwargs)

使用游泳池

from random import randrange
from time import sleep

delays = [randrange(1, 10) for i in range(30)]

def wait_delay(d):
    print('sleeping for (%d)sec' % d)
    sleep(d)

pool = ThreadPool(20)
for i, d in enumerate(delays):
    pool.add_task(wait_delay, d)
pool.wait_completion()

Here’s the result I finally ended up using. It’s a modified version of the classes by dgorissen above.

File: threadpool.py

from queue import Queue, Empty
import threading
from threading import Thread


class Worker(Thread):
    _TIMEOUT = 2
    """ Thread executing tasks from a given tasks queue. Thread is signalable, 
        to exit
    """
    def __init__(self, tasks, th_num):
        Thread.__init__(self)
        self.tasks = tasks
        self.daemon, self.th_num = True, th_num
        self.done = threading.Event()
        self.start()

    def run(self):       
        while not self.done.is_set():
            try:
                func, args, kwargs = self.tasks.get(block=True,
                                                   timeout=self._TIMEOUT)
                try:
                    func(*args, **kwargs)
                except Exception as e:
                    print(e)
                finally:
                    self.tasks.task_done()
            except Empty as e:
                pass
        return

    def signal_exit(self):
        """ Signal to thread to exit """
        self.done.set()


class ThreadPool:
    """Pool of threads consuming tasks from a queue"""
    def __init__(self, num_threads, tasks=[]):
        self.tasks = Queue(num_threads)
        self.workers = []
        self.done = False
        self._init_workers(num_threads)
        for task in tasks:
            self.tasks.put(task)

    def _init_workers(self, num_threads):
        for i in range(num_threads):
            self.workers.append(Worker(self.tasks, i))

    def add_task(self, func, *args, **kwargs):
        """Add a task to the queue"""
        self.tasks.put((func, args, kwargs))

    def _close_all_threads(self):
        """ Signal all threads to exit and lose the references to them """
        for workr in self.workers:
            workr.signal_exit()
        self.workers = []

    def wait_completion(self):
        """Wait for completion of all the tasks in the queue"""
        self.tasks.join()

    def __del__(self):
        self._close_all_threads()


def create_task(func, *args, **kwargs):
    return (func, args, kwargs)

To use the pool

from random import randrange
from time import sleep

delays = [randrange(1, 10) for i in range(30)]

def wait_delay(d):
    print('sleeping for (%d)sec' % d)
    sleep(d)

pool = ThreadPool(20)
for i, d in enumerate(delays):
    pool.add_task(wait_delay, d)
pool.wait_completion()

回答 6

创建新流程的开销非常小,尤其是其中只有4个时。我怀疑这是您应用程序的性能热点。保持简单,优化您必须去的地方以及分析结果指向的地方。

The overhead of creating the new processes is minimal, especially when it’s just 4 of them. I doubt this is a performance hot spot of your application. Keep it simple, optimize where you have to and where profiling results point to.


回答 7

没有基于线程的内置池。但是,用Queue该类实现生产者/消费者队列可能很快。

来自:https : //docs.python.org/2/library/queue.html

from threading import Thread
from Queue import Queue
def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in source():
    q.put(item)

q.join()       # block until all tasks are done

There is no built in thread based pool. However, it can be very quick to implement a producer/consumer queue with the Queue class.

From: https://docs.python.org/2/library/queue.html

from threading import Thread
from Queue import Queue
def worker():
    while True:
        item = q.get()
        do_work(item)
        q.task_done()

q = Queue()
for i in range(num_worker_threads):
     t = Thread(target=worker)
     t.daemon = True
     t.start()

for item in source():
    q.put(item)

q.join()       # block until all tasks are done

用逗号分割并在Python中去除空格

问题:用逗号分割并在Python中去除空格

我有一些在逗号处分割的python代码,但没有去除空格:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

我宁愿这样删除空格:

['blah', 'lots', 'of', 'spaces', 'here']

我知道我可以遍历list和strip()每个项目,但是,因为这是Python,所以我猜有一种更快,更轻松和更优雅的方法。

I have some python code that splits on comma, but doesn’t strip the whitespace:

>>> string = "blah, lots  ,  of ,  spaces, here "
>>> mylist = string.split(',')
>>> print mylist
['blah', ' lots  ', '  of ', '  spaces', ' here ']

I would rather end up with whitespace removed like this:

['blah', 'lots', 'of', 'spaces', 'here']

I am aware that I could loop through the list and strip() each item but, as this is Python, I’m guessing there’s a quicker, easier and more elegant way of doing it.


回答 0

使用列表理解-更简单,就像for循环一样容易阅读。

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

请参阅: 有关列表理解的Python文档
很好的2秒钟的列表理解说明。

Use list comprehension — simpler, and just as easy to read as a for loop.

my_string = "blah, lots  ,  of ,  spaces, here "
result = [x.strip() for x in my_string.split(',')]
# result is ["blah", "lots", "of", "spaces", "here"]

See: Python docs on List Comprehension
A good 2 second explanation of list comprehension.


回答 1

使用正则表达式拆分。注意我用前导空格使情况更一般。列表理解是删除前面和后面的空字符串。

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

即使^\s+不匹配也可以:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

这就是您需要^ \ s +的原因:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

看到等等的主要空间吗?

说明:上面使用的是Python 3解释器,但结果与Python 2相同。

Split using a regular expression. Note I made the case more general with leading spaces. The list comprehension is to remove the null strings at the front and back.

>>> import re
>>> string = "  blah, lots  ,  of ,  spaces, here "
>>> pattern = re.compile("^\s+|\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['blah', 'lots', 'of', 'spaces', 'here']

This works even if ^\s+ doesn’t match:

>>> string = "foo,   bar  "
>>> print([x for x in pattern.split(string) if x])
['foo', 'bar']
>>>

Here’s why you need ^\s+:

>>> pattern = re.compile("\s*,\s*|\s+$")
>>> print([x for x in pattern.split(string) if x])
['  blah', 'lots', 'of', 'spaces', 'here']

See the leading spaces in blah?

Clarification: above uses the Python 3 interpreter, but results are the same in Python 2.


回答 2

我来补充:

map(str.strip, string.split(','))

但是看到Jason Orendorff在评论中已经提到了它。

在同一个答案中读到格伦·梅纳德(Glenn Maynard)的评论,这暗示着人们对地图的理解,我开始怀疑为什么。我以为他是出于性能方面的考虑,但是当然他可能是出于风格方面的原因,或者其他原因(Glenn?)。

因此,在我的盒子上快速地(可能有缺陷?)应用了以下三种方法的测试:

[word.strip() for word in string.split(',')]
$ time ./list_comprehension.py 
real    0m22.876s

map(lambda s: s.strip(), string.split(','))
$ time ./map_with_lambda.py 
real    0m25.736s

map(str.strip, string.split(','))
$ time ./map_with_str.strip.py 
real    0m19.428s

map(str.strip, string.split(','))赢家,但它似乎他们都在同一个球场。

当然,出于性能原因,不一定要排除map(有或没有lambda),对我而言,它至少与列表理解一样清晰。

编辑:

Ubuntu 10.04上的Python 2.6.5

I came to add:

map(str.strip, string.split(','))

but saw it had already been mentioned by Jason Orendorff in a comment.

Reading Glenn Maynard’s comment in the same answer suggesting list comprehensions over map I started to wonder why. I assumed he meant for performance reasons, but of course he might have meant for stylistic reasons, or something else (Glenn?).

So a quick (possibly flawed?) test on my box applying the three methods in a loop revealed:

[word.strip() for word in string.split(',')]
$ time ./list_comprehension.py 
real    0m22.876s

map(lambda s: s.strip(), string.split(','))
$ time ./map_with_lambda.py 
real    0m25.736s

map(str.strip, string.split(','))
$ time ./map_with_str.strip.py 
real    0m19.428s

making map(str.strip, string.split(',')) the winner, although it seems they are all in the same ballpark.

Certainly though map (with or without a lambda) should not necessarily be ruled out for performance reasons, and for me it is at least as clear as a list comprehension.

Edit:

Python 2.6.5 on Ubuntu 10.04


回答 3

分割字符串之前,只需从字符串中删除空格。

mylist = my_string.replace(' ','').split(',')

Just remove the white space from the string before you split it.

mylist = my_string.replace(' ','').split(',')

回答 4

我知道已经回答了这个问题,但是如果您结束很多工作,则使用正则表达式可能是更好的选择:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

\s匹配任何空白字符,我们只是用一个空字符串替换它''。您可以在此处找到更多信息:http : //docs.python.org/library/re.html#re.sub

I know this has already been answered, but if you end doing this a lot, regular expressions may be a better way to go:

>>> import re
>>> re.sub(r'\s', '', string).split(',')
['blah', 'lots', 'of', 'spaces', 'here']

The \s matches any whitespace character, and we just replace it with an empty string ''. You can find more info here: http://docs.python.org/library/re.html#re.sub


回答 5

import re
result=[x for x in re.split(',| ',your_string) if x!='']

这对我来说很好。

import re
result=[x for x in re.split(',| ',your_string) if x!='']

this works fine for me.


回答 6

re (如正则表达式中一样)允许一次分割多个字符:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

这对于您的示例字符串而言效果不佳,但对于逗号分隔的列表则效果很好。对于您的示例字符串,您可以结合使用re.split功能来分割正则表达式模式,从而获得“按此分割”效果。

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

不幸的是,这很丑陋,但是a filter会成功的:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

瞧!

re (as in regular expressions) allows splitting on multiple characters at once:

$ string = "blah, lots  ,  of ,  spaces, here "
$ re.split(', ',string)
['blah', 'lots  ', ' of ', ' spaces', 'here ']

This doesn’t work well for your example string, but works nicely for a comma-space separated list. For your example string, you can combine the re.split power to split on regex patterns to get a “split-on-this-or-that” effect.

$ re.split('[, ]',string)
['blah',
 '',
 'lots',
 '',
 '',
 '',
 '',
 'of',
 '',
 '',
 '',
 'spaces',
 '',
 'here',
 '']

Unfortunately, that’s ugly, but a filter will do the trick:

$ filter(None, re.split('[, ]',string))
['blah', 'lots', 'of', 'spaces', 'here']

Voila!


回答 7

map(lambda s: s.strip(), mylist)比显式循环要好一点。或一次全部:map(lambda s:s.strip(), string.split(','))

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping. Or for the whole thing at once: map(lambda s:s.strip(), string.split(','))


回答 8

s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st
s = 'bla, buu, jii'

sp = []
sp = s.split(',')
for st in sp:
    print st

回答 9

import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

简单地说,用逗号或至少一个空白空格,带有/没有在前/在后的空格。

请试试!

import re
mylist = [x for x in re.compile('\s*[,|\s+]\s*').split(string)]

Simply, comma or at least one white spaces with/without preceding/succeeding white spaces.

Please try!


回答 10

map(lambda s: s.strip(), mylist)比显式循环要好一点。
或一次全部:

map(lambda s:s.strip(), string.split(','))

这基本上就是您需要的一切。

map(lambda s: s.strip(), mylist) would be a little better than explicitly looping.
Or for the whole thing at once:

map(lambda s:s.strip(), string.split(','))

That’s basically everything you need.


如何安装适用于Python的yaml软件包?

问题:如何安装适用于Python的yaml软件包?

我有一个使用YAML的Python程序。我尝试使用将其安装在新服务器上pip install yaml,并且返回以下内容:

$ sudo pip install yaml
Downloading/unpacking yaml
  Could not find any downloads that satisfy the requirement yaml
No distributions at all found for yaml
Storing complete log in /home/pa/.pip/pip.log

如何安装适用于Python的yaml软件包?我正在运行Python 2.7。(作业系统:Debian Wheezy)

I have a Python program that uses YAML. I attempted to install it on a new server using pip install yaml and it returns the following:

$ sudo pip install yaml
Downloading/unpacking yaml
  Could not find any downloads that satisfy the requirement yaml
No distributions at all found for yaml
Storing complete log in /home/pa/.pip/pip.log

How do I install the yaml package for Python? I’m running Python 2.7. (OS: Debian Wheezy)


回答 0

您可以尝试点子搜索功能,

$ pip search yaml

它在简短说明中使用yaml在PyPI中查找软件包。这揭示了各种软件包,包括PyYaml,yamltools和PySyck等(请注意,PySyck文档建议使用PyYaml,因为syck已过时)。现在您知道了特定的软件包名称,可以安装它:

$ pip install pyyaml

如果要在linux系统范围内安装python yaml,也可以使用软件包管理器,例如aptitudeyum

$ sudo apt-get install python-yaml
$ sudo yum install python-yaml

You could try the search feature in pip,

$ pip search yaml

which looks for packages in PyPI with yaml in the short description. That reveals various packages, including PyYaml, yamltools, and PySyck, among others (Note that PySyck docs recommend using PyYaml, since syck is out of date). Now you know a specific package name, you can install it:

$ pip install pyyaml

If you want to install python yaml system-wide in linux, you can also use a package manager, like aptitude or yum:

$ sudo apt-get install python-yaml
$ sudo yum install python-yaml

回答 1

pip install pyyaml

如果没有pip,请运行easy_install pip安装pip,这是必备的软件包安装程序- 为什么在easy_install上使用pip?。如果您喜欢坚持使用easy_install,则easy_install pyyaml

pip install pyyaml

If you don’t have pip, run easy_install pip to install pip, which is the go-to package installer – Why use pip over easy_install?. If you prefer to stick with easy_install, then easy_install pyyaml


回答 2

更新:如今,安装已通过pip完成,但仍需要libyaml来构建C扩展(在Mac上):

brew install libyaml
python -m pip install pyyaml

过时的方法

对于MacOSX(小牛),以下方法似乎有效:

brew install libyaml
sudo python -m easy_install pyyaml

Update: Nowadays installing is done with pip, but libyaml is still required to build the C extension (on mac):

brew install libyaml
python -m pip install pyyaml

Outdated method:

For MacOSX (mavericks), the following seems to work:

brew install libyaml
sudo python -m easy_install pyyaml

回答 3

pip install PyYAML

如果找不到libyaml或编译的PyYAML可以在Mavericks上不使用它。

pip install PyYAML

If libyaml is not found or compiled PyYAML can do without it on Mavericks.


回答 4

有三个支持YAML的软件包。Syck(pip install syck),从2002年开始实施YAML 1.0规范;pip install pyyaml遵循2004年的YAML 1.1规范的PyYAML();和ruamel.yaml下面的最新(YAML 1.2,从2009年)规范。

您可以使用以下命令安装YAML 1.2兼容软件包,pip install ruamel.yaml或者如果您正在运行Debian / Ubuntu(或衍生版本)的现代版本,则可以使用:

sudo apt-get install python-ruamel.yaml

There are three YAML capable packages. Syck (pip install syck) which implements the YAML 1.0 specification from 2002; PyYAML (pip install pyyaml) which follows the YAML 1.1 specification from 2004; and ruamel.yaml which follows the latest (YAML 1.2, from 2009) specification.

You can install the YAML 1.2 compatible package with pip install ruamel.yaml or if you are running a modern version of Debian/Ubuntu (or derivative) with:

sudo apt-get install python-ruamel.yaml

回答 5

基于Debian的系统:

$ sudo aptitude install python-yaml

或更高版本的python3

$ sudo aptitude install python3-yaml

Debian-based systems:

$ sudo aptitude install python-yaml

or newer for python3

$ sudo aptitude install python3-yaml


回答 6

以下命令将下载pyyaml,其中还包括yaml

pip install pyYaml

following command will download pyyaml, which also includes yaml

pip install pyYaml

回答 7

“应该有一种-最好只有一种-显而易见的方法。” 因此,我再添加一个。这更像是Debian / Ubuntu的“从源代码安装”,来自https://github.com/yaml/pyyaml

安装libYAML及其标头:

sudo apt-get install libyaml-dev

下载 pyyaml来源:

wget http://pyyaml.org/download/pyyaml/PyYAML-3.13.tar.gz

从源代码安装(不要忘记激活您的venv):

. your/env/bin/activate
tar xzf PyYAML-3.13.tar.gz
cd PyYAML-3.13.tar.gz
(env)$ python setup.py install
(env)$ python setup.py test 

“There should be one — and preferably only one — obvious way to do it.” So let me add another one. This one is more like “install from sources” for Debian/Ubuntu, from https://github.com/yaml/pyyaml

Install the libYAML and it’s headers:

sudo apt-get install libyaml-dev

Download the pyyaml sources:

wget http://pyyaml.org/download/pyyaml/PyYAML-3.13.tar.gz

Install from sources, (don’t forget to activate your venv):

. your/env/bin/activate
tar xzf PyYAML-3.13.tar.gz
cd PyYAML-3.13.tar.gz
(env)$ python setup.py install
(env)$ python setup.py test 

回答 8

使用strictyaml代替

如果您可以自行创建yaml文件,或者不需要常规yaml的任何这些功能,则建议使用strictyaml而不是标准pyyaml软件包。

简而言之,默认yaml在安全性,接口和可预测性方面存在一些严重缺陷。strictyaml是yaml规范的一个子集,没有这些问题(并且有更好的记录)。

您可以在这里阅读更多有关常规Yaml问题的信息

意见: strictyaml应为yaml的默认实现,而旧的yaml规范应作废。

Use strictyaml instead

If you have the luxury of creating the yaml file yourself, or if you don’t require any of these features of regular yaml, I recommend using strictyaml instead of the standard pyyaml package.

In short, default yaml has some serious flaws in terms of security, interface, and predictability. strictyaml is a subset of the yaml spec that does not have those issues (and is better documented).

You can read more about the problems with regular yaml here

OPINION: strictyaml should be the default implementation of yaml and the old yaml spec should be obsoleted.


回答 9

对我来说,安装libyaml的开发版本即可。

yum install libyaml-devel         #centos
apt-get install libyaml-dev       # ubuntu

For me, installing development version of libyaml did it.

yum install libyaml-devel         #centos
apt-get install libyaml-dev       # ubuntu

on_delete对Django模型有什么作用?

问题:on_delete对Django模型有什么作用?

我对Django非常熟悉,但是最近发现on_delete=models.CASCADE模型中存在一个选项,我在文档中搜索了相同的选项,但找不到以下内容:

在Django 1.9中进行了更改:

on_delete现在可以用作第二个位置参数(以前通常只作为关键字参数传递)。在Django 2.0中,这是必填参数。

使用的一个例子是

from django.db import models

class Car(models.Model):
    manufacturer = models.ForeignKey(
        'Manufacturer',
        on_delete=models.CASCADE,
    )
    # ...

class Manufacturer(models.Model):
    # ...
    pass

on_delete是做什么的?(我想如果删除模型,要执行的操作

怎么models.CASCADE办?(文档中的任何提示

还有其他可用的选项(如果我的猜测是正确的)?

有关此文档的位置在哪里?

I’m quite familiar with Django, but recently noticed there exists an on_delete=models.CASCADE option with the models, I have searched for the documentation for the same but couldn’t find anything more than:

Changed in Django 1.9:

on_delete can now be used as the second positional argument (previously it was typically only passed as a keyword argument). It will be a required argument in Django 2.0.

an example case of usage is

from django.db import models

class Car(models.Model):
    manufacturer = models.ForeignKey(
        'Manufacturer',
        on_delete=models.CASCADE,
    )
    # ...

class Manufacturer(models.Model):
    # ...
    pass

What does on_delete do? (I guess the actions to be done if the model is deleted)

What does models.CASCADE do? (any hints in documentation)

What other options are available (if my guess is correct)?

Where does the documentation for this reside?


回答 0

这是删除引用对象时采取的行为。它不是特定于Django的,这是一种SQL标准。

发生此类事件时,有6种可能的操作:

  • CASCADE:删除引用的对象时,还请删除引用了该对象的对象(例如,删除博客文章时,您可能还希望删除注释)。SQL等效项:CASCADE
  • PROTECT:禁止删除引用的对象。要删除它,您将必须删除所有手动引用它的对象。SQL等效项:RESTRICT
  • SET_NULL:将引用设置为NULL(要求该字段可为空)。例如,当删除用户时,您可能希望保留他在博客文章中发布的评论,但说该评论是由匿名(或已删除)用户发布的。SQL等效项:SET NULL
  • SET_DEFAULT:设置默认值。SQL等效项:SET DEFAULT
  • SET(...):设置给定值。这不是SQL标准的一部分,完全由Django处理。
  • DO_NOTHING:这可能是一个非常糟糕的主意,因为这会在数据库中造成完整性问题(引用实际上不存在的对象)。SQL等效项:NO ACTION

资料来源:Django说明文件

例如,另请参阅PostGreSQL文档

在大多数情况下,这CASCADE是预期的行为,但是对于每个ForeignKey,您应始终问自己在这种情况下的预期行为是什么。PROTECT并且SET_NULL经常有用。设置CASCADE不应该设置的位置,可以通过简单地删除单个用户来级联删除所有数据库。


附加说明以阐明级联方向

有趣的是,注意到CASCADE行动的方向对于许多人来说并不明确。事实上,这很有趣地看到,只有CASCADE行动并不清楚。我知道级联行为可能会造成混淆,但是您必须认为它与任何其他动作是同一方向。因此,如果您觉得自己CASCADE不清楚方向,那实际上意味着on_delete您不清楚自己的行为。

在您的数据库中,外键基本上由一个整数字段表示,该字段的值是外对象的主键。假设您有一个comment_A条目,它具有一个article_B条目的外键。如果您删除条目comment_A,那么一切都很好,article_B以前可以不带有comment_A生存,并且也不会被删除。但是,如果删除article_B,则comment_A会慌!它永远都离不开article_B并需要它,它是它属性的一部分(article=article_B,但是* article_B ** ???)。这是on_delete确定如何解决此完整性错误的步骤,或者说:

  • “不!请!不要!我不能没有你!” PROTECT用SQL语言表示)
  • “好吧,如果我不是你的,那我就不是任何人的”(说SET_NULL
  • “再见,我不能没有article_B生活”自杀(这是CASCADE行为)。
  • “没关系,我有多余的恋人,从现在开始我将引用article_C”SET_DEFAULT,甚至SET(...))。
  • “我不能面对现实,即使那是我唯一的事情,我也会继续给你起名字!” DO_NOTHING

我希望它使级联方向更清晰。:)

This is the behaviour to adopt when the referenced object is deleted. It is not specific to django, this is an SQL standard.

There are 6 possible actions to take when such event occurs:

  • CASCADE: When the referenced object is deleted, also delete the objects that have references to it (When you remove a blog post for instance, you might want to delete comments as well). SQL equivalent: CASCADE.
  • PROTECT: Forbid the deletion of the referenced object. To delete it you will have to delete all objects that reference it manually. SQL equivalent: RESTRICT.
  • SET_NULL: Set the reference to NULL (requires the field to be nullable). For instance, when you delete a User, you might want to keep the comments he posted on blog posts, but say it was posted by an anonymous (or deleted) user. SQL equivalent: SET NULL.
  • SET_DEFAULT: Set the default value. SQL equivalent: SET DEFAULT.
  • SET(...): Set a given value. This one is not part of the SQL standard and is entirely handled by Django.
  • DO_NOTHING: Probably a very bad idea since this would create integrity issues in your database (referencing an object that actually doesn’t exist). SQL equivalent: NO ACTION.

Source: Django documentation

See also the documentation of PostGreSQL for instance.

In most cases, CASCADE is the expected behaviour, but for every ForeignKey, you should always ask yourself what is the expected behaviour in this situation. PROTECT and SET_NULL are often useful. Setting CASCADE where it should not, can potentially delete all your database in cascade, by simply deleting a single user.


Additional note to clarify cascade direction

It’s funny to notice that the direction of the CASCADE action is not clear to many people. Actually, it’s funny to notice that only the CASCADE action is not clear. I understand the cascade behavior might be confusing, however you must think that it is the same direction as any other action. Thus, if you feel that CASCADE direction is not clear to you, it actually means that on_delete behavior is not clear to you.

In your database, a foreign key is basically represented by an integer field which value is the primary key of the foreign object. Let’s say you have an entry comment_A, which has a foreign key to an entry article_B. If you delete the entry comment_A, everything is fine, article_B used to live without comment_A and don’t bother if it’s deleted. However, if you delete article_B, then comment_A panics! It never lived without article_B and needs it, it’s part of its attributes (article=article_B, but what is *article_B**???). This is where on_delete steps in, to determine how to resolve this integrity error, either by saying:

  • “No! Please! Don’t! I can’t live without you!” (which is said PROTECT in SQL language)
  • “Alright, if I’m not yours, then I’m nobody’s” (which is said SET_NULL)
  • “Good bye world, I can’t live without article_B” and commit suicide (this is the CASCADE behavior).
  • “It’s OK, I’ve got spare lover, I’ll reference article_C from now” (SET_DEFAULT, or even SET(...)).
  • “I can’t face reality, I’ll keep calling your name even if that’s the only thing left to me!” (DO_NOTHING)

I hope it makes cascade direction clearer. :)


回答 1

on_delete方法用于告诉Django如何处理依赖于您删除的模型实例的模型实例。(例如,ForeignKey恋爱关系)。该命令on_delete=models.CASCADE告诉Django级联删除效果,即也继续删除相关模型。

这是一个更具体的例子。假设您有一个Author模型ForeignKey中的一个Book模型。现在,如果删除Author模型实例,则Django将不知道如何处理Book依赖于该Author模型实例的模型实例。该on_delete方法告诉Django在这种情况下该怎么做。设置on_delete=models.CASCADE将指示Django级联删除效果,即删除所有Book依赖于Author您删除的模型实例的模型实例。

注意:on_delete在Django 2.0中将成为必填参数。在旧版本中,默认为CASCADE

这是完整的官方文档。

The on_delete method is used to tell Django what to do with model instances that depend on the model instance you delete. (e.g. a ForeignKey relationship). The on_delete=models.CASCADE tells Django to cascade the deleting effect i.e. continue deleting the dependent models as well.

Here’s a more concrete example. Assume you have an Author model that is a ForeignKey in a Book model. Now, if you delete an instance of the Author model, Django would not know what to do with instances of the Book model that depend on that instance of Author model. The on_delete method tells Django what to do in that case. Setting on_delete=models.CASCADE will instruct Django to cascade the deleting effect i.e. delete all the Book model instances that depend on the Author model instance you deleted.

Note: on_delete will become a required argument in Django 2.0. In older versions it defaults to CASCADE.

Here’s the entire official documentation.


回答 2

仅供参考,on_delete模型中的参数从听起来像是倒过来的。您on_delete在模型上放置了外键(FK),以告诉django如果删除了记录中指向的FK条目该怎么办。选项我们店已经使用的大多是PROTECTCASCADESET_NULL。这是我弄清楚的基本规则:

  1. 使用PROTECT时,你的FK指向一个查表真的不应该被改变,并且肯定不会引起你的表来改变。如果有人试图删除该查询表上的条目,则PROTECT防止该条目与任何记录绑定时删除该条目。它还可以防止从删除的Django 你的记录,只是因为它删除了一个查找表中的条目。最后一部分至关重要。 如果有人要从“性别”表中删除性别“女性”,我肯定不希望立即删除我在“人”表中拥有该性别的任何人。
  2. 使用CASCADE时,你的FK指向“父”的纪录。所以,如果一个人可以有很多PersonEthnicity项(他/她可以是美洲印第安人,黑色和白色),而那个人删除了,我真的想什么“孩子” PersonEthnicity条目被删除。没有人,他们是无关紧要的。
  3. 使用SET_NULL时,你希望人们被允许删除查找表中的条目,但你仍然要保留记录。例如,如果某人可以拥有一所高中,但对我而言,那所高中不在我的查询表上并不重要on_delete=SET_NULL。这会将我的“个人”记录保留在那里;只会将“我的人”上的高中FK设置为​​null。显然,您必须允许null=True该FK。

这是一个可以完成所有三件事的模型示例:

class PurchPurchaseAccount(models.Model):
    id = models.AutoField(primary_key=True)
    purchase = models.ForeignKey(PurchPurchase, null=True, db_column='purchase', blank=True, on_delete=models.CASCADE) # If "parent" rec gone, delete "child" rec!!!
    paid_from_acct = models.ForeignKey(PurchPaidFromAcct, null=True, db_column='paid_from_acct', blank=True, on_delete=models.PROTECT) # Disallow lookup deletion & do not delete this rec.
    _updated = models.DateTimeField()
    _updatedby = models.ForeignKey(Person, null=True, db_column='_updatedby', blank=True, related_name='acctupdated_by', on_delete=models.SET_NULL) # Person records shouldn't be deleted, but if they are, preserve this PurchPurchaseAccount entry, and just set this person to null.

    def __unicode__(self):
        return str(self.paid_from_acct.display)
    class Meta:
        db_table = u'purch_purchase_account'

作为最后一个提示,您是否知道如果指定on_delete(或未指定),默认行为是CASCADE?这意味着,如果有人删除了您“性别”表上的性别条目,则具有该性别的任何“人”记录也将被删除!

我会说:“如果有疑问,那就出发on_delete=models.PROTECT。” 然后测试您的应用程序。您将快速找出哪些FK应该标记为其他值,而不会危及您的任何数据。

另外,值得注意的on_delete=CASCADE是,如果这是您选择的行为,实际上并没有添加到您的任何迁移中。我猜这是因为它是默认设置,所以放置on_delete=CASCADE和放置任何东西都是一样的。

FYI, the on_delete parameter in models is backwards from what it sounds like. You put on_delete on a Foreign Key (FK) on a model to tell django what to do if the FK entry that you are pointing to on your record is deleted. The options our shop have used the most are PROTECT, CASCADE, and SET_NULL. Here are the basic rules I have figured out:

  1. Use PROTECT when your FK is pointing to a look-up table that really shouldn’t be changing and that certainly should not cause your table to change. If anyone tries to delete an entry on that look-up table, PROTECT prevents them from deleting it if it is tied to any records. It also prevents django from deleting your record just because it deleted an entry on a look-up table. This last part is critical. If someone were to delete the gender “Female” from my Gender table, I CERTAINLY would NOT want that to instantly delete any and all people I had in my Person table who had that gender.
  2. Use CASCADE when your FK is pointing to a “parent” record. So, if a Person can have many PersonEthnicity entries (he/she can be American Indian, Black, and White), and that Person is deleted, I really would want any “child” PersonEthnicity entries to be deleted. They are irrelevant without the Person.
  3. Use SET_NULL when you do want people to be allowed to delete an entry on a look-up table, but you still want to preserve your record. For example, if a Person can have a HighSchool, but it doesn’t really matter to me if that high-school goes away on my look-up table, I would say on_delete=SET_NULL. This would leave my Person record out there; it just would just set the high-school FK on my Person to null. Obviously, you will have to allow null=True on that FK.

Here is an example of a model that does all three things:

class PurchPurchaseAccount(models.Model):
    id = models.AutoField(primary_key=True)
    purchase = models.ForeignKey(PurchPurchase, null=True, db_column='purchase', blank=True, on_delete=models.CASCADE) # If "parent" rec gone, delete "child" rec!!!
    paid_from_acct = models.ForeignKey(PurchPaidFromAcct, null=True, db_column='paid_from_acct', blank=True, on_delete=models.PROTECT) # Disallow lookup deletion & do not delete this rec.
    _updated = models.DateTimeField()
    _updatedby = models.ForeignKey(Person, null=True, db_column='_updatedby', blank=True, related_name='acctupdated_by', on_delete=models.SET_NULL) # Person records shouldn't be deleted, but if they are, preserve this PurchPurchaseAccount entry, and just set this person to null.

    def __unicode__(self):
        return str(self.paid_from_acct.display)
    class Meta:
        db_table = u'purch_purchase_account'

As a last tidbit, did you know that if you don’t specify on_delete (or didn’t), the default behavior is CASCADE? This means that if someone deleted a gender entry on your Gender table, any Person records with that gender were also deleted!

I would say, “If in doubt, set on_delete=models.PROTECT.” Then go test your application. You will quickly figure out which FKs should be labeled the other values without endangering any of your data.

Also, it is worth noting that on_delete=CASCADE is actually not added to any of your migrations, if that is the behavior you are selecting. I guess this is because it is the default, so putting on_delete=CASCADE is the same thing as putting nothing.


回答 3

如前所述,CASCADE将删除具有外键的记录,并引用另一个已删除的对象。因此,例如,如果您有一个房地产网站,并且有一个引用城市的房地产

class City(models.Model):
    # define model fields for a city

class Property(models.Model):
    city = models.ForeignKey(City, on_delete = models.CASCADE)
    # define model fields for a property

现在,当从数据库中删除城市时,所有关联的属性(例如,位于该城市的房地产)也将从数据库中删除

现在,我还要提及其他选项的优点,例如SET_NULL或SET_DEFAULT甚至DO_NOTHING。基本上,从管理角度来看,您要“删除”这些记录。但是您真的不希望它们消失。因为许多的原因。可能有人不小心删除了该文件,或者进行了审核和监视。和简单的报告。因此,这可能是一种将财产与城市“断开连接”的方式。同样,这将取决于您的应用程序的编写方式。

例如,某些应用程序的“已删除”字段为0或1。所有搜索和列表视图等内容,可能出现在报表中或用户可以从前端访问它的任何位置,均不包括deleted == 1。但是,如果您创建自定义报告或自定义查询来下拉已删除记录的列表,甚至更多,以便查看上次修改的时间(另一个字段)以及由谁(即谁删除它和何时删除)。从行政角度来看,这是非常有利的。

并且不要忘记,您可以像还原deleted = 0那些记录一样简单地还原意外删除。

我的观点是,如果有功能,总会有其背后的原因。并非总是一个很好的理由。但这是一个原因。往往也是一个好人。

As mentioned earlier, CASCADE will delete the record that has a foreign key and references another object that was deleted. So for example if you have a real estate website and have a Property that references a City

class City(models.Model):
    # define model fields for a city

class Property(models.Model):
    city = models.ForeignKey(City, on_delete = models.CASCADE)
    # define model fields for a property

and now when the City is deleted from the database, all associated Properties (eg. real estate located in that city) will also be deleted from the database

Now I also want to mention the merit of other options, such as SET_NULL or SET_DEFAULT or even DO_NOTHING. Basically, from the administration perspective, you want to “delete” those records. But you don’t really want them to disappear. For many reasons. Someone might have deleted it accidentally, or for auditing and monitoring. And plain reporting. So it can be a way to “disconnect” the property from a City. Again, it will depend on how your application is written.

For example, some applications have a field “deleted” which is 0 or 1. And all their searches and list views etc, anything that can appear in reports or anywhere the user can access it from the front end, exclude anything that is deleted == 1. However, if you create a custom report or a custom query to pull down a list of records that were deleted and even more so to see when it was last modified (another field) and by whom (i.e. who deleted it and when)..that is very advantageous from the executive standpoint.

And don’t forget that you can revert accidental deletions as simple as deleted = 0 for those records.

My point is, if there is a functionality, there is always a reason behind it. Not always a good reason. But a reason. And often a good one too.


回答 4

这是您的问题答案:为什么我们使用on_delete?

删除由ForeignKey引用的对象时,默认情况下,Django会模拟SQL约束ON DELETE CASCADE的行为,并删除包含ForeignKey的对象。通过指定on_delete参数可以覆盖此行为。例如,如果您具有可为空的ForeignKey,并且希望在删除引用的对象时将其设置为null:

user = models.ForeignKey(User, blank=True, null=True, on_delete=models.SET_NULL)

on_delete的可能值在django.db.models中找到:

级联级联删除;默认值。

保护:通过引发ProtectedError(django.db.IntegrityError的子类)来防止删除引用的对象。

SET_NULL:将ForeignKey设置为null;否则为false。仅当null为True时才有可能。

SET_DEFAULT:将ForeignKey设置为其默认值;必须为ForeignKey设置默认值。

Here is answer for your question that says: why we use on_delete?

When an object referenced by a ForeignKey is deleted, Django by default emulates the behavior of the SQL constraint ON DELETE CASCADE and also deletes the object containing the ForeignKey. This behavior can be overridden by specifying the on_delete argument. For example, if you have a nullable ForeignKey and you want it to be set null when the referenced object is deleted:

user = models.ForeignKey(User, blank=True, null=True, on_delete=models.SET_NULL)

The possible values for on_delete are found in django.db.models:

CASCADE: Cascade deletes; the default.

PROTECT: Prevent deletion of the referenced object by raising ProtectedError, a subclass of django.db.IntegrityError.

SET_NULL: Set the ForeignKey null; this is only possible if null is True.

SET_DEFAULT: Set the ForeignKey to its default value; a default for the ForeignKey must be set.


回答 5

假设您有两种模型,一种名为Person,另一种名为Companies

根据定义,一个人可以创建多个公司。

考虑到一个公司只能有一个人,因此我们希望在删除一个人时也删除与该人关联的所有公司。

因此,我们首先创建一个Person模型,像这样

class Person(models.Model):
    id = models.IntegerField(primary_key=True)
    name = models.CharField(max_length=20)

    def __str__(self):
        return self.id+self.name

然后,公司模型如下所示

class Companies(models.Model):
    title = models.CharField(max_length=20)
    description=models.CharField(max_length=10)
    person= models.ForeignKey(Person,related_name='persons',on_delete=models.CASCADE)

注意on_delete=models.CASCADE模型公司中的用法。也就是删除拥有它的人(Person类的实例)时删除所有公司。

Let’s say you have two models, one named Person and another one named Companies.

By definition, one person can create more than one company.

Considering a company can have one and only one person, we want that when a person is deleted that all the companies associated with that person also be deleted.

So, we start by creating a Person model, like this

class Person(models.Model):
    id = models.IntegerField(primary_key=True)
    name = models.CharField(max_length=20)

    def __str__(self):
        return self.id+self.name

Then, the Companies model can look like this

class Companies(models.Model):
    title = models.CharField(max_length=20)
    description=models.CharField(max_length=10)
    person= models.ForeignKey(Person,related_name='persons',on_delete=models.CASCADE)

Notice the usage of on_delete=models.CASCADE in the model Companies. That is to delete all companies when the person that owns it (instance of class Person) is deleted.


回答 6

通过考虑将FK添加到已存在的级联(即瀑布)中来重新定向“ CASCADE”功能的思维模型。该瀑布的来源是主键。删除流向下。

因此,如果将FK的on_delete定义为“ CASCADE”,则需要将此FK的记录添加到源自PK的一系列删除中。FK的记录是否可以参与此级联(“ SET_NULL”)。实际上,带有FK的记录甚至可能阻止删除流程!用“保护”建造一个水坝。

Re-orient your mental model of the functionality of “CASCADE” by thinking of adding a FK to an already existing cascade (i.e. a waterfall). The source of this waterfall is a Primary Key. Deletes flow down.

So if you define a FK’s on_delete as “CASCADE,” you’re adding this FK’s record to a cascade of deletes originating from the PK. The FK’s record may participate in this cascade or not (“SET_NULL”). In fact, a record with a FK may even prevent the flow of the deletes! Build a dam with “PROTECT.”


回答 7

使用CASCADE意味着实际上告诉Django删除引用的记录。在下面的民意调查应用示例中:当“问题”被删除时,它还将删除该问题具有的选择。

例如:问题:您如何得知我们的?(选择:1.朋友2.电视广告3.搜索引擎4.电子邮件促销)

删除此问题时,它还将从表中删除所有这四个选项。 请注意它流动的方向。您不必放置on_delete = models。问题模型中的CASCADE将其放置在Choice中。

from django.db import models



class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.dateTimeField('date_published')

class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_legth=200)
    votes = models.IntegerField(default=0)

Using CASCADE means actually telling Django to delete the referenced record. In the poll app example below: When a ‘Question’ gets deleted it will also delete the Choices this Question has.

e.g Question: How did you hear about us? (Choices: 1. Friends 2. TV Ad 3. Search Engine 4. Email Promotion)

When you delete this question, it will also delete all these four choices from the table. Note that which direction it flows. You don’t have to put on_delete=models.CASCADE in Question Model put it in the Choice.

from django.db import models



class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.dateTimeField('date_published')

class Choice(models.Model):
    question = models.ForeignKey(Question, on_delete=models.CASCADE)
    choice_text = models.CharField(max_legth=200)
    votes = models.IntegerField(default=0)

如何在Python中获得类似于Cron的调度程序?[关闭]

问题:如何在Python中获得类似于Cron的调度程序?[关闭]

我正在寻找在Python库将提供atcron一样的功能。

我很想拥有一个纯Python解决方案,而不是依赖于安装在盒子上的工具;这样,我可以在没有cron的机器上运行。

对于不熟悉的用户,cron您可以根据以下表达式来安排任务:

 0 2 * * 7 /usr/bin/run-backup # run the backups at 0200 on Every Sunday
 0 9-17/2 * * 1-5 /usr/bin/purge-temps # run the purge temps command, every 2 hours between 9am and 5pm on Mondays to Fridays.

cron时间表达式语法不太重要,但是我希望具有这种灵活性。

如果没有任何东西可以立即为我执行此操作,将不胜感激地收到有关构建基块进行类似操作的任何建议。

编辑 我对启动过程不感兴趣,只是“工作”也用Python编写-python函数。必要时,我认为这将是一个不同的线程,但不会出现在不同的过程中。

为此,我正在寻找cron时间表达式的可表达性,但是在Python中。

Cron 已经存在了很多年,但我正在尝试尽可能地便携。我不能依靠它的存在。

I’m looking for a library in Python which will provide at and cron like functionality.

I’d quite like have a pure Python solution, rather than relying on tools installed on the box; this way I run on machines with no cron.

For those unfamiliar with cron: you can schedule tasks based upon an expression like:

 0 2 * * 7 /usr/bin/run-backup # run the backups at 0200 on Every Sunday
 0 9-17/2 * * 1-5 /usr/bin/purge-temps # run the purge temps command, every 2 hours between 9am and 5pm on Mondays to Fridays.

The cron time expression syntax is less important, but I would like to have something with this sort of flexibility.

If there isn’t something that does this for me out-the-box, any suggestions for the building blocks to make something like this would be gratefully received.

Edit I’m not interested in launching processes, just “jobs” also written in Python – python functions. By necessity I think this would be a different thread, but not in a different process.

To this end, I’m looking for the expressivity of the cron time expression, but in Python.

Cron has been around for years, but I’m trying to be as portable as possible. I cannot rely on its presence.


回答 0

如果您正在寻找轻巧的结帐时间表

import schedule
import time

def job():
    print("I'm working...")

schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)

while 1:
    schedule.run_pending()
    time.sleep(1)

披露:我是那个图书馆的作者。

If you’re looking for something lightweight checkout schedule:

import schedule
import time

def job():
    print("I'm working...")

schedule.every(10).minutes.do(job)
schedule.every().hour.do(job)
schedule.every().day.at("10:30").do(job)

while 1:
    schedule.run_pending()
    time.sleep(1)

Disclosure: I’m the author of that library.


回答 1

您可以只使用普通的Python参数传递语法来指定crontab。例如,假设我们定义一个Event类,如下所示:

from datetime import datetime, timedelta
import time

# Some utility classes / functions first
class AllMatch(set):
    """Universal set - match everything"""
    def __contains__(self, item): return True

allMatch = AllMatch()

def conv_to_set(obj):  # Allow single integer to be provided
    if isinstance(obj, (int,long)):
        return set([obj])  # Single item
    if not isinstance(obj, set):
        obj = set(obj)
    return obj

# The actual Event class
class Event(object):
    def __init__(self, action, min=allMatch, hour=allMatch, 
                       day=allMatch, month=allMatch, dow=allMatch, 
                       args=(), kwargs={}):
        self.mins = conv_to_set(min)
        self.hours= conv_to_set(hour)
        self.days = conv_to_set(day)
        self.months = conv_to_set(month)
        self.dow = conv_to_set(dow)
        self.action = action
        self.args = args
        self.kwargs = kwargs

    def matchtime(self, t):
        """Return True if this event should trigger at the specified datetime"""
        return ((t.minute     in self.mins) and
                (t.hour       in self.hours) and
                (t.day        in self.days) and
                (t.month      in self.months) and
                (t.weekday()  in self.dow))

    def check(self, t):
        if self.matchtime(t):
            self.action(*self.args, **self.kwargs)

(注意:未经彻底测试)

然后,可以使用普通的python语法将CronTab指定为:

c = CronTab(
  Event(perform_backup, 0, 2, dow=6 ),
  Event(purge_temps, 0, range(9,18,2), dow=range(0,5))
)

这样,您就可以充分利用Python的参数机制(混合使用位置和关键字args,并且可以将符号名称用于星期和几个月的名称)

将CronTab类定义为仅以分钟为增量休眠,并在每个事件上调用check()。(夏令时/时区可能有一些细微之处,请注意。)这是一个快速实现:

class CronTab(object):
    def __init__(self, *events):
        self.events = events

    def run(self):
        t=datetime(*datetime.now().timetuple()[:5])
        while 1:
            for e in self.events:
                e.check(t)

            t += timedelta(minutes=1)
            while datetime.now() < t:
                time.sleep((t - datetime.now()).seconds)

需要注意的几件事:Python的工作日/月为零索引(与cron不同),并且该范围排除了最后一个元素,因此像“ 1-5”这样的语法变为range(0,5)-即[0,1,2, 3,4]。如果您喜欢cron语法,那么对其进行解析应该不会太困难。

You could just use normal Python argument passing syntax to specify your crontab. For example, suppose we define an Event class as below:

from datetime import datetime, timedelta
import time

# Some utility classes / functions first
class AllMatch(set):
    """Universal set - match everything"""
    def __contains__(self, item): return True

allMatch = AllMatch()

def conv_to_set(obj):  # Allow single integer to be provided
    if isinstance(obj, (int,long)):
        return set([obj])  # Single item
    if not isinstance(obj, set):
        obj = set(obj)
    return obj

# The actual Event class
class Event(object):
    def __init__(self, action, min=allMatch, hour=allMatch, 
                       day=allMatch, month=allMatch, dow=allMatch, 
                       args=(), kwargs={}):
        self.mins = conv_to_set(min)
        self.hours= conv_to_set(hour)
        self.days = conv_to_set(day)
        self.months = conv_to_set(month)
        self.dow = conv_to_set(dow)
        self.action = action
        self.args = args
        self.kwargs = kwargs

    def matchtime(self, t):
        """Return True if this event should trigger at the specified datetime"""
        return ((t.minute     in self.mins) and
                (t.hour       in self.hours) and
                (t.day        in self.days) and
                (t.month      in self.months) and
                (t.weekday()  in self.dow))

    def check(self, t):
        if self.matchtime(t):
            self.action(*self.args, **self.kwargs)

(Note: Not thoroughly tested)

Then your CronTab can be specified in normal python syntax as:

c = CronTab(
  Event(perform_backup, 0, 2, dow=6 ),
  Event(purge_temps, 0, range(9,18,2), dow=range(0,5))
)

This way you get the full power of Python’s argument mechanics (mixing positional and keyword args, and can use symbolic names for names of weeks and months)

The CronTab class would be defined as simply sleeping in minute increments, and calling check() on each event. (There are probably some subtleties with daylight savings time / timezones to be wary of though). Here’s a quick implementation:

class CronTab(object):
    def __init__(self, *events):
        self.events = events

    def run(self):
        t=datetime(*datetime.now().timetuple()[:5])
        while 1:
            for e in self.events:
                e.check(t)

            t += timedelta(minutes=1)
            while datetime.now() < t:
                time.sleep((t - datetime.now()).seconds)

A few things to note: Python’s weekdays / months are zero indexed (unlike cron), and that range excludes the last element, hence syntax like “1-5” becomes range(0,5) – ie [0,1,2,3,4]. If you prefer cron syntax, parsing it shouldn’t be too difficult however.


回答 2

也许只有在问了问题之后才出现这种情况;我以为是出于完整性考虑而只提到它:https : //apscheduler.readthedocs.org/en/latest/

maybe this has come up only after the question was asked; I thought I just mention it for completeness sake: https://apscheduler.readthedocs.org/en/latest/


回答 3

我在搜索中看到的一件事是python的sched模块,这可能是您正在寻找的东西。

One thing that in my searches I’ve seen is python’s sched module which might be the kind of thing you’re looking for.


回答 4

“ … Crontab模块,用于读取和写入crontab文件以及自动且简单地使用直接API来访问系统cron。…”

http://pypi.python.org/pypi/python-crontab

还有APScheduler(一个python软件包)。已经编写和调试。

http://packages.python.org/APScheduler/cronschedule.html

“… Crontab module for read and writing crontab files and accessing the system cron automatically and simply using a direct API. …”

http://pypi.python.org/pypi/python-crontab

and also APScheduler, a python package. Already written & debugged.

http://packages.python.org/APScheduler/cronschedule.html


回答 5

与上面大致相同,但同时使用gevent :)

"""Gevent based crontab implementation"""

from datetime import datetime, timedelta
import gevent

# Some utility classes / functions first
def conv_to_set(obj):
    """Converts to set allowing single integer to be provided"""

    if isinstance(obj, (int, long)):
        return set([obj])  # Single item
    if not isinstance(obj, set):
        obj = set(obj)
    return obj

class AllMatch(set):
    """Universal set - match everything"""
    def __contains__(self, item): 
        return True

allMatch = AllMatch()

class Event(object):
    """The Actual Event Class"""

    def __init__(self, action, minute=allMatch, hour=allMatch, 
                       day=allMatch, month=allMatch, daysofweek=allMatch, 
                       args=(), kwargs={}):
        self.mins = conv_to_set(minute)
        self.hours = conv_to_set(hour)
        self.days = conv_to_set(day)
        self.months = conv_to_set(month)
        self.daysofweek = conv_to_set(daysofweek)
        self.action = action
        self.args = args
        self.kwargs = kwargs

    def matchtime(self, t1):
        """Return True if this event should trigger at the specified datetime"""
        return ((t1.minute     in self.mins) and
                (t1.hour       in self.hours) and
                (t1.day        in self.days) and
                (t1.month      in self.months) and
                (t1.weekday()  in self.daysofweek))

    def check(self, t):
        """Check and run action if needed"""

        if self.matchtime(t):
            self.action(*self.args, **self.kwargs)

class CronTab(object):
    """The crontab implementation"""

    def __init__(self, *events):
        self.events = events

    def _check(self):
        """Check all events in separate greenlets"""

        t1 = datetime(*datetime.now().timetuple()[:5])
        for event in self.events:
            gevent.spawn(event.check, t1)

        t1 += timedelta(minutes=1)
        s1 = (t1 - datetime.now()).seconds + 1
        print "Checking again in %s seconds" % s1
        job = gevent.spawn_later(s1, self._check)

    def run(self):
        """Run the cron forever"""

        self._check()
        while True:
            gevent.sleep(60)

import os 
def test_task():
    """Just an example that sends a bell and asd to all terminals"""

    os.system('echo asd | wall')  

cron = CronTab(
  Event(test_task, 22, 1 ),
  Event(test_task, 0, range(9,18,2), daysofweek=range(0,5)),
)
cron.run()

More or less same as above but concurrent using gevent :)

"""Gevent based crontab implementation"""

from datetime import datetime, timedelta
import gevent

# Some utility classes / functions first
def conv_to_set(obj):
    """Converts to set allowing single integer to be provided"""

    if isinstance(obj, (int, long)):
        return set([obj])  # Single item
    if not isinstance(obj, set):
        obj = set(obj)
    return obj

class AllMatch(set):
    """Universal set - match everything"""
    def __contains__(self, item): 
        return True

allMatch = AllMatch()

class Event(object):
    """The Actual Event Class"""

    def __init__(self, action, minute=allMatch, hour=allMatch, 
                       day=allMatch, month=allMatch, daysofweek=allMatch, 
                       args=(), kwargs={}):
        self.mins = conv_to_set(minute)
        self.hours = conv_to_set(hour)
        self.days = conv_to_set(day)
        self.months = conv_to_set(month)
        self.daysofweek = conv_to_set(daysofweek)
        self.action = action
        self.args = args
        self.kwargs = kwargs

    def matchtime(self, t1):
        """Return True if this event should trigger at the specified datetime"""
        return ((t1.minute     in self.mins) and
                (t1.hour       in self.hours) and
                (t1.day        in self.days) and
                (t1.month      in self.months) and
                (t1.weekday()  in self.daysofweek))

    def check(self, t):
        """Check and run action if needed"""

        if self.matchtime(t):
            self.action(*self.args, **self.kwargs)

class CronTab(object):
    """The crontab implementation"""

    def __init__(self, *events):
        self.events = events

    def _check(self):
        """Check all events in separate greenlets"""

        t1 = datetime(*datetime.now().timetuple()[:5])
        for event in self.events:
            gevent.spawn(event.check, t1)

        t1 += timedelta(minutes=1)
        s1 = (t1 - datetime.now()).seconds + 1
        print "Checking again in %s seconds" % s1
        job = gevent.spawn_later(s1, self._check)

    def run(self):
        """Run the cron forever"""

        self._check()
        while True:
            gevent.sleep(60)

import os 
def test_task():
    """Just an example that sends a bell and asd to all terminals"""

    os.system('echo asd | wall')  

cron = CronTab(
  Event(test_task, 22, 1 ),
  Event(test_task, 0, range(9,18,2), daysofweek=range(0,5)),
)
cron.run()

回答 6

列出的解决方案均未尝试解析复杂的cron计划字符串。所以,这是我的版本,使用croniter。基本要点:

schedule = "*/5 * * * *" # Run every five minutes

nextRunTime = getNextCronRunTime(schedule)
while True:
     roundedDownTime = roundDownTime()
     if (roundedDownTime == nextRunTime):
         ####################################
         ### Do your periodic thing here. ###
         ####################################
         nextRunTime = getNextCronRunTime(schedule)
     elif (roundedDownTime > nextRunTime):
         # We missed an execution. Error. Re initialize.
         nextRunTime = getNextCronRunTime(schedule)
     sleepTillTopOfNextMinute()

辅助程序:

from croniter import croniter
from datetime import datetime, timedelta

# Round time down to the top of the previous minute
def roundDownTime(dt=None, dateDelta=timedelta(minutes=1)):
    roundTo = dateDelta.total_seconds()
    if dt == None : dt = datetime.now()
    seconds = (dt - dt.min).seconds
    rounding = (seconds+roundTo/2) // roundTo * roundTo
    return dt + timedelta(0,rounding-seconds,-dt.microsecond)

# Get next run time from now, based on schedule specified by cron string
def getNextCronRunTime(schedule):
    return croniter(schedule, datetime.now()).get_next(datetime)

# Sleep till the top of the next minute
def sleepTillTopOfNextMinute():
    t = datetime.utcnow()
    sleeptime = 60 - (t.second + t.microsecond/1000000.0)
    time.sleep(sleeptime)

None of the listed solutions even attempt to parse a complex cron schedule string. So, here is my version, using croniter. Basic gist:

schedule = "*/5 * * * *" # Run every five minutes

nextRunTime = getNextCronRunTime(schedule)
while True:
     roundedDownTime = roundDownTime()
     if (roundedDownTime == nextRunTime):
         ####################################
         ### Do your periodic thing here. ###
         ####################################
         nextRunTime = getNextCronRunTime(schedule)
     elif (roundedDownTime > nextRunTime):
         # We missed an execution. Error. Re initialize.
         nextRunTime = getNextCronRunTime(schedule)
     sleepTillTopOfNextMinute()

Helper routines:

from croniter import croniter
from datetime import datetime, timedelta

# Round time down to the top of the previous minute
def roundDownTime(dt=None, dateDelta=timedelta(minutes=1)):
    roundTo = dateDelta.total_seconds()
    if dt == None : dt = datetime.now()
    seconds = (dt - dt.min).seconds
    rounding = (seconds+roundTo/2) // roundTo * roundTo
    return dt + timedelta(0,rounding-seconds,-dt.microsecond)

# Get next run time from now, based on schedule specified by cron string
def getNextCronRunTime(schedule):
    return croniter(schedule, datetime.now()).get_next(datetime)

# Sleep till the top of the next minute
def sleepTillTopOfNextMinute():
    t = datetime.utcnow()
    sleeptime = 60 - (t.second + t.microsecond/1000000.0)
    time.sleep(sleeptime)

回答 7

我已经修改了脚本。

  1. 易于使用:

    cron = Cron()
    cron.add('* * * * *'   , minute_task) # every minute
    cron.add('33 * * * *'  , day_task)    # every hour
    cron.add('34 18 * * *' , day_task)    # every day
    cron.run()
  2. 尝试在一分钟的第一秒开始任务。

Github上的代码

I have modified the script.

  1. Easy to use:

    cron = Cron()
    cron.add('* * * * *'   , minute_task) # every minute
    cron.add('33 * * * *'  , day_task)    # every hour
    cron.add('34 18 * * *' , day_task)    # every day
    cron.run()
    
  2. Try to start task in the first second of a minute.

Code on Github


回答 8

我有一个小的修复 Brian建议 CronTab类运行方法

计时时间为一秒钟,导致每分钟结束时出现一秒钟的硬循环。

class CronTab(object):
    def __init__(self, *events):
        self.events = events

    def run(self):
        t=datetime(*datetime.now().timetuple()[:5])
        while 1:
            for e in self.events:
                e.check(t)

            t += timedelta(minutes=1)
            n = datetime.now()
            while n < t:
                s = (t - n).seconds + 1
                time.sleep(s)
                n = datetime.now()

I have a minor fix for the CronTab class run method suggested by Brian.

The timing was out by one second leading to a one-second, hard loop at the end of each minute.

class CronTab(object):
    def __init__(self, *events):
        self.events = events

    def run(self):
        t=datetime(*datetime.now().timetuple()[:5])
        while 1:
            for e in self.events:
                e.check(t)

            t += timedelta(minutes=1)
            n = datetime.now()
            while n < t:
                s = (t - n).seconds + 1
                time.sleep(s)
                n = datetime.now()

回答 9

没有“纯python”方法可以执行此操作,因为某些其他过程将必须启动python才能运行您的解决方案。每个平台都有一种或二十种不同的方式来启动流程和监视其进度。在UNIX平台上,cron是旧标准。在Mac OS X上,还启动了该功能,它将类似cron的启动与看门狗功能结合在一起,如果需要的话,可以使您的进程保持活动状态。python运行后,即可使用sched模块安排任务。

There isn’t a “pure python” way to do this because some other process would have to launch python in order to run your solution. Every platform will have one or twenty different ways to launch processes and monitor their progress. On unix platforms, cron is the old standard. On Mac OS X there is also launchd, which combines cron-like launching with watchdog functionality that can keep your process alive if that’s what you want. Once python is running, then you can use the sched module to schedule tasks.


回答 10

我知道有很多答案,但是另一个解决方案可能是与装饰器搭配使用。这是每天在特定时间重复执行功能的示例。关于使用这种方式的很酷的想法是,您只需要向要计划的功能添加语法糖

@repeatEveryDay(hour=6, minutes=30)
def sayHello(name):
    print(f"Hello {name}")

sayHello("Bob") # Now this function will be invoked every day at 6.30 a.m

装饰器将如下所示:

def repeatEveryDay(hour, minutes=0, seconds=0):
    """
    Decorator that will run the decorated function everyday at that hour, minutes and seconds.
    :param hour: 0-24
    :param minutes: 0-60 (Optional)
    :param seconds: 0-60 (Optional)
    """
    def decoratorRepeat(func):

        @functools.wraps(func)
        def wrapperRepeat(*args, **kwargs):

            def getLocalTime():
                return datetime.datetime.fromtimestamp(time.mktime(time.localtime()))

            # Get the datetime of the first function call
            td = datetime.timedelta(seconds=15)
            if wrapperRepeat.nextSent == None:
                now = getLocalTime()
                wrapperRepeat.nextSent = datetime.datetime(now.year, now.month, now.day, hour, minutes, seconds)
                if wrapperRepeat.nextSent < now:
                    wrapperRepeat.nextSent += td

            # Waiting till next day
            while getLocalTime() < wrapperRepeat.nextSent:
                time.sleep(1)

            # Call the function
            func(*args, **kwargs)

            # Get the datetime of the next function call
            wrapperRepeat.nextSent += td
            wrapperRepeat(*args, **kwargs)

        wrapperRepeat.nextSent = None
        return wrapperRepeat

    return decoratorRepeat

I know there are a lot of answers, but another solution could be to go with decorators. This is an example to repeat a function everyday at a specific time. The cool think about using this way is that you only need to add the Syntactic Sugar to the function you want to schedule:

@repeatEveryDay(hour=6, minutes=30)
def sayHello(name):
    print(f"Hello {name}")

sayHello("Bob") # Now this function will be invoked every day at 6.30 a.m

And the decorator will look like:

def repeatEveryDay(hour, minutes=0, seconds=0):
    """
    Decorator that will run the decorated function everyday at that hour, minutes and seconds.
    :param hour: 0-24
    :param minutes: 0-60 (Optional)
    :param seconds: 0-60 (Optional)
    """
    def decoratorRepeat(func):

        @functools.wraps(func)
        def wrapperRepeat(*args, **kwargs):

            def getLocalTime():
                return datetime.datetime.fromtimestamp(time.mktime(time.localtime()))

            # Get the datetime of the first function call
            td = datetime.timedelta(seconds=15)
            if wrapperRepeat.nextSent == None:
                now = getLocalTime()
                wrapperRepeat.nextSent = datetime.datetime(now.year, now.month, now.day, hour, minutes, seconds)
                if wrapperRepeat.nextSent < now:
                    wrapperRepeat.nextSent += td

            # Waiting till next day
            while getLocalTime() < wrapperRepeat.nextSent:
                time.sleep(1)

            # Call the function
            func(*args, **kwargs)

            # Get the datetime of the next function call
            wrapperRepeat.nextSent += td
            wrapperRepeat(*args, **kwargs)

        wrapperRepeat.nextSent = None
        return wrapperRepeat

    return decoratorRepeat

回答 11

Brian的解决方案运行良好。但是,正如其他人指出的那样,运行代码中存在一个细微的错误。我也发现它的需求过于复杂。

如果有人需要,这是我对运行代码更简单,更实用的替代方法:

def run(self):
    while 1:
        t = datetime.now()
        for e in self.events:
            e.check(t)

        time.sleep(60 - t.second - t.microsecond / 1000000.0)

Brian’s solution is working quite well. However, as others have pointed out, there is a subtle bug in the run code. Also i found it overly complicated for the needs.

Here is my simpler and functional alternative for the run code in case anybody needs it:

def run(self):
    while 1:
        t = datetime.now()
        for e in self.events:
            e.check(t)

        time.sleep(60 - t.second - t.microsecond / 1000000.0)

回答 12

另一个简单的解决方案是:

from aqcron import At
from time import sleep
from datetime import datetime

# Event scheduling
event_1 = At( second=5 )
event_2 = At( second=[0,20,40] )

while True:
    now = datetime.now()

    # Event check
    if now in event_1: print "event_1"
    if now in event_2: print "event_2"

    sleep(1)

而类aqcron.At是:

# aqcron.py

class At(object):
    def __init__(self, year=None,    month=None,
                 day=None,     weekday=None,
                 hour=None,    minute=None,
                 second=None):
        loc = locals()
        loc.pop("self")
        self.at = dict((k, v) for k, v in loc.iteritems() if v != None)

    def __contains__(self, now):
        for k in self.at.keys():
            try:
                if not getattr(now, k) in self.at[k]: return False
            except TypeError:
                if self.at[k] != getattr(now, k): return False
        return True

Another trivial solution would be:

from aqcron import At
from time import sleep
from datetime import datetime

# Event scheduling
event_1 = At( second=5 )
event_2 = At( second=[0,20,40] )

while True:
    now = datetime.now()

    # Event check
    if now in event_1: print "event_1"
    if now in event_2: print "event_2"

    sleep(1)

And the class aqcron.At is:

# aqcron.py

class At(object):
    def __init__(self, year=None,    month=None,
                 day=None,     weekday=None,
                 hour=None,    minute=None,
                 second=None):
        loc = locals()
        loc.pop("self")
        self.at = dict((k, v) for k, v in loc.iteritems() if v != None)

    def __contains__(self, now):
        for k in self.at.keys():
            try:
                if not getattr(now, k) in self.at[k]: return False
            except TypeError:
                if self.at[k] != getattr(now, k): return False
        return True

回答 13

如果您正在寻找分布式调度程序,则可以查看https://github.com/sherinkurian/mani-尽管确实需要redis,所以可能不是您想要的。(请注意,我是作者)这是通过使时钟在多个节点上运行来确保容错的。

If you are looking for a distributed scheduler, you can check out https://github.com/sherinkurian/mani – it does need redis though so might not be what you are looking for. (note that i am the author) this was built to ensure fault-tolerance by having clock run on more than one node.


回答 14

我不知道是否已经存在类似的东西。使用时间,日期时间和/或日历模块轻松编写自己的代码,请参见http://docs.python.org/library/time.html

python解决方案的唯一问题是您的工作需要始终运行,并且可能在重新启动后自动“恢复”,而您确实需要依赖于系统的解决方案。

I don’t know if something like that already exists. It would be easy to write your own with time, datetime and/or calendar modules, see http://docs.python.org/library/time.html

The only concern for a python solution is that your job needs to be always running and possibly be automatically “resurrected” after a reboot, something for which you do need to rely on system dependent solutions.


回答 15

您可以查看PiCloud的[1] Crons [2],但请注意,您的作业不会在您自己的计算机上运行。如果您每月使用20个小时以上的计算时间,则还需要付费。

[1] http://www.picloud.com

[2] http://docs.picloud.com/cron.html

You can check out PiCloud’s [1] Crons [2], but do note that your jobs won’t be running on your own machine. It’s also a service that you’ll need to pay for if you use more than 20 hours of compute time a month.

[1] http://www.picloud.com

[2] http://docs.picloud.com/cron.html


回答 16

服务器上Crontab的方法。

Python文件名hello.py

步骤1:创建一个sh文件,让其命名为s.sh

python3 /home/ubuntu/Shaurya/Folder/hello.py> /home/ubuntu/Shaurya/Folder/log.txt 2>&1

第2步:打开Crontab编辑器

crontab -e

步骤3:添加计划时间

使用Crontab格式

2 * * * * sudo sh /home/ubuntu/Shaurya/Folder/s.sh

该cron将在“第2分钟”运行。

Method of Crontab on Server.

Python file name hello.py

Step1: Create a sh file let give name s.sh

python3 /home/ubuntu/Shaurya/Folder/hello.py > /home/ubuntu/Shaurya/Folder/log.txt 2>&1

Step2: Open Crontab Editor

crontab -e

Step3: Add Schedule Time

Use Crontab Formatting

2 * * * * sudo sh /home/ubuntu/Shaurya/Folder/s.sh

This cron will run “At minute 2.”


回答 17

我喜欢pycron软件包如何解决此问题。

import pycron
import time

while True:
    if pycron.is_now('0 2 * * 0'):   # True Every Sunday at 02:00
        print('running backup')
    time.sleep(60)

I like how the pycron package solves this problem.

import pycron
import time

while True:
    if pycron.is_now('0 2 * * 0'):   # True Every Sunday at 02:00
        print('running backup')
    time.sleep(60)

是否可以将可变数量的参数传递给函数?

问题:是否可以将可变数量的参数传递给函数?

与在C或C ++中使用varargs的方式类似:

fn(a, b)
fn(a, b, c, d, ...)

In a similar way to using varargs in C or C++:

fn(a, b)
fn(a, b, c, d, ...)

回答 0

是。您可以将其*args用作非关键字参数。然后,您将可以传递任意数量的参数。

def manyArgs(*arg):
  print "I was called with", len(arg), "arguments:", arg

>>> manyArgs(1)
I was called with 1 arguments: (1,)
>>> manyArgs(1, 2, 3)
I was called with 3 arguments: (1, 2, 3)

如您所见,Python会将所有参数作为一个元组解压缩参数。

对于关键字参数,您需要将其作为单独的实际参数接受,如Skurmedel的answer所示。

Yes. You can use *args as a non-keyword argument. You will then be able to pass any number of arguments.

def manyArgs(*arg):
  print "I was called with", len(arg), "arguments:", arg

>>> manyArgs(1)
I was called with 1 arguments: (1,)
>>> manyArgs(1, 2, 3)
I was called with 3 arguments: (1, 2, 3)

As you can see, Python will unpack the arguments as a single tuple with all the arguments.

For keyword arguments you need to accept those as a separate actual argument, as shown in Skurmedel’s answer.


回答 1

添加到发布帖子:

您也可以发送多个键值参数。

def myfunc(**kwargs):
    # kwargs is a dictionary.
    for k,v in kwargs.iteritems():
         print "%s = %s" % (k, v)

myfunc(abc=123, efh=456)
# abc = 123
# efh = 456

您可以将两者混合使用:

def myfunc2(*args, **kwargs):
   for a in args:
       print a
   for k,v in kwargs.iteritems():
       print "%s = %s" % (k, v)

myfunc2(1, 2, 3, banan=123)
# 1
# 2
# 3
# banan = 123

必须同时声明和调用它们,也就是说,函数签名必须为* args,** kwargs,并以该顺序调用。

Adding to unwinds post:

You can send multiple key-value args too.

def myfunc(**kwargs):
    # kwargs is a dictionary.
    for k,v in kwargs.iteritems():
         print "%s = %s" % (k, v)

myfunc(abc=123, efh=456)
# abc = 123
# efh = 456

And you can mix the two:

def myfunc2(*args, **kwargs):
   for a in args:
       print a
   for k,v in kwargs.iteritems():
       print "%s = %s" % (k, v)

myfunc2(1, 2, 3, banan=123)
# 1
# 2
# 3
# banan = 123

They must be both declared and called in that order, that is the function signature needs to be *args, **kwargs, and called in that order.


回答 2

如果可以的话,Skurmedel的代码适用于python 2;使其适应python 3,将更iteritems改为items并添加括号print。这可能会阻止像我这样的初学者进入: AttributeError: 'dict' object has no attribute 'iteritems'并在其他位置搜索(例如,尝试使用NetworkX的write_shp()时错误“’dict’对象没有属性’iteritems”))。

def myfunc(**kwargs):
for k,v in kwargs.items():
   print("%s = %s" % (k, v))

myfunc(abc=123, efh=456)
# abc = 123
# efh = 456

和:

def myfunc2(*args, **kwargs):
   for a in args:
       print(a)
   for k,v in kwargs.items():
       print("%s = %s" % (k, v))

myfunc2(1, 2, 3, banan=123)
# 1
# 2
# 3
# banan = 123

If I may, Skurmedel’s code is for python 2; to adapt it to python 3, change iteritems to items and add parenthesis to print. That could prevent beginners like me to bump into: AttributeError: 'dict' object has no attribute 'iteritems' and search elsewhere (e.g. Error “ ‘dict’ object has no attribute ‘iteritems’ ” when trying to use NetworkX’s write_shp()) why this is happening.

def myfunc(**kwargs):
for k,v in kwargs.items():
   print("%s = %s" % (k, v))

myfunc(abc=123, efh=456)
# abc = 123
# efh = 456

and:

def myfunc2(*args, **kwargs):
   for a in args:
       print(a)
   for k,v in kwargs.items():
       print("%s = %s" % (k, v))

myfunc2(1, 2, 3, banan=123)
# 1
# 2
# 3
# banan = 123

回答 3

添加到其他优秀职位。

有时,您不想指定参数的数目,不想为它们使用键(如果在方法中未使用在字典中传递的一个参数,则编译器会抱怨)。

def manyArgs1(args):
  print args.a, args.b #note args.c is not used here

def manyArgs2(args):
  print args.c #note args.b and .c are not used here

class Args: pass

args = Args()
args.a = 1
args.b = 2
args.c = 3

manyArgs1(args) #outputs 1 2
manyArgs2(args) #outputs 3

然后你可以做类似的事情

myfuns = [manyArgs1, manyArgs2]
for fun in myfuns:
  fun(args)

Adding to the other excellent posts.

Sometimes you don’t want to specify the number of arguments and want to use keys for them (the compiler will complain if one argument passed in a dictionary is not used in the method).

def manyArgs1(args):
  print args.a, args.b #note args.c is not used here

def manyArgs2(args):
  print args.c #note args.b and .c are not used here

class Args: pass

args = Args()
args.a = 1
args.b = 2
args.c = 3

manyArgs1(args) #outputs 1 2
manyArgs2(args) #outputs 3

Then you can do things like

myfuns = [manyArgs1, manyArgs2]
for fun in myfuns:
  fun(args)

回答 4

def f(dic):
    if 'a' in dic:
        print dic['a'],
        pass
    else: print 'None',

    if 'b' in dic:
        print dic['b'],
        pass
    else: print 'None',

    if 'c' in dic:
        print dic['c'],
        pass
    else: print 'None',
    print
    pass
f({})
f({'a':20,
   'c':30})
f({'a':20,
   'c':30,
   'b':'red'})
____________

上面的代码将输出

None None None
20 None 30
20 red 30

这就像通过字典传递变量参数一样好

def f(dic):
    if 'a' in dic:
        print dic['a'],
        pass
    else: print 'None',

    if 'b' in dic:
        print dic['b'],
        pass
    else: print 'None',

    if 'c' in dic:
        print dic['c'],
        pass
    else: print 'None',
    print
    pass
f({})
f({'a':20,
   'c':30})
f({'a':20,
   'c':30,
   'b':'red'})
____________

the above code will output

None None None
20 None 30
20 red 30

This is as good as passing variable arguments by means of a dictionary


回答 5

除了已经提到的好答案之外,另一种解决方法还取决于您可以按位置传递可选的命名参数这一事实。例如,

def f(x,y=None):
    print(x)
    if y is not None:
        print(y)

Yield

In [11]: f(1,2)
1
2

In [12]: f(1)
1

Another way to go about it, besides the nice answers already mentioned, depends upon the fact that you can pass optional named arguments by position. For example,

def f(x,y=None):
    print(x)
    if y is not None:
        print(y)

Yields

In [11]: f(1,2)
1
2

In [12]: f(1)
1

不带参数的Python argparse命令行标志

问题:不带参数的Python argparse命令行标志

如何在命令行参数中添加可选标志?

例如。所以我可以写

python myprog.py 

要么

python myprog.py -w

我试过了

parser.add_argument('-w')

但是我收到一条错误消息说

Usage [-w W]
error: argument -w: expected one argument

我认为这意味着它需要-w选项的参数值。接受旗帜的方式是什么?

我在这个问题上发现http://docs.python.org/library/argparse.html相当不透明。

How do I add an optional flag to my command line args?

eg. so I can write

python myprog.py 

or

python myprog.py -w

I tried

parser.add_argument('-w')

But I just get an error message saying

Usage [-w W]
error: argument -w: expected one argument

which I take it means that it wants an argument value for the -w option. What’s the way of just accepting a flag?

I’m finding http://docs.python.org/library/argparse.html rather opaque on this question.


回答 0

如您所愿,参数w在命令行中的-w之后需要一个值。如果您只是想通过设置变量True或来翻转开关False,请访问http://docs.python.org/dev/library/argparse.html#action(特别是store_true和store_false)

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-w', action='store_true')

其中action='store_true'暗示default=False

相反,您可能有action='store_false',这意味着default=True

As you have it, the argument w is expecting a value after -w on the command line. If you are just looking to flip a switch by setting a variable True or False, have a look at http://docs.python.org/dev/library/argparse.html#action (specifically store_true and store_false)

import argparse

parser = argparse.ArgumentParser()
parser.add_argument('-w', action='store_true')

where action='store_true' implies default=False.

Conversely, you could haveaction='store_false', which implies default=True.


回答 1

添加一个快速片段以使其可以执行:

资料来源:myparser.py

import argparse
parser = argparse.ArgumentParser(description="Flip a switch by setting a flag")
parser.add_argument('-w', action='store_true')

args = parser.parse_args()
print args.w

用法:

python myparser.py -w
>> True

Adding a quick snippet to have it ready to execute:

Source: myparser.py

import argparse
parser = argparse.ArgumentParser(description="Flip a switch by setting a flag")
parser.add_argument('-w', action='store_true')

args = parser.parse_args()
print args.w

Usage:

python myparser.py -w
>> True

回答 2

这是一种快速的方法,sys尽管功能有限,但除了.. 之外不需要任何其他功能:

flag = "--flag" in sys.argv[1:]

[1:] 如果完整的文件名是 --flag

Here’s a quick way to do it, won’t require anything besides sys.. though functionality is limited:

flag = "--flag" in sys.argv[1:]

[1:] is in case if the full file name is --flag


使用pip将Python软件包从本地文件系统文件夹安装到virtualenv

问题:使用pip将Python软件包从本地文件系统文件夹安装到virtualenv

是否可以使用本地文件系统中的pip安装软件包?

我已经python setup.py sdist为我的程序包运行了,该程序包已经创建了相应的tar.gz文件。该文件存储在我的系统上,位置为/srv/pkg/mypackage/mypackage-0.1.0.tar.gz

现在,在虚拟环境中,我想安装来自pypi或来自特定本地位置的软件包/srv/pkg

这可能吗?

PS 我知道我可以指定pip install /srv/pkg/mypackage/mypackage-0.1.0.tar.gz。可以,但是我正在谈论使用该/srv/pkg位置作为我输入时pip搜索的另一个位置pip install mypackage

Is it possible to install packages using pip from the local filesystem?

I have run python setup.py sdist for my package, which has created the appropriate tar.gz file. This file is stored on my system at /srv/pkg/mypackage/mypackage-0.1.0.tar.gz.

Now in a virtual environment I would like to install packages either coming from pypi or from the specific local location /srv/pkg.

Is this possible?

PS I know that I can specify pip install /srv/pkg/mypackage/mypackage-0.1.0.tar.gz. That will work, but I am talking about using the /srv/pkg location as another place for pip to search if I typed pip install mypackage.


回答 0

我很确定您正在寻找的东西称为--find-links选项。

虽然您可能需要index.html为本地软件包索引生成一个虚拟对象,该虚拟对象列出了所有软件包的链接。该工具有助于:

https://github.com/wolever/pip2pi

I am pretty sure that what you are looking for is called --find-links option.

Though you might need to generate a dummy index.html for your local package index which lists the links to all packages. This tool helps:

https://github.com/wolever/pip2pi


回答 1

关于什么::

pip install --help
...
  -e, --editable <path/url>   Install a project in editable mode (i.e. setuptools
                              "develop mode") from a local project path or a VCS url.

例如, pip install -e /srv/pkg

/ srv / pkg是可在其中找到“ setup.py”的顶级目录。

What about::

pip install --help
...
  -e, --editable <path/url>   Install a project in editable mode (i.e. setuptools
                              "develop mode") from a local project path or a VCS url.

eg, pip install -e /srv/pkg

where /srv/pkg is the top-level directory where ‘setup.py’ can be found.


回答 2

我正在安装,pyfuzzy但不在PyPI中;它返回消息:No matching distribution found for pyfuzzy

我尝试了接受的答案

pip install  --no-index --find-links=file:///Users/victor/Downloads/pyfuzzy-0.1.0 pyfuzzy

但它也不起作用,并返回以下错误:

忽略索引:https : //pypi.python.org/simple收集pyfuzzy找不到满足pyfuzzy要求的版本(来自版本:)找不到与pyfuzzy匹配的发行版

最后,我找到了一个简单的好方法:https : //pip.pypa.io/en/latest/reference/pip_install.html

Install a particular source archive file.
$ pip install ./downloads/SomePackage-1.0.4.tar.gz
$ pip install http://my.package.repo/SomePackage-1.0.4.zip

所以以下命令对我有用:

pip install ../pyfuzzy-0.1.0.tar.gz.

希望它能对您有所帮助。

I am installing pyfuzzybut is is not in PyPI; it returns the message: No matching distribution found for pyfuzzy.

I tried the accepted answer

pip install  --no-index --find-links=file:///Users/victor/Downloads/pyfuzzy-0.1.0 pyfuzzy

But it does not work either and returns the following error:

Ignoring indexes: https://pypi.python.org/simple Collecting pyfuzzy Could not find a version that satisfies the requirement pyfuzzy (from versions: ) No matching distribution found for pyfuzzy

At last , I have found a simple good way there: https://pip.pypa.io/en/latest/reference/pip_install.html

Install a particular source archive file.
$ pip install ./downloads/SomePackage-1.0.4.tar.gz
$ pip install http://my.package.repo/SomePackage-1.0.4.zip

So the following command worked for me:

pip install ../pyfuzzy-0.1.0.tar.gz.

Hope it can help you.


回答 3

这是我最终使用的解决方案:

import pip


def install(package):
    # Debugging
    # pip.main(["install", "--pre", "--upgrade", "--no-index",
    #         "--find-links=.", package, "--log-file", "log.txt", "-vv"])
    pip.main(["install", "--upgrade", "--no-index", "--find-links=.", package])


if __name__ == "__main__":
    install("mypackagename")
    raw_input("Press Enter to Exit...\n")

我从pip安装示例以及Rikard另一个问题的回答中总结了这一点。“ –pre”参数使您可以安装非生产版本。“ –no-index”参数避免搜索PyPI索引。“ –find-links =”。参数在本地文件夹中搜索(可以是相对的也可以是绝对的)。我使用了“ –log-file”,“ log.txt”和“ -vv”参数进行调试。“ –upgrade”参数使您可以在较旧的版本上安装较新的版本。

我还找到了卸载它们的好方法。当您有多个不同的Python环境时,这很有用。这是相同的基本格式,只是使用“卸载”而不是“安装”,并采取了安全措施来防止意外卸载:

import pip


def uninstall(package):
    response = raw_input("Uninstall '%s'? [y/n]:\n" % package)
    if "y" in response.lower():
        # Debugging
        # pip.main(["uninstall", package, "-vv"])
        pip.main(["uninstall", package])
    pass


if __name__ == "__main__":
    uninstall("mypackagename")
    raw_input("Press Enter to Exit...\n")

本地文件夹包含以下文件:install.py,uninstall.py,mypackagename-1.0.zip

This is the solution that I ended up using:

import pip


def install(package):
    # Debugging
    # pip.main(["install", "--pre", "--upgrade", "--no-index",
    #         "--find-links=.", package, "--log-file", "log.txt", "-vv"])
    pip.main(["install", "--upgrade", "--no-index", "--find-links=.", package])


if __name__ == "__main__":
    install("mypackagename")
    raw_input("Press Enter to Exit...\n")

I pieced this together from pip install examples as well as from Rikard’s answer on another question. The “–pre” argument lets you install non-production versions. The “–no-index” argument avoids searching the PyPI indexes. The “–find-links=.” argument searches in the local folder (this can be relative or absolute). I used the “–log-file”, “log.txt”, and “-vv” arguments for debugging. The “–upgrade” argument lets you install newer versions over older ones.

I also found a good way to uninstall them. This is useful when you have several different Python environments. It’s the same basic format, just using “uninstall” instead of “install”, with a safety measure to prevent unintended uninstalls:

import pip


def uninstall(package):
    response = raw_input("Uninstall '%s'? [y/n]:\n" % package)
    if "y" in response.lower():
        # Debugging
        # pip.main(["uninstall", package, "-vv"])
        pip.main(["uninstall", package])
    pass


if __name__ == "__main__":
    uninstall("mypackagename")
    raw_input("Press Enter to Exit...\n")

The local folder contains these files: install.py, uninstall.py, mypackagename-1.0.zip


回答 4

一个–find-links选项可以完成这项工作,并且可以通过requirements.txt文件运行!

您可以将软件包归档文件放在某个文件夹中,并在不更改需求文件的情况下使用最新的归档文件,例如requirements

.
└───requirements.txt
└───requirements
    ├───foo_bar-0.1.5-py2.py3-none-any.whl
    ├───foo_bar-0.1.6-py2.py3-none-any.whl
    ├───wiz_bang-0.7-py2.py3-none-any.whl
    ├───wiz_bang-0.8-py2.py3-none-any.whl
    ├───base.txt
    ├───local.txt
    └───production.txt

现在requirements/base.txt放入:

--find-links=requirements
foo_bar
wiz_bang>=0.8

一种更新专有软件包的好方法,只需将新软件包放入文件夹中

这样,您可以通过相同的一次调用从local folderAND 安装软件包pypipip install -r requirements/production.txt

PS。请参阅我的cookiecutter-djangopackage分支,以了解如何拆分需求并使用基于文件夹的需求组织。

An option –find-links does the job and it works from requirements.txt file!

You can put package archives in some folder and take the latest one without changing the requirements file, for example requirements:

.
└───requirements.txt
└───requirements
    ├───foo_bar-0.1.5-py2.py3-none-any.whl
    ├───foo_bar-0.1.6-py2.py3-none-any.whl
    ├───wiz_bang-0.7-py2.py3-none-any.whl
    ├───wiz_bang-0.8-py2.py3-none-any.whl
    ├───base.txt
    ├───local.txt
    └───production.txt

Now in requirements/base.txt put:

--find-links=requirements
foo_bar
wiz_bang>=0.8

A neat way to update proprietary packages, just drop new one in the folder

In this way you can install packages from local folder AND pypi with the same single call: pip install -r requirements/production.txt

PS. See my cookiecutter-djangopackage fork to see how to split requirements and use folder based requirements organization.


回答 5

安装程序包页面,您可以简单地运行:

点安装/ srv / pkg / mypackage

其中/ srv / pkg / mypackage是目录,包含setup.py


另外1,您可以从存档文件中安装它:

点安装./mypackage-1.0.4.tar.gz

1 尽管在问题中指出,但由于其受欢迎程度,它也包括在内。

From the installing-packages page you can simply run:

pip install /srv/pkg/mypackage

where /srv/pkg/mypackage is the directory, containing setup.py.


Additionally1, you can install it from the archive file:

pip install ./mypackage-1.0.4.tar.gz

1 Although noted in the question, due to its popularity, it is also included.


回答 6

假设您有virtualenv和一个requirements.txt文件,则可以在此文件中定义获取软件包的位置:

# Published pypi packages 
PyJWT==1.6.4
email_validator==1.0.3
# Remote GIT repo package, this will install as django-bootstrap-themes
git+https://github.com/marquicus/django-bootstrap-themes#egg=django-bootstrap-themes
# Local GIT repo package, this will install as django-knowledge
git+file:///soft/SANDBOX/python/django/forks/django-knowledge#egg=django-knowledge

Assuming you have virtualenv and a requirements.txt file, then you can define inside this file where to get the packages:

# Published pypi packages 
PyJWT==1.6.4
email_validator==1.0.3
# Remote GIT repo package, this will install as django-bootstrap-themes
git+https://github.com/marquicus/django-bootstrap-themes#egg=django-bootstrap-themes
# Local GIT repo package, this will install as django-knowledge
git+file:///soft/SANDBOX/python/django/forks/django-knowledge#egg=django-knowledge

回答 7

其要求requirements.txt,并egg_dir作为目录

您可以构建本地缓存:

$ pip download -r requirements.txt -d eggs_dir

然后,使用该“缓存”非常简单,例如:

$ pip install -r requirements.txt --find-links=eggs_dir

Having requirements in requirements.txt and egg_dir as a directory

you can build your local cache:

$ pip download -r requirements.txt -d eggs_dir

then, using that “cache” is simple like:

$ pip install -r requirements.txt --find-links=eggs_dir


回答 8

要仅从本地安装,您需要2个选项:

  • --find-links:在哪里寻找依赖项。不需要file://其他人提到的前缀。
  • --no-index:不要在pypi索引中查找缺少的依赖项(未安装依赖项,也不在--find-links路径中)。

因此,您可以从任何文件夹运行以下文件:

pip install --no-index --find-links /srv/pkg /path/to/mypackage-0.1.0.tar.gz

如果您的mypackage设置正确,它将列出其所有依赖关系,并且如果您使用pip download下载了一系列依赖关系(即,依赖关系等的依赖关系),那么一切都会正常。

如果您想使用pypi索引(如果可以访问),但是如果不使用本地索引,则可以删除--no-index并添加--retries 0。在尝试检查pypi是否缺少依赖项(未安装依赖项)时,您会看到pip暂停一会儿,当它发现无法达到它时,将回落到本地。似乎没有办法告诉pip“先查找本地索引,然后查找索引”。

To install only from local you need 2 options:

  • --find-links: where to look for dependencies. There is no need for the file:// prefix mentioned by others.
  • --no-index: do not look in pypi indexes for missing dependencies (dependencies not installed and not in the --find-links path).

So you could run from any folder the following:

pip install --no-index --find-links /srv/pkg /path/to/mypackage-0.1.0.tar.gz

If your mypackage is setup properly, it will list all its dependencies, and if you used pip download to download the cascade of dependencies (ie dependencies of depencies etc), everything will work.

If you want to use the pypi index if it is accessible, but fallback to local wheels if not, you can remove --no-index and add --retries 0. You will see pip pause for a bit while it is try to check pypi for a missing dependency (one not installed) and when it finds it cannot reach it, will fall back to local. There does not seem to be a way to tell pip to “look for local ones first, then the index”.


回答 9

我一直在尝试实现一些非常简单且失败的尝试,也许我很愚蠢。

无论如何,如果您有一个script / Dockerfile可以下载python软件包zip文件(例如从GitHub),然后要安装它,则可以使用file:///前缀来安装它,如以下示例所示:

$ wget https://example.com/mypackage.zip
$ echo "${MYPACKAGE_MD5}  mypackage.zip" | md5sum --check -
$ pip install file:///.mypackage.zip

注意:我知道您可以使用来立即安装软件包,pip install https://example.com/mypackage.zip但就我而言,我想验证校验和(永远不要偏执),并且在尝试使用pip提供的各种选项/ #md5片段时,我惨败。

直接使用进行如此简单的操作令人感到沮丧pip。我只是想通过一个校验和pip,并安装之前验证该zip匹配。

我可能做的很愚蠢,但最终我放弃了,选择了这样做。我希望它可以帮助其他尝试做类似事情的人。

I’ve been trying to achieve something really simple and failed miserably, probably I’m stupid.

Anyway, if you have a script/Dockerfile which download a python package zip file (e.g. from GitHub) and you then want to install it you can use the file:/// prefix to install it as shown in the following example:

$ wget https://example.com/mypackage.zip
$ echo "${MYPACKAGE_MD5}  mypackage.zip" | md5sum --check -
$ pip install file:///.mypackage.zip

NOTE: I know you could install the package straight away using pip install https://example.com/mypackage.zip but in my case I wanted to verify the checksum (never paranoid enough) and I failed miserably when trying to use the various options that pip provides/the #md5 fragment.

It’s been surprisingly frustrating to do something so simple directly with pip. I just wanted to pass a checksum and have pip verify that the zip was matching before installing it.

I was probably doing something very stupid but in the end I gave up and opted for this. I hope it helps others trying to do something similar.


numpy数组和矩阵有什么区别?我应该使用哪一个?

问题:numpy数组和矩阵有什么区别?我应该使用哪一个?

每种都有哪些优点和缺点?

从我所看到的情况来看,如果需要,任何一个都可以替代另一个,所以我应该同时使用这两个还是应该仅使用其中之一?

程序的样式会影响我的选择吗?我正在使用numpy进行一些机器学习,因此确实有很多矩阵,但也有很多向量(数组)。

What are the advantages and disadvantages of each?

From what I’ve seen, either one can work as a replacement for the other if need be, so should I bother using both or should I stick to just one of them?

Will the style of the program influence my choice? I am doing some machine learning using numpy, so there are indeed lots of matrices, but also lots of vectors (arrays).


回答 0

根据官方文件,不再建议使用矩阵类,因为将来会删除它。

https://numpy.org/doc/stable/reference/generation/numpy.matrix.html

正如其他答案所指出的那样,您可以使用NumPy数组实现所有操作。

As per the official documents, it’s not anymore advisable to use matrix class since it will be removed in the future.

https://numpy.org/doc/stable/reference/generated/numpy.matrix.html

As other answers already state that you can achieve all the operations with NumPy arrays.


回答 1

numpy的矩阵是严格2维的,而numpy的阵列(ndarrays)是N维的。矩阵对象是ndarray的子​​类,因此它们继承了ndarray的所有属性和方法。

numpy矩阵的主要优点是它们为矩阵乘法提供了一种方便的表示法:如果a和b是矩阵,则a*b它们是矩阵乘积。

import numpy as np

a = np.mat('4 3; 2 1')
b = np.mat('1 2; 3 4')
print(a)
# [[4 3]
#  [2 1]]
print(b)
# [[1 2]
#  [3 4]]
print(a*b)
# [[13 20]
#  [ 5  8]]

另一方面,从Python 3.5开始,NumPy使用@运算符支持中缀矩阵乘法,因此您可以在Python> = 3.5中使用ndarrays实现相同的矩阵乘法便捷性。

import numpy as np

a = np.array([[4, 3], [2, 1]])
b = np.array([[1, 2], [3, 4]])
print(a@b)
# [[13 20]
#  [ 5  8]]

矩阵对象和ndarray都.T必须返回转置,但是矩阵对象也必须具有.H共轭转置和.I逆。

相反,numpy数组始终遵守以元素为单位应用操作的规则(除了new @运算符)。因此,如果ab是numpy数组,则a*b该数组是通过按元素逐个乘以组成的:

c = np.array([[4, 3], [2, 1]])
d = np.array([[1, 2], [3, 4]])
print(c*d)
# [[4 6]
#  [6 4]]

要获得矩阵乘法的结果,请使用np.dot(或@在Python> = 3.5中,如上所示):

print(np.dot(c,d))
# [[13 20]
#  [ 5  8]]

**运营商还表现不同:

print(a**2)
# [[22 15]
#  [10  7]]
print(c**2)
# [[16  9]
#  [ 4  1]]

由于a是矩阵,所以a**2返回矩阵乘积a*a。由于c是ndarray,因此c**2返回一个ndarray,每个组件的元素均平方。

矩阵对象和ndarray之间还有其他技术差异(与np.ravel,项目选择和序列行为有关)。

numpy数组的主要优点是它们比二维矩阵更通用。当您需要3维数组时会发生什么?然后,您必须使用ndarray,而不是矩阵对象。因此,学习使用矩阵对象的工作量更大-您必须学习矩阵对象操作和ndarray操作。

编写一个将矩阵和数组混合在一起的程序会使您的生活变得困难,因为您必须跟踪变量是什么类型的对象,以免乘法返回您不期望的东西。

相反,如果仅使用ndarray,则可以执行矩阵对象可以执行的所有操作,以及更多操作,但功能/符号略有不同。

如果您愿意放弃NumPy矩阵产品表示法的视觉吸引力(使用python> = 3.5的ndarrays几乎可以优雅地实现),那么我认为NumPy数组绝对是可行的方法。

PS。当然,您实际上不必选择以牺牲另一个为代价,因为np.asmatrixnp.asarray允许您将一个转换为另一个(只要数组是二维的)。


还有就是与NumPy之间的差异大纲arraysVS NumPy的matrixES 这里

Numpy matrices are strictly 2-dimensional, while numpy arrays (ndarrays) are N-dimensional. Matrix objects are a subclass of ndarray, so they inherit all the attributes and methods of ndarrays.

The main advantage of numpy matrices is that they provide a convenient notation for matrix multiplication: if a and b are matrices, then a*b is their matrix product.

import numpy as np

a = np.mat('4 3; 2 1')
b = np.mat('1 2; 3 4')
print(a)
# [[4 3]
#  [2 1]]
print(b)
# [[1 2]
#  [3 4]]
print(a*b)
# [[13 20]
#  [ 5  8]]

On the other hand, as of Python 3.5, NumPy supports infix matrix multiplication using the @ operator, so you can achieve the same convenience of matrix multiplication with ndarrays in Python >= 3.5.

import numpy as np

a = np.array([[4, 3], [2, 1]])
b = np.array([[1, 2], [3, 4]])
print(a@b)
# [[13 20]
#  [ 5  8]]

Both matrix objects and ndarrays have .T to return the transpose, but matrix objects also have .H for the conjugate transpose, and .I for the inverse.

In contrast, numpy arrays consistently abide by the rule that operations are applied element-wise (except for the new @ operator). Thus, if a and b are numpy arrays, then a*b is the array formed by multiplying the components element-wise:

c = np.array([[4, 3], [2, 1]])
d = np.array([[1, 2], [3, 4]])
print(c*d)
# [[4 6]
#  [6 4]]

To obtain the result of matrix multiplication, you use np.dot (or @ in Python >= 3.5, as shown above):

print(np.dot(c,d))
# [[13 20]
#  [ 5  8]]

The ** operator also behaves differently:

print(a**2)
# [[22 15]
#  [10  7]]
print(c**2)
# [[16  9]
#  [ 4  1]]

Since a is a matrix, a**2 returns the matrix product a*a. Since c is an ndarray, c**2 returns an ndarray with each component squared element-wise.

There are other technical differences between matrix objects and ndarrays (having to do with np.ravel, item selection and sequence behavior).

The main advantage of numpy arrays is that they are more general than 2-dimensional matrices. What happens when you want a 3-dimensional array? Then you have to use an ndarray, not a matrix object. Thus, learning to use matrix objects is more work — you have to learn matrix object operations, and ndarray operations.

Writing a program that mixes both matrices and arrays makes your life difficult because you have to keep track of what type of object your variables are, lest multiplication return something you don’t expect.

In contrast, if you stick solely with ndarrays, then you can do everything matrix objects can do, and more, except with slightly different functions/notation.

If you are willing to give up the visual appeal of NumPy matrix product notation (which can be achieved almost as elegantly with ndarrays in Python >= 3.5), then I think NumPy arrays are definitely the way to go.

PS. Of course, you really don’t have to choose one at the expense of the other, since np.asmatrix and np.asarray allow you to convert one to the other (as long as the array is 2-dimensional).


There is a synopsis of the differences between NumPy arrays vs NumPy matrixes here.


回答 2

Scipy.org建议您使用数组:

*’array’或’matrix’?我应该使用哪个?-简短答案

使用数组。

  • 它们是numpy的标准向量/矩阵/张量类型。许多numpy函数返回数组,而不是矩阵。

  • 在逐元素运算和线性代数运算之间有明显的区别。

  • 如果愿意,可以有标准向量或行/列向量。

使用数组类型的唯一缺点是,您将不得不使用dot而不是*乘(减少)两个张量(标量积,矩阵向量乘法等)。

Scipy.org recommends that you use arrays:

*’array’ or ‘matrix’? Which should I use? – Short answer

Use arrays.

  • They are the standard vector/matrix/tensor type of numpy. Many numpy function return arrays, not matrices.

  • There is a clear distinction between element-wise operations and linear algebra operations.

  • You can have standard vectors or row/column vectors if you like.

The only disadvantage of using the array type is that you will have to use dot instead of * to multiply (reduce) two tensors (scalar product, matrix vector multiplication etc.).


回答 3

只是将一个案例添加到unutbu的列表中。

与numpy矩阵或矩阵语言(如matlab)相比,numpy ndarray对我而言最大的实际差异之一是,在归约运算中未保留维。矩阵始终为2d,而数组的均值则少一维。

例如,矩阵或数组的行为不佳的行:

带矩阵

>>> m = np.mat([[1,2],[2,3]])
>>> m
matrix([[1, 2],
        [2, 3]])
>>> mm = m.mean(1)
>>> mm
matrix([[ 1.5],
        [ 2.5]])
>>> mm.shape
(2, 1)
>>> m - mm
matrix([[-0.5,  0.5],
        [-0.5,  0.5]])

带阵列

>>> a = np.array([[1,2],[2,3]])
>>> a
array([[1, 2],
       [2, 3]])
>>> am = a.mean(1)
>>> am.shape
(2,)
>>> am
array([ 1.5,  2.5])
>>> a - am #wrong
array([[-0.5, -0.5],
       [ 0.5,  0.5]])
>>> a - am[:, np.newaxis]  #right
array([[-0.5,  0.5],
       [-0.5,  0.5]])

我还认为混合数组和矩阵会带来很多“快乐的”调试时间。但是,就乘法而言,scipy.sparse矩阵始终是矩阵。

Just to add one case to unutbu’s list.

One of the biggest practical differences for me of numpy ndarrays compared to numpy matrices or matrix languages like matlab, is that the dimension is not preserved in reduce operations. Matrices are always 2d, while the mean of an array, for example, has one dimension less.

For example demean rows of a matrix or array:

with matrix

>>> m = np.mat([[1,2],[2,3]])
>>> m
matrix([[1, 2],
        [2, 3]])
>>> mm = m.mean(1)
>>> mm
matrix([[ 1.5],
        [ 2.5]])
>>> mm.shape
(2, 1)
>>> m - mm
matrix([[-0.5,  0.5],
        [-0.5,  0.5]])

with array

>>> a = np.array([[1,2],[2,3]])
>>> a
array([[1, 2],
       [2, 3]])
>>> am = a.mean(1)
>>> am.shape
(2,)
>>> am
array([ 1.5,  2.5])
>>> a - am #wrong
array([[-0.5, -0.5],
       [ 0.5,  0.5]])
>>> a - am[:, np.newaxis]  #right
array([[-0.5,  0.5],
       [-0.5,  0.5]])

I also think that mixing arrays and matrices gives rise to many “happy” debugging hours. However, scipy.sparse matrices are always matrices in terms of operators like multiplication.


回答 4

正如其他人提到的那样,也许它的主要优点matrix是它为矩阵乘法提供了一种方便的符号。

但是,在Python 3.5中,终于有了一个专用的infix运算符用于矩阵乘法@

在最新的NumPy版本中,它可以与ndarrays 一起使用:

A = numpy.ones((1, 3))
B = numpy.ones((3, 3))
A @ B

因此,如今,如果有更多疑问,您应该坚持ndarray

As others have mentioned, perhaps the main advantage of matrix was that it provided a convenient notation for matrix multiplication.

However, in Python 3.5 there is finally a dedicated infix operator for matrix multiplication: @.

With recent NumPy versions, it can be used with ndarrays:

A = numpy.ones((1, 3))
B = numpy.ones((3, 3))
A @ B

So nowadays, even more, when in doubt, you should stick to ndarray.