标签归档:python-requests

如何使用请求下载图像

问题:如何使用请求下载图像

我正在尝试使用python的requests模块从网络下载并保存图像。

这是我使用的(工作)代码:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

这是使用requests以下代码的新代码(无效):

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

您能帮助我从响应中使用什么属性requests吗?

I’m trying to download and save an image from the web using python’s requests module.

Here is the (working) code I used:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

Here is the new (non-working) code using requests:

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

Can you help me on what attribute from the response to use from requests?


回答 0

您可以使用response.rawfile对象,也可以遍历响应。

response.raw默认情况下,使用类似文件的对象不会解码压缩的响应(使用GZIP或deflate)。您可以通过将decode_content属性设置为Truerequests将其设置False为控制自身解码)来强制为您解压缩。然后,您可以使用shutil.copyfileobj()Python将数据流式传输到文件对象:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)        

要遍历响应,请使用循环;这样迭代可确保在此阶段对数据进行解压缩:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

这将以128字节的块读取数据;如果您觉得其他块大小更好,请使用具有自定义块大小的Response.iter_content()方法

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

请注意,您需要以二进制模式打开目标文件,以确保python不会尝试为您翻译换行符。我们还设置stream=Truerequests不先将整个图像下载到内存中。

You can either use the response.raw file object, or iterate over the response.

To use the response.raw file-like object will not, by default, decode compressed responses (with GZIP or deflate). You can force it to decompress for you anyway by setting the decode_content attribute to True (requests sets it to False to control decoding itself). You can then use shutil.copyfileobj() to have Python stream the data to a file object:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)        

To iterate over the response use a loop; iterating like this ensures that data is decompressed by this stage:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

This’ll read the data in 128 byte chunks; if you feel another chunk size works better, use the Response.iter_content() method with a custom chunk size:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

Note that you need to open the destination file in binary mode to ensure python doesn’t try and translate newlines for you. We also set stream=True so that requests doesn’t download the whole image into memory first.


回答 1

从请求中获取类似文件的对象,然后将其复制到文件中。这也将避免将整个事件立即读入内存。

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

Get a file-like object from the request and copy it to a file. This will also avoid reading the whole thing into memory at once.

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

回答 2

怎么样,一个快速的解决方案。

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

How about this, a quick solution.

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

回答 3

我同样需要使用请求下载图像。我首先尝试了Martijn Pieters的答案,并且效果很好。但是,当我对该简单函数进行概要分析时,发现与urllib和urllib2相比,它使用了许多函数调用。

然后,我尝试了请求模块的作者推荐方式

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

这大大减少了函数调用的次数,从而加快了我的应用程序的速度。这是我的探查器的代码和结果。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)

    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

testRequest的结果:

343080 function calls (343068 primitive calls) in 2.580 seconds

以及testRequest2的结果:

3129 function calls (3105 primitive calls) in 0.024 seconds

I have the same need for downloading images using requests. I first tried the answer of Martijn Pieters, and it works well. But when I did a profile on this simple function, I found that it uses so many function calls compared to urllib and urllib2.

I then tried the way recommended by the author of requests module:

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

This much more reduced the number of function calls, thus speeded up my application. Here is the code of my profiler and the result.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)

    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

The result for testRequest:

343080 function calls (343068 primitive calls) in 2.580 seconds

And the result for testRequest2:

3129 function calls (3105 primitive calls) in 0.024 seconds

回答 4

这可能比使用容易requests。这是我唯一一次建议不要使用requestsHTTP的东西。

二班轮使用urllib

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

还有一个名为Python的漂亮模块wget,非常易于使用。在这里找到。

这证明了设计的简单性:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

请享用。

编辑:您还可以添加out参数以指定路径。

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

This might be easier than using requests. This is the only time I’ll ever suggest not using requests to do HTTP stuff.

Two liner using urllib:

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

There is also a nice Python module named wget that is pretty easy to use. Found here.

This demonstrates the simplicity of the design:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

Enjoy.

Edit: You can also add an out parameter to specify a path.

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

回答 5

以下代码段下载文件。

该文件以其文件名保存在指定的url中。

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

Following code snippet downloads a file.

The file is saved with its filename as in specified url.

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

回答 6

有两种主要方法:

  1. 使用.content(最简单/官方的)(请参见Zhenyi Zhang的答案):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
        with Image.open(f) as img:
            img.show()
  2. 使用.raw(请参阅Martijn Pieters的答案):

    import requests
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
        img.show()
    r.close()  # Safety when stream=True ensure the connection is released.

两者的时间都没有明显差异。

There are 2 main ways:

  1. Using .content (simplest/official) (see Zhenyi Zhang’s answer):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
        with Image.open(f) as img:
            img.show()
    
  2. Using .raw (see Martijn Pieters’s answer):

    import requests
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
        img.show()
    r.close()  # Safety when stream=True ensure the connection is released.
    

Timing both shows no noticeable difference.


回答 7

就像导入图像和请求一样容易

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

As easy as to import Image and requests

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

回答 8

这是一个更加用户友好的答案,仍然使用流式传输。

只需定义这些函数并调用即可getImage()。默认情况下,它将使用与url相同的文件名并写入当前目录,但是两者都可以更改。

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

request胆量getImage()是根据这里的答案而的胆量getImageFast()是根据以上答案。

Here is a more user-friendly answer that still uses streaming.

Just define these functions and call getImage(). It will use the same file name as the url and write to the current directory by default, but both can be changed.

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

The request guts of getImage() are based on the answer here and the guts of getImageFast() are based on the answer above.


回答 9

我将发布答案,因为我没有足够的代表发表评论,但是使用Blairg23发布的wget,您还可以为路径提供out参数。

 wget.download(url, out=path)

I’m going to post an answer as I don’t have enough rep to make a comment, but with wget as posted by Blairg23, you can also provide an out parameter for the path.

 wget.download(url, out=path)

回答 10

这是谷歌搜索有关如何下载带有请求的二进制文件的第一个响应。如果您需要下载包含请求的任意文件,可以使用:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

This is the first response that comes up for google searches on how to download a binary file with requests. In case you need to download an arbitrary file with requests, you can use:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

回答 11

这就是我做的

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

This is how I did it

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

回答 12

您可以执行以下操作:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

You can do something like this:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

Python请求抛出SSLError

问题:Python请求抛出SSLError

我正在研究一个简单的脚本,涉及CAS,jspring安全检查,重定向等。我想使用Kenneth Reitz的python请求,因为这是一项很棒的工作!但是,CAS需要通过SSL进行验证,因此我必须首先通过该步骤。我不知道想要什么Python请求吗?该SSL证书应该存放在哪里?

Traceback (most recent call last):
  File "./test.py", line 24, in <module>
  response = requests.get(url1, headers=headers)
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 52, in get
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 40, in request
  File "build/bdist.linux-x86_64/egg/requests/sessions.py", line 209, in request 
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 624, in send
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 300, in _build_response
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 611, in send
requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

I’m working on a simple script that involves CAS, jspring security check, redirection, etc. I would like to use Kenneth Reitz’s python requests because it’s a great piece of work! However, CAS requires getting validated via SSL so I have to get past that step first. I don’t know what Python requests is wanting? Where is this SSL certificate supposed to reside?

Traceback (most recent call last):
  File "./test.py", line 24, in <module>
  response = requests.get(url1, headers=headers)
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 52, in get
  File "build/bdist.linux-x86_64/egg/requests/api.py", line 40, in request
  File "build/bdist.linux-x86_64/egg/requests/sessions.py", line 209, in request 
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 624, in send
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 300, in _build_response
  File "build/bdist.linux-x86_64/egg/requests/models.py", line 611, in send
requests.exceptions.SSLError: [Errno 1] _ssl.c:503: error:14090086:SSL routines:SSL3_GET_SERVER_CERTIFICATE:certificate verify failed

回答 0

您遇到的问题是由不受信任的SSL证书引起的。

就像之前评论中提到的@dirk一样,最快的解决方法是设置verify=False

requests.get('https://example.com', verify=False)

请注意,这将导致证书不被验证。这会使您的应用程序面临安全风险,例如中间人攻击。

当然要运用判断力。正如在评论中提到的,这可能是为快速/一次性应用程序/脚本可以接受的,但真的不应该去制作软件

如果在特定情况下仅跳过证书检查是不可接受的,请考虑以下选项,最好的选择是将verify参数设置为字符串,该字符串是.pem证书文件的路径(应通过某种安全方法获取该字符串)手段)。

因此,从2.0版开始,该verify参数接受以下值及其各自的语义:

  • True:使证书根据库自身的可信证书颁发机构进行验证(注意:您可以通过Certifi库查看哪些根证书请求使用,Certifi库是从Requests:Certifi-Human Trust Database中提取的RC的信任数据库)。
  • False完全绕过证书验证。
  • CA_BUNDLE文件的路径,供请求用于验证证书。

来源:请求-SSL证书验证

还要看一下cert同一链接上的参数。

The problem you are having is caused by an untrusted SSL certificate.

Like @dirk mentioned in a previous comment, the quickest fix is setting verify=False:

requests.get('https://example.com', verify=False)

Please note that this will cause the certificate not to be verified. This will expose your application to security risks, such as man-in-the-middle attacks.

Of course, apply judgment. As mentioned in the comments, this may be acceptable for quick/throwaway applications/scripts, but really should not go to production software.

If just skipping the certificate check is not acceptable in your particular context, consider the following options, your best option is to set the verify parameter to a string that is the path of the .pem file of the certificate (which you should obtain by some sort of secure means).

So, as of version 2.0, the verify parameter accepts the following values, with their respective semantics:

  • True: causes the certificate to validated against the library’s own trusted certificate authorities (Note: you can see which Root Certificates Requests uses via the Certifi library, a trust database of RCs extracted from Requests: Certifi – Trust Database for Humans).
  • False: bypasses certificate validation completely.
  • Path to a CA_BUNDLE file for Requests to use to validate the certificates.

Source: Requests – SSL Cert Verification

Also take a look at the cert parameter on the same link.


回答 1

关于SSL验证的请求文档中

就像网络浏览器一样,请求可以验证HTTPS请求的SSL证书。要检查主机的SSL证书,可以使用verify参数:

>>> requests.get('https://kennethreitz.com', verify=True)

如果您不想验证自己的SSL证书,请输入 verify=False

From requests documentation on SSL verification:

Requests can verify SSL certificates for HTTPS requests, just like a web browser. To check a host’s SSL certificate, you can use the verify argument:

>>> requests.get('https://kennethreitz.com', verify=True)

If you don’t want to verify your SSL certificate, make verify=False


回答 2

要使用的CA文件名可以通过以下方式传递verify

cafile = 'cacert.pem' # http://curl.haxx.se/ca/cacert.pem
r = requests.get(url, verify=cafile)

如果使用,verify=Truerequests使用它自己的CA集,该CA集可能没有用于签署服务器证书的CA。

The name of CA file to use you could pass via verify:

cafile = 'cacert.pem' # http://curl.haxx.se/ca/cacert.pem
r = requests.get(url, verify=cafile)

If you use verify=True then requests uses its own CA set that might not have CA that signed your server certificate.


回答 3

$ pip install -U requests[security]

  • 已在Python 2.7.6 @ Ubuntu 14.04.4 LTS上测试
  • 在Python 2.7.5 @ MacOSX 10.9.5(Mavericks)上测试

打开此问题时(2012-05),请求版本为0.13.1。在版本2.4.1(2014-09)上,引入了“安全”附加功能,并使用certifi软件包(如果有)。

目前(2016-09)主版本为2.11.1,如果没有 ,则可以正常使用verify=False。无需使用requests.get(url, verify=False),如果已安装requests[security]其他功能。

$ pip install -U requests[security]

  • Tested on Python 2.7.6 @ Ubuntu 14.04.4 LTS
  • Tested on Python 2.7.5 @ MacOSX 10.9.5 (Mavericks)

When this question was opened (2012-05) the Requests version was 0.13.1. On version 2.4.1 (2014-09) the “security” extras were introduced, using certifi package if available.

Right now (2016-09) the main version is 2.11.1, that works good without verify=False. No need to use requests.get(url, verify=False), if installed with requests[security] extras.


回答 4

使用aws boto3时遇到相同的问题,并且ssl证书验证失败的问题,通过查看boto3代码,我发现REQUESTS_CA_BUNDLE未设置,因此我通过手动设置解决了这两个问题:

from boto3.session import Session
import os

# debian
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(
    '/etc/ssl/certs/',
    'ca-certificates.crt')
# centos
#   'ca-bundle.crt')

对于aws-cli,我想将REQUESTS_CA_BUNDLE设置为~/.bashrc可以解决此问题(未经测试,因为我的aws-cli没有它就可以工作)。

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt # ca-bundle.crt
export REQUESTS_CA_BUNDLE

I encountered the same issue and ssl certificate verify failed issue when using aws boto3, by review boto3 code, I found the REQUESTS_CA_BUNDLE is not set, so I fixed the both issue by setting it manually:

from boto3.session import Session
import os

# debian
os.environ['REQUESTS_CA_BUNDLE'] = os.path.join(
    '/etc/ssl/certs/',
    'ca-certificates.crt')
# centos
#   'ca-bundle.crt')

For aws-cli, I guess setting REQUESTS_CA_BUNDLE in ~/.bashrc will fix this issue (not tested because my aws-cli works without it).

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-certificates.crt # ca-bundle.crt
export REQUESTS_CA_BUNDLE

回答 5

如果您有一个依赖的库requests并且不能修改验证路径(如pyvmomi),则必须找到cacert.pem与请求捆绑在一起的文件,然后在其中附加您的CA。这是找到cacert.pem位置的通用方法:

视窗

C:\>python -c "import requests; print requests.certs.where()"
c:\Python27\lib\site-packages\requests-2.8.1-py2.7.egg\requests\cacert.pem

linux

#  (py2.7.5,requests 2.7.0, verify not enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/lib/python2.7/dist-packages/certifi/cacert.pem

#  (py2.7.10, verify enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/local/lib/python2.7/dist-packages/requests/cacert.pem

顺便说一句 @ requests-devs,将自己的cacerts与请求捆绑在一起确实非常烦人……尤其是您似乎没有先使用ca ca系统存储这一事实,并且在任何地方都没有记录。

更新

在使用库且无法控制ca-bundle位置的情况下,还可以将ca-bundle位置显式设置为主机范围的ca-bundle:

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-bundle.crt python -c "import requests; requests.get('https://somesite.com';)"

In case you have a library that relies on requests and you cannot modify the verify path (like with pyvmomi) then you’ll have to find the cacert.pem bundled with requests and append your CA there. Here’s a generic approach to find the cacert.pem location:

windows

C:\>python -c "import requests; print requests.certs.where()"
c:\Python27\lib\site-packages\requests-2.8.1-py2.7.egg\requests\cacert.pem

linux

#  (py2.7.5,requests 2.7.0, verify not enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/lib/python2.7/dist-packages/certifi/cacert.pem

#  (py2.7.10, verify enforced)
root@host:~/# python -c "import requests; print requests.certs.where()"
/usr/local/lib/python2.7/dist-packages/requests/cacert.pem

btw. @requests-devs, bundling your own cacerts with request is really, really annoying… especially the fact that you do not seem to use the system ca store first and this is not documented anywhere.

update

in situations, where you’re using a library and have no control over the ca-bundle location you could also explicitly set the ca-bundle location to be your host-wide ca-bundle:

REQUESTS_CA_BUNDLE=/etc/ssl/certs/ca-bundle.crt python -c "import requests; requests.get('https://somesite.com';)"

回答 6

使用gspread会遇到相同的问题,这些命令对我有用:

sudo pip uninstall -y certifi
sudo pip install certifi==2015.04.28

I face the same problem using gspread and these commands works for me:

sudo pip uninstall -y certifi
sudo pip install certifi==2015.04.28

回答 7

如果要删除警告,请使用以下代码。

import urllib3

urllib3.disable_warnings()

以及verify=Falsewith request.getpostmethod

If you want to remove the warnings, use the code below.

import urllib3

urllib3.disable_warnings()

and verify=False with request.get or post method


回答 8

我找到了解决类似问题的特定方法。这个想法是指向存储在系统上的cacert文件,并由另一个基于ssl的应用程序使用。

在Debian中(我不确定其他发行版中是否相同),证书文件(.pem)存储在/etc/ssl/certs/So,这是对我有用的代码:

import requests
verify='/etc/ssl/certs/cacert.org.pem'
response = requests.get('https://lists.cacert.org', verify=verify)

为了猜测pem选择哪个文件,我浏览了该URL,然后检查哪个证书颁发机构(CA)生成了证书。

编辑:如果您不能编辑代码(因为正在运行第三个应用程序),则可以尝试将pem证书直接添加到其中/usr/local/lib/python2.7/dist-packages/requests/cacert.pem(例如,将证书复制到文件末尾)。

I have found an specific approach for solving a similar issue. The idea is pointing the cacert file stored at the system and used by another ssl based applications.

In Debian (I’m not sure if same in other distributions) the certificate files (.pem) are stored at /etc/ssl/certs/ So, this is the code that work for me:

import requests
verify='/etc/ssl/certs/cacert.org.pem'
response = requests.get('https://lists.cacert.org', verify=verify)

For guessing what pem file choose, I have browse to the url and check which Certificate Authority (CA) has generated the certificate.

EDIT: if you cannot edit the code (because you are running a third app) you can try to add the pem certificate directly into /usr/local/lib/python2.7/dist-packages/requests/cacert.pem (e.g. copying it to the end of the file).


回答 9

如果您不关心证书,请使用verify=False

import requests

url = "Write your url here"

returnResponse = requests.get(url, verify=False)

If you don’t bother about certificate just use verify=False.

import requests

url = "Write your url here"

returnResponse = requests.get(url, verify=False)

回答 10

经过数小时的调试,我只能使用以下软件包来使其工作:

requests[security]==2.7.0  # not 2.18.1
cryptography==1.9  # not 2.0

使用 OpenSSL 1.0.2g 1 Mar 2016

没有这些软件包将verify=False无法正常工作。

我希望这可以帮助别人。

After hours of debugging I could only get this to work using the following packages:

requests[security]==2.7.0  # not 2.18.1
cryptography==1.9  # not 2.0

using OpenSSL 1.0.2g 1 Mar 2016

Without these packages verify=False was not working.

I hope this helps someone.


回答 11

我遇到了同样的问题。原来我没有在服务器上安装中间证书(只需将其附加到证书的底部,如下所示)。

https://www.digicert.com/ssl-support/pem-ssl-creation.htm

确保已安装ca-certificates软件包:

sudo apt-get install ca-certificates

更新时间也可以解决此问题:

sudo apt-get install ntpdate
sudo ntpdate -u ntp.ubuntu.com

如果您使用的是自签名证书,则可能必须手动将其添加到系统中。

I ran into the same issue. Turns out I hadn’t installed the intermediate certificate on my server (just append it to the bottom of your certificate as seen below).

https://www.digicert.com/ssl-support/pem-ssl-creation.htm

Make sure you have the ca-certificates package installed:

sudo apt-get install ca-certificates

Updating the time may also resolve this:

sudo apt-get install ntpdate
sudo ntpdate -u ntp.ubuntu.com

If you’re using a self-signed certificate, you’ll probably have to add it to your system manually.


回答 12

如果请求调用被埋在代码的深处,并且您不想安装服务器证书,则仅出于调试目的,可以对请求进行monkeypatch:

import requests.api
import warnings


def requestspatch(method, url, **kwargs):
    kwargs['verify'] = False
    return _origcall(method, url, **kwargs)

_origcall = requests.api.request
requests.api.request = requestspatch
warnings.warn('Patched requests: SSL verification disabled!')

切勿在生产中使用!

If the request calls are buried somewhere deep in the code and you do not want to install the server certificate, then, just for debug purposes only, it’s possible to monkeypatch requests:

import requests.api
import warnings


def requestspatch(method, url, **kwargs):
    kwargs['verify'] = False
    return _origcall(method, url, **kwargs)

_origcall = requests.api.request
requests.api.request = requestspatch
warnings.warn('Patched requests: SSL verification disabled!')

Never use in production!


回答 13

我想参加聚会太晚了,但我想为像我这样的流浪者粘贴修复程序!所以以下内容在Python 3.7.x上为我解决了

在终端中输入以下内容

pip install --upgrade certifi      # hold your breath..

尝试再次运行您的脚本/请求,看看它是否有效(我确定它不会被修复!)。如果不起作用,请尝试直接在终端中运行以下命令

open /Applications/Python\ 3.6/Install\ Certificates.command  # please replace 3.6 here with your suitable python version

Too late to the party I guess but I wanted to paste the fix for fellow wanderers like myself! So the following worked out for me on Python 3.7.x

Type the following in your terminal

pip install --upgrade certifi      # hold your breath..

Try running your script/requests again and see if it works (I’m sure it won’t be fixed yet!). If it didn’t work then try running the following command in the terminal directly

open /Applications/Python\ 3.6/Install\ Certificates.command  # please replace 3.6 here with your suitable python version

回答 14

我为HOURS争取了这个问题。

我试图更新请求。然后,我更新了证书。我指出了对certifi.where()的验证(无论如何,代码默认情况下都会这样做)。没事。

最后,我将python版本更新为python 2.7.11。我使用的是Python 2.7.5,它与验证证书的方式有些不兼容。一旦我更新了Python(以及其他一些依赖项),它便开始工作。

I fought this problem for HOURS.

I tried to update requests. Then I updated certifi. I pointed verify to certifi.where() (The code does this by default anyways). Nothing worked.

Finally I updated my version of python to python 2.7.11. I was on Python 2.7.5 which had some incompatibilities with the way that the certificates are verified. Once I updated Python (and a handful of other dependencies) it started working.


回答 15

这类似于@ rafael-almeida的答案,但我想指出,从请求2.11+开始,没有3个值verify可以使用,实际上有4个:

  • True:根据请求的内部可信CA进行验证。
  • False完全绕过证书验证。(不建议)
  • CA_BUNDLE文件的路径。请求将使用它来验证服务器的证书。
  • 包含公共证书文件的目录的路径。请求将使用它来验证服务器的证书。

我剩下的答案是关于#4,如何使用包含证书的目录进行验证:

获取所需的公共证书并将其放置在目录中。

严格来说,您可能“应该”使用带外方法来获取证书,但是您也可以仅使用任何浏览器下载它们。

如果服务器使用证书链,请确保获取链中的每个证书。

根据请求文档,必须首先使用“ rehash”实用程序(openssl rehash)处理包含证书的目录。

(这需要openssl 1.1.1+,并且并非所有Windows openssl实施都支持rehash。如果openssl rehash不适合您,则可以尝试在https://github.com/ruby/openssl/blob/master上运行rehash ruby​​脚本。/sample/c_rehash.rb,尽管我还没有尝试过。

我在获取要求识别我的证书的请求时遇到了一些麻烦,但是在使用openssl x509 -outform PEM命令将证书转换为Base64 .pem格式后,一切工作正常。

您也可以只进行懒散的重新哈希处理:

try:
    # As long as the certificates in the certs directory are in the OS's certificate store, `verify=True` is fine.
    return requests.get(url, auth=auth, verify=True)
except requests.exceptions.SSLError:
    subprocess.run(f"openssl rehash -compat -v my_certs_dir", shell=True, check=True)
    return requests.get(url, auth=auth, verify="my_certs_dir")

This is similar to @rafael-almeida ‘s answer, but I want to point out that as of requests 2.11+, there are not 3 values that verify can take, there are actually 4:

  • True: validates against requests’s internal trusted CAs.
  • False: bypasses certificate validation completely. (Not recommended)
  • Path to a CA_BUNDLE file. requests will use this to validate the server’s certificates.
  • Path to a directory containing public certificate files. requests will use this to validate the server’s certificates.

The rest of my answer is about #4, how to use a directory containing certificates to validate:

Obtain the public certificates needed and place them in a directory.

Strictly speaking, you probably “should” use an out-of-band method of obtaining the certificates, but you could also just download them using any browser.

If the server uses a certificate chain, be sure to obtain every single certificate in the chain.

According to the requests documentation, the directory containing the certificates must first be processed with the “rehash” utility (openssl rehash).

(This requires openssl 1.1.1+, and not all Windows openssl implementations support rehash. If openssl rehash won’t work for you, you could try running the rehash ruby script at https://github.com/ruby/openssl/blob/master/sample/c_rehash.rb , though I haven’t tried this. )

I had some trouble with getting requests to recognize my certificates, but after I used the openssl x509 -outform PEM command to convert the certs to Base64 .pem format, everything worked perfectly.

You can also just do lazy rehashing:

try:
    # As long as the certificates in the certs directory are in the OS's certificate store, `verify=True` is fine.
    return requests.get(url, auth=auth, verify=True)
except requests.exceptions.SSLError:
    subprocess.run(f"openssl rehash -compat -v my_certs_dir", shell=True, check=True)
    return requests.get(url, auth=auth, verify="my_certs_dir")

回答 16

目前,请求模块中存在一个导致此错误的问题,存在于v2.6.2至v2.12.4(ATOW)中:https : //github.com/kennethreitz/requests/issues/2573

解决此问题的方法是添加以下行: requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS'

There is currently an issue in the requests module causing this error, present in v2.6.2 to v2.12.4 (ATOW): https://github.com/kennethreitz/requests/issues/2573

Workaround for this issue is adding the following line: requests.packages.urllib3.util.ssl_.DEFAULT_CIPHERS = 'ECDH+AESGCM:DH+AESGCM:ECDH+AES256:DH+AES256:ECDH+AES128:DH+AES:ECDH+3DES:DH+3DES:RSA+AESGCM:RSA+AES:RSA+3DES:!aNULL:!MD5:!DSS'


回答 17

如@Rafael Almeida所述,您遇到的问题是由不受信任的SSL证书引起的。就我而言,我的服务器不信任SSL证书。为了解决此问题而不损害安全性,我下载了证书并将其安装在服务器上(只需双击.crt文件,然后安装证书…)。

As mentioned by @Rafael Almeida, the problem you are having is caused by an untrusted SSL certificate. In my case, the SSL certificate was untrusted by my server. To get around this without compromising security, I downloaded the certificate, and installed it on the server (by simply double clicking on the .crt file and then Install Certificate…).


回答 18

如果正在从另一个包中调用请求,则添加选项是不可行的。在那种情况下,将证书添加到cacert捆绑包是直接的方法,例如,我必须添加“ StartCom Class 1 Primary Intermediate Server CA”,为此,我将根证书下载到StartComClass1.pem中。鉴于我的virtualenv名为caldav,我添加了以下证书:

cat StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/pip/_vendor/requests/cacert.pem
cat temp/StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/requests/cacert.pem

其中之一可能就足够了,我没有检查

It is not feasible to add options if requests is being called from another package. In that case adding certificates to the cacert bundle is the straight path, e.g. I had to add “StartCom Class 1 Primary Intermediate Server CA”, for which I downloaded the root cert into StartComClass1.pem. given my virtualenv is named caldav, I added the certificate with:

cat StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/pip/_vendor/requests/cacert.pem
cat temp/StartComClass1.pem >> .virtualenvs/caldav/lib/python2.7/site-packages/requests/cacert.pem

one of those might be enough, I did not check


回答 19

我遇到了相似或相同的认证验证问题。我读到的OpenSSL版本低于1.0.2,该请求有时取决于验证强证书的困难(请参阅此处)。CentOS 7似乎使用了1.0.1e,这似乎有问题。

我不确定如何在CentOS上解决此问题,因此我决定允许使用较弱的1024位CA证书。

import certifi # This should be already installed as a dependency of 'requests'
requests.get("https://example.com", verify=certifi.old_where())

I was having a similar or the same certification validation problem. I read that OpenSSL versions less than 1.0.2, which requests depends upon sometimes have trouble validating strong certificates (see here). CentOS 7 seems to use 1.0.1e which seems to have the problem.

I wasn’t sure how to get around this problem on CentOS, so I decided to allow weaker 1024bit CA certificates.

import certifi # This should be already installed as a dependency of 'requests'
requests.get("https://example.com", verify=certifi.old_where())

回答 20

我必须从Python 3.4.0升级到3.4.6

pyenv virtualenv 3.4.6 myvenv
pyenv activate myvenv
pip install -r requirements.txt

I had to upgrade from Python 3.4.0 to 3.4.6

pyenv virtualenv 3.4.6 myvenv
pyenv activate myvenv
pip install -r requirements.txt

回答 21

就我而言,原因是无关紧要的。

我知道SSL验证已经进行了几天,实际上是在另一台机器上工作。

我的下一步是比较正在验证的计算机和未进行验证的计算机之间的证书内容和大小。

这很快导致我确定“工作不正确”的计算机上的证书不好,一旦我将其替换为“好”证书,一切就很好了。

In my case the reason was fairly trivial.

I had known that the SSL verification had worked until a few days earlier, and was infact working on a different machine.

My next step was to compare the certificate contents and size between the machine on which verification was working, and the one on which it was not.

This quickly led to me determining that the Certificate on the ‘incorrectly’ working machine was not good, and once I replaced it with the ‘good’ cert, everything was fine.


使用Python请求发布JSON

问题:使用Python请求发布JSON

我需要将JSON从客户端发布到服务器。我正在使用Python 2.7.1和simplejson。客户端正在使用请求。服务器是CherryPy。我可以从服务器获取硬编码的JSON(代码未显示),但是当我尝试将JSON POST到服务器时,会收到“ 400 Bad Request”。

这是我的客户代码:

data = {'sender':   'Alice',
    'receiver': 'Bob',
    'message':  'We did it!'}
data_json = simplejson.dumps(data)
payload = {'json_payload': data_json}
r = requests.post("http://localhost:8080", data=payload)

这是服务器代码。

class Root(object):

    def __init__(self, content):
        self.content = content
        print self.content  # this works

    exposed = True

    def GET(self):
        cherrypy.response.headers['Content-Type'] = 'application/json'
        return simplejson.dumps(self.content)

    def POST(self):
        self.content = simplejson.loads(cherrypy.request.body.read())

有任何想法吗?

I need to POST a JSON from a client to a server. I’m using Python 2.7.1 and simplejson. The client is using Requests. The server is CherryPy. I can GET a hard-coded JSON from the server (code not shown), but when I try to POST a JSON to the server, I get “400 Bad Request”.

Here is my client code:

data = {'sender':   'Alice',
    'receiver': 'Bob',
    'message':  'We did it!'}
data_json = simplejson.dumps(data)
payload = {'json_payload': data_json}
r = requests.post("http://localhost:8080", data=payload)

Here is the server code.

class Root(object):

    def __init__(self, content):
        self.content = content
        print self.content  # this works

    exposed = True

    def GET(self):
        cherrypy.response.headers['Content-Type'] = 'application/json'
        return simplejson.dumps(self.content)

    def POST(self):
        self.content = simplejson.loads(cherrypy.request.body.read())

Any ideas?


回答 0

从Requests 2.4.2及更高版本开始,您可以在调用中使用’json’参数,从而使其更简单。

>>> import requests
>>> r = requests.post('http://httpbin.org/post', json={"key": "value"})
>>> r.status_code
200
>>> r.json()
{'args': {},
 'data': '{"key": "value"}',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate',
             'Connection': 'close',
             'Content-Length': '16',
             'Content-Type': 'application/json',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.4.3 CPython/3.4.0',
             'X-Request-Id': 'xx-xx-xx'},
 'json': {'key': 'value'},
 'origin': 'x.x.x.x',
 'url': 'http://httpbin.org/post'}

编辑:此功能已添加到官方文档中。您可以在这里查看:请求文档

As of Requests version 2.4.2 and onwards, you can alternatively use ‘json’ parameter in the call which makes it simpler.

>>> import requests
>>> r = requests.post('http://httpbin.org/post', json={"key": "value"})
>>> r.status_code
200
>>> r.json()
{'args': {},
 'data': '{"key": "value"}',
 'files': {},
 'form': {},
 'headers': {'Accept': '*/*',
             'Accept-Encoding': 'gzip, deflate',
             'Connection': 'close',
             'Content-Length': '16',
             'Content-Type': 'application/json',
             'Host': 'httpbin.org',
             'User-Agent': 'python-requests/2.4.3 CPython/3.4.0',
             'X-Request-Id': 'xx-xx-xx'},
 'json': {'key': 'value'},
 'origin': 'x.x.x.x',
 'url': 'http://httpbin.org/post'}

EDIT: This feature has been added to the official documentation. You can view it here: Requests documentation


回答 1

原来我缺少标题信息。以下作品:

url = "http://localhost:8080"
data = {'sender': 'Alice', 'receiver': 'Bob', 'message': 'We did it!'}
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
r = requests.post(url, data=json.dumps(data), headers=headers)

It turns out I was missing the header information. The following works:

url = "http://localhost:8080"
data = {'sender': 'Alice', 'receiver': 'Bob', 'message': 'We did it!'}
headers = {'Content-type': 'application/json', 'Accept': 'text/plain'}
r = requests.post(url, data=json.dumps(data), headers=headers)

回答 2

从请求2.4.2(https://pypi.python.org/pypi/requests)开始,支持“ json”参数。无需指定“ Content-Type”。因此,较短的版本:

requests.post('http://httpbin.org/post', json={'test': 'cheers'})

From requests 2.4.2 (https://pypi.python.org/pypi/requests), the “json” parameter is supported. No need to specify “Content-Type”. So the shorter version:

requests.post('http://httpbin.org/post', json={'test': 'cheers'})

回答 3

更好的方法是:

url = "http://xxx.xxxx.xx"

datas = {"cardno":"6248889874650987","systemIdentify":"s08","sourceChannel": 12}

headers = {'Content-type': 'application/json'}

rsp = requests.post(url, json=datas, headers=headers)

The better way is:

url = "http://xxx.xxxx.xx"

datas = {"cardno":"6248889874650987","systemIdentify":"s08","sourceChannel": 12}

headers = {'Content-type': 'application/json'}

rsp = requests.post(url, json=datas, headers=headers)

回答 4

与python 3.5+完美搭配

客户:

import requests
data = {'sender':   'Alice',
    'receiver': 'Bob',
    'message':  'We did it!'}
r = requests.post("http://localhost:8080", json={'json_payload': data})

服务器:

class Root(object):

    def __init__(self, content):
        self.content = content
        print self.content  # this works

    exposed = True

    def GET(self):
        cherrypy.response.headers['Content-Type'] = 'application/json'
        return simplejson.dumps(self.content)

    @cherrypy.tools.json_in()
    @cherrypy.tools.json_out()
    def POST(self):
        self.content = cherrypy.request.json
        return {'status': 'success', 'message': 'updated'}

Works perfectly with python 3.5+

client:

import requests
data = {'sender':   'Alice',
    'receiver': 'Bob',
    'message':  'We did it!'}
r = requests.post("http://localhost:8080", json={'json_payload': data})

server:

class Root(object):

    def __init__(self, content):
        self.content = content
        print self.content  # this works

    exposed = True

    def GET(self):
        cherrypy.response.headers['Content-Type'] = 'application/json'
        return simplejson.dumps(self.content)

    @cherrypy.tools.json_in()
    @cherrypy.tools.json_out()
    def POST(self):
        self.content = cherrypy.request.json
        return {'status': 'success', 'message': 'updated'}

回答 5

应该使用(data / json / files)之间的哪个参数,它实际上取决于名为ContentType的请求标头(通常通过浏览器的开发人员工具进行检查),

当Content-Type为application / x-www-form-urlencoded时,代码应为:

requests.post(url, data=jsonObj)

当Content-Type为application / json时,您的代码应为以下之一:

requests.post(url, json=jsonObj)
requests.post(url, data=jsonstr, headers={"Content-Type":"application/json"})

当Content-Type为multipart / form-data时,它用于上传文件,因此您的代码应为:

requests.post(url, files=xxxx)

Which parameter between (data / json / files) should be used,it’s actually depends on a request header named ContentType(usually check this through developer tools of your browser),

when the Content-Type is application/x-www-form-urlencoded, code should be:

requests.post(url, data=jsonObj)

when the Content-Type is application/json, your code is supposed to be one of below:

requests.post(url, json=jsonObj)
requests.post(url, data=jsonstr, headers={"Content-Type":"application/json"})

when the Content-Type is multipart/form-data, it’s used to upload files, so your code should be:

requests.post(url, files=xxxx)

ImportError:没有名为请求的模块

问题:ImportError:没有名为请求的模块

每当我尝试导入时requests,都会出现错误提示No module Named requests

import requests

我得到的错误:

File "ex2.py", line 1, in <module>
    import requests
ImportError: No module named requests

Whenever I try to import requests, I get an error saying No module Named requests.

import requests

The error I get:

File "ex2.py", line 1, in <module>
    import requests
ImportError: No module named requests

回答 0

Requests不是内置模块(默认的python安装不附带),因此您必须安装它:

OSX / Linux

如果已安装,请使用$ sudo pip install requests(或pip3 install requests用于python3) pip。如果pip已安装但不在您的路径中,则可以使用python -m pip install requests(或python3 -m pip install requests用于python3)

或者,sudo easy_install -U requests如果已easy_install安装,也可以使用。

另外,您可以使用系统软件包管理器:

对于centos:yum install python-requests 对于Ubuntu:apt-get install python-requests

视窗

如果已安装Pip.exe并将其添加到Path Environment Variable中,请使用pip install requests(或pip3 install requests用于python3) pip。如果pip已安装但不在您的路径中,则可以使用python -m pip install requests(或python3 -m pip install requests用于python3)

或者从命令提示符,使用> Path\easy_install.exe requests,这里Path是你的Python*\Scripts文件夹,如果安装它。(例如:C:\Python32\Scripts

如果您要手动将库添加到Windows计算机,则可以下载压缩的库,解压缩它,然后将其放入Lib\site-packagespython路径的文件夹中。(例如:C:\Python27\Lib\site-packages

从来源(通用)

对于任何缺少的库,通常可从https://pypi.python.org/pypi/获得该源。您可以在此处下载请求:https//pypi.python.org/pypi/requests

在Mac OS X和Windows上,下载源zip后,解压缩它,并从未python setup.py install压缩的dir 的termiminal / cmd中运行。

来源

Requests is not a built in module (does not come with the default python installation), so you will have to install it:

OSX/Linux

Use $ sudo pip install requests (or pip3 install requests for python3) if you have pip installed. If pip is installed but not in your path you can use python -m pip install requests (or python3 -m pip install requests for python3)

Alternatively you can also use sudo easy_install -U requests if you have easy_install installed.

Alternatively you can use your systems package manager:

For centos: yum install python-requests For Ubuntu: apt-get install python-requests

Windows

Use pip install requests (or pip3 install requests for python3) if you have pip installed and Pip.exe added to the Path Environment Variable. If pip is installed but not in your path you can use python -m pip install requests (or python3 -m pip install requests for python3)

Alternatively from a cmd prompt, use > Path\easy_install.exe requests, where Path is your Python*\Scripts folder, if it was installed. (For example: C:\Python32\Scripts)

If you manually want to add a library to a windows machine, you can download the compressed library, uncompress it, and then place it into the Lib\site-packages folder of your python path. (For example: C:\Python27\Lib\site-packages)

From Source (Universal)

For any missing library, the source is usually available at https://pypi.python.org/pypi/. You can download requests here: https://pypi.python.org/pypi/requests

On mac osx and windows, after downloading the source zip, uncompress it and from the termiminal/cmd run python setup.py install from the uncompressed dir.

(source)


回答 1

对我而言,您使用的是哪个版本的Python并不明显。

如果是Python 3,一个解决方案是 sudo pip3 install requests

It’s not obvious to me which version of Python you are using.

If it’s Python 3, a solution would be sudo pip3 install requests


回答 2

requests在适用于Python2的Debian / Ubuntu上安装模块:

$ sudo apt-get install python-requests

对于Python3,命令为:

$ sudo apt-get install python3-requests

To install requests module on Debian/Ubuntu for Python2:

$ sudo apt-get install python-requests

And for Python3 the command is:

$ sudo apt-get install python3-requests


回答 3

如果您使用的是Ubuntu,则需要安装 requests

运行以下命令:

pip install requests

如果遇到权限被拒绝的错误,请在命令前使用sudo:

sudo pip install requests

If you are using Ubuntu, there is need to install requests

run this command:

pip install requests

if you face permission denied error, use sudo before command:

sudo pip install requests

回答 4

这可能为时已晚,但是即使未设置pip path,也可以运行此命令。我正在Windows 10上运行Python 3.7,这是命令

py -m pip install requests

并且您还可以将“ requests”替换为任何其他已卸载的库

This may be a liittle bit too late but this command can be run even when pip path is not set. I am using Python 3.7 running on Windows 10 and this is the command

py -m pip install requests

and you can also replace ‘requests’ with any other uninstalled library


回答 5

在OSX上,该命令将取决于您安装的python的风格。

Python 2.x-默认

sudo pip install requests

Python 3.x

sudo pip3 install requests

On OSX, the command will depend on the flavour of python installation you have.

Python 2.x – Default

sudo pip install requests

Python 3.x

sudo pip3 install requests

回答 6

就我而言,请求已经安装,但需要升级。以下命令可以解决问题

$ sudo pip install requests --upgrade

In my case requests was already installed, but needed an upgrade. The following command did the trick

$ sudo pip install requests --upgrade

回答 7

在Windows打开命令行上

pip3 install requests

On Windows Open Command Line

pip3 install requests

回答 8

我遇到了同样的问题,所以我从https://pypi.python.org/pypi/requests#downloads 请求下载文件夹中将名为“ requests”的文件夹复制到“ /Library/Python/2.7/site-packages”。现在,当您使用:导入请求时,它应该可以正常工作。

I had the same issue, so I copied the folder named “requests” from https://pypi.python.org/pypi/requests#downloadsrequests download to “/Library/Python/2.7/site-packages”. Now when you use: import requests, it should work fine.


回答 9

Brew用户可以使用下面的参考,

安装请求的命令:

python3 -m pip install requests

自制软件和Python

pip是Python的软件包安装程序,您需要该软件包requests

Brew users can use reference below,

command to install requests:

python3 -m pip install requests

Homebrew and Python

pip is the package installer for Python and you need the package requests.


回答 10

向应用程序添加第三方程序包

跟随此链接 https://cloud.google.com/appengine/docs/python/tools/libraries27?hl=zh_CN#vendoring

步骤1:在项目的根目录中有一个名为appengine_config.py的文件,然后添加以下行:

从google.appengine.ext导入供应商

添加安装在“ lib”文件夹中的所有库。

vendor.add(’lib’)

步骤2:创建一个目录,并将其命名为project的根目录下的“ lib”。

步骤3:使用pip install -t lib请求

第4步:部署到App Engine。

Adding Third-party Packages to the Application

Follow this link https://cloud.google.com/appengine/docs/python/tools/libraries27?hl=en#vendoring

step1 : Have a file by named a file named appengine_config.py in the root of your project, then add these lines:

from google.appengine.ext import vendor

Add any libraries installed in the “lib” folder.

vendor.add(‘lib’)

Step 2: create a directory and name it “lib” under root directory of project.

step 3: use pip install -t lib requests

step 4 : deploy to app engine.


回答 11

尝试sudo apt-get install python-requests

这对我有用。

Try sudo apt-get install python-requests.

This worked for me.


回答 12

对于Windows,只需将路径指定为cd,然后将路径指定为python的“脚本”,然后执行命令easy_install.exe请求即可。然后尝试导入请求…

For windows just give path as cd and path to the “Scripts” of python and then execute the command easy_install.exe requests.Then try import requests…


回答 13

唯一对我有用的东西:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install requests

The only thing that worked for me:

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py
pip install requests

回答 14

在过去的几个月中,我有几次遇到这个问题。我还没有看到针对fedora系统的好的解决方案,因此这里是另一个解决方案。我正在使用RHEL7,发现了以下内容:

如果您urllib3通过进行了安装pip,并且requests通过进行了安装yum,则即使安装了正确的软件包,也会出现问题。如果您urllib3通过yumrequests安装了,则同样适用pip。这是我为解决此问题所做的工作:

sudo pip uninstall requests
sudo pip uninstall urllib3
sudo yum remove python-urllib3
sudo yum remove python-requests

(确认已删除所有这些库)

sudo yum install python-urllib3
sudo yum install python-requests

请注意,这仅适用于运行Fedora,Redhat或CentOS的系统。

资料来源:
这个问题(在答案的评论中)。
这个 github问题。

I have had this issue a couple times in the past few months. I haven’t seen a good solution for fedora systems posted, so here’s yet another solution. I’m using RHEL7, and I discovered the following:

If you have urllib3 installed via pip, and requests installed via yum you will have issues, even if you have the correct packages installed. The same will apply if you have urllib3 installed via yum, and requests installed via pip. Here’s what I did to fix the issue:

sudo pip uninstall requests
sudo pip uninstall urllib3
sudo yum remove python-urllib3
sudo yum remove python-requests

(confirm that all those libraries have been removed)

sudo yum install python-urllib3
sudo yum install python-requests

Just be aware that this will only work for systems that are running Fedora, Redhat, or CentOS.

Sources:
This very question (in the comments to this answer).
This github issue.


回答 15

我已经安装了python2.7和python3.6

打开〜/ .bash_profile的命令行, 我发现#Setting Python 3.6的PATH,所以我将路径更改为PATH =“ / usr / local / Cellar / python / 2.7.13 / bin:$ {PATH}”,(请确保您的python2.7的路径),然后保存。这个对我有用。

I have installed python2.7 and python3.6

Open Command Line to ~/.bash_profile I find that #Setting PATH for Python 3.6 , So I change the path to PATH=”/usr/local/Cellar/python/2.7.13/bin:${PATH}” , (please make sure your python2.7’s path) ,then save. It works for me.


回答 16

如果要request在Windows上导入:

pip install request

然后beautifulsoup4用于:

pip3 install beautifulsoup4

if you want request import on windows:

pip install request

then beautifulsoup4 for:

pip3 install beautifulsoup4

回答 17

我解决了这个问题。您可以尝试这种方法。在此文件“ .bash_profile”中,添加类似alias python=/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7

I solved this problem.You can try this method. In this file ‘.bash_profile’, Add codes like alias python=/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7


回答 18

如果您将anaconda用作Python软件包管理器,请执行以下操作:

conda install -c anaconda requests

通过pip安装请求对我没有帮助。

If you are using anaconda as your python package manager, execute the following:

conda install -c anaconda requests

Installing requests through pip didn’t help me.


回答 19

您必须确保您的请求模块未安装在最新版本的python中。

使用python 3.7时,请像这样运行python文件:

python3 myfile.py

或使用以下命令进入python交互模式:

python3

是的,这对我有用。像这样运行文件:python3 file.py

You must make sure your requests module is not being installed in a more recent version of python.

When using python 3.7, run your python file like:

python3 myfile.py

or enter python interactive mode with:

python3

Yes, this works for me. Run your file like this: python3 file.py


回答 20

Python 常见安装问题

如果Homebrew在macOS上修改了路径,这些命令也很有用。

python -m pip install requests

要么

python3 -m pip install requests

并行安装多个版本的Python?

Python Common installation issues

These commands are also useful if Homebrew screws up your path on macOS.

python -m pip install requests

or

python3 -m pip install requests

Multiple versions of Python installed in parallel?


回答 21

我的答案与@ pi-k基本相同。就我而言,我的程序在本地运行,但无法在质量检查服务器上构建。(我怀疑devops阻止了该软件包的旧版本,而且我的版本肯定已经过时了)我只是决定升级所有内容

$ pip install pip-review
$ pip-review --local --interactive

My answer is basically the same as @pi-k. In my case my program worked locally but failed to build on QA servers. (I suspect devops had older versions of the package blocked and my version must have been too out-of-date) I just decided to upgrade everything

$ pip install pip-review
$ pip-review --local --interactive

回答 22

如果您使用的是anaconda 步骤1:python 步骤2:在管理员模式下打开anaconda提示 步骤3:cd < python path > 步骤4:在此位置安装软件包

If you are using anaconda step 1: where python step 2: open anaconda prompt in administrator mode step 3: cd <python path> step 4: install the package in this location


回答 23

就我而言,这表明请求Requirement已经满足。所以我用。

sudo pip3 install requests

In my case it was showing request Requirement already satisfied . so I use.

sudo pip3 install requests

回答 24

pycharm IDE中

从菜单中的文件一键设置

2-接下来进行Python解释器

3点子

4-搜索请求并安装

在终端pycharm中写入此命令

pip install requests 

并通过以下方式使用:

import requests

in pycharm IDE

1-go to setting from File in menu

2-next go on Python interpreter

3-click on pip

4- search for requests and install it

or write this order in terminal pycharm

pip install requests 

and use it by :

import requests

回答 25

我的问题是我有四个尝试使用的python不同的python库(即使我显式调用了/usr/bin/python)。一旦我从路径中删除了shell别名和另外两个python,/usr/bin/python就可以了import requests

-HTH

My problem was that I had four different python libraries that python was trying to use (even though I was explicitly calling /usr/bin/python). Once I removed a shell alias and two other pythons from my path, /usr/bin/python was able to import requests.

-HTH


回答 26

问题可能是由于一台计算机具有多个版本的Python。确保要安装所有版本的请求模块。

就我而言,我有python版本2.73.7。我通过同时安装两个版本的python解决了此问题

The issue could be because of a machine having multiple versions of Python. Make sure that you are installing Request modules in all the versions.

In my case, I had python version 2.7 and 3.7. I resolved this issue by installing with both versions of python


回答 27

试试这个,我已经安装了anaconda,在阅读了很多文章之后,我发现这是一个解决方法

import sys
print(sys.version)
print("\n \n")
print(sys.path)
sys.path.append('/usr/local/anaconda3/envs/py36/lib/python3.6/site-packages')

在python_version文件夹中提供站点包的路径。

Try this I have anaconda installed and after going through a lot of articles I found this as a fix

import sys
print(sys.version)
print("\n \n")
print(sys.path)
sys.path.append('/usr/local/anaconda3/envs/py36/lib/python3.6/site-packages')

Provide the path of site-packages inside python_version folder.


回答 28

也许您安装了多个版本的python。尝试使用其他版本(例如python3.7 xxx.py)来确定哪个版本正确。

Maybe you have multiple versions of python installed. Try different versions, such as python3.7 xxx.py, to identify which one is the right version.


回答 29

您还可以通过首先在目录中找到pip3.exe文件在Windows上使用pip安装:对我说==> cd c:\ python34 \ scripts然后运行==> pip3安装请求

you can also use pip install on windows by first locating the pip3.exe file in the directory: say for me==> cd c:\python34\scripts then run ==> pip3 install requests


尝试/使用Python请求模块的正确方法?

问题:尝试/使用Python请求模块的正确方法?

try:
    r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
    print e #should I also sys.exit(1) after this?

它是否正确?有没有更好的方法来构造它?这会覆盖我所有的基地吗?

try:
    r = requests.get(url, params={'s': thing})
except requests.ConnectionError, e:
    print e #should I also sys.exit(1) after this?

Is this correct? Is there a better way to structure this? Will this cover all my bases?


回答 0

看一下Requests 异常文档。简而言之:

如果出现网络问题(例如DNS故障,连接被拒绝等),请求将引发ConnectionError异常。

如果发生罕见的无效HTTP响应,则请求将引发HTTPError异常。

如果请求超时,Timeout则会引发异常。

如果请求超过配置的最大重定向数,TooManyRedirects则会引发异常。

请求显式引发的所有异常都继承自requests.exceptions.RequestException

要回答您的问题,您显示的内容不会涵盖所有基础。您将只捕获与连接有关的错误,而不是超时的错误。

捕获异常时该做什么实际上取决于脚本/程序的设计。退出是否可以接受?您可以再试一次吗?如果错误是灾难性的,并且您无法继续进行,那么可以,您可以通过引发SystemExit(一种打印错误并调用的好方法)来中止程序sys.exit

您可以捕获基类异常,该异常将处理所有情况:

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e:  # This is the correct syntax
    raise SystemExit(e)

或者,您可以分别捕获它们并执行不同的操作。

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
    # Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
    # Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
    # catastrophic error. bail.
    raise SystemExit(e)

正如克里斯蒂安指出:

如果您希望http错误(例如401未经授权)引发异常,可以调用Response.raise_for_statusHTTPError如果响应是http错误,则将引发。

一个例子:

try:
    r = requests.get('http://www.google.com/nothere')
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    raise SystemExit(err)

将打印:

404 Client Error: Not Found for url: http://www.google.com/nothere

Have a look at the Requests exception docs. In short:

In the event of a network problem (e.g. DNS failure, refused connection, etc), Requests will raise a ConnectionError exception.

In the event of the rare invalid HTTP response, Requests will raise an HTTPError exception.

If a request times out, a Timeout exception is raised.

If a request exceeds the configured number of maximum redirections, a TooManyRedirects exception is raised.

All exceptions that Requests explicitly raises inherit from requests.exceptions.RequestException.

To answer your question, what you show will not cover all of your bases. You’ll only catch connection-related errors, not ones that time out.

What to do when you catch the exception is really up to the design of your script/program. Is it acceptable to exit? Can you go on and try again? If the error is catastrophic and you can’t go on, then yes, you may abort your program by raising SystemExit (a nice way to both print an error and call sys.exit).

You can either catch the base-class exception, which will handle all cases:

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.RequestException as e:  # This is the correct syntax
    raise SystemExit(e)

Or you can catch them separately and do different things.

try:
    r = requests.get(url, params={'s': thing})
except requests.exceptions.Timeout:
    # Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects:
    # Tell the user their URL was bad and try a different one
except requests.exceptions.RequestException as e:
    # catastrophic error. bail.
    raise SystemExit(e)

As Christian pointed out:

If you want http errors (e.g. 401 Unauthorized) to raise exceptions, you can call Response.raise_for_status. That will raise an HTTPError, if the response was an http error.

An example:

try:
    r = requests.get('http://www.google.com/nothere')
    r.raise_for_status()
except requests.exceptions.HTTPError as err:
    raise SystemExit(err)

Will print:

404 Client Error: Not Found for url: http://www.google.com/nothere

回答 1

另一项建议是明确的。似乎最好是从特定错误到一般错误,以获取所需的错误来捕获,因此特定错误不会被一般错误掩盖。

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)

Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)     

OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah

One additional suggestion to be explicit. It seems best to go from specific to general down the stack of errors to get the desired error to be caught, so the specific ones don’t get masked by the general one.

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)

Http Error: 404 Client Error: Not Found for url: http://www.google.com/blahblah

vs

url='http://www.google.com/blahblah'

try:
    r = requests.get(url,timeout=3)
    r.raise_for_status()
except requests.exceptions.RequestException as err:
    print ("OOps: Something Else",err)
except requests.exceptions.HTTPError as errh:
    print ("Http Error:",errh)
except requests.exceptions.ConnectionError as errc:
    print ("Error Connecting:",errc)
except requests.exceptions.Timeout as errt:
    print ("Timeout Error:",errt)     

OOps: Something Else 404 Client Error: Not Found for url: http://www.google.com/blahblah

回答 2

异常对象还包含原始响应e.response,如果需要查看服务器响应中的错误正文,该对象可能很有用。例如:

try:
    r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print (e.response.text)

Exception object also contains original response e.response, that could be useful if need to see error body in response from the server. For example:

try:
    r = requests.post('somerestapi.com/post-here', data={'birthday': '9/9/3999'})
    r.raise_for_status()
except requests.exceptions.HTTPError as e:
    print (e.response.text)

使用请求在python中下载大文件

问题:使用请求在python中下载大文件

请求是一个非常不错的库。我想用它来下载大文件(> 1GB)。问题是不可能将整个文件保留在内存中,我需要分块读取它。这是以下代码的问题

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return 

由于某种原因,它无法按这种方式工作。仍将响应加载到内存中,然后再将其保存到文件中。

更新

如果您需要一个小型客户端(Python 2.x /3.x),可以从FTP下载大文件,则可以在此处找到它。它支持多线程和重新连接(它确实监视连接),还可以为下载任务调整套接字参数。

Requests is a really nice library. I’d like to use it for download big files (>1GB). The problem is it’s not possible to keep whole file in memory I need to read it in chunks. And this is a problem with the following code

import requests

def DownloadFile(url)
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    f = open(local_filename, 'wb')
    for chunk in r.iter_content(chunk_size=512 * 1024): 
        if chunk: # filter out keep-alive new chunks
            f.write(chunk)
    f.close()
    return 

By some reason it doesn’t work this way. It still loads response into memory before save it to a file.

UPDATE

If you need a small client (Python 2.x /3.x) which can download big files from FTP, you can find it here. It supports multithreading & reconnects (it does monitor connections) also it tunes socket params for the download task.


回答 0

使用以下流代码,无论下载文件的大小如何,Python内存的使用都受到限制:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

请注意,使用返回的字节数iter_content不完全是chunk_size; 它应该是一个通常更大的随机数,并且在每次迭代中都应该有所不同。

https://requests.readthedocs.io/en/latest/user/advanced/#body-content-workflowhttps://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content进一步参考。

With the following streaming code, the Python memory usage is restricted regardless of the size of the downloaded file:

def download_file(url):
    local_filename = url.split('/')[-1]
    # NOTE the stream=True parameter below
    with requests.get(url, stream=True) as r:
        r.raise_for_status()
        with open(local_filename, 'wb') as f:
            for chunk in r.iter_content(chunk_size=8192): 
                # If you have chunk encoded response uncomment if
                # and set chunk_size parameter to None.
                #if chunk: 
                f.write(chunk)
    return local_filename

Note that the number of bytes returned using iter_content is not exactly the chunk_size; it’s expected to be a random number that is often far bigger, and is expected to be different in every iteration.

See https://requests.readthedocs.io/en/latest/user/advanced/#body-content-workflow and https://requests.readthedocs.io/en/latest/api/#requests.Response.iter_content for further reference.


回答 1

如果使用Response.raw和,则容易得多shutil.copyfileobj()

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

这样就无需占用过多内存就可以将文件流式传输到磁盘,并且代码很简单。

It’s much easier if you use Response.raw and shutil.copyfileobj():

import requests
import shutil

def download_file(url):
    local_filename = url.split('/')[-1]
    with requests.get(url, stream=True) as r:
        with open(local_filename, 'wb') as f:
            shutil.copyfileobj(r.raw, f)

    return local_filename

This streams the file to disk without using excessive memory, and the code is simple.


回答 2

OP并不是在问什么,但是…这样做很简单urllib

from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

或这样,如果您要将其保存到临时文件中:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

我看了看这个过程:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

而且我看到文件在增长,但内存使用量保持在17 MB。我想念什么吗?

Not exactly what OP was asking, but… it’s ridiculously easy to do that with urllib:

from urllib.request import urlretrieve
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
dst = 'ubuntu-16.04.2-desktop-amd64.iso'
urlretrieve(url, dst)

Or this way, if you want to save it to a temporary file:

from urllib.request import urlopen
from shutil import copyfileobj
from tempfile import NamedTemporaryFile
url = 'http://mirror.pnl.gov/releases/16.04.2/ubuntu-16.04.2-desktop-amd64.iso'
with urlopen(url) as fsrc, NamedTemporaryFile(delete=False) as fdst:
    copyfileobj(fsrc, fdst)

I watched the process:

watch 'ps -p 18647 -o pid,ppid,pmem,rsz,vsz,comm,args; ls -al *.iso'

And I saw the file growing, but memory usage stayed at 17 MB. Am I missing something?


回答 3

您的块大小可能太大,您是否尝试过删除它-一次一次可能是1024个字节?(同样,您可以with用来整理语法)

def DownloadFile(url):
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return 

顺便说一句,您如何推断响应已加载到内存中?

这听起来仿佛Python没有刷新数据文件,从其他SO问题,你可以尝试f.flush(),并os.fsync()迫使文件的写入和释放内存;

    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                f.flush()
                os.fsync(f.fileno())

Your chunk size could be too large, have you tried dropping that – maybe 1024 bytes at a time? (also, you could use with to tidy up the syntax)

def DownloadFile(url):
    local_filename = url.split('/')[-1]
    r = requests.get(url)
    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
    return 

Incidentally, how are you deducing that the response has been loaded into memory?

It sounds as if python isn’t flushing the data to file, from other SO questions you could try f.flush() and os.fsync() to force the file write and free memory;

    with open(local_filename, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024): 
            if chunk: # filter out keep-alive new chunks
                f.write(chunk)
                f.flush()
                os.fsync(f.fileno())

urllib,urllib2,urllib3和请求模块之间有什么区别?

问题:urllib,urllib2,urllib3和请求模块之间有什么区别?

在Python,有什么之间的差异urlliburllib2urllib3requests模块?为什么有三个?他们似乎在做同样的事情…

In Python, what are the differences between the urllib, urllib2, urllib3 and requests modules? Why are there three? They seem to do the same thing…


回答 0

我知道已经有人说过了,但我强烈建议您使用requestsPython软件包。

如果您使用的是python以外的语言,则可能是在考虑urllib并且urllib2易于使用,代码不多且功能强大,这就是我以前的想法。但是该requests程序包是如此有用且太短,以至于每个人都应该使用它。

首先,它支持完全宁静的API,并且非常简单:

import requests

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

无论是GET / POST,您都无需再次对参数进行编码,只需将字典作为参数即可。

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

加上它甚至还具有内置的JSON解码器(再次,我知道json.loads()编写的内容并不多,但这肯定很方便):

resp.json()

或者,如果您的响应数据只是文本,请使用:

resp.text

这只是冰山一角。这是请求站点中的功能列表:

  • 国际域名和URL
  • 保持活动和连接池
  • Cookie持久性会话
  • 浏览器式SSL验证
  • 基本/摘要身份验证
  • 优雅的键/值Cookie
  • 自动减压
  • Unicode响应机构
  • 分段文件上传
  • 连接超时
  • .netrc支持
  • 项目清单
  • python 2.6—3.4
  • 线程安全的。

I know it’s been said already, but I’d highly recommend the requests Python package.

If you’ve used languages other than python, you’re probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that’s how I used to think. But the requests package is so unbelievably useful and short that everyone should be using it.

First, it supports a fully restful API, and is as easy as:

import requests

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

Plus it even has a built in JSON decoder (again, I know json.loads() isn’t a lot more to write, but this sure is convenient):

resp.json()

Or if your response data is just text, use:

resp.text

This is just the tip of the iceberg. This is the list of features from the requests site:

  • International Domains and URLs
  • Keep-Alive & Connection Pooling
  • Sessions with Cookie Persistence
  • Browser-style SSL Verification
  • Basic/Digest Authentication
  • Elegant Key/Value Cookies
  • Automatic Decompression
  • Unicode Response Bodies
  • Multipart File Uploads
  • Connection Timeouts
  • .netrc support
  • List item
  • Python 2.6—3.4
  • Thread-safe.

回答 1

urllib2提供了一些额外的功能,即该urlopen()函数可以允许您指定标头(通常您以前必须使用httplib,这要冗长得多。)不过,更重要的是,urllib2提供了Request该类,该类可以提供更多功能。声明式处理请求:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

请注意,urlencode()仅在urllib中,而不在urllib2中。

还有一些处理程序,用于在urllib2中实现更高级的URL支持。简短的答案是,除非使用旧代码,否则可能要使用urllib2中的URL打开程序,但是对于某些实用程序功能,仍然需要导入urllib。

奖励答案 使用Google App Engine,您可以使用httplib,urllib或urllib2中的任何一个,但它们都只是Google URL Fetch API的包装。也就是说,您仍然受到端口,协议和允许的响应时间之类的相同限制。不过,您可以像期望的那样使用库的核心来获取HTTP URL。

urllib2 provides some extra functionality, namely the urlopen() function can allow you to specify headers (normally you’d have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request class, which allows for a more declarative approach to doing a request:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

Note that urlencode() is only in urllib, not urllib2.

There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you’re working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.

Bonus answer With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google’s URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.


回答 2

urlliburllib2都是Python模块,它们执行URL请求相关的内容,但提供不同的功能。

1)urllib2可以接受Request对象来设置URL请求的标头,而urllib仅接受URL。

2)urllib提供了urlencode方法,该方法用于生成GET查询字符串,而urllib2没有此功能。这是urllib与urllib2经常一起使用的原因之一。

Requests -Requests是一个使用Python编写的简单易用的HTTP库。

1)Python请求自动对参数进行编码,因此您只需将它们作为简单的参数传递,就与urllib不同,在urllib中,需要在传递参数之前使用urllib.encode()方法对参数进行编码。

2)它自动将响应解码为Unicode。

3)Requests还具有更方便的错误处理方式。如果您的身份验证失败,则urllib2将引发urllib2.URLError,而Requests将返回正常的响应对象。您需要通过boolean response.ok查看所有请求是否成功

urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.

2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn’t have such a function. This is one of the reasons why urllib is often used along with urllib2.

Requests – Requests’ is a simple, easy-to-use HTTP library written in Python.

1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.

2) It automatically decoded the response into Unicode.

3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok


回答 3

将Python2移植到Python3是一个相当大的区别。urllib2对于python3不存在,其方法已移植到urllib。因此,您正在大量使用它,并希望将来迁移到Python3,请考虑使用urllib。但是2to3工具将自动为您完成大部分工作。

One considerable difference is about porting Python2 to Python3. urllib2 does not exist for python3 and its methods ported to urllib. So you are using that heavily and want to migrate to Python3 in future, consider using urllib. However 2to3 tool will automatically do most of the work for you.


回答 4

仅添加到现有答案中,我看不到有人提到python请求不是本机库。如果可以添加依赖项,那么请求就可以了。但是,如果您试图避免添加依赖项,则urllib是一个本机python库,已经可供您使用。

Just to add to the existing answers, I don’t see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.


回答 5

我喜欢此urllib.urlencode功能,并且似乎不存在urllib2

>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'

I like the urllib.urlencode function, and it doesn’t appear to exist in urllib2.

>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'

回答 6

要获取网址的内容:

try: # Try importing requests first.
    import requests
except ImportError: 
    try: # Try importing Python3 urllib
        import urllib.request
    except AttributeError: # Now importing Python2 urllib
        import urllib


def get_content(url):
    try:  # Using requests.
        return requests.get(url).content # Returns requests.models.Response.
    except NameError:  
        try: # Using Python3 urllib.
            with urllib.request.urlopen(index_url) as response:
                return response.read() # Returns http.client.HTTPResponse.
        except AttributeError: # Using Python3 urllib.
            return urllib.urlopen(url).read() # Returns an instance.

很难request为响应编写Python2和Python3以及依赖项代码,因为它们的urlopen()功能和requests.get()函数返回不同的类型:

  • Python2 urllib.request.urlopen()返回一个http.client.HTTPResponse
  • Python3 urllib.urlopen(url)返回一个instance
  • 请求request.get(url)返回一个requests.models.Response

To get the content of a url:

try: # Try importing requests first.
    import requests
except ImportError: 
    try: # Try importing Python3 urllib
        import urllib.request
    except AttributeError: # Now importing Python2 urllib
        import urllib


def get_content(url):
    try:  # Using requests.
        return requests.get(url).content # Returns requests.models.Response.
    except NameError:  
        try: # Using Python3 urllib.
            with urllib.request.urlopen(index_url) as response:
                return response.read() # Returns http.client.HTTPResponse.
        except AttributeError: # Using Python3 urllib.
            return urllib.urlopen(url).read() # Returns an instance.

It’s hard to write Python2 and Python3 and request dependencies code for the responses because they urlopen() functions and requests.get() function return different types:

  • Python2 urllib.request.urlopen() returns a http.client.HTTPResponse
  • Python3 urllib.urlopen(url) returns an instance
  • Request request.get(url) returns a requests.models.Response

回答 7

通常应该使用urllib2,因为通过接受Request对象有时会使事情变得容易一些,并且还会在协议错误时引发URLException。但是,借助Google App Engine,您将无法使用任何一种。您必须使用Google在其沙盒Python环境中提供的URL Fetch API

You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can’t use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.


回答 8

我发现上述答案中缺少的一个关键点是urllib返回类型为object的对象,<class http.client.HTTPResponse>requests返回return <class 'requests.models.Response'>

因此,read()方法可以与一起使用,urllib但不能与一起使用requests

PS:requests已经有很多方法,几乎​​不需要read();>

A key point that I find missing in the above answers is that urllib returns an object of type <class http.client.HTTPResponse> whereas requests returns <class 'requests.models.Response'>.

Due to this, read() method can be used with urllib but not with requests.

P.S. : requests is already rich with so many methods that it hardly needs one more as read() ;>


Requests-一个简单而优雅的HTTP库

Requests

Requests是一个简单而优雅的HTTP库

>>> import requests
>>> r = requests.get('https://api.github.com/user', auth=('user', 'pass'))
>>> r.status_code
200
>>> r.headers['content-type']
'application/json; charset=utf8'
>>> r.encoding
'utf-8'
>>> r.text
'{"type":"User"...'
>>> r.json()
{'disk_usage': 368627, 'private_gists': 484, ...}

请求允许您极其轻松地发送HTTP/1.1请求。不需要手动将查询字符串添加到URL,也不需要对PUT&POST数据-但现在,只需使用json方法!

Requests是目前下载量最大的Python包之一,14M downloads / week-根据GitHub的说法,请求目前正在depended upon通过500,000+存储库。您当然可以信任这段代码



安装请求和支持的版本

请访问PyPI上的Requests:

$ python -m pip install requests

Requests正式支持Python 2.7和3.6+

支持的功能和最佳做法

Requests已经为构建健壮可靠的HTTP语言应用程序的需求做好了准备,以满足当今的需求

  • 保活和连接池
  • 国际域名和URL
  • 具有Cookie持久性的会话
  • 浏览器样式的TLS/SSL验证
  • 基本和摘要身份验证
  • 熟悉dict-喜欢饼干
  • 自动内容解压缩和解码
  • 多部分文件上载
  • SOCKS代理支持
  • 连接超时
  • 流式下载
  • 自动兑现.netrc
  • 分块的HTTP请求

API参考和用户指南,请访问Read the Docs

克隆存储库

在克隆请求存储库时,您可能需要添加-c fetch.fsck.badTimezone=ignore用于避免有关错误提交的错误的标记(请参见this issue了解更多背景信息):

git clone -c fetch.fsck.badTimezone=ignore https://github.com/psf/requests.git

您还可以将此设置应用于全局Git配置:

git config --global fetch.fsck.badTimezone ignore