标签归档:urllib2

AttributeError(“’str’对象没有属性’read’”)

问题:AttributeError(“’str’对象没有属性’read’”)

在Python中,我得到一个错误:

Exception:  (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)

给定python代码:

def getEntries (self, sub):
    url = 'http://www.reddit.com/'
    if (sub != ''):
        url += 'r/' + sub

    request = urllib2.Request (url + 
        '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
    response = urllib2.urlopen (request)
    jsonofabitch = response.read ()

    return json.load (jsonofabitch)['data']['children']

此错误是什么意思,我怎么做导致此错误?

In Python I’m getting an error:

Exception:  (<type 'exceptions.AttributeError'>,
AttributeError("'str' object has no attribute 'read'",), <traceback object at 0x1543ab8>)

Given python code:

def getEntries (self, sub):
    url = 'http://www.reddit.com/'
    if (sub != ''):
        url += 'r/' + sub

    request = urllib2.Request (url + 
        '.json', None, {'User-Agent' : 'Reddit desktop client by /user/RobinJ1995/'})
    response = urllib2.urlopen (request)
    jsonofabitch = response.read ()

    return json.load (jsonofabitch)['data']['children']

What does this error mean and what did I do to cause it?


回答 0

问题在于,对于json.load您,应该传递带有已read定义函数的对象之类的文件。因此,您可以使用json.load(response)json.loads(response.read())

The problem is that for json.load you should pass a file like object with a read function defined. So either you use json.load(response) or json.loads(response.read()).


回答 1

AttributeError("'str' object has no attribute 'read'",)

这就是说的意思:有人试图.read在给它的对象上找到一个属性,然后给它一个类型的对象str(即,给它一个字符串)。

错误发生在这里:

json.load (jsonofabitch)['data']['children']

好吧,您不在read任何地方寻找,因此它必须在json.load您调用的函数中发生(如完整的回溯所示)。这是因为json.load正在尝试提供给.read您的东西,但是却给了它jsonofabitch,该东西当前命名为字符串(通过调用.read来创建response)。

解决方案:不要.read自欺欺人。函数将执行此操作,并希望您response直接给它以使其可以执行操作。

您也可以通过阅读功能的内置Python文档(try help(json.load)或整个模块(try help(json))),或通过查看http://docs.python.org上这些功能的文档来解决此问题。

AttributeError("'str' object has no attribute 'read'",)

This means exactly what it says: something tried to find a .read attribute on the object that you gave it, and you gave it an object of type str (i.e., you gave it a string).

The error occurred here:

json.load (jsonofabitch)['data']['children']

Well, you aren’t looking for read anywhere, so it must happen in the json.load function that you called (as indicated by the full traceback). That is because json.load is trying to .read the thing that you gave it, but you gave it jsonofabitch, which currently names a string (which you created by calling .read on the response).

Solution: don’t call .read yourself; the function will do this, and is expecting you to give it the response directly so that it can do so.

You could also have figured this out by reading the built-in Python documentation for the function (try help(json.load), or for the entire module (try help(json)), or by checking the documentation for those functions on http://docs.python.org .


回答 2

如果收到这样的python错误:

AttributeError: 'str' object has no attribute 'some_method'

您可能通过用字符串覆盖对象意外地中毒了对象。

如何用几行代码在python中重现此错误:

#!/usr/bin/env python
import json
def foobar(json):
    msg = json.loads(json)

foobar('{"batman": "yes"}')

运行它,打印:

AttributeError: 'str' object has no attribute 'loads'

但是更改变量名的名称,它可以正常工作:

#!/usr/bin/env python
import json
def foobar(jsonstring):
    msg = json.loads(jsonstring)

foobar('{"batman": "yes"}')

当您尝试在字符串中运行方法时,导致此错误。字符串有几种方法,但是您没有调用。因此,停止尝试调用String未定义的方法,并开始寻找在何处毒害了对象。

If you get a python error like this:

AttributeError: 'str' object has no attribute 'some_method'

You probably poisoned your object accidentally by overwriting your object with a string.

How to reproduce this error in python with a few lines of code:

#!/usr/bin/env python
import json
def foobar(json):
    msg = json.loads(json)

foobar('{"batman": "yes"}')

Run it, which prints:

AttributeError: 'str' object has no attribute 'loads'

But change the name of the variablename, and it works fine:

#!/usr/bin/env python
import json
def foobar(jsonstring):
    msg = json.loads(jsonstring)

foobar('{"batman": "yes"}')

This error is caused when you tried to run a method within a string. String has a few methods, but not the one you are invoking. So stop trying to invoke a method which String does not define and start looking for where you poisoned your object.


回答 3

好的,这是旧线程了。我有一个相同的问题,我的问题是我用了json.load而不是json.loads

这样,json加载任何类型的字典都没有问题。

官方文件

json.load-使用此转换表将fp(支持.read()的文本文件或包含JSON文档的二进制文件)反序列化为Python对象。

json.loads-使用此转换表将s(包含JSON文档的str,字节或字节数组实例)反序列化为Python对象。

Ok, this is an old thread but. I had a same issue, my problem was I used json.load instead of json.loads

This way, json has no problem with loading any kind of dictionary.

Official documentation

json.load – Deserialize fp (a .read()-supporting text file or binary file containing a JSON document) to a Python object using this conversion table.

json.loads – Deserialize s (a str, bytes or bytearray instance containing a JSON document) to a Python object using this conversion table.


回答 4

您需要先打开文件。这不起作用:

json_file = json.load('test.json')

但这有效:

f = open('test.json')
json_file = json.load(f)

You need to open the file first. This doesn’t work:

json_file = json.load('test.json')

But this works:

f = open('test.json')
json_file = json.load(f)

如何在Python中将字典转换为查询字符串?

问题:如何在Python中将字典转换为查询字符串?

使用后cgi.parse_qs(),如何将结果(字典)转换回查询字符串?寻找类似的东西 urllib.urlencode()

After using cgi.parse_qs(), how to convert the result (dictionary) back to query string? Looking for something similar to urllib.urlencode().


回答 0

Python 3

urllib.parse.urlencode(查询,doseq = False,[…])

映射对象或可能包含str或bytes对象的两个元素的元组序列转换百分比编码的ASCII文本字符串。

Python的3 urllib.parse文档

A dict是一个映射。

旧版Python

urllib.urlencodequery[,doseq])
映射对象或由两个元素组成的元组序列转换 “百分比编码”字符串…一系列key=value'&'字符分隔的对…

Python 2.7版urllib文档

Python 3

urllib.parse.urlencode(query, doseq=False, [...])

Convert a mapping object or a sequence of two-element tuples, which may contain str or bytes objects, to a percent-encoded ASCII text string.

Python 3 urllib.parse docs

A dict is a mapping.

Legacy Python

urllib.urlencode(query[, doseq])
Convert a mapping object or a sequence of two-element tuples to a “percent-encoded” string… a series of key=value pairs separated by '&' characters…

Python 2.7 urllib docs


回答 1

在python3中,略有不同:

from urllib.parse import urlencode
urlencode({'pram1': 'foo', 'param2': 'bar'})

输出: 'pram1=foo&param2=bar'

为了实现python2和python3的兼容性,请尝试以下操作:

try:
    #python2
    from urllib import urlencode
except ImportError:
    #python3
    from urllib.parse import urlencode

In python3, slightly different:

from urllib.parse import urlencode
urlencode({'pram1': 'foo', 'param2': 'bar'})

output: 'pram1=foo&param2=bar'

for python2 and python3 compatibility, try this:

try:
    #python2
    from urllib import urlencode
except ImportError:
    #python3
    from urllib.parse import urlencode

回答 2

您正在寻找完全一样的东西urllib.urlencode()

但是,当您调用parse_qs()(与区别parse_qsl())时,字典键是唯一的查询变量名称,而值是每个名称的值列表

为了将此信息传递到中urllib.urlencode(),您必须“展平”这些列表。这是使用元组列表理解的方法:

query_pairs = [(k,v) for k,vlist in d.iteritems() for v in vlist]
urllib.urlencode(query_pairs)

You’re looking for something exactly like urllib.urlencode()!

However, when you call parse_qs() (distinct from parse_qsl()), the dictionary keys are the unique query variable names and the values are lists of values for each name.

In order to pass this information into urllib.urlencode(), you must “flatten” these lists. Here is how you can do it with a list comprehenshion of tuples:

query_pairs = [(k,v) for k,vlist in d.iteritems() for v in vlist]
urllib.urlencode(query_pairs)

回答 3

也许您正在寻找这样的东西:

def dictToQuery(d):
  query = ''
  for key in d.keys():
    query += str(key) + '=' + str(d[key]) + "&"
  return query

它需要一个字典并将其转换为查询字符串,就像urlencode一样。它将在查询字符串后附加一个最终的“&”,但return query[:-1]如果有问题,则将其修复。

Maybe you’re looking for something like this:

def dictToQuery(d):
  query = ''
  for key in d.keys():
    query += str(key) + '=' + str(d[key]) + "&"
  return query

It takes a dictionary and convert it to a query string, just like urlencode. It’ll append a final “&” to the query string, but return query[:-1] fixes that, if it’s an issue.


需要为Python 3.5.1安装urllib2

问题:需要为Python 3.5.1安装urllib2

我正在为Mac运行Python 3.5.1。我想使用urllib2模块。我尝试安装它,但被告知它已被拆分成Python 3 urllib.requesturllib.error用于Python 3。

我的命令(现在不在框架bin目录中运行,因为它不在我的路径中):

sudo ./pip3 install urllib.request

返回此:

Could not find a version that satisfies the requirement urllib.request (from versions: )
No matching distribution found for urllib.request

在尝试一口气安装之前,我遇到了同样的错误urllib2

I’m running Python 3.5.1 for Mac. I want to use urllib2 module. I tried installing it but I was told that it’s been split into urllib.request and urllib.error for Python 3.

My command (running from the framework bin directory for now because it’s not in my path):

sudo ./pip3 install urllib.request

Returns this:

Could not find a version that satisfies the requirement urllib.request (from versions: )
No matching distribution found for urllib.request

I got the same error before when I tried to install urllib2 in one fell swoop.


回答 0

警告:安全性研究发现 PyPI上多个中毒的软件包,包括名为的软件包urllib,安装后会“打电话回家”。如果您pip install urllib在2017年6月之后使用了一段时间,请尽快删除该软件包

您不能,也不需要。

urllib2是Python 2中包含的库的名称。您可以改用Python 3中包含的urllib.request。该urllib.request库的工作方式与urllib2在Python 2中的工作方式相同。由于已经包含该库,因此您无需安装它。

如果您正在遵循的教程告诉您使用方法,urllib2那么您会发现遇到更多问题。您的教程是针对Python 2编写的,而不是针对Python 3编写的。找到其他教程,或者安装Python 2.7并继续该版本的教程。您会发现urllib2该版本随附。

或者,安装该requests以获得更高级别且更易于使用的API。它可以在Python 2和3上使用。

WARNING: Security researches have found several poisoned packages on PyPI, including a package named urllib, which will ‘phone home’ when installed. If you used pip install urllib some time after June 2017, remove that package as soon as possible.

You can’t, and you don’t need to.

urllib2 is the name of the library included in Python 2. You can use the urllib.request library included with Python 3, instead. The urllib.request library works the same way urllib2 works in Python 2. Because it is already included you don’t need to install it.

If you are following a tutorial that tells you to use urllib2 then you’ll find you’ll run into more issues. Your tutorial was written for Python 2, not Python 3. Find a different tutorial, or install Python 2.7 and continue your tutorial on that version. You’ll find urllib2 comes with that version.

Alternatively, install the requests library for a higher-level and easier to use API. It’ll work on both Python 2 and 3.


回答 1

根据文档

注意 urllib2模块已拆分为Python 3中名为urllib.request和的多个模块urllib.error。将源转换为Python 3时,2to3工具将自动适应导入。

因此,似乎无法做您想做的事,但您可以使用中的适当python3函数urllib.request

Acording to the docs:

Note The urllib2 module has been split across several modules in Python 3 named urllib.request and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to Python 3.

So it appears that it is impossible to do what you want but you can use appropriate python3 functions from urllib.request.


回答 2

在Python 3中,urllib2被两个名为urllib.request和的内置模块代替urllib.error

来源改编


因此,替换为:

import urllib2

有了这个:

import urllib.request as urllib2

In Python 3, urllib2 was replaced by two in-built modules named urllib.request and urllib.error

Adapted from source


So replace this:

import urllib2

With this:

import urllib.request as urllib2

更改urllib2.urlopen上的用户代理

问题:更改urllib2.urlopen上的用户代理

如何使用除urllib2.urlopen上的默认代理之外的其他用户代理下载网页?

How can I download a webpage with a user agent other than the default one on urllib2.urlopen?


回答 0

从每个人喜欢的Dive Into Python中设置User-Agent

简短的故事:您可以使用Request.add_header进行此操作。

您也可以在创建Request本身时将标头作为字典传递,如docs所述

标头应该是一个字典,并且将被视为如同add_header()以每个键和值作为参数调用时一样。这通常用于“欺骗” User-Agent标头,浏览器使用该标头来标识自身–一些HTTP服务器仅允许来自普通浏览器的请求而不是脚本。例如,Mozilla Firefox可能会将其自身标识为"Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11",而urllib2默认的用户代理字符串是"Python-urllib/2.6"(在python 2.6上)。

Setting the User-Agent from everyone’s favorite Dive Into Python.

The short story: You can use Request.add_header to do this.

You can also pass the headers as a dictionary when creating the Request itself, as the docs note:

headers should be a dictionary, and will be treated as if add_header() was called with each key and value as arguments. This is often used to “spoof” the User-Agent header, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as "Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11", while urllib2‘s default user agent string is "Python-urllib/2.6" (on Python 2.6).


回答 1

我几周前回答了一个类似的问题

这个问题中有示例代码,但是基本上您可以执行以下操作:(请注意User-AgentRFC 2616第14.43节的大写形式。)

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
response = opener.open('http://www.stackoverflow.com')

I answered a similar question a couple weeks ago.

There is example code in that question, but basically you can do something like this: (Note the capitalization of User-Agent as of RFC 2616, section 14.43.)

opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0')]
response = opener.open('http://www.stackoverflow.com')

回答 2

headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('www.example.com', None, headers)
html = urllib2.urlopen(req).read()

或者,更短一些:

req = urllib2.Request('www.example.com', headers={ 'User-Agent': 'Mozilla/5.0' })
html = urllib2.urlopen(req).read()
headers = { 'User-Agent' : 'Mozilla/5.0' }
req = urllib2.Request('www.example.com', None, headers)
html = urllib2.urlopen(req).read()

Or, a bit shorter:

req = urllib2.Request('www.example.com', headers={ 'User-Agent': 'Mozilla/5.0' })
html = urllib2.urlopen(req).read()

回答 3

对于python 3,urllib分为3个模块…

import urllib.request
req = urllib.request.Request(url="http://localhost/", headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'})
handler = urllib.request.urlopen(req)

For python 3, urllib is split into 3 modules…

import urllib.request
req = urllib.request.Request(url="http://localhost/", headers={'User-Agent':' Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0'})
handler = urllib.request.urlopen(req)

回答 4

所有这些理论上都应该起作用,但是(至少在Windows上使用Python 2.7.2)在您发送自定义User-agent标头时,urllib2不会发送该标头。如果您不尝试发送User-agent标头,它将发送默认的Python / urllib2

这些方法似乎都不适合添加User-agent,但它们适用于其他标头:

opener = urllib2.build_opener(proxy)
opener.addheaders = {'User-agent':'Custom user agent'}
urllib2.install_opener(opener)

request = urllib2.Request(url, headers={'User-agent':'Custom user agent'})

request.headers['User-agent'] = 'Custom user agent'

request.add_header('User-agent', 'Custom user agent')

All these should work in theory, but (with Python 2.7.2 on Windows at least) any time you send a custom User-agent header, urllib2 doesn’t send that header. If you don’t try to send a User-agent header, it sends the default Python / urllib2

None of these methods seem to work for adding User-agent but they work for other headers:

opener = urllib2.build_opener(proxy)
opener.addheaders = {'User-agent':'Custom user agent'}
urllib2.install_opener(opener)

request = urllib2.Request(url, headers={'User-agent':'Custom user agent'})

request.headers['User-agent'] = 'Custom user agent'

request.add_header('User-agent', 'Custom user agent')

回答 5

对于urllib您可以使用:

from urllib import FancyURLopener

class MyOpener(FancyURLopener, object):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'

myopener = MyOpener()
myopener.retrieve('https://www.google.com/search?q=test', 'useragent.html')

For urllib you can use:

from urllib import FancyURLopener

class MyOpener(FancyURLopener, object):
    version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; it; rv:1.8.1.11) Gecko/20071127 Firefox/2.0.0.11'

myopener = MyOpener()
myopener.retrieve('https://www.google.com/search?q=test', 'useragent.html')

回答 6

urllib2和Python 2.7中的另一个解决方案:

req = urllib2.Request('http://www.example.com/')
req.add_unredirected_header('User-Agent', 'Custom User-Agent')
urllib2.urlopen(req)

Another solution in urllib2 and Python 2.7:

req = urllib2.Request('http://www.example.com/')
req.add_unredirected_header('User-Agent', 'Custom User-Agent')
urllib2.urlopen(req)

回答 7

试试这个 :

html_source_code = requests.get("http://www.example.com/",
                   headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36',
                            'Upgrade-Insecure-Requests': '1',
                            'x-runtime': '148ms'}, 
                   allow_redirects=True).content

Try this :

html_source_code = requests.get("http://www.example.com/",
                   headers={'User-Agent':'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/44.0.2403.107 Safari/537.36',
                            'Upgrade-Insecure-Requests': '1',
                            'x-runtime': '148ms'}, 
                   allow_redirects=True).content

回答 8

有两个属性,urllib.URLopener()分别是:
addheaders = [('User-Agent', 'Python-urllib/1.17'), ('Accept', '*/*')]
version = 'Python-urllib/1.17'
要欺骗网站,您需要将这两个值都更改为可接受的用户代理。例如:
Chrome浏览器:'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36'
Google bot:'Googlebot/2.1'
像这样

import urllib
page_extractor=urllib.URLopener()  
page_extractor.addheaders = [('User-Agent', 'Googlebot/2.1'), ('Accept', '*/*')]  
page_extractor.version = 'Googlebot/2.1'
page_extractor.retrieve(<url>, <file_path>)

仅更改一项财产无效,因为该网站将其标记为可疑请求。

there are two properties of urllib.URLopener() namely:
addheaders = [('User-Agent', 'Python-urllib/1.17'), ('Accept', '*/*')] and
version = 'Python-urllib/1.17'.
To fool the website you need to changes both of these values to an accepted User-Agent. for e.g.
Chrome browser : 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/33.0.1750.149 Safari/537.36'
Google bot : 'Googlebot/2.1'
like this

import urllib
page_extractor=urllib.URLopener()  
page_extractor.addheaders = [('User-Agent', 'Googlebot/2.1'), ('Accept', '*/*')]  
page_extractor.version = 'Googlebot/2.1'
page_extractor.retrieve(<url>, <file_path>)

changing just one property does not work because the website marks it as a suspicious request.


通过urllib和python下载图片

问题:通过urllib和python下载图片

因此,我试图制作一个Python脚本来下载网络漫画,并将其放入桌面上的文件夹中。我在这里发现了一些类似的程序,它们执行相似的操作,但是并没有完全满足我的需要。我发现最相似的代码就在这里(http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images)。我尝试使用此代码:

>>> import urllib
>>> image = urllib.URLopener()
>>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg")
('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>)

然后,我在计算机上搜索了文件“ 00000001.jpg”,但我发现的只是它的缓存图片。我什至不确定它是否已将文件保存到我的计算机中。一旦我了解了如何下载文件,我想我就会处理其余的事情。本质上,只需要使用for循环并在’00000000’。’jpg’处拆分字符串,然后将’00000000’递增至最大数,我必须以某种方式确定。关于最佳方法或正确下载文件的任何建议?

谢谢!

编辑6/15/10

这是完成的脚本,它将文件保存到您选择的任何目录中。由于某种奇怪的原因,文件没有下载,而只是下载了。任何有关如何清理它的建议将不胜感激。我目前正在研究如何查找网站上存在的漫画,因此我可以获取最新的漫画,而不是在引发一定数量的异常后退出程序。

import urllib
import os

comicCounter=len(os.listdir('/file'))+1  # reads the number of files in the folder to start downloading at the next comic
errorCount=0

def download_comic(url,comicName):
    """
    download a comic in the form of

    url = http://www.example.com
    comicName = '00000000.jpg'
    """
    image=urllib.URLopener()
    image.retrieve(url,comicName)  # download comicName at URL

while comicCounter <= 1000:  # not the most elegant solution
    os.chdir('/file')  # set where files download to
        try:
        if comicCounter < 10:  # needed to break into 10^n segments because comic names are a set of zeros followed by a number
            comicNumber=str('0000000'+str(comicCounter))  # string containing the eight digit comic number
            comicName=str(comicNumber+".jpg")  # string containing the file name
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)  # creates the URL for the comic
            comicCounter+=1  # increments the comic counter to go to the next comic, must be before the download in case the download raises an exception
            download_comic(url,comicName)  # uses the function defined above to download the comic
            print url
        if 10 <= comicCounter < 100:
            comicNumber=str('000000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        if 100 <= comicCounter < 1000:
            comicNumber=str('00000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        else:  # quit the program if any number outside this range shows up
            quit
    except IOError:  # urllib raises an IOError for a 404 error, when the comic doesn't exist
        errorCount+=1  # add one to the error count
        if errorCount>3:  # if more than three errors occur during downloading, quit the program
            break
        else:
            print str("comic"+ ' ' + str(comicCounter) + ' ' + "does not exist")  # otherwise say that the certain comic number doesn't exist
print "all comics are up to date"  # prints if all comics are downloaded

So I’m trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I’ve found a few similar programs on here that do something similar, but nothing quite like what I need. The one that I found most similar is right here (http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images). I tried using this code:

>>> import urllib
>>> image = urllib.URLopener()
>>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg")
('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>)

I then searched my computer for a file “00000001.jpg”, but all I found was the cached picture of it. I’m not even sure it saved the file to my computer. Once I understand how to get the file downloaded, I think I know how to handle the rest. Essentially just use a for loop and split the string at the ‘00000000’.’jpg’ and increment the ‘00000000’ up to the largest number, which I would have to somehow determine. Any reccomendations on the best way to do this or how to download the file correctly?

Thanks!

EDIT 6/15/10

Here is the completed script, it saves the files to any directory you choose. For some odd reason, the files weren’t downloading and they just did. Any suggestions on how to clean it up would be much appreciated. I’m currently working out how to find out many comics exist on the site so I can get just the latest one, rather than having the program quit after a certain number of exceptions are raised.

import urllib
import os

comicCounter=len(os.listdir('/file'))+1  # reads the number of files in the folder to start downloading at the next comic
errorCount=0

def download_comic(url,comicName):
    """
    download a comic in the form of

    url = http://www.example.com
    comicName = '00000000.jpg'
    """
    image=urllib.URLopener()
    image.retrieve(url,comicName)  # download comicName at URL

while comicCounter <= 1000:  # not the most elegant solution
    os.chdir('/file')  # set where files download to
        try:
        if comicCounter < 10:  # needed to break into 10^n segments because comic names are a set of zeros followed by a number
            comicNumber=str('0000000'+str(comicCounter))  # string containing the eight digit comic number
            comicName=str(comicNumber+".jpg")  # string containing the file name
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)  # creates the URL for the comic
            comicCounter+=1  # increments the comic counter to go to the next comic, must be before the download in case the download raises an exception
            download_comic(url,comicName)  # uses the function defined above to download the comic
            print url
        if 10 <= comicCounter < 100:
            comicNumber=str('000000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        if 100 <= comicCounter < 1000:
            comicNumber=str('00000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        else:  # quit the program if any number outside this range shows up
            quit
    except IOError:  # urllib raises an IOError for a 404 error, when the comic doesn't exist
        errorCount+=1  # add one to the error count
        if errorCount>3:  # if more than three errors occur during downloading, quit the program
            break
        else:
            print str("comic"+ ' ' + str(comicCounter) + ' ' + "does not exist")  # otherwise say that the certain comic number doesn't exist
print "all comics are up to date"  # prints if all comics are downloaded

回答 0

Python 2

使用urllib.urlretrieve

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 3

使用urllib.request.urlretrieve(Python 3旧界面的一部分,工作原理完全相同)

import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 2

Using urllib.urlretrieve

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 3

Using urllib.request.urlretrieve (part of Python 3’s legacy interface, works exactly the same)

import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

回答 1

import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()

回答 2

仅作记录,使用请求库。

import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()

虽然它应该检查request.get()错误。

Just for the record, using requests library.

import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()

Though it should check for requests.get() error.


回答 3

对于Python 3,您需要导入import urllib.request

import urllib.request 

urllib.request.urlretrieve(url, filename)

有关更多信息,请查看链接

For Python 3 you will need to import import urllib.request:

import urllib.request 

urllib.request.urlretrieve(url, filename)

for more info check out the link


回答 4

@DiGMi的答案的Python 3版本:

from urllib import request
f = open('00000001.jpg', 'wb')
f.write(request.urlopen("http://www.gunnerkrigg.com/comics/00000001.jpg").read())
f.close()

Python 3 version of @DiGMi’s answer:

from urllib import request
f = open('00000001.jpg', 'wb')
f.write(request.urlopen("http://www.gunnerkrigg.com/comics/00000001.jpg").read())
f.close()

回答 5

我找到了这个答案,并以更可靠的方式对其进行了编辑

def download_photo(self, img_url, filename):
    try:
        image_on_web = urllib.urlopen(img_url)
        if image_on_web.headers.maintype == 'image':
            buf = image_on_web.read()
            path = os.getcwd() + DOWNLOADED_IMAGE_PATH
            file_path = "%s%s" % (path, filename)
            downloaded_image = file(file_path, "wb")
            downloaded_image.write(buf)
            downloaded_image.close()
            image_on_web.close()
        else:
            return False    
    except:
        return False
    return True

由此,您在下载时永远不会获得任何其他资源或异常。

I have found this answer and I edit that in more reliable way

def download_photo(self, img_url, filename):
    try:
        image_on_web = urllib.urlopen(img_url)
        if image_on_web.headers.maintype == 'image':
            buf = image_on_web.read()
            path = os.getcwd() + DOWNLOADED_IMAGE_PATH
            file_path = "%s%s" % (path, filename)
            downloaded_image = file(file_path, "wb")
            downloaded_image.write(buf)
            downloaded_image.close()
            image_on_web.close()
        else:
            return False    
    except:
        return False
    return True

From this you never get any other resources or exceptions while downloading.


回答 6

如果您知道这些文件位于dir网站的同一目录中site并且具有以下格式:filename_01.jpg,…,filename_10.jpg,则下载所有文件:

import requests

for x in range(1, 10):
    str1 = 'filename_%2.2d.jpg' % (x)
    str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)

    f = open(str1, 'wb')
    f.write(requests.get(str2).content)
    f.close()

If you know that the files are located in the same directory dir of the website site and have the following format: filename_01.jpg, …, filename_10.jpg then download all of them:

import requests

for x in range(1, 10):
    str1 = 'filename_%2.2d.jpg' % (x)
    str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)

    f = open(str1, 'wb')
    f.write(requests.get(str2).content)
    f.close()

回答 7

最简单的方法是只.read()读取部分或整个响应,然后将其写入到您在已知位置打开的文件中。

It’s easiest to just use .read() to read the partial or entire response, then write it into a file you’ve opened in a known good location.


回答 8

也许您需要“用户代理”:

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36')]
response = opener.open('http://google.com')
htmlData = response.read()
f = open('file.txt','w')
f.write(htmlData )
f.close()

Maybe you need ‘User-Agent’:

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36')]
response = opener.open('http://google.com')
htmlData = response.read()
f = open('file.txt','w')
f.write(htmlData )
f.close()

回答 9

除了建议您retrieve()仔细阅读文档之外(http://docs.python.org/library/urllib.html#urllib.URLopener.retrieve),我建议实际调用read()响应的内容,然后将其保存到选择一个文件,而不是将其保留在检索创建的临时文件中。

Aside from suggesting you read the docs for retrieve() carefully (http://docs.python.org/library/urllib.html#urllib.URLopener.retrieve), I would suggest actually calling read() on the content of the response, and then saving it into a file of your choosing rather than leaving it in the temporary file that retrieve creates.


回答 10

以上所有代码均不允许保留原始图像名称,有时这是必需的。这将有助于将图像保存到本地驱动器,并保留原始图像名称

    IMAGE = URL.rsplit('/',1)[1]
    urllib.urlretrieve(URL, IMAGE)

试试这个以获得更多细节。

All the above codes, do not allow to preserve the original image name, which sometimes is required. This will help in saving the images to your local drive, preserving the original image name

    IMAGE = URL.rsplit('/',1)[1]
    urllib.urlretrieve(URL, IMAGE)

Try this for more details.


回答 11

这对我使用python 3。

它从csv文件获取URL列表,然后开始将它们下载到文件夹中。如果内容或图像不存在,它将采用该异常并继续使其神奇。

import urllib.request
import csv
import os

errorCount=0

file_list = "/Users/$USER/Desktop/YOUR-FILE-TO-DOWNLOAD-IMAGES/image_{0}.jpg"

# CSV file must separate by commas
# urls.csv is set to your current working directory make sure your cd into or add the corresponding path
with open ('urls.csv') as images:
    images = csv.reader(images)
    img_count = 1
    print("Please Wait.. it will take some time")
    for image in images:
        try:
            urllib.request.urlretrieve(image[0],
            file_list.format(img_count))
            img_count += 1
        except IOError:
            errorCount+=1
            # Stop in case you reach 100 errors downloading images
            if errorCount>100:
                break
            else:
                print ("File does not exist")

print ("Done!")

This worked for me using python 3.

It gets a list of URLs from the csv file and starts downloading them into a folder. In case the content or image does not exist it takes that exception and continues making its magic.

import urllib.request
import csv
import os

errorCount=0

file_list = "/Users/$USER/Desktop/YOUR-FILE-TO-DOWNLOAD-IMAGES/image_{0}.jpg"

# CSV file must separate by commas
# urls.csv is set to your current working directory make sure your cd into or add the corresponding path
with open ('urls.csv') as images:
    images = csv.reader(images)
    img_count = 1
    print("Please Wait.. it will take some time")
    for image in images:
        try:
            urllib.request.urlretrieve(image[0],
            file_list.format(img_count))
            img_count += 1
        except IOError:
            errorCount+=1
            # Stop in case you reach 100 errors downloading images
            if errorCount>100:
                break
            else:
                print ("File does not exist")

print ("Done!")

回答 12

一个更简单的解决方案可能是(python 3):

import urllib.request
import os
os.chdir("D:\\comic") #your path
i=1;
s="00000000"
while i<1000:
    try:
        urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/"+ s[:8-len(str(i))]+ str(i)+".jpg",str(i)+".jpg")
    except:
        print("not possible" + str(i))
    i+=1;

A simpler solution may be(python 3):

import urllib.request
import os
os.chdir("D:\\comic") #your path
i=1;
s="00000000"
while i<1000:
    try:
        urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/"+ s[:8-len(str(i))]+ str(i)+".jpg",str(i)+".jpg")
    except:
        print("not possible" + str(i))
    i+=1;

回答 13

那这个呢:

import urllib, os

def from_url( url, filename = None ):
    '''Store the url content to filename'''
    if not filename:
        filename = os.path.basename( os.path.realpath(url) )

    req = urllib.request.Request( url )
    try:
        response = urllib.request.urlopen( req )
    except urllib.error.URLError as e:
        if hasattr( e, 'reason' ):
            print( 'Fail in reaching the server -> ', e.reason )
            return False
        elif hasattr( e, 'code' ):
            print( 'The server couldn\'t fulfill the request -> ', e.code )
            return False
    else:
        with open( filename, 'wb' ) as fo:
            fo.write( response.read() )
            print( 'Url saved as %s' % filename )
        return True

##

def main():
    test_url = 'http://cdn.sstatic.net/stackoverflow/img/favicon.ico'

    from_url( test_url )

if __name__ == '__main__':
    main()

What about this:

import urllib, os

def from_url( url, filename = None ):
    '''Store the url content to filename'''
    if not filename:
        filename = os.path.basename( os.path.realpath(url) )

    req = urllib.request.Request( url )
    try:
        response = urllib.request.urlopen( req )
    except urllib.error.URLError as e:
        if hasattr( e, 'reason' ):
            print( 'Fail in reaching the server -> ', e.reason )
            return False
        elif hasattr( e, 'code' ):
            print( 'The server couldn\'t fulfill the request -> ', e.code )
            return False
    else:
        with open( filename, 'wb' ) as fo:
            fo.write( response.read() )
            print( 'Url saved as %s' % filename )
        return True

##

def main():
    test_url = 'http://cdn.sstatic.net/stackoverflow/img/favicon.ico'

    from_url( test_url )

if __name__ == '__main__':
    main()

回答 14

如果您需要代理支持,则可以执行以下操作:

  if needProxy == False:
    returnCode, urlReturnResponse = urllib.urlretrieve( myUrl, fullJpegPathAndName )
  else:
    proxy_support = urllib2.ProxyHandler({"https":myHttpProxyAddress})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    urlReader = urllib2.urlopen( myUrl ).read() 
    with open( fullJpegPathAndName, "w" ) as f:
      f.write( urlReader )

If you need proxy support you can do this:

  if needProxy == False:
    returnCode, urlReturnResponse = urllib.urlretrieve( myUrl, fullJpegPathAndName )
  else:
    proxy_support = urllib2.ProxyHandler({"https":myHttpProxyAddress})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    urlReader = urllib2.urlopen( myUrl ).read() 
    with open( fullJpegPathAndName, "w" ) as f:
      f.write( urlReader )

回答 15

另一种方法是通过fastai库。这对我来说就像是一种魅力。我正面临着一个SSL: CERTIFICATE_VERIFY_FAILED Error使用,urlretrieve所以我尝试了。

url = 'https://www.linkdoesntexist.com/lennon.jpg'
fastai.core.download_url(url,'image1.jpg', show_progress=False)

Another way to do this is via the fastai library. This worked like a charm for me. I was facing a SSL: CERTIFICATE_VERIFY_FAILED Error using urlretrieve so I tried that.

url = 'https://www.linkdoesntexist.com/lennon.jpg'
fastai.core.download_url(url,'image1.jpg', show_progress=False)

回答 16

使用请求

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)

if __name__ == '__main__':
    ImageDl(url)

Using requests

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)

if __name__ == '__main__':
    ImageDl(url)

回答 17

使用urllib,您可以立即完成此操作。

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)

urllib.request.urlretrieve(URL, "images/0.jpg")

Using urllib, you can get this done instantly.

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)

urllib.request.urlretrieve(URL, "images/0.jpg")

如何使用请求下载图像

问题:如何使用请求下载图像

我正在尝试使用python的requests模块从网络下载并保存图像。

这是我使用的(工作)代码:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

这是使用requests以下代码的新代码(无效):

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

您能帮助我从响应中使用什么属性requests吗?

I’m trying to download and save an image from the web using python’s requests module.

Here is the (working) code I used:

img = urllib2.urlopen(settings.STATICMAP_URL.format(**data))
with open(path, 'w') as f:
    f.write(img.read())

Here is the new (non-working) code using requests:

r = requests.get(settings.STATICMAP_URL.format(**data))
if r.status_code == 200:
    img = r.raw.read()
    with open(path, 'w') as f:
        f.write(img)

Can you help me on what attribute from the response to use from requests?


回答 0

您可以使用response.rawfile对象,也可以遍历响应。

response.raw默认情况下,使用类似文件的对象不会解码压缩的响应(使用GZIP或deflate)。您可以通过将decode_content属性设置为Truerequests将其设置False为控制自身解码)来强制为您解压缩。然后,您可以使用shutil.copyfileobj()Python将数据流式传输到文件对象:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)        

要遍历响应,请使用循环;这样迭代可确保在此阶段对数据进行解压缩:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

这将以128字节的块读取数据;如果您觉得其他块大小更好,请使用具有自定义块大小的Response.iter_content()方法

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

请注意,您需要以二进制模式打开目标文件,以确保python不会尝试为您翻译换行符。我们还设置stream=Truerequests不先将整个图像下载到内存中。

You can either use the response.raw file object, or iterate over the response.

To use the response.raw file-like object will not, by default, decode compressed responses (with GZIP or deflate). You can force it to decompress for you anyway by setting the decode_content attribute to True (requests sets it to False to control decoding itself). You can then use shutil.copyfileobj() to have Python stream the data to a file object:

import requests
import shutil

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        r.raw.decode_content = True
        shutil.copyfileobj(r.raw, f)        

To iterate over the response use a loop; iterating like this ensures that data is decompressed by this stage:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r:
            f.write(chunk)

This’ll read the data in 128 byte chunks; if you feel another chunk size works better, use the Response.iter_content() method with a custom chunk size:

r = requests.get(settings.STATICMAP_URL.format(**data), stream=True)
if r.status_code == 200:
    with open(path, 'wb') as f:
        for chunk in r.iter_content(1024):
            f.write(chunk)

Note that you need to open the destination file in binary mode to ensure python doesn’t try and translate newlines for you. We also set stream=True so that requests doesn’t download the whole image into memory first.


回答 1

从请求中获取类似文件的对象,然后将其复制到文件中。这也将避免将整个事件立即读入内存。

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

Get a file-like object from the request and copy it to a file. This will also avoid reading the whole thing into memory at once.

import shutil

import requests

url = 'http://example.com/img.png'
response = requests.get(url, stream=True)
with open('img.png', 'wb') as out_file:
    shutil.copyfileobj(response.raw, out_file)
del response

回答 2

怎么样,一个快速的解决方案。

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

How about this, a quick solution.

import requests

url = "http://craphound.com/images/1006884_2adf8fc7.jpg"
response = requests.get(url)
if response.status_code == 200:
    with open("/Users/apple/Desktop/sample.jpg", 'wb') as f:
        f.write(response.content)

回答 3

我同样需要使用请求下载图像。我首先尝试了Martijn Pieters的答案,并且效果很好。但是,当我对该简单函数进行概要分析时,发现与urllib和urllib2相比,它使用了许多函数调用。

然后,我尝试了请求模块的作者推荐方式

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

这大大减少了函数调用的次数,从而加快了我的应用程序的速度。这是我的探查器的代码和结果。

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)

    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

testRequest的结果:

343080 function calls (343068 primitive calls) in 2.580 seconds

以及testRequest2的结果:

3129 function calls (3105 primitive calls) in 0.024 seconds

I have the same need for downloading images using requests. I first tried the answer of Martijn Pieters, and it works well. But when I did a profile on this simple function, I found that it uses so many function calls compared to urllib and urllib2.

I then tried the way recommended by the author of requests module:

import requests
from PIL import Image
# python2.x, use this instead  
# from StringIO import StringIO
# for python3.x,
from io import StringIO

r = requests.get('https://example.com/image.jpg')
i = Image.open(StringIO(r.content))

This much more reduced the number of function calls, thus speeded up my application. Here is the code of my profiler and the result.

#!/usr/bin/python
import requests
from StringIO import StringIO
from PIL import Image
import profile

def testRequest():
    image_name = 'test1.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url, stream=True)
    with open(image_name, 'wb') as f:
        for chunk in r.iter_content():
            f.write(chunk)

def testRequest2():
    image_name = 'test2.jpg'
    url = 'http://example.com/image.jpg'

    r = requests.get(url)

    i = Image.open(StringIO(r.content))
    i.save(image_name)

if __name__ == '__main__':
    profile.run('testUrllib()')
    profile.run('testUrllib2()')
    profile.run('testRequest()')

The result for testRequest:

343080 function calls (343068 primitive calls) in 2.580 seconds

And the result for testRequest2:

3129 function calls (3105 primitive calls) in 0.024 seconds

回答 4

这可能比使用容易requests。这是我唯一一次建议不要使用requestsHTTP的东西。

二班轮使用urllib

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

还有一个名为Python的漂亮模块wget,非常易于使用。在这里找到。

这证明了设计的简单性:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

请享用。

编辑:您还可以添加out参数以指定路径。

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

This might be easier than using requests. This is the only time I’ll ever suggest not using requests to do HTTP stuff.

Two liner using urllib:

>>> import urllib
>>> urllib.request.urlretrieve("http://www.example.com/songs/mp3.mp3", "mp3.mp3")

There is also a nice Python module named wget that is pretty easy to use. Found here.

This demonstrates the simplicity of the design:

>>> import wget
>>> url = 'http://www.futurecrew.com/skaven/song_files/mp3/razorback.mp3'
>>> filename = wget.download(url)
100% [................................................] 3841532 / 3841532>
>> filename
'razorback.mp3'

Enjoy.

Edit: You can also add an out parameter to specify a path.

>>> out_filepath = <output_filepath>    
>>> filename = wget.download(url, out=out_filepath)

回答 5

以下代码段下载文件。

该文件以其文件名保存在指定的url中。

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

Following code snippet downloads a file.

The file is saved with its filename as in specified url.

import requests

url = "http://example.com/image.jpg"
filename = url.split("/")[-1]
r = requests.get(url, timeout=0.5)

if r.status_code == 200:
    with open(filename, 'wb') as f:
        f.write(r.content)

回答 6

有两种主要方法:

  1. 使用.content(最简单/官方的)(请参见Zhenyi Zhang的答案):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
        with Image.open(f) as img:
            img.show()
  2. 使用.raw(请参阅Martijn Pieters的答案):

    import requests
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
        img.show()
    r.close()  # Safety when stream=True ensure the connection is released.

两者的时间都没有明显差异。

There are 2 main ways:

  1. Using .content (simplest/official) (see Zhenyi Zhang’s answer):

    import io  # Note: io.BytesIO is StringIO.StringIO on Python2.
    import requests
    
    r = requests.get('http://lorempixel.com/400/200')
    r.raise_for_status()
    with io.BytesIO(r.content) as f:
        with Image.open(f) as img:
            img.show()
    
  2. Using .raw (see Martijn Pieters’s answer):

    import requests
    
    r = requests.get('http://lorempixel.com/400/200', stream=True)
    r.raise_for_status()
    r.raw.decode_content = True  # Required to decompress gzip/deflate compressed responses.
    with PIL.Image.open(r.raw) as img:
        img.show()
    r.close()  # Safety when stream=True ensure the connection is released.
    

Timing both shows no noticeable difference.


回答 7

就像导入图像和请求一样容易

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

As easy as to import Image and requests

from PIL import Image
import requests

img = Image.open(requests.get(url, stream = True).raw)
img.save('img1.jpg')

回答 8

这是一个更加用户友好的答案,仍然使用流式传输。

只需定义这些函数并调用即可getImage()。默认情况下,它将使用与url相同的文件名并写入当前目录,但是两者都可以更改。

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

request胆量getImage()是根据这里的答案而的胆量getImageFast()是根据以上答案。

Here is a more user-friendly answer that still uses streaming.

Just define these functions and call getImage(). It will use the same file name as the url and write to the current directory by default, but both can be changed.

import requests
from StringIO import StringIO
from PIL import Image

def createFilename(url, name, folder):
    dotSplit = url.split('.')
    if name == None:
        # use the same as the url
        slashSplit = dotSplit[-2].split('/')
        name = slashSplit[-1]
    ext = dotSplit[-1]
    file = '{}{}.{}'.format(folder, name, ext)
    return file

def getImage(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    with open(file, 'wb') as f:
        r = requests.get(url, stream=True)
        for block in r.iter_content(1024):
            if not block:
                break
            f.write(block)

def getImageFast(url, name=None, folder='./'):
    file = createFilename(url, name, folder)
    r = requests.get(url)
    i = Image.open(StringIO(r.content))
    i.save(file)

if __name__ == '__main__':
    # Uses Less Memory
    getImage('http://www.example.com/image.jpg')
    # Faster
    getImageFast('http://www.example.com/image.jpg')

The request guts of getImage() are based on the answer here and the guts of getImageFast() are based on the answer above.


回答 9

我将发布答案,因为我没有足够的代表发表评论,但是使用Blairg23发布的wget,您还可以为路径提供out参数。

 wget.download(url, out=path)

I’m going to post an answer as I don’t have enough rep to make a comment, but with wget as posted by Blairg23, you can also provide an out parameter for the path.

 wget.download(url, out=path)

回答 10

这是谷歌搜索有关如何下载带有请求的二进制文件的第一个响应。如果您需要下载包含请求的任意文件,可以使用:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

This is the first response that comes up for google searches on how to download a binary file with requests. In case you need to download an arbitrary file with requests, you can use:

import requests
url = 'https://s3.amazonaws.com/lab-data-collections/GoogleNews-vectors-negative300.bin.gz'
open('GoogleNews-vectors-negative300.bin.gz', 'wb').write(requests.get(url, allow_redirects=True).content)

回答 11

这就是我做的

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

This is how I did it

import requests
from PIL import Image
from io import BytesIO

url = 'your_url'
files = {'file': ("C:/Users/shadow/Downloads/black.jpeg", open('C:/Users/shadow/Downloads/black.jpeg', 'rb'),'image/jpg')}
response = requests.post(url, files=files)

img = Image.open(BytesIO(response.content))
img.show()

回答 12

您可以执行以下操作:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

You can do something like this:

import requests
import random

url = "https://images.pexels.com/photos/1308881/pexels-photo-1308881.jpeg? auto=compress&cs=tinysrgb&dpr=1&w=500"
name=random.randrange(1,1000)
filename=str(name)+".jpg"
response = requests.get(url)
if response.status_code.ok:
   with open(filename,'w') as f:
    f.write(response.content)

导入错误:没有模块名称urllib2

问题:导入错误:没有模块名称urllib2

这是我的代码:

import urllib2.request

response = urllib2.urlopen("http://www.google.com")
html = response.read()
print(html)

有什么帮助吗?

Here’s my code:

import urllib2.request

response = urllib2.urlopen("http://www.google.com")
html = response.read()
print(html)

Any help?


回答 0

urllib2文档中所述:

urllib2模块已在Python 3中分为几个名为urllib.request和的模块urllib.error2to3在将源转换为Python 3时,该工具将自动调整导入。

所以你应该说

from urllib.request import urlopen
html = urlopen("http://www.google.com/").read()
print(html)

您当前正在编辑的代码示例不正确,因为您是在说urllib.urlopen("http://www.google.com/")而不是urlopen("http://www.google.com/")

As stated in the urllib2 documentation:

The urllib2 module has been split across several modules in Python 3 named urllib.request and urllib.error. The 2to3 tool will automatically adapt imports when converting your sources to Python 3.

So you should instead be saying

from urllib.request import urlopen
html = urlopen("http://www.google.com/").read()
print(html)

Your current, now-edited code sample is incorrect because you are saying urllib.urlopen("http://www.google.com/") instead of just urlopen("http://www.google.com/").


回答 1

对于使用Python 2(测试版2.7.3和2.6.8)和Python 3(3.2.3和3.3.2+)的脚本,请尝试:

#! /usr/bin/env python

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

html = urlopen("http://www.google.com/")
print(html.read())

For a script working with Python 2 (tested versions 2.7.3 and 2.6.8) and Python 3 (3.2.3 and 3.3.2+) try:

#! /usr/bin/env python

try:
    # For Python 3.0 and later
    from urllib.request import urlopen
except ImportError:
    # Fall back to Python 2's urllib2
    from urllib2 import urlopen

html = urlopen("http://www.google.com/")
print(html.read())

回答 2

上面的内容在3.3中对我不起作用。改试试这个(YMMV等)

import urllib.request
url = "http://www.google.com/"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
print (response.read().decode('utf-8'))

The above didn’t work for me in 3.3. Try this instead (YMMV, etc)

import urllib.request
url = "http://www.google.com/"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
print (response.read().decode('utf-8'))

回答 3

一些制表符补全显示了Python 2 vs Python 3中软件包的内容。

在Python 2中:

In [1]: import urllib

In [2]: urllib.
urllib.ContentTooShortError      urllib.ftpwrapper                urllib.socket                    urllib.test1
urllib.FancyURLopener            urllib.getproxies                urllib.splitattr                 urllib.thishost
urllib.MAXFTPCACHE               urllib.getproxies_environment    urllib.splithost                 urllib.time
urllib.URLopener                 urllib.i                         urllib.splitnport                urllib.toBytes
urllib.addbase                   urllib.localhost                 urllib.splitpasswd               urllib.unquote
urllib.addclosehook              urllib.noheaders                 urllib.splitport                 urllib.unquote_plus
urllib.addinfo                   urllib.os                        urllib.splitquery                urllib.unwrap
urllib.addinfourl                urllib.pathname2url              urllib.splittag                  urllib.url2pathname
urllib.always_safe               urllib.proxy_bypass              urllib.splittype                 urllib.urlcleanup
urllib.base64                    urllib.proxy_bypass_environment  urllib.splituser                 urllib.urlencode
urllib.basejoin                  urllib.quote                     urllib.splitvalue                urllib.urlopen
urllib.c                         urllib.quote_plus                urllib.ssl                       urllib.urlretrieve
urllib.ftpcache                  urllib.re                        urllib.string                    
urllib.ftperrors                 urllib.reporthook                urllib.sys  

在Python 3中:

In [2]: import urllib.
urllib.error        urllib.parse        urllib.request      urllib.response     urllib.robotparser

In [2]: import urllib.error.
urllib.error.ContentTooShortError  urllib.error.HTTPError             urllib.error.URLError

In [2]: import urllib.parse.
urllib.parse.parse_qs          urllib.parse.quote_plus        urllib.parse.urldefrag         urllib.parse.urlsplit
urllib.parse.parse_qsl         urllib.parse.unquote           urllib.parse.urlencode         urllib.parse.urlunparse
urllib.parse.quote             urllib.parse.unquote_plus      urllib.parse.urljoin           urllib.parse.urlunsplit
urllib.parse.quote_from_bytes  urllib.parse.unquote_to_bytes  urllib.parse.urlparse

In [2]: import urllib.request.
urllib.request.AbstractBasicAuthHandler         urllib.request.HTTPSHandler
urllib.request.AbstractDigestAuthHandler        urllib.request.OpenerDirector
urllib.request.BaseHandler                      urllib.request.ProxyBasicAuthHandler
urllib.request.CacheFTPHandler                  urllib.request.ProxyDigestAuthHandler
urllib.request.DataHandler                      urllib.request.ProxyHandler
urllib.request.FTPHandler                       urllib.request.Request
urllib.request.FancyURLopener                   urllib.request.URLopener
urllib.request.FileHandler                      urllib.request.UnknownHandler
urllib.request.HTTPBasicAuthHandler             urllib.request.build_opener
urllib.request.HTTPCookieProcessor              urllib.request.getproxies
urllib.request.HTTPDefaultErrorHandler          urllib.request.install_opener
urllib.request.HTTPDigestAuthHandler            urllib.request.pathname2url
urllib.request.HTTPErrorProcessor               urllib.request.url2pathname
urllib.request.HTTPHandler                      urllib.request.urlcleanup
urllib.request.HTTPPasswordMgr                  urllib.request.urlopen
urllib.request.HTTPPasswordMgrWithDefaultRealm  urllib.request.urlretrieve
urllib.request.HTTPRedirectHandler     


In [2]: import urllib.response.
urllib.response.addbase       urllib.response.addclosehook  urllib.response.addinfo       urllib.response.addinfourl

Some tab completions to show the contents of the packages in Python 2 vs Python 3.

In Python 2:

In [1]: import urllib

In [2]: urllib.
urllib.ContentTooShortError      urllib.ftpwrapper                urllib.socket                    urllib.test1
urllib.FancyURLopener            urllib.getproxies                urllib.splitattr                 urllib.thishost
urllib.MAXFTPCACHE               urllib.getproxies_environment    urllib.splithost                 urllib.time
urllib.URLopener                 urllib.i                         urllib.splitnport                urllib.toBytes
urllib.addbase                   urllib.localhost                 urllib.splitpasswd               urllib.unquote
urllib.addclosehook              urllib.noheaders                 urllib.splitport                 urllib.unquote_plus
urllib.addinfo                   urllib.os                        urllib.splitquery                urllib.unwrap
urllib.addinfourl                urllib.pathname2url              urllib.splittag                  urllib.url2pathname
urllib.always_safe               urllib.proxy_bypass              urllib.splittype                 urllib.urlcleanup
urllib.base64                    urllib.proxy_bypass_environment  urllib.splituser                 urllib.urlencode
urllib.basejoin                  urllib.quote                     urllib.splitvalue                urllib.urlopen
urllib.c                         urllib.quote_plus                urllib.ssl                       urllib.urlretrieve
urllib.ftpcache                  urllib.re                        urllib.string                    
urllib.ftperrors                 urllib.reporthook                urllib.sys  

In Python 3:

In [2]: import urllib.
urllib.error        urllib.parse        urllib.request      urllib.response     urllib.robotparser

In [2]: import urllib.error.
urllib.error.ContentTooShortError  urllib.error.HTTPError             urllib.error.URLError

In [2]: import urllib.parse.
urllib.parse.parse_qs          urllib.parse.quote_plus        urllib.parse.urldefrag         urllib.parse.urlsplit
urllib.parse.parse_qsl         urllib.parse.unquote           urllib.parse.urlencode         urllib.parse.urlunparse
urllib.parse.quote             urllib.parse.unquote_plus      urllib.parse.urljoin           urllib.parse.urlunsplit
urllib.parse.quote_from_bytes  urllib.parse.unquote_to_bytes  urllib.parse.urlparse

In [2]: import urllib.request.
urllib.request.AbstractBasicAuthHandler         urllib.request.HTTPSHandler
urllib.request.AbstractDigestAuthHandler        urllib.request.OpenerDirector
urllib.request.BaseHandler                      urllib.request.ProxyBasicAuthHandler
urllib.request.CacheFTPHandler                  urllib.request.ProxyDigestAuthHandler
urllib.request.DataHandler                      urllib.request.ProxyHandler
urllib.request.FTPHandler                       urllib.request.Request
urllib.request.FancyURLopener                   urllib.request.URLopener
urllib.request.FileHandler                      urllib.request.UnknownHandler
urllib.request.HTTPBasicAuthHandler             urllib.request.build_opener
urllib.request.HTTPCookieProcessor              urllib.request.getproxies
urllib.request.HTTPDefaultErrorHandler          urllib.request.install_opener
urllib.request.HTTPDigestAuthHandler            urllib.request.pathname2url
urllib.request.HTTPErrorProcessor               urllib.request.url2pathname
urllib.request.HTTPHandler                      urllib.request.urlcleanup
urllib.request.HTTPPasswordMgr                  urllib.request.urlopen
urllib.request.HTTPPasswordMgrWithDefaultRealm  urllib.request.urlretrieve
urllib.request.HTTPRedirectHandler     


In [2]: import urllib.response.
urllib.response.addbase       urllib.response.addclosehook  urllib.response.addinfo       urllib.response.addinfourl

回答 4

Python 3:

import urllib.request

wp = urllib.request.urlopen("http://google.com")
pw = wp.read()
print(pw)

Python 2:

import urllib
import sys

wp = urllib.urlopen("http://google.com")
for line in wp:
    sys.stdout.write(line)

虽然我已经测试了各自版本中的两个代码。

Python 3:

import urllib.request

wp = urllib.request.urlopen("http://google.com")
pw = wp.read()
print(pw)

Python 2:

import urllib
import sys

wp = urllib.urlopen("http://google.com")
for line in wp:
    sys.stdout.write(line)

While I have tested both the Codes in respective versions.


回答 5

所有解决方案中最简单的:

在Python 3.x中:

import urllib.request
url = "https://api.github.com/users?since=100"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
data_content = response.read()
print(data_content)

Simplest of all solutions:

In Python 3.x:

import urllib.request
url = "https://api.github.com/users?since=100"
request = urllib.request.Request(url)
response = urllib.request.urlopen(request)
data_content = response.read()
print(data_content)

回答 6

在python 3中,要获取文本输出:

import io
import urllib.request

response = urllib.request.urlopen("http://google.com")
text = io.TextIOWrapper(response)

In python 3, to get text output:

import io
import urllib.request

response = urllib.request.urlopen("http://google.com")
text = io.TextIOWrapper(response)

回答 7

那在python3中对我有用:

import urllib.request
htmlfile = urllib.request.urlopen("http://google.com")
htmltext = htmlfile.read()
print(htmltext)

That worked for me in python3:

import urllib.request
htmlfile = urllib.request.urlopen("http://google.com")
htmltext = htmlfile.read()
print(htmltext)

urllib,urllib2,urllib3和请求模块之间有什么区别?

问题:urllib,urllib2,urllib3和请求模块之间有什么区别?

在Python,有什么之间的差异urlliburllib2urllib3requests模块?为什么有三个?他们似乎在做同样的事情…

In Python, what are the differences between the urllib, urllib2, urllib3 and requests modules? Why are there three? They seem to do the same thing…


回答 0

我知道已经有人说过了,但我强烈建议您使用requestsPython软件包。

如果您使用的是python以外的语言,则可能是在考虑urllib并且urllib2易于使用,代码不多且功能强大,这就是我以前的想法。但是该requests程序包是如此有用且太短,以至于每个人都应该使用它。

首先,它支持完全宁静的API,并且非常简单:

import requests

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

无论是GET / POST,您都无需再次对参数进行编码,只需将字典作为参数即可。

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

加上它甚至还具有内置的JSON解码器(再次,我知道json.loads()编写的内容并不多,但这肯定很方便):

resp.json()

或者,如果您的响应数据只是文本,请使用:

resp.text

这只是冰山一角。这是请求站点中的功能列表:

  • 国际域名和URL
  • 保持活动和连接池
  • Cookie持久性会话
  • 浏览器式SSL验证
  • 基本/摘要身份验证
  • 优雅的键/值Cookie
  • 自动减压
  • Unicode响应机构
  • 分段文件上传
  • 连接超时
  • .netrc支持
  • 项目清单
  • python 2.6—3.4
  • 线程安全的。

I know it’s been said already, but I’d highly recommend the requests Python package.

If you’ve used languages other than python, you’re probably thinking urllib and urllib2 are easy to use, not much code, and highly capable, that’s how I used to think. But the requests package is so unbelievably useful and short that everyone should be using it.

First, it supports a fully restful API, and is as easy as:

import requests

resp = requests.get('http://www.mywebsite.com/user')
resp = requests.post('http://www.mywebsite.com/user')
resp = requests.put('http://www.mywebsite.com/user/put')
resp = requests.delete('http://www.mywebsite.com/user/delete')

Regardless of whether GET / POST, you never have to encode parameters again, it simply takes a dictionary as an argument and is good to go:

userdata = {"firstname": "John", "lastname": "Doe", "password": "jdoe123"}
resp = requests.post('http://www.mywebsite.com/user', data=userdata)

Plus it even has a built in JSON decoder (again, I know json.loads() isn’t a lot more to write, but this sure is convenient):

resp.json()

Or if your response data is just text, use:

resp.text

This is just the tip of the iceberg. This is the list of features from the requests site:

  • International Domains and URLs
  • Keep-Alive & Connection Pooling
  • Sessions with Cookie Persistence
  • Browser-style SSL Verification
  • Basic/Digest Authentication
  • Elegant Key/Value Cookies
  • Automatic Decompression
  • Unicode Response Bodies
  • Multipart File Uploads
  • Connection Timeouts
  • .netrc support
  • List item
  • Python 2.6—3.4
  • Thread-safe.

回答 1

urllib2提供了一些额外的功能,即该urlopen()函数可以允许您指定标头(通常您以前必须使用httplib,这要冗长得多。)不过,更重要的是,urllib2提供了Request该类,该类可以提供更多功能。声明式处理请求:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

请注意,urlencode()仅在urllib中,而不在urllib2中。

还有一些处理程序,用于在urllib2中实现更高级的URL支持。简短的答案是,除非使用旧代码,否则可能要使用urllib2中的URL打开程序,但是对于某些实用程序功能,仍然需要导入urllib。

奖励答案 使用Google App Engine,您可以使用httplib,urllib或urllib2中的任何一个,但它们都只是Google URL Fetch API的包装。也就是说,您仍然受到端口,协议和允许的响应时间之类的相同限制。不过,您可以像期望的那样使用库的核心来获取HTTP URL。

urllib2 provides some extra functionality, namely the urlopen() function can allow you to specify headers (normally you’d have had to use httplib in the past, which is far more verbose.) More importantly though, urllib2 provides the Request class, which allows for a more declarative approach to doing a request:

r = Request(url='http://www.mysite.com')
r.add_header('User-Agent', 'awesome fetcher')
r.add_data(urllib.urlencode({'foo': 'bar'})
response = urlopen(r)

Note that urlencode() is only in urllib, not urllib2.

There are also handlers for implementing more advanced URL support in urllib2. The short answer is, unless you’re working with legacy code, you probably want to use the URL opener from urllib2, but you still need to import into urllib for some of the utility functions.

Bonus answer With Google App Engine, you can use any of httplib, urllib or urllib2, but all of them are just wrappers for Google’s URL Fetch API. That is, you are still subject to the same limitations such as ports, protocols, and the length of the response allowed. You can use the core of the libraries as you would expect for retrieving HTTP URLs, though.


回答 2

urlliburllib2都是Python模块,它们执行URL请求相关的内容,但提供不同的功能。

1)urllib2可以接受Request对象来设置URL请求的标头,而urllib仅接受URL。

2)urllib提供了urlencode方法,该方法用于生成GET查询字符串,而urllib2没有此功能。这是urllib与urllib2经常一起使用的原因之一。

Requests -Requests是一个使用Python编写的简单易用的HTTP库。

1)Python请求自动对参数进行编码,因此您只需将它们作为简单的参数传递,就与urllib不同,在urllib中,需要在传递参数之前使用urllib.encode()方法对参数进行编码。

2)它自动将响应解码为Unicode。

3)Requests还具有更方便的错误处理方式。如果您的身份验证失败,则urllib2将引发urllib2.URLError,而Requests将返回正常的响应对象。您需要通过boolean response.ok查看所有请求是否成功

urllib and urllib2 are both Python modules that do URL request related stuff but offer different functionalities.

1) urllib2 can accept a Request object to set the headers for a URL request, urllib accepts only a URL.

2) urllib provides the urlencode method which is used for the generation of GET query strings, urllib2 doesn’t have such a function. This is one of the reasons why urllib is often used along with urllib2.

Requests – Requests’ is a simple, easy-to-use HTTP library written in Python.

1) Python Requests encodes the parameters automatically so you just pass them as simple arguments, unlike in the case of urllib, where you need to use the method urllib.encode() to encode the parameters before passing them.

2) It automatically decoded the response into Unicode.

3) Requests also has far more convenient error handling.If your authentication failed, urllib2 would raise a urllib2.URLError, while Requests would return a normal response object, as expected. All you have to see if the request was successful by boolean response.ok


回答 3

将Python2移植到Python3是一个相当大的区别。urllib2对于python3不存在,其方法已移植到urllib。因此,您正在大量使用它,并希望将来迁移到Python3,请考虑使用urllib。但是2to3工具将自动为您完成大部分工作。

One considerable difference is about porting Python2 to Python3. urllib2 does not exist for python3 and its methods ported to urllib. So you are using that heavily and want to migrate to Python3 in future, consider using urllib. However 2to3 tool will automatically do most of the work for you.


回答 4

仅添加到现有答案中,我看不到有人提到python请求不是本机库。如果可以添加依赖项,那么请求就可以了。但是,如果您试图避免添加依赖项,则urllib是一个本机python库,已经可供您使用。

Just to add to the existing answers, I don’t see anyone mentioning that python requests is not a native library. If you are ok with adding dependencies, then requests is fine. However, if you are trying to avoid adding dependencies, urllib is a native python library that is already available to you.


回答 5

我喜欢此urllib.urlencode功能,并且似乎不存在urllib2

>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'

I like the urllib.urlencode function, and it doesn’t appear to exist in urllib2.

>>> urllib.urlencode({'abc':'d f', 'def': '-!2'})
'abc=d+f&def=-%212'

回答 6

要获取网址的内容:

try: # Try importing requests first.
    import requests
except ImportError: 
    try: # Try importing Python3 urllib
        import urllib.request
    except AttributeError: # Now importing Python2 urllib
        import urllib


def get_content(url):
    try:  # Using requests.
        return requests.get(url).content # Returns requests.models.Response.
    except NameError:  
        try: # Using Python3 urllib.
            with urllib.request.urlopen(index_url) as response:
                return response.read() # Returns http.client.HTTPResponse.
        except AttributeError: # Using Python3 urllib.
            return urllib.urlopen(url).read() # Returns an instance.

很难request为响应编写Python2和Python3以及依赖项代码,因为它们的urlopen()功能和requests.get()函数返回不同的类型:

  • Python2 urllib.request.urlopen()返回一个http.client.HTTPResponse
  • Python3 urllib.urlopen(url)返回一个instance
  • 请求request.get(url)返回一个requests.models.Response

To get the content of a url:

try: # Try importing requests first.
    import requests
except ImportError: 
    try: # Try importing Python3 urllib
        import urllib.request
    except AttributeError: # Now importing Python2 urllib
        import urllib


def get_content(url):
    try:  # Using requests.
        return requests.get(url).content # Returns requests.models.Response.
    except NameError:  
        try: # Using Python3 urllib.
            with urllib.request.urlopen(index_url) as response:
                return response.read() # Returns http.client.HTTPResponse.
        except AttributeError: # Using Python3 urllib.
            return urllib.urlopen(url).read() # Returns an instance.

It’s hard to write Python2 and Python3 and request dependencies code for the responses because they urlopen() functions and requests.get() function return different types:

  • Python2 urllib.request.urlopen() returns a http.client.HTTPResponse
  • Python3 urllib.urlopen(url) returns an instance
  • Request request.get(url) returns a requests.models.Response

回答 7

通常应该使用urllib2,因为通过接受Request对象有时会使事情变得容易一些,并且还会在协议错误时引发URLException。但是,借助Google App Engine,您将无法使用任何一种。您必须使用Google在其沙盒Python环境中提供的URL Fetch API

You should generally use urllib2, since this makes things a bit easier at times by accepting Request objects and will also raise a URLException on protocol errors. With Google App Engine though, you can’t use either. You have to use the URL Fetch API that Google provides in its sandboxed Python environment.


回答 8

我发现上述答案中缺少的一个关键点是urllib返回类型为object的对象,<class http.client.HTTPResponse>requests返回return <class 'requests.models.Response'>

因此,read()方法可以与一起使用,urllib但不能与一起使用requests

PS:requests已经有很多方法,几乎​​不需要read();>

A key point that I find missing in the above answers is that urllib returns an object of type <class http.client.HTTPResponse> whereas requests returns <class 'requests.models.Response'>.

Due to this, read() method can be used with urllib but not with requests.

P.S. : requests is already rich with so many methods that it hardly needs one more as read() ;>