标签归档:urllib

在Python中,如何使用urllib查看网站是404还是200?

问题:在Python中,如何使用urllib查看网站是404还是200?

如何通过urllib获取标头的代码?

How to get the code of the headers through urllib?


回答 0

getcode()方法(在python2.6中添加)返回与响应一起发送的HTTP状态代码;如果URL不是HTTP URL,则返回None。

>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200

The getcode() method (Added in python2.6) returns the HTTP status code that was sent with the response, or None if the URL is no HTTP URL.

>>> a=urllib.urlopen('http://www.google.com/asdfsf')
>>> a.getcode()
404
>>> a=urllib.urlopen('http://www.google.com/')
>>> a.getcode()
200

回答 1

您也可以使用urllib2

import urllib2

req = urllib2.Request('http://www.python.org/fish.html')
try:
    resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
    if e.code == 404:
        # do something...
    else:
        # ...
except urllib2.URLError as e:
    # Not an HTTP-specific error (e.g. connection refused)
    # ...
else:
    # 200
    body = resp.read()

请注意,它HTTPError是一个子类,URLError用于存储HTTP状态代码。

You can use urllib2 as well:

import urllib2

req = urllib2.Request('http://www.python.org/fish.html')
try:
    resp = urllib2.urlopen(req)
except urllib2.HTTPError as e:
    if e.code == 404:
        # do something...
    else:
        # ...
except urllib2.URLError as e:
    # Not an HTTP-specific error (e.g. connection refused)
    # ...
else:
    # 200
    body = resp.read()

Note that HTTPError is a subclass of URLError which stores the HTTP status code.


回答 2

对于Python 3:

import urllib.request, urllib.error

url = 'http://www.google.com/asdfsf'
try:
    conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
    # Return code error (e.g. 404, 501, ...)
    # ...
    print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
    # Not an HTTP-specific error (e.g. connection refused)
    # ...
    print('URLError: {}'.format(e.reason))
else:
    # 200
    # ...
    print('good')

For Python 3:

import urllib.request, urllib.error

url = 'http://www.google.com/asdfsf'
try:
    conn = urllib.request.urlopen(url)
except urllib.error.HTTPError as e:
    # Return code error (e.g. 404, 501, ...)
    # ...
    print('HTTPError: {}'.format(e.code))
except urllib.error.URLError as e:
    # Not an HTTP-specific error (e.g. connection refused)
    # ...
    print('URLError: {}'.format(e.reason))
else:
    # 200
    # ...
    print('good')

回答 3

import urllib2

try:
    fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
    data = fileHandle.read()
    fileHandle.close()
except urllib2.URLError, e:
    print 'you got an error with the code', e
import urllib2

try:
    fileHandle = urllib2.urlopen('http://www.python.org/fish.html')
    data = fileHandle.read()
    fileHandle.close()
except urllib2.URLError, e:
    print 'you got an error with the code', e

如何在Python中将字典转换为查询字符串?

问题:如何在Python中将字典转换为查询字符串?

使用后cgi.parse_qs(),如何将结果(字典)转换回查询字符串?寻找类似的东西 urllib.urlencode()

After using cgi.parse_qs(), how to convert the result (dictionary) back to query string? Looking for something similar to urllib.urlencode().


回答 0

Python 3

urllib.parse.urlencode(查询,doseq = False,[…])

映射对象或可能包含str或bytes对象的两个元素的元组序列转换百分比编码的ASCII文本字符串。

Python的3 urllib.parse文档

A dict是一个映射。

旧版Python

urllib.urlencodequery[,doseq])
映射对象或由两个元素组成的元组序列转换 “百分比编码”字符串…一系列key=value'&'字符分隔的对…

Python 2.7版urllib文档

Python 3

urllib.parse.urlencode(query, doseq=False, [...])

Convert a mapping object or a sequence of two-element tuples, which may contain str or bytes objects, to a percent-encoded ASCII text string.

Python 3 urllib.parse docs

A dict is a mapping.

Legacy Python

urllib.urlencode(query[, doseq])
Convert a mapping object or a sequence of two-element tuples to a “percent-encoded” string… a series of key=value pairs separated by '&' characters…

Python 2.7 urllib docs


回答 1

在python3中,略有不同:

from urllib.parse import urlencode
urlencode({'pram1': 'foo', 'param2': 'bar'})

输出: 'pram1=foo&param2=bar'

为了实现python2和python3的兼容性,请尝试以下操作:

try:
    #python2
    from urllib import urlencode
except ImportError:
    #python3
    from urllib.parse import urlencode

In python3, slightly different:

from urllib.parse import urlencode
urlencode({'pram1': 'foo', 'param2': 'bar'})

output: 'pram1=foo&param2=bar'

for python2 and python3 compatibility, try this:

try:
    #python2
    from urllib import urlencode
except ImportError:
    #python3
    from urllib.parse import urlencode

回答 2

您正在寻找完全一样的东西urllib.urlencode()

但是,当您调用parse_qs()(与区别parse_qsl())时,字典键是唯一的查询变量名称,而值是每个名称的值列表

为了将此信息传递到中urllib.urlencode(),您必须“展平”这些列表。这是使用元组列表理解的方法:

query_pairs = [(k,v) for k,vlist in d.iteritems() for v in vlist]
urllib.urlencode(query_pairs)

You’re looking for something exactly like urllib.urlencode()!

However, when you call parse_qs() (distinct from parse_qsl()), the dictionary keys are the unique query variable names and the values are lists of values for each name.

In order to pass this information into urllib.urlencode(), you must “flatten” these lists. Here is how you can do it with a list comprehenshion of tuples:

query_pairs = [(k,v) for k,vlist in d.iteritems() for v in vlist]
urllib.urlencode(query_pairs)

回答 3

也许您正在寻找这样的东西:

def dictToQuery(d):
  query = ''
  for key in d.keys():
    query += str(key) + '=' + str(d[key]) + "&"
  return query

它需要一个字典并将其转换为查询字符串,就像urlencode一样。它将在查询字符串后附加一个最终的“&”,但return query[:-1]如果有问题,则将其修复。

Maybe you’re looking for something like this:

def dictToQuery(d):
  query = ''
  for key in d.keys():
    query += str(key) + '=' + str(d[key]) + "&"
  return query

It takes a dictionary and convert it to a query string, just like urlencode. It’ll append a final “&” to the query string, but return query[:-1] fixes that, if it’s an issue.


Python:导入urllib.quote

问题:Python:导入urllib.quote

我想用urllib.quote()。但是python(python3)找不到模块。假设我有以下代码行:

print(urllib.quote("châteu", safe=''))

如何导入urllib.quote?

import urllibimport urllib.quote两者都给

AttributeError: 'module' object has no attribute 'quote'

令我困惑的urllib.request是可以通过以下方式访问import urllib.request

I would like to use urllib.quote(). But python (python3) is not finding the module. Suppose, I have this line of code:

print(urllib.quote("châteu", safe=''))

How do I import urllib.quote?

import urllib or import urllib.quote both give

AttributeError: 'module' object has no attribute 'quote'

What confuses me is that urllib.request is accessible via import urllib.request


回答 0

在Python 3.x中,您需要导入urllib.parse.quote

>>> import urllib.parse
>>> urllib.parse.quote("châteu", safe='')
'ch%C3%A2teu'

根据Python 2.x urllib模块文档

注意

urllib模块已经被分成部分和更名在Python 3 urllib.requesturllib.parse,和urllib.error

In Python 3.x, you need to import urllib.parse.quote:

>>> import urllib.parse
>>> urllib.parse.quote("châteu", safe='')
'ch%C3%A2teu'

According to Python 2.x urllib module documentation:

NOTE

The urllib module has been split into parts and renamed in Python 3 to urllib.request, urllib.parse, and urllib.error.


回答 1

如果您需要同时处理Python 2.x和3.x,则可以捕获异常并加载替代项。

try:
    from urllib import quote  # Python 2.X
except ImportError:
    from urllib.parse import quote  # Python 3+

您还可以使用python兼容性包装器6来处理此问题。

from six.moves.urllib.parse import quote

If you need to handle both Python 2.x and 3.x you can catch the exception and load the alternative.

try:
    from urllib import quote  # Python 2.X
except ImportError:
    from urllib.parse import quote  # Python 3+

You could also use the python compatibility wrapper six to handle this.

from six.moves.urllib.parse import quote

回答 2

urllib在Python3中进行了一些更改,现在可以从parse子模块中导入

>>> from urllib.parse import quote  
>>> quote('"')                      
'%22'                               

urllib went through some changes in Python3 and can now be imported from the parse submodule

>>> from urllib.parse import quote  
>>> quote('"')                      
'%22'                               

回答 3

这就是我在不使用异常的情况下的处理方式。

import sys
if sys.version_info.major > 2:  # Python 3 or later
    from urllib.parse import quote
else:  # Python 2
    from urllib import quote

This is how I handle this, without using exceptions.

import sys
if sys.version_info.major > 2:  # Python 3 or later
    from urllib.parse import quote
else:  # Python 2
    from urllib import quote

“模块”没有属性“ urlencode”

问题:“模块”没有属性“ urlencode”

当我尝试遵循与URL编码相关的Python Wiki的示例时

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()

在第二行引发错误:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlencode'

我想念什么?

When I try to follow the Python Wiki’s example related to URL encoding:

>>> import urllib
>>> params = urllib.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
>>> f = urllib.urlopen("http://www.musi-cal.com/cgi-bin/query", params)
>>> print f.read()

An error is raised on the second line:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'urlencode'

What am I missing?


回答 0

urllib已在拆分Python 3

urllib.urlencode()功能是现在urllib.parse.urlencode()

urllib.urlopen()现在的功能urllib.request.urlopen()

urllib has been split up in Python 3.

The urllib.urlencode() function is now urllib.parse.urlencode(),

the urllib.urlopen() function is now urllib.request.urlopen().


回答 1

import urllib.parse
urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})
import urllib.parse
urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0})

回答 2

您使用Python 2文档,但使用Python 3编写程序。

You use the Python 2 docs but write your program in Python 3.


我们可以将xpath与BeautifulSoup一起使用吗?

问题:我们可以将xpath与BeautifulSoup一起使用吗?

我正在使用BeautifulSoup抓取网址,并且我有以下代码

import urllib
import urllib2
from BeautifulSoup import BeautifulSoup

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
soup = BeautifulSoup(the_page)
soup.findAll('td',attrs={'class':'empformbody'})

现在在上面的代码中,我们可以findAll用来获取标签和与其相关的信息,但是我想使用xpath。是否可以将xpath与BeautifulSoup一起使用?如果可能的话,任何人都可以给我提供示例代码,以便提供更多帮助吗?

I am using BeautifulSoup to scrape a url and I had the following code

import urllib
import urllib2
from BeautifulSoup import BeautifulSoup

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
req = urllib2.Request(url)
response = urllib2.urlopen(req)
the_page = response.read()
soup = BeautifulSoup(the_page)
soup.findAll('td',attrs={'class':'empformbody'})

Now in the above code we can use findAll to get tags and information related to them, but I want to use xpath. Is it possible to use xpath with BeautifulSoup? If possible, can anyone please provide me an example code so that it will be more helpful?


回答 0

不,BeautifulSoup本身不支持XPath表达式。

另一种库,LXML支持的XPath 1.0。它具有BeautifulSoup兼容模式,它将尝试以Soup的方式解析损坏的HTML。但是,默认的lxml HTML解析器可以很好地完成解析损坏的HTML的工作,而且我相信它的速度更快。

将文档解析为lxml树后,就可以使用该.xpath()方法搜索元素。

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
from lxml import etree

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
tree.xpath(xpathselector)

还有一个带有附加功能的专用lxml.html()模块

请注意,在上面的示例中,我将response对象直接传递给lxml,因为直接从流中读取解析器比将响应首先读取到大字符串中更为有效。要对requests库执行相同的操作,您需要在启用透明传输解压缩后设置stream=True并传递response.raw对象:

import lxml.html
import requests

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = requests.get(url, stream=True)
response.raw.decode_content = True
tree = lxml.html.parse(response.raw)

您可能会感兴趣的是CSS选择器支持;在CSSSelector类转换CSS语句转换为XPath表达式,使您的搜索td.empformbody更加容易:

from lxml.cssselect import CSSSelector

td_empformbody = CSSSelector('td.empformbody')
for elem in td_empformbody(tree):
    # Do something with these table cells.

即将来临:BeautifulSoup本身确实具有非常完整的CSS选择器支持

for cell in soup.select('table#foobar td.empformbody'):
    # Do something with these table cells.

Nope, BeautifulSoup, by itself, does not support XPath expressions.

An alternative library, lxml, does support XPath 1.0. It has a BeautifulSoup compatible mode where it’ll try and parse broken HTML the way Soup does. However, the default lxml HTML parser does just as good a job of parsing broken HTML, and I believe is faster.

Once you’ve parsed your document into an lxml tree, you can use the .xpath() method to search for elements.

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    from urllib.request import urlopen
from lxml import etree

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = urlopen(url)
htmlparser = etree.HTMLParser()
tree = etree.parse(response, htmlparser)
tree.xpath(xpathselector)

There is also a dedicated lxml.html() module with additional functionality.

Note that in the above example I passed the response object directly to lxml, as having the parser read directly from the stream is more efficient than reading the response into a large string first. To do the same with the requests library, you want to set stream=True and pass in the response.raw object after enabling transparent transport decompression:

import lxml.html
import requests

url =  "http://www.example.com/servlet/av/ResultTemplate=AVResult.html"
response = requests.get(url, stream=True)
response.raw.decode_content = True
tree = lxml.html.parse(response.raw)

Of possible interest to you is the CSS Selector support; the CSSSelector class translates CSS statements into XPath expressions, making your search for td.empformbody that much easier:

from lxml.cssselect import CSSSelector

td_empformbody = CSSSelector('td.empformbody')
for elem in td_empformbody(tree):
    # Do something with these table cells.

Coming full circle: BeautifulSoup itself does have very complete CSS selector support:

for cell in soup.select('table#foobar td.empformbody'):
    # Do something with these table cells.

回答 1

我可以确认Beautiful Soup中没有XPath支持。

I can confirm that there is no XPath support within Beautiful Soup.


回答 2

正如其他人所说,BeautifulSoup没有xpath支持。可能有很多方法可以从xpath中获取某些东西,包括使用Selenium。但是,以下是可在Python 2或3中使用的解决方案:

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

print('Buyers: ', buyers)
print('Prices: ', prices)

以此为参考。

As others have said, BeautifulSoup doesn’t have xpath support. There are probably a number of ways to get something from an xpath, including using Selenium. However, here’s a solution that works in either Python 2 or 3:

from lxml import html
import requests

page = requests.get('http://econpy.pythonanywhere.com/ex/001.html')
tree = html.fromstring(page.content)
#This will create a list of buyers:
buyers = tree.xpath('//div[@title="buyer-name"]/text()')
#This will create a list of prices
prices = tree.xpath('//span[@class="item-price"]/text()')

print('Buyers: ', buyers)
print('Prices: ', prices)

I used this as a reference.


回答 3

BeautifulSoup 从当前指向子元素的元素中有一个名为findNext的函数,因此:

father.findNext('div',{'class':'class_value'}).findNext('div',{'id':'id_value'}).findAll('a') 

上面的代码可以模仿以下xpath:

div[class=class_value]/div[id=id_value]

BeautifulSoup has a function named findNext from current element directed childern,so:

father.findNext('div',{'class':'class_value'}).findNext('div',{'id':'id_value'}).findAll('a') 

Above code can imitate the following xpath:

div[class=class_value]/div[id=id_value]

回答 4

我搜索了他们的文档,似乎没有xpath选项。此外,你可以看到在这里对SO类似的问题时,OP是要求从XPath来BeautifulSoup一个翻译,所以我的结论是-没有,没有的XPath解析可用。

I’ve searched through their docs and it seems there is not xpath option. Also, as you can see here on a similar question on SO, the OP is asking for a translation from xpath to BeautifulSoup, so my conclusion would be – no, there is no xpath parsing available.


回答 5

当您使用lxml时,一切都很简单:

tree = lxml.html.fromstring(html)
i_need_element = tree.xpath('//a[@class="shared-components"]/@href')

但是使用BeautifulSoup BS4时也很简单:

  • 首先删除“ //”和“ @”
  • 第二个-在“ =“之前添加星号

试试这个魔术:

soup = BeautifulSoup(html, "lxml")
i_need_element = soup.select ('a[class*="shared-components"]')

如您所见,这不支持子标签,因此我删除了“ / @ href”部分

when you use lxml all simple:

tree = lxml.html.fromstring(html)
i_need_element = tree.xpath('//a[@class="shared-components"]/@href')

but when use BeautifulSoup BS4 all simple too:

  • first remove “//” and “@”
  • second – add star before “=”

try this magic:

soup = BeautifulSoup(html, "lxml")
i_need_element = soup.select ('a[class*="shared-components"]')

as you see, this does not support sub-tag, so i remove “/@href” part


回答 6

也许您可以在没有XPath的情况下尝试以下操作

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html = '''
<html>
<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
'''
# What XPath can do, so can it
doc = SimplifiedDoc(html)
# The result is the same as doc.getElementByTag('body').getElementByTag('div').getElementByTag('h1').text
print (doc.body.div.h1.text)
print (doc.div.h1.text)
print (doc.h1.text) # Shorter paths will be faster
print (doc.div.getChildren())
print (doc.div.getChildren('p'))

Maybe you can try the following without XPath

from simplified_scrapy.simplified_doc import SimplifiedDoc 
html = '''
<html>
<body>
<div>
    <h1>Example Domain</h1>
    <p>This domain is for use in illustrative examples in documents. You may use this
    domain in literature without prior coordination or asking for permission.</p>
    <p><a href="https://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
'''
# What XPath can do, so can it
doc = SimplifiedDoc(html)
# The result is the same as doc.getElementByTag('body').getElementByTag('div').getElementByTag('h1').text
print (doc.body.div.h1.text)
print (doc.div.h1.text)
print (doc.h1.text) # Shorter paths will be faster
print (doc.div.getChildren())
print (doc.div.getChildren('p'))

回答 7

from lxml import etree
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('path of your localfile.html'),'html.parser')
dom = etree.HTML(str(soup))
print dom.xpath('//*[@id="BGINP01_S1"]/section/div/font/text()')

上面使用了Soup对象和lxml的组合,并且可以使用xpath提取值

from lxml import etree
from bs4 import BeautifulSoup
soup = BeautifulSoup(open('path of your localfile.html'),'html.parser')
dom = etree.HTML(str(soup))
print dom.xpath('//*[@id="BGINP01_S1"]/section/div/font/text()')

Above used the combination of Soup object with lxml and one can extract the value using xpath


回答 8

这是一个很旧的线程,但是现在有一个解决方法,当时在BeautifulSoup中可能还没有。

这是我所做的一个例子。我使用“请求”模块读取RSS提要,并在名为“ rss_text”的变量中获取其文本内容。这样,我就可以通过BeautifulSoup运行它,搜索xpath / rss / channel / title,并检索其内容。它并不是XPath的全部功能(通配符,多个路径等),但是,如果您只有要定位的基本路径,则可以使用。

from bs4 import BeautifulSoup
rss_obj = BeautifulSoup(rss_text, 'xml')
cls.title = rss_obj.rss.channel.title.get_text()

This is a pretty old thread, but there is a work-around solution now, which may not have been in BeautifulSoup at the time.

Here is an example of what I did. I use the “requests” module to read an RSS feed and get its text content in a variable called “rss_text”. With that, I run it thru BeautifulSoup, search for the xpath /rss/channel/title, and retrieve its contents. It’s not exactly XPath in all its glory (wildcards, multiple paths, etc.), but if you just have a basic path you want to locate, this works.

from bs4 import BeautifulSoup
rss_obj = BeautifulSoup(rss_text, 'xml')
cls.title = rss_obj.rss.channel.title.get_text()

urllib2.HTTPError:HTTP错误403:禁止

问题:urllib2.HTTPError:HTTP错误403:禁止

我正在尝试使用python自动下载历史股票数据。我尝试打开的URL用CSV文件响应,但是我无法使用urllib2打开。我曾尝试按照前面几个问题中的说明更改用户代理,甚至尝试接受响应cookie,但没有运气。你能帮忙吗?

注意:相同的方法适用于yahoo Finance。

码:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"

hdr = {'User-Agent':'Mozilla/5.0'}

req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

错误

http_error_default中的文件“ C:\ Python27 \ lib \ urllib2.py”,第527行,引发HTTPError(req.get_full_url(),代码,msg,hdrs,fp)urllib2.HTTPError:HTTP错误403:禁止

谢谢你的协助

I am trying to automate download of historic stock data using python. The URL I am trying to open responds with a CSV file, but I am unable to open using urllib2. I have tried changing user agent as specified in few questions earlier, I even tried to accept response cookies, with no luck. Can you please help.

Note: The same method works for yahoo Finance.

Code:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"

hdr = {'User-Agent':'Mozilla/5.0'}

req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

Error

File “C:\Python27\lib\urllib2.py”, line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

Thanks for your assistance


回答 0

通过添加更多头,我可以获取数据:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(site, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

实际上,它仅与此一个附加头一起工作:

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

By adding a few more headers I was able to get the data:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(site, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

Actually, it works with just this one additional header:

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

回答 1

这将在Python 3中工作

import urllib.request

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = "http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need

This will work in Python 3

import urllib.request

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = "http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need

回答 2

NSE网站已更改,较旧的脚本是当前网站的半最佳状态。此摘要可以收集安全性的日常细节。详细信息包括代码,证券类型,先前收盘价,开盘价,高价,低价,平均价格,交易数量,营业额,交易数量,可交付数量以及已交付与交易的比率(百分比)。这些方便地以字典形式的列表形式呈现。

带有请求和BeautifulSoup的Python 3.X版本

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME="3MINDIA" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE"




def getquote(symbol, start, end):
    start = start.strftime("%-d-%-m-%Y")
    end = end.strftime("%-d-%-m-%Y")

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Referer': 'https://cssspritegenerator.com',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, 'html.parser')
    payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == '__main__':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

此外,这是相对模块化的,可以使用摘要。

NSE website has changed and the older scripts are semi-optimum to current website. This snippet can gather daily details of security. Details include symbol, security type, previous close, open price, high price, low price, average price, traded quantity, turnover, number of trades, deliverable quantities and ratio of delivered vs traded in percentage. These conveniently presented as list of dictionary form.

Python 3.X version with requests and BeautifulSoup

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME="3MINDIA" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE"




def getquote(symbol, start, end):
    start = start.strftime("%-d-%-m-%Y")
    end = end.strftime("%-d-%-m-%Y")

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Referer': 'https://cssspritegenerator.com',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, 'html.parser')
    payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == '__main__':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

Besides this is relatively modular and ready to use snippet.


AttributeError:’模块’对象没有属性’urlopen’

问题:AttributeError:’模块’对象没有属性’urlopen’

我正在尝试使用Python下载网站的HTML源代码,但收到此错误。

Traceback (most recent call last):  
    File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in <module>
     file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'

我在这里遵循指南:http : //www.boddie.org.uk/python/HTML.html

import urllib

file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()

#I'm guessing this would output the html source code?
print(s)

我正在使用Python 3。

I’m trying to use Python to download the HTML source code of a website but I’m receiving this error.

Traceback (most recent call last):  
    File "C:\Users\Sergio.Tapia\Documents\NetBeansProjects\DICParser\src\WebDownload.py", line 3, in <module>
     file = urllib.urlopen("http://www.python.org")
AttributeError: 'module' object has no attribute 'urlopen'

I’m following the guide here: http://www.boddie.org.uk/python/HTML.html

import urllib

file = urllib.urlopen("http://www.python.org")
s = file.read()
f.close()

#I'm guessing this would output the html source code?
print(s)

I’m using Python 3.


回答 0

这适用于Python2.x。

对于Python 3,请在docs中查看:

import urllib.request

with urllib.request.urlopen("http://www.python.org") as url:
    s = url.read()
    # I'm guessing this would output the html source code ?
    print(s)

This works in Python 2.x.

For Python 3 look in the docs:

import urllib.request

with urllib.request.urlopen("http://www.python.org") as url:
    s = url.read()
    # I'm guessing this would output the html source code ?
    print(s)

回答 1

与Python 2 + 3兼容的解决方案是:

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    # Not Python 3 - today, it is most likely to be Python 2
    # But note that this might need an update when Python 4
    # might be around one day
    from urllib import urlopen


# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
    s = url.read()

print(s)

A Python 2+3 compatible solution is:

import sys

if sys.version_info[0] == 3:
    from urllib.request import urlopen
else:
    # Not Python 3 - today, it is most likely to be Python 2
    # But note that this might need an update when Python 4
    # might be around one day
    from urllib import urlopen


# Your code where you can use urlopen
with urlopen("http://www.python.org") as url:
    s = url.read()

print(s)

回答 2

import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)

在Python v3中,“ urllib.request”本身就是一个模块,因此此处不能使用“ urllib”。

import urllib.request as ur
s = ur.urlopen("http://www.google.com")
sl = s.read()
print(sl)

In Python v3 the “urllib.request” is a module by itself, therefore “urllib” cannot be used here.


回答 3

为了使“ dataX = urllib.urlopen(url).read() ”在python 3中 工作(这对于python 2来说是正确的),您只需更改2个小东西即可。

1: urllib语句本身(在中间添加.request):

dataX = urllib.request.urlopen(url).read()

2:其前面的import语句(从“ import urlib”更改为:

import urllib.request

它应该在python3中工作:)

To get ‘dataX = urllib.urlopen(url).read()‘ working in python3 (this would have been correct for python2) you must just change 2 little things.

1: The urllib statement itself (add the .request in the middle):

dataX = urllib.request.urlopen(url).read()

2: The import statement preceding it (change from ‘import urlib’ to:

import urllib.request

And it should work in python3 :)


回答 4

import urllib.request as ur

filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
    print(line.strip())
import urllib.request as ur

filehandler = ur.urlopen ('http://www.google.com')
for line in filehandler:
    print(line.strip())

回答 5

对于python 3,请尝试如下操作:

import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")

它将视频下载到当前工作目录

我从这里得到帮助

For python 3, try something like this:

import urllib.request
urllib.request.urlretrieve('http://crcv.ucf.edu/THUMOS14/UCF101/UCF101/v_YoYo_g19_c02.avi', "video_name.avi")

It will download the video to the current working directory

I got help from HERE


回答 6

python3的解决方案:

from urllib.request import urlopen

url = 'http://www.python.org'
file = urlopen(url)
html = file.read()
print(html)

Solution for python3:

from urllib.request import urlopen

url = 'http://www.python.org'
file = urlopen(url)
html = file.read()
print(html)

回答 7

更改两行:

import urllib.request #line1

#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2

如果收到错误403:禁止错误,请尝试以下操作:

siteurl = "http://www.python.org"

req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()

希望您的问题得到解决。

Change TWO lines:

import urllib.request #line1

#Replace
urllib.urlopen("http://www.python.org")
#To
urllib.request.urlopen("http://www.python.org") #line2

If You got ERROR 403: Forbidden Error exception try this:

siteurl = "http://www.python.org"

req = urllib.request.Request(siteurl, headers={'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.100 Safari/537.36'})
pageHTML = urllib.request.urlopen(req).read()

I hope your problem resolved.


回答 8

可能的方法之一:

import urllib
...

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    # Python 3
    from urllib.request import urlopen

One of the possible way to do it:

import urllib
...

try:
    # Python 2
    from urllib2 import urlopen
except ImportError:
    # Python 3
    from urllib.request import urlopen

回答 9

使用六个模块使您的代码在python2python3之间兼容

urllib.request.urlopen("<your-url>")```

Use six module to make you code compatible between python2 and python3

urllib.request.urlopen("<your-url>")```

回答 10

您在python2.x中使用的代码,可以这样使用:

from urllib.request import urlopen
urlopen(url)

顺便说一句,建议另一个名为的模块requests使用起来更友好,您可以使用pipinstall来安装,并像这样使用:

import requests
requests.get(url)
requests.post(url)

我以为它很容易使用,我也是初学者….哈哈

your code used in python2.x, you can use like this:

from urllib.request import urlopen
urlopen(url)

by the way, suggest another module called requests is more friendly to use, you can use pip install it, and use like this:

import requests
requests.get(url)
requests.post(url)

I thought it is easily to use, i am beginner too….hahah


回答 11

import urllib
import urllib.request
from bs4 import BeautifulSoup


with urllib.request.urlopen("http://www.newegg.com/") as url:
    s = url.read()
    print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)

for links in all_tag_a:
    #print(links.get('href'))
    print(links)
import urllib
import urllib.request
from bs4 import BeautifulSoup


with urllib.request.urlopen("http://www.newegg.com/") as url:
    s = url.read()
    print(s)
soup = BeautifulSoup(s, "html.parser")
all_tag_a = soup.find_all("a", limit=10)

for links in all_tag_a:
    #print(links.get('href'))
    print(links)

UnicodeEncodeError:“ charmap”编解码器无法编码字符

问题:UnicodeEncodeError:“ charmap”编解码器无法编码字符

我正在尝试抓取一个网站,但这给我一个错误。

我正在使用以下代码:

import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen("https://www.website.com/")
html = get.read()

soup = BeautifulSoup(html)

print(soup)

我收到以下错误:

File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to <undefined>

我该怎么做才能解决此问题?

I’m trying to scrape a website, but it gives me an error.

I’m using the following code:

import urllib.request
from bs4 import BeautifulSoup

get = urllib.request.urlopen("https://www.website.com/")
html = get.read()

soup = BeautifulSoup(html)

print(soup)

And I’m getting the following error:

File "C:\Python34\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 70924-70950: character maps to <undefined>

What can I do to fix this?


回答 0

UnicodeEncodeError将抓取的网页内容保存到文件中时,我得到的是相同的。为了解决这个问题,我替换了以下代码:

with open(fname, "w") as f:
    f.write(html)

有了这个:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

使用io可以向后兼容Python 2。

如果只需要支持Python 3,则可以改用内置open函数:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)

I was getting the same UnicodeEncodeError when saving scraped web content to a file. To fix it I replaced this code:

with open(fname, "w") as f:
    f.write(html)

with this:

import io
with io.open(fname, "w", encoding="utf-8") as f:
    f.write(html)

Using io gives you backward compatibility with Python 2.

If you only need to support Python 3 you can use the builtin open function instead:

with open(fname, "w", encoding="utf-8") as f:
    f.write(html)

回答 1

我通过添加将.encode("utf-8")其修复soup

那意味着print(soup)变成print(soup.encode("utf-8"))

I fixed it by adding .encode("utf-8") to soup.

That means that print(soup) becomes print(soup.encode("utf-8")).


回答 2

在Python 3.7中,并且运行Windows 10可以正常工作(我不确定它是否可以在其他平台和/或其他版本的Python上运行)

替换此行:

with open('filename', 'w') as f:

有了这个:

with open('filename', 'w', encoding='utf-8') as f:

之所以起作用,是因为在使用文件时将编码更改为UTF-8,因此能够将UTF-8中的字符转换为文本,而不是遇到UTF-8字符时返回错误。当前编码不支持。

In Python 3.7, and running Windows 10 this worked (I am not sure whether it will work on other platforms and/or other versions of Python)

Replacing this line:

with open('filename', 'w') as f:

With this:

with open('filename', 'w', encoding='utf-8') as f:

The reason why it is working is because the encoding is changed to UTF-8 when using the file, so characters in UTF-8 are able to be converted to text, instead of returning an error when it encounters a UTF-8 character that is not suppord by the current encoding.


回答 3

在保存get请求的响应时,在窗口10上的Python 3.7上引发了相同的错误。从URL接收到的响应的编码为UTF-8,因此始终建议检查编码,以便可以传递相同的编码以避免此类琐碎的问题因为它确实浪费了很多生产时间

import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
    f.write(resp.text)

当我用open命令添加encoding =“ utf-8”时,它以正确的响应保存了文件

with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
    f.write(resp.text)

While saving the response of get request, same error was thrown on Python 3.7 on window 10. The response received from the URL, encoding was UTF-8 so it is always recommended to check the encoding so same can be passed to avoid such trivial issue as it really kills lots of time in production

import requests
resp = requests.get('https://en.wikipedia.org/wiki/NIFTY_50')
print(resp.encoding)
with open ('NiftyList.txt', 'w') as f:
    f.write(resp.text)

When I added encoding=”utf-8″ with the open command it saved the file with the correct response

with open ('NiftyList.txt', 'w', encoding="utf-8") as f:
    f.write(resp.text)

回答 4

甚至我在尝试打印,读取/写入或打开它时遇到的编码问题都相同。如上文所述,如果您尝试打印.encoding =“ utf-8”,则将有所帮助。

soup.encode(“ utf-8”)

如果您尝试打开抓取的数据并将其写入文件,请使用(……,encoding =“ utf-8”)打开文件

使用open(filename_csv,’w’,newline =”,encoding =“ utf-8”)作为csv_file:

Even I faced the same issue with the encoding that occurs when you try to print it, read/write it or open it. As others mentioned above adding .encoding=”utf-8″ will help if you are trying to print it.

soup.encode(“utf-8”)

If you are trying to open scraped data and maybe write it into a file, then open the file with (……,encoding=”utf-8″)

with open(filename_csv , ‘w’, newline=”,encoding=”utf-8″) as csv_file:


回答 5

对于那些仍然收到此错误,添加encode("utf-8")soup也将解决这个问题。

soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
print(soup)

For those still getting this error, adding encode("utf-8") to soup will also fix this.

soup = BeautifulSoup(html_doc, 'html.parser').encode("utf-8")
print(soup)

通过urllib和python下载图片

问题:通过urllib和python下载图片

因此,我试图制作一个Python脚本来下载网络漫画,并将其放入桌面上的文件夹中。我在这里发现了一些类似的程序,它们执行相似的操作,但是并没有完全满足我的需要。我发现最相似的代码就在这里(http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images)。我尝试使用此代码:

>>> import urllib
>>> image = urllib.URLopener()
>>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg")
('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>)

然后,我在计算机上搜索了文件“ 00000001.jpg”,但我发现的只是它的缓存图片。我什至不确定它是否已将文件保存到我的计算机中。一旦我了解了如何下载文件,我想我就会处理其余的事情。本质上,只需要使用for循环并在’00000000’。’jpg’处拆分字符串,然后将’00000000’递增至最大数,我必须以某种方式确定。关于最佳方法或正确下载文件的任何建议?

谢谢!

编辑6/15/10

这是完成的脚本,它将文件保存到您选择的任何目录中。由于某种奇怪的原因,文件没有下载,而只是下载了。任何有关如何清理它的建议将不胜感激。我目前正在研究如何查找网站上存在的漫画,因此我可以获取最新的漫画,而不是在引发一定数量的异常后退出程序。

import urllib
import os

comicCounter=len(os.listdir('/file'))+1  # reads the number of files in the folder to start downloading at the next comic
errorCount=0

def download_comic(url,comicName):
    """
    download a comic in the form of

    url = http://www.example.com
    comicName = '00000000.jpg'
    """
    image=urllib.URLopener()
    image.retrieve(url,comicName)  # download comicName at URL

while comicCounter <= 1000:  # not the most elegant solution
    os.chdir('/file')  # set where files download to
        try:
        if comicCounter < 10:  # needed to break into 10^n segments because comic names are a set of zeros followed by a number
            comicNumber=str('0000000'+str(comicCounter))  # string containing the eight digit comic number
            comicName=str(comicNumber+".jpg")  # string containing the file name
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)  # creates the URL for the comic
            comicCounter+=1  # increments the comic counter to go to the next comic, must be before the download in case the download raises an exception
            download_comic(url,comicName)  # uses the function defined above to download the comic
            print url
        if 10 <= comicCounter < 100:
            comicNumber=str('000000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        if 100 <= comicCounter < 1000:
            comicNumber=str('00000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        else:  # quit the program if any number outside this range shows up
            quit
    except IOError:  # urllib raises an IOError for a 404 error, when the comic doesn't exist
        errorCount+=1  # add one to the error count
        if errorCount>3:  # if more than three errors occur during downloading, quit the program
            break
        else:
            print str("comic"+ ' ' + str(comicCounter) + ' ' + "does not exist")  # otherwise say that the certain comic number doesn't exist
print "all comics are up to date"  # prints if all comics are downloaded

So I’m trying to make a Python script that downloads webcomics and puts them in a folder on my desktop. I’ve found a few similar programs on here that do something similar, but nothing quite like what I need. The one that I found most similar is right here (http://bytes.com/topic/python/answers/850927-problem-using-urllib-download-images). I tried using this code:

>>> import urllib
>>> image = urllib.URLopener()
>>> image.retrieve("http://www.gunnerkrigg.com//comics/00000001.jpg","00000001.jpg")
('00000001.jpg', <httplib.HTTPMessage instance at 0x1457a80>)

I then searched my computer for a file “00000001.jpg”, but all I found was the cached picture of it. I’m not even sure it saved the file to my computer. Once I understand how to get the file downloaded, I think I know how to handle the rest. Essentially just use a for loop and split the string at the ‘00000000’.’jpg’ and increment the ‘00000000’ up to the largest number, which I would have to somehow determine. Any reccomendations on the best way to do this or how to download the file correctly?

Thanks!

EDIT 6/15/10

Here is the completed script, it saves the files to any directory you choose. For some odd reason, the files weren’t downloading and they just did. Any suggestions on how to clean it up would be much appreciated. I’m currently working out how to find out many comics exist on the site so I can get just the latest one, rather than having the program quit after a certain number of exceptions are raised.

import urllib
import os

comicCounter=len(os.listdir('/file'))+1  # reads the number of files in the folder to start downloading at the next comic
errorCount=0

def download_comic(url,comicName):
    """
    download a comic in the form of

    url = http://www.example.com
    comicName = '00000000.jpg'
    """
    image=urllib.URLopener()
    image.retrieve(url,comicName)  # download comicName at URL

while comicCounter <= 1000:  # not the most elegant solution
    os.chdir('/file')  # set where files download to
        try:
        if comicCounter < 10:  # needed to break into 10^n segments because comic names are a set of zeros followed by a number
            comicNumber=str('0000000'+str(comicCounter))  # string containing the eight digit comic number
            comicName=str(comicNumber+".jpg")  # string containing the file name
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)  # creates the URL for the comic
            comicCounter+=1  # increments the comic counter to go to the next comic, must be before the download in case the download raises an exception
            download_comic(url,comicName)  # uses the function defined above to download the comic
            print url
        if 10 <= comicCounter < 100:
            comicNumber=str('000000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        if 100 <= comicCounter < 1000:
            comicNumber=str('00000'+str(comicCounter))
            comicName=str(comicNumber+".jpg")
            url=str("http://www.gunnerkrigg.com//comics/"+comicName)
            comicCounter+=1
            download_comic(url,comicName)
            print url
        else:  # quit the program if any number outside this range shows up
            quit
    except IOError:  # urllib raises an IOError for a 404 error, when the comic doesn't exist
        errorCount+=1  # add one to the error count
        if errorCount>3:  # if more than three errors occur during downloading, quit the program
            break
        else:
            print str("comic"+ ' ' + str(comicCounter) + ' ' + "does not exist")  # otherwise say that the certain comic number doesn't exist
print "all comics are up to date"  # prints if all comics are downloaded

回答 0

Python 2

使用urllib.urlretrieve

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 3

使用urllib.request.urlretrieve(Python 3旧界面的一部分,工作原理完全相同)

import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 2

Using urllib.urlretrieve

import urllib
urllib.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

Python 3

Using urllib.request.urlretrieve (part of Python 3’s legacy interface, works exactly the same)

import urllib.request
urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/00000001.jpg", "00000001.jpg")

回答 1

import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()
import urllib
f = open('00000001.jpg','wb')
f.write(urllib.urlopen('http://www.gunnerkrigg.com//comics/00000001.jpg').read())
f.close()

回答 2

仅作记录,使用请求库。

import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()

虽然它应该检查request.get()错误。

Just for the record, using requests library.

import requests
f = open('00000001.jpg','wb')
f.write(requests.get('http://www.gunnerkrigg.com//comics/00000001.jpg').content)
f.close()

Though it should check for requests.get() error.


回答 3

对于Python 3,您需要导入import urllib.request

import urllib.request 

urllib.request.urlretrieve(url, filename)

有关更多信息,请查看链接

For Python 3 you will need to import import urllib.request:

import urllib.request 

urllib.request.urlretrieve(url, filename)

for more info check out the link


回答 4

@DiGMi的答案的Python 3版本:

from urllib import request
f = open('00000001.jpg', 'wb')
f.write(request.urlopen("http://www.gunnerkrigg.com/comics/00000001.jpg").read())
f.close()

Python 3 version of @DiGMi’s answer:

from urllib import request
f = open('00000001.jpg', 'wb')
f.write(request.urlopen("http://www.gunnerkrigg.com/comics/00000001.jpg").read())
f.close()

回答 5

我找到了这个答案,并以更可靠的方式对其进行了编辑

def download_photo(self, img_url, filename):
    try:
        image_on_web = urllib.urlopen(img_url)
        if image_on_web.headers.maintype == 'image':
            buf = image_on_web.read()
            path = os.getcwd() + DOWNLOADED_IMAGE_PATH
            file_path = "%s%s" % (path, filename)
            downloaded_image = file(file_path, "wb")
            downloaded_image.write(buf)
            downloaded_image.close()
            image_on_web.close()
        else:
            return False    
    except:
        return False
    return True

由此,您在下载时永远不会获得任何其他资源或异常。

I have found this answer and I edit that in more reliable way

def download_photo(self, img_url, filename):
    try:
        image_on_web = urllib.urlopen(img_url)
        if image_on_web.headers.maintype == 'image':
            buf = image_on_web.read()
            path = os.getcwd() + DOWNLOADED_IMAGE_PATH
            file_path = "%s%s" % (path, filename)
            downloaded_image = file(file_path, "wb")
            downloaded_image.write(buf)
            downloaded_image.close()
            image_on_web.close()
        else:
            return False    
    except:
        return False
    return True

From this you never get any other resources or exceptions while downloading.


回答 6

如果您知道这些文件位于dir网站的同一目录中site并且具有以下格式:filename_01.jpg,…,filename_10.jpg,则下载所有文件:

import requests

for x in range(1, 10):
    str1 = 'filename_%2.2d.jpg' % (x)
    str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)

    f = open(str1, 'wb')
    f.write(requests.get(str2).content)
    f.close()

If you know that the files are located in the same directory dir of the website site and have the following format: filename_01.jpg, …, filename_10.jpg then download all of them:

import requests

for x in range(1, 10):
    str1 = 'filename_%2.2d.jpg' % (x)
    str2 = 'http://site/dir/filename_%2.2d.jpg' % (x)

    f = open(str1, 'wb')
    f.write(requests.get(str2).content)
    f.close()

回答 7

最简单的方法是只.read()读取部分或整个响应,然后将其写入到您在已知位置打开的文件中。

It’s easiest to just use .read() to read the partial or entire response, then write it into a file you’ve opened in a known good location.


回答 8

也许您需要“用户代理”:

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36')]
response = opener.open('http://google.com')
htmlData = response.read()
f = open('file.txt','w')
f.write(htmlData )
f.close()

Maybe you need ‘User-Agent’:

import urllib2
opener = urllib2.build_opener()
opener.addheaders = [('User-Agent', 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/34.0.1847.137 Safari/537.36')]
response = opener.open('http://google.com')
htmlData = response.read()
f = open('file.txt','w')
f.write(htmlData )
f.close()

回答 9

除了建议您retrieve()仔细阅读文档之外(http://docs.python.org/library/urllib.html#urllib.URLopener.retrieve),我建议实际调用read()响应的内容,然后将其保存到选择一个文件,而不是将其保留在检索创建的临时文件中。

Aside from suggesting you read the docs for retrieve() carefully (http://docs.python.org/library/urllib.html#urllib.URLopener.retrieve), I would suggest actually calling read() on the content of the response, and then saving it into a file of your choosing rather than leaving it in the temporary file that retrieve creates.


回答 10

以上所有代码均不允许保留原始图像名称,有时这是必需的。这将有助于将图像保存到本地驱动器,并保留原始图像名称

    IMAGE = URL.rsplit('/',1)[1]
    urllib.urlretrieve(URL, IMAGE)

试试这个以获得更多细节。

All the above codes, do not allow to preserve the original image name, which sometimes is required. This will help in saving the images to your local drive, preserving the original image name

    IMAGE = URL.rsplit('/',1)[1]
    urllib.urlretrieve(URL, IMAGE)

Try this for more details.


回答 11

这对我使用python 3。

它从csv文件获取URL列表,然后开始将它们下载到文件夹中。如果内容或图像不存在,它将采用该异常并继续使其神奇。

import urllib.request
import csv
import os

errorCount=0

file_list = "/Users/$USER/Desktop/YOUR-FILE-TO-DOWNLOAD-IMAGES/image_{0}.jpg"

# CSV file must separate by commas
# urls.csv is set to your current working directory make sure your cd into or add the corresponding path
with open ('urls.csv') as images:
    images = csv.reader(images)
    img_count = 1
    print("Please Wait.. it will take some time")
    for image in images:
        try:
            urllib.request.urlretrieve(image[0],
            file_list.format(img_count))
            img_count += 1
        except IOError:
            errorCount+=1
            # Stop in case you reach 100 errors downloading images
            if errorCount>100:
                break
            else:
                print ("File does not exist")

print ("Done!")

This worked for me using python 3.

It gets a list of URLs from the csv file and starts downloading them into a folder. In case the content or image does not exist it takes that exception and continues making its magic.

import urllib.request
import csv
import os

errorCount=0

file_list = "/Users/$USER/Desktop/YOUR-FILE-TO-DOWNLOAD-IMAGES/image_{0}.jpg"

# CSV file must separate by commas
# urls.csv is set to your current working directory make sure your cd into or add the corresponding path
with open ('urls.csv') as images:
    images = csv.reader(images)
    img_count = 1
    print("Please Wait.. it will take some time")
    for image in images:
        try:
            urllib.request.urlretrieve(image[0],
            file_list.format(img_count))
            img_count += 1
        except IOError:
            errorCount+=1
            # Stop in case you reach 100 errors downloading images
            if errorCount>100:
                break
            else:
                print ("File does not exist")

print ("Done!")

回答 12

一个更简单的解决方案可能是(python 3):

import urllib.request
import os
os.chdir("D:\\comic") #your path
i=1;
s="00000000"
while i<1000:
    try:
        urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/"+ s[:8-len(str(i))]+ str(i)+".jpg",str(i)+".jpg")
    except:
        print("not possible" + str(i))
    i+=1;

A simpler solution may be(python 3):

import urllib.request
import os
os.chdir("D:\\comic") #your path
i=1;
s="00000000"
while i<1000:
    try:
        urllib.request.urlretrieve("http://www.gunnerkrigg.com//comics/"+ s[:8-len(str(i))]+ str(i)+".jpg",str(i)+".jpg")
    except:
        print("not possible" + str(i))
    i+=1;

回答 13

那这个呢:

import urllib, os

def from_url( url, filename = None ):
    '''Store the url content to filename'''
    if not filename:
        filename = os.path.basename( os.path.realpath(url) )

    req = urllib.request.Request( url )
    try:
        response = urllib.request.urlopen( req )
    except urllib.error.URLError as e:
        if hasattr( e, 'reason' ):
            print( 'Fail in reaching the server -> ', e.reason )
            return False
        elif hasattr( e, 'code' ):
            print( 'The server couldn\'t fulfill the request -> ', e.code )
            return False
    else:
        with open( filename, 'wb' ) as fo:
            fo.write( response.read() )
            print( 'Url saved as %s' % filename )
        return True

##

def main():
    test_url = 'http://cdn.sstatic.net/stackoverflow/img/favicon.ico'

    from_url( test_url )

if __name__ == '__main__':
    main()

What about this:

import urllib, os

def from_url( url, filename = None ):
    '''Store the url content to filename'''
    if not filename:
        filename = os.path.basename( os.path.realpath(url) )

    req = urllib.request.Request( url )
    try:
        response = urllib.request.urlopen( req )
    except urllib.error.URLError as e:
        if hasattr( e, 'reason' ):
            print( 'Fail in reaching the server -> ', e.reason )
            return False
        elif hasattr( e, 'code' ):
            print( 'The server couldn\'t fulfill the request -> ', e.code )
            return False
    else:
        with open( filename, 'wb' ) as fo:
            fo.write( response.read() )
            print( 'Url saved as %s' % filename )
        return True

##

def main():
    test_url = 'http://cdn.sstatic.net/stackoverflow/img/favicon.ico'

    from_url( test_url )

if __name__ == '__main__':
    main()

回答 14

如果您需要代理支持,则可以执行以下操作:

  if needProxy == False:
    returnCode, urlReturnResponse = urllib.urlretrieve( myUrl, fullJpegPathAndName )
  else:
    proxy_support = urllib2.ProxyHandler({"https":myHttpProxyAddress})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    urlReader = urllib2.urlopen( myUrl ).read() 
    with open( fullJpegPathAndName, "w" ) as f:
      f.write( urlReader )

If you need proxy support you can do this:

  if needProxy == False:
    returnCode, urlReturnResponse = urllib.urlretrieve( myUrl, fullJpegPathAndName )
  else:
    proxy_support = urllib2.ProxyHandler({"https":myHttpProxyAddress})
    opener = urllib2.build_opener(proxy_support)
    urllib2.install_opener(opener)
    urlReader = urllib2.urlopen( myUrl ).read() 
    with open( fullJpegPathAndName, "w" ) as f:
      f.write( urlReader )

回答 15

另一种方法是通过fastai库。这对我来说就像是一种魅力。我正面临着一个SSL: CERTIFICATE_VERIFY_FAILED Error使用,urlretrieve所以我尝试了。

url = 'https://www.linkdoesntexist.com/lennon.jpg'
fastai.core.download_url(url,'image1.jpg', show_progress=False)

Another way to do this is via the fastai library. This worked like a charm for me. I was facing a SSL: CERTIFICATE_VERIFY_FAILED Error using urlretrieve so I tried that.

url = 'https://www.linkdoesntexist.com/lennon.jpg'
fastai.core.download_url(url,'image1.jpg', show_progress=False)

回答 16

使用请求

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)

if __name__ == '__main__':
    ImageDl(url)

Using requests

import requests
import shutil,os

headers = {
    'user-agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36'
}
currentDir = os.getcwd()
path = os.path.join(currentDir,'Images')#saving images to Images folder

def ImageDl(url):
    attempts = 0
    while attempts < 5:#retry 5 times
        try:
            filename = url.split('/')[-1]
            r = requests.get(url,headers=headers,stream=True,timeout=5)
            if r.status_code == 200:
                with open(os.path.join(path,filename),'wb') as f:
                    r.raw.decode_content = True
                    shutil.copyfileobj(r.raw,f)
            print(filename)
            break
        except Exception as e:
            attempts+=1
            print(e)

if __name__ == '__main__':
    ImageDl(url)

回答 17

使用urllib,您可以立即完成此操作。

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)

urllib.request.urlretrieve(URL, "images/0.jpg")

Using urllib, you can get this done instantly.

import urllib.request

opener=urllib.request.build_opener()
opener.addheaders=[('User-Agent','Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/36.0.1941.0 Safari/537.36')]
urllib.request.install_opener(opener)

urllib.request.urlretrieve(URL, "images/0.jpg")

如何发送POST请求?

问题:如何发送POST请求?

我在网上找到了这个脚本:

import httplib, urllib
params = urllib.urlencode({'number': 12524, 'type': 'issue', 'action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
            "Accept": "text/plain"}
conn = httplib.HTTPConnection("bugs.python.org")
conn.request("POST", "", params, headers)
response = conn.getresponse()
print response.status, response.reason
302 Found
data = response.read()
data
'Redirecting to <a href="http://bugs.python.org/issue12524">http://bugs.python.org/issue12524</a>'
conn.close()

但是我不明白如何在PHP中使用它,或者params变量中的所有内容是什么,或者如何使用它。在尝试使它正常工作时,请给我一点帮助吗?

I found this script online:

import httplib, urllib
params = urllib.urlencode({'number': 12524, 'type': 'issue', 'action': 'show'})
headers = {"Content-type": "application/x-www-form-urlencoded",
            "Accept": "text/plain"}
conn = httplib.HTTPConnection("bugs.python.org")
conn.request("POST", "", params, headers)
response = conn.getresponse()
print response.status, response.reason
302 Found
data = response.read()
data
'Redirecting to <a href="http://bugs.python.org/issue12524">http://bugs.python.org/issue12524</a>'
conn.close()

But I don’t understand how to use it with PHP or what everything inside the params variable is or how to use it. Can I please have a little help with trying to get this to work?


回答 0

如果您真的想使用Python处理HTTP,我强烈建议您使用Requests:HTTP for Humans。适应您的问题的POST快速入门是:

>>> import requests
>>> r = requests.post("http://bugs.python.org", data={'number': 12524, 'type': 'issue', 'action': 'show'})
>>> print(r.status_code, r.reason)
200 OK
>>> print(r.text[:300] + '...')

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
Issue 12524: change httplib docs POST example - Python tracker

</title>
<link rel="shortcut i...
>>> 

If you really want to handle with HTTP using Python, I highly recommend Requests: HTTP for Humans. The POST quickstart adapted to your question is:

>>> import requests
>>> r = requests.post("http://bugs.python.org", data={'number': 12524, 'type': 'issue', 'action': 'show'})
>>> print(r.status_code, r.reason)
200 OK
>>> print(r.text[:300] + '...')

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<title>
Issue 12524: change httplib docs POST example - Python tracker

</title>
<link rel="shortcut i...
>>> 

回答 1

如果您需要脚本具有可移植性,并且不希望有任何第三方依赖关系,那么这就是您纯粹在Python 3中发送POST请求的方式。

from urllib.parse import urlencode
from urllib.request import Request, urlopen

url = 'https://httpbin.org/post' # Set destination URL here
post_fields = {'foo': 'bar'}     # Set POST fields here

request = Request(url, urlencode(post_fields).encode())
json = urlopen(request).read().decode()
print(json)

样本输出:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "foo": "bar"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Length": "7", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.3"
  }, 
  "json": null, 
  "origin": "127.0.0.1", 
  "url": "https://httpbin.org/post"
}

If you need your script to be portable and you would rather not have any 3rd party dependencies, this is how you send POST request purely in Python 3.

from urllib.parse import urlencode
from urllib.request import Request, urlopen

url = 'https://httpbin.org/post' # Set destination URL here
post_fields = {'foo': 'bar'}     # Set POST fields here

request = Request(url, urlencode(post_fields).encode())
json = urlopen(request).read().decode()
print(json)

Sample output:

{
  "args": {}, 
  "data": "", 
  "files": {}, 
  "form": {
    "foo": "bar"
  }, 
  "headers": {
    "Accept-Encoding": "identity", 
    "Content-Length": "7", 
    "Content-Type": "application/x-www-form-urlencoded", 
    "Host": "httpbin.org", 
    "User-Agent": "Python-urllib/3.3"
  }, 
  "json": null, 
  "origin": "127.0.0.1", 
  "url": "https://httpbin.org/post"
}

回答 2

您无法使用urllib(仅针对GET)实现POST请求,而是尝试使用requests模块,例如:

范例1.0:

import requests

base_url="www.server.com"
final_url="/{0}/friendly/{1}/url".format(base_url,any_value_here)

payload = {'number': 2, 'value': 1}
response = requests.post(final_url, data=payload)

print(response.text) #TEXT/HTML
print(response.status_code, response.reason) #HTTP

范例1.2:

>>> import requests

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

范例1.3:

>>> import json

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, data=json.dumps(payload))

You can’t achieve POST requests using urllib (only for GET), instead try using requests module, e.g.:

Example 1.0:

import requests

base_url="www.server.com"
final_url="/{0}/friendly/{1}/url".format(base_url,any_value_here)

payload = {'number': 2, 'value': 1}
response = requests.post(final_url, data=payload)

print(response.text) #TEXT/HTML
print(response.status_code, response.reason) #HTTP

Example 1.2:

>>> import requests

>>> payload = {'key1': 'value1', 'key2': 'value2'}

>>> r = requests.post("http://httpbin.org/post", data=payload)
>>> print(r.text)
{
  ...
  "form": {
    "key2": "value2",
    "key1": "value1"
  },
  ...
}

Example 1.3:

>>> import json

>>> url = 'https://api.github.com/some/endpoint'
>>> payload = {'some': 'data'}

>>> r = requests.post(url, data=json.dumps(payload))

回答 3

requests通过点击REST API端点,使用库进行GET,POST,PUT或DELETE。在中传递其余api端点url,在中传递url有效载荷(dict),data并在中传递标头/元数据headers

import requests, json

url = "bugs.python.org"

payload = {"number": 12524, 
           "type": "issue", 
           "action": "show"}

header = {"Content-type": "application/x-www-form-urlencoded",
          "Accept": "text/plain"} 

response_decoded_json = requests.post(url, data=payload, headers=header)
response_json = response_decoded_json.json()

print response_json

Use requests library to GET, POST, PUT or DELETE by hitting a REST API endpoint. Pass the rest api endpoint url in url, payload(dict) in data and header/metadata in headers

import requests, json

url = "bugs.python.org"

payload = {"number": 12524, 
           "type": "issue", 
           "action": "show"}

header = {"Content-type": "application/x-www-form-urlencoded",
          "Accept": "text/plain"} 

response_decoded_json = requests.post(url, data=payload, headers=header)
response_json = response_decoded_json.json()

print response_json

回答 4

您的数据字典包含表单输入字段的名称,您只需保持其值正确即可查找结果。 表单视图 页眉将浏览​​器配置为检索您声明的数据类型。使用请求库,可以轻松发送POST:

import requests

url = "https://bugs.python.org"
data = {'@number': 12524, '@type': 'issue', '@action': 'show'}
headers = {"Content-type": "application/x-www-form-urlencoded", "Accept":"text/plain"}
response = requests.post(url, data=data, headers=headers)

print(response.text)

有关请求对象的更多信息:https : //requests.readthedocs.io/en/master/api/

Your data dictionary conteines names of form input fields, you just keep on right their values to find results. form view Header configures browser to retrieve type of data you declare. With requests library it’s easy to send POST:

import requests

url = "https://bugs.python.org"
data = {'@number': 12524, '@type': 'issue', '@action': 'show'}
headers = {"Content-type": "application/x-www-form-urlencoded", "Accept":"text/plain"}
response = requests.post(url, data=data, headers=headers)

print(response.text)

More about Request object: https://requests.readthedocs.io/en/master/api/


回答 5

如果您不想使用必须安装的模块(例如)requests,并且用例非常基础,则可以使用urllib2

urllib2.urlopen(url, body)

请参阅urllib2此处的文档:https : //docs.python.org/2/library/urllib2.html

If you don’t want to use a module you have to install like requests, and your use case is very basic, then you can use urllib2

urllib2.urlopen(url, body)

See the documentation for urllib2 here: https://docs.python.org/2/library/urllib2.html.