Python 实用宝典

Question 1

What I’m trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?

Question 2

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.

Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There’s also a getheader(name) to get a specific header.

Question 3

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

Question 4

Obligatory Requests way:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

Question 5

I believe the Requests library should be mentioned as well.

Question 6

Just:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

Edit: I’ve just came to realize there is httplib2 :D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

link text

Question 7

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.

It is basically the same code just that the library isn’t called httplib anymore but http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

Question 8

import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

Question 9

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.

Question 10

I have found that httplib is slightly faster than urllib2. I timed two programs – one using httplib and the other using urllib2 – sending HEAD requests to 10,000 URL’s. The httplib one was faster by several minutes. httplib‘s total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2‘s total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?

Question 11

And yet another approach (similar to Pawel answer):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

Just to avoid having unbounded methods at instance level.

Question 12

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()[‘content-type’], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.

Question 13

查看urllib2的源代码，看起来最简单的方法是将HTTPRedirectHandler子类化，然后使用build_opener覆盖默认的HTTPRedirectHandler，但这似乎需要很多（相对复杂的）工作来完成应有的工作很简单。

Question 14

Looking at the source of urllib2 it looks like the easiest way to do it would be to subclass HTTPRedirectHandler and then use build_opener to override the default HTTPRedirectHandler, but this seems like a lot of (relatively complicated) work to do what seems like it should be pretty simple.

Question 15

这是请求的方式：

import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])

Question 16

Here is the Requests way:

import requests
r = requests.get('http://github.com', allow_redirects=False)
print(r.status_code, r.headers['Location'])

Question 17

Dive Into Python有很好的章节介绍如何使用urllib2进行重定向。另一个解决方案是httplib。

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

Question 18

Dive Into Python has a good chapter on handling redirects with urllib2. Another solution is httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.bogosoft.com")
>>> conn.request("GET", "")
>>> r1 = conn.getresponse()
>>> print r1.status, r1.reason
301 Moved Permanently
>>> print r1.getheader('Location')
http://www.bogosoft.com/new/location

Question 19

这是一个不会跟随重定向的urllib2处理程序：

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)

Question 20

This is a urllib2 handler that will not follow redirects:

class NoRedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_302(self, req, fp, code, msg, headers):
        infourl = urllib.addinfourl(fp, headers, req.get_full_url())
        infourl.status = code
        infourl.code = code
        return infourl
    http_error_300 = http_error_302
    http_error_301 = http_error_302
    http_error_303 = http_error_302
    http_error_307 = http_error_302

opener = urllib2.build_opener(NoRedirectHandler())
urllib2.install_opener(opener)

Question 21

request方法中的redirections关键字httplib2是红色鲱鱼。RedirectLimit如果收到重定向状态代码，它将引发异常，而不是返回第一个请求。要返回初始响应，您需要在对象上设置follow_redirects为：FalseHttp

import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")

Question 22

The redirections keyword in the httplib2 request method is a red herring. Rather than return the first request it will raise a RedirectLimit exception if it receives a redirection status code. To return the inital response you need to set follow_redirects to False on the Http object:

import httplib2
h = httplib2.Http()
h.follow_redirects = False
(response, body) = h.request("http://example.com")

Question 23

我想这会有所帮助

from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)

Question 24

i suppose this would help

from httplib2 import Http
def get_html(uri,num_redirections=0): # put it as 0 for not to follow redirects
conn = Http()
return conn.request(uri,redirections=num_redirections)

Question 25

我仅次于olt的Dive into Python指针。这是一个使用urllib2重定向处理程序的实现，比应做的工作还要多？也许，耸耸肩。

import sys
import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301( 
            self, req, fp, code, msg, headers)              
        result.status = code                                 
        raise Exception("Permanent Redirect: %s" % 301)

    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        raise Exception("Temporary Redirect: %s" % 302)

def main(script_name, url):
   opener = urllib2.build_opener(RedirectHandler)
   urllib2.install_opener(opener)
   print urllib2.urlopen(url).read()

if __name__ == "__main__":
    main(*sys.argv)

Question 26

I second olt’s pointer to Dive into Python. Here’s an implementation using urllib2 redirect handlers, more work than it should be? Maybe, shrug.

import sys
import urllib2

class RedirectHandler(urllib2.HTTPRedirectHandler):
    def http_error_301(self, req, fp, code, msg, headers):  
        result = urllib2.HTTPRedirectHandler.http_error_301( 
            self, req, fp, code, msg, headers)              
        result.status = code                                 
        raise Exception("Permanent Redirect: %s" % 301)

    def http_error_302(self, req, fp, code, msg, headers):
        result = urllib2.HTTPRedirectHandler.http_error_302(
            self, req, fp, code, msg, headers)              
        result.status = code                                
        raise Exception("Temporary Redirect: %s" % 302)

def main(script_name, url):
   opener = urllib2.build_opener(RedirectHandler)
   urllib2.install_opener(opener)
   print urllib2.urlopen(url).read()

if __name__ == "__main__":
    main(*sys.argv)

Question 27

但是最短的方法是

class NoRedirect(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        pass

noredir_opener = urllib2.build_opener(NoRedirect())

Question 28

The shortest way however is

class NoRedirect(urllib2.HTTPRedirectHandler):
    def redirect_request(self, req, fp, code, msg, hdrs, newurl):
        pass

noredir_opener = urllib2.build_opener(NoRedirect())

Question 29

I’ve been looking through the Python Requests documentation but I cannot see any functionality for what I am trying to achieve.

In my script I am setting allow_redirects=True.

I would like to know if the page has been redirected to something else, what is the new URL.

For example, if the start URL was: www.google.com/redirect

And the final URL is www.google.co.uk/redirected

How do I get that URL?

Question 30

You are looking for the request history.

The response.history attribute is a list of responses that led to the final URL, which can be found in response.url.

response = requests.get(someurl)
if response.history:
    print("Request was redirected")
    for resp in response.history:
        print(resp.status_code, resp.url)
    print("Final destination:")
    print(response.status_code, response.url)
else:
    print("Request was not redirected")

Demo:

>>> import requests
>>> response = requests.get('http://httpbin.org/redirect/3')
>>> response.history
(<Response [302]>, <Response [302]>, <Response [302]>)
>>> for resp in response.history:
...     print(resp.status_code, resp.url)
... 
302 http://httpbin.org/redirect/3
302 http://httpbin.org/redirect/2
302 http://httpbin.org/redirect/1
>>> print(response.status_code, response.url)
200 http://httpbin.org/get

Question 31

This is answering a slightly different question, but since I got stuck on this myself, I hope it might be useful for someone else.

If you want to use allow_redirects=False and get directly to the first redirect object, rather than following a chain of them, and you just want to get the redirect location directly out of the 302 response object, then r.url won’t work. Instead, it’s the “Location” header:

r = requests.get('http://github.com/', allow_redirects=False)
r.status_code  # 302
r.url  # http://github.com, not https.
r.headers['Location']  # https://github.com/ -- the redirect destination

Question 32

the documentation has this blurb https://requests.readthedocs.io/en/master/user/quickstart/#redirection-and-history

import requests

r = requests.get('http://www.github.com')
r.url
#returns https://www.github.com instead of the http page you asked for

Question 33

I think requests.head instead of requests.get will be more safe to call when handling url redirect,check the github issue here:

r = requests.head(url, allow_redirects=True)
print(r.url)

Question 34

For python3.5, you can use the following code:

import urllib.request
res = urllib.request.urlopen(starturl)
finalurl = res.geturl()
print(finalurl)

Question 35

data = {
        'ids': [12, 3, 4, 5, 6 , ...]
    }
    urllib2.urlopen("http://abc.com/api/posts/create",urllib.urlencode(data))

I want to send a POST request, but one of the fields should be a list of numbers. How can I do that ? (JSON?)

Question 36

If your server is expecting the POST request to be json, then you would need to add a header, and also serialize the data for your request…

Python 2.x

import json
import urllib2

data = {
        'ids': [12, 3, 4, 5, 6]
}

req = urllib2.Request('http://example.com/api/posts/create')
req.add_header('Content-Type', 'application/json')

response = urllib2.urlopen(req, json.dumps(data))

Python 3.x

https://stackoverflow.com/a/26876308/496445

If you don’t specify the header, it will be the default application/x-www-form-urlencoded type.

Question 37

I recommend using the incredible requests module.

http://docs.python-requests.org/en/v0.10.7/user/quickstart/#custom-headers

url = 'https://api.github.com/some/endpoint'
payload = {'some': 'data'}
headers = {'content-type': 'application/json'}

response = requests.post(url, data=json.dumps(payload), headers=headers)

Question 38

for python 3.4.2 I found the following will work:

import urllib.request
import json      

body = {'ids': [12, 14, 50]}  
myurl = "http://www.testmycode.com"

req = urllib.request.Request(myurl)
req.add_header('Content-Type', 'application/json; charset=utf-8')
jsondata = json.dumps(body)
jsondataasbytes = jsondata.encode('utf-8')   # needs to be bytes
req.add_header('Content-Length', len(jsondataasbytes))
response = urllib.request.urlopen(req, jsondataasbytes)

Question 39

This works perfect for Python 3.5, if the URL contains Query String / Parameter value,

Request URL = https://bah2.com/ws/rest/v1/concept/
Parameter value = 21f6bb43-98a1-419d-8f0c-8133669e40ca

import requests

url = 'https://bahbah2.com/ws/rest/v1/concept/21f6bb43-98a1-419d-8f0c-8133669e40ca'
data = {"name": "Value"}
r = requests.post(url, auth=('username', 'password'), verify=False, json=data)
print(r.status_code)

Question 40

You have to add header,or you will get http 400 error. The code works well on python2.6,centos5.4

code:

    import urllib2,json

    url = 'http://www.google.com/someservice'
    postdata = {'key':'value'}

    req = urllib2.Request(url)
    req.add_header('Content-Type','application/json')
    data = json.dumps(postdata)

    response = urllib2.urlopen(req,data)

Question 41

Here is an example of how to use urllib.request object from Python standard library.

import urllib.request
import json
from pprint import pprint

url = "https://app.close.com/hackwithus/3d63efa04a08a9e0/"

values = {
    "first_name": "Vlad",
    "last_name": "Bezden",
    "urls": [
        "https://twitter.com/VladBezden",
        "https://github.com/vlad-bezden",
    ],
}


headers = {
    "Content-Type": "application/json",
    "Accept": "application/json",
}

data = json.dumps(values).encode("utf-8")
pprint(data)

try:
    req = urllib.request.Request(url, data, headers)
    with urllib.request.urlopen(req) as f:
        res = f.read()
    pprint(res.decode())
except Exception as e:
    pprint(e)

Question 42

In the lastest requests package, you can use json parameter in requests.post() method to send a json dict, and the Content-Type in header will be set to application/json. There is no need to specify header explicitly.

import requests

payload = {'key': 'value'}

requests.post(url, json=payload)

Question 43

This one works fine for me with apis

import requests

data={'Id':id ,'name': name}
r = requests.post( url = 'https://apiurllink', data = data)

Question 44

I am trying to automate download of historic stock data using python. The URL I am trying to open responds with a CSV file, but I am unable to open using urllib2. I have tried changing user agent as specified in few questions earlier, I even tried to accept response cookies, with no luck. Can you please help.

Note: The same method works for yahoo Finance.

Code:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"

hdr = {'User-Agent':'Mozilla/5.0'}

req = urllib2.Request(site,headers=hdr)

page = urllib2.urlopen(req)

Error

File “C:\Python27\lib\urllib2.py”, line 527, in http_error_default raise HTTPError(req.get_full_url(), code, msg, hdrs, fp) urllib2.HTTPError: HTTP Error 403: Forbidden

Thanks for your assistance

Question 45

By adding a few more headers I was able to get the data:

import urllib2,cookielib

site= "http://www.nseindia.com/live_market/dynaContent/live_watch/get_quote/getHistoricalData.jsp?symbol=JPASSOCIAT&fromDate=1-JAN-2012&toDate=1-AUG-2012&datePeriod=unselected&hiddDwnld=true"
hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
       'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
       'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
       'Accept-Encoding': 'none',
       'Accept-Language': 'en-US,en;q=0.8',
       'Connection': 'keep-alive'}

req = urllib2.Request(site, headers=hdr)

try:
    page = urllib2.urlopen(req)
except urllib2.HTTPError, e:
    print e.fp.read()

content = page.read()
print content

Actually, it works with just this one additional header:

'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',

Question 46

This will work in Python 3

import urllib.request

user_agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7'

url = "http://en.wikipedia.org/wiki/List_of_TCP_and_UDP_port_numbers"
headers={'User-Agent':user_agent,} 

request=urllib.request.Request(url,None,headers) #The assembled request
response = urllib.request.urlopen(request)
data = response.read() # The data u need

Question 47

NSE website has changed and the older scripts are semi-optimum to current website. This snippet can gather daily details of security. Details include symbol, security type, previous close, open price, high price, low price, average price, traded quantity, turnover, number of trades, deliverable quantities and ratio of delivered vs traded in percentage. These conveniently presented as list of dictionary form.

Python 3.X version with requests and BeautifulSoup

from requests import get
from csv import DictReader
from bs4 import BeautifulSoup as Soup
from datetime import date
from io import StringIO 

SECURITY_NAME="3MINDIA" # Change this to get quote for another stock
START_DATE= date(2017, 1, 1) # Start date of stock quote data DD-MM-YYYY
END_DATE= date(2017, 9, 14)  # End date of stock quote data DD-MM-YYYY


BASE_URL = "https://www.nseindia.com/products/dynaContent/common/productsSymbolMapping.jsp?symbol={security}&segmentLink=3&symbolCount=1&series=ALL&dateRange=+&fromDate={start_date}&toDate={end_date}&dataType=PRICEVOLUMEDELIVERABLE"




def getquote(symbol, start, end):
    start = start.strftime("%-d-%-m-%Y")
    end = end.strftime("%-d-%-m-%Y")

    hdr = {'User-Agent': 'Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/23.0.1271.64 Safari/537.11',
         'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8',
         'Referer': 'https://cssspritegenerator.com',
         'Accept-Charset': 'ISO-8859-1,utf-8;q=0.7,*;q=0.3',
         'Accept-Encoding': 'none',
         'Accept-Language': 'en-US,en;q=0.8',
         'Connection': 'keep-alive'}

    url = BASE_URL.format(security=symbol, start_date=start, end_date=end)
    d = get(url, headers=hdr)
    soup = Soup(d.content, 'html.parser')
    payload = soup.find('div', {'id': 'csvContentDiv'}).text.replace(':', '\n')
    csv = DictReader(StringIO(payload))
    for row in csv:
        print({k:v.strip() for k, v in row.items()})


 if __name__ == '__main__':
     getquote(SECURITY_NAME, START_DATE, END_DATE)

Besides this is relatively modular and ready to use snippet.

Question 48

I was trying to scrap a website for practice, but I kept on getting the HTTP Error 403 (does it think I’m a bot)?

Here is my code:

#import requests
import urllib.request
from bs4 import BeautifulSoup
#from urllib import urlopen
import re

webpage = urllib.request.urlopen('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1').read
findrows = re.compile('<tr class="- banding(?:On|Off)>(.*?)</tr>')
findlink = re.compile('<a href =">(.*)</a>')

row_array = re.findall(findrows, webpage)
links = re.finall(findlink, webpate)

print(len(row_array))

iterator = []

The error I get is:

 File "C:\Python33\lib\urllib\request.py", line 160, in urlopen
    return opener.open(url, data, timeout)
  File "C:\Python33\lib\urllib\request.py", line 479, in open
    response = meth(req, response)
  File "C:\Python33\lib\urllib\request.py", line 591, in http_response
    'http', request, response, code, msg, hdrs)
  File "C:\Python33\lib\urllib\request.py", line 517, in error
    return self._call_chain(*args)
  File "C:\Python33\lib\urllib\request.py", line 451, in _call_chain
    result = func(*args)
  File "C:\Python33\lib\urllib\request.py", line 599, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden

Question 49

This is probably because of mod_security or some similar server security feature which blocks known spider/bot user agents (urllib uses something like python urllib/3.3.0, it’s easily detected). Try setting a known browser user agent with:

from urllib.request import Request, urlopen

req = Request('http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1', headers={'User-Agent': 'Mozilla/5.0'})
webpage = urlopen(req).read()

This works for me.

By the way, in your code you are missing the () after .read in the urlopen line, but I think that it’s a typo.

TIP: since this is exercise, choose a different, non restrictive site. Maybe they are blocking urllib for some reason…

Question 50

Definitely it’s blocking because of your use of urllib based on the user agent. This same thing is happening to me with OfferUp. You can create a new class called AppURLopener which overrides the user-agent with Mozilla.

import urllib.request

class AppURLopener(urllib.request.FancyURLopener):
    version = "Mozilla/5.0"

opener = AppURLopener()
response = opener.open('http://httpbin.org/user-agent')

Source

Question 51

“This is probably because of mod_security or some similar server security feature which blocks known

spider/bot

user agents (urllib uses something like python urllib/3.3.0, it’s easily detected)” – as already mentioned by Stefano Sanfilippo

from urllib.request import Request, urlopen
url="https://stackoverflow.com/search?q=html+error+403"
req = Request(url, headers={'User-Agent': 'Mozilla/5.0'})

web_byte = urlopen(req).read()

webpage = web_byte.decode('utf-8')

The web_byte is a byte object returned by the server and the content type present in webpage is mostly utf-8. Therefore you need to decode web_byte using decode method.

This solves complete problem while I was having trying to scrap from a website using PyCharm

P.S -> I use python 3.4

Question 52

Since the page works in browser and not when calling within python program, it seems that the web app that serves that url recognizes that you request the content not by the browser.

Demonstration:

curl --dump-header r.txt http://www.cmegroup.com/trading/products/#sortField=oi&sortAsc=false&venues=3&page=1&cleared=1&group=1

...
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access ...
</HTML>

and the content in r.txt has status line:

HTTP/1.1 403 Forbidden

Try posting header ‘User-Agent’ which fakes web client.

NOTE: The page contains Ajax call that creates the table you probably want to parse. You’ll need to check the javascript logic of the page or simply using browser debugger (like Firebug / Net tab) to see which url you need to call to get the table’s content.

Question 53

You can try in two ways. The detail is in this link.

1) Via pip

pip install –upgrade certifi

2) If it doesn’t work, try to run a Cerificates.command that comes bundled with Python 3.* for Mac:(Go to your python installation location and double click the file)

open /Applications/Python\ 3.*/Install\ Certificates.command

Question 54

Based on the previous answer,

from urllib.request import Request, urlopen       
#specify url
url = 'https://xyz/xyz'
req = Request(url, headers={'User-Agent': 'XYZ/3.0'})
response = urlopen(req, timeout=20).read()

This worked for me by extending the timeout.

Question 55

If you feel guilty about faking the user-agent as Mozilla (comment in the top answer from Stefano), it could work with a non-urllib User-Agent as well. This worked for the sites I reference:

    req = urlrequest.Request(link, headers={'User-Agent': 'XYZ/3.0'})
    urlrequest.urlopen(req, timeout=10).read()

My application is to test validity by scraping specific links that I refer to, in my articles. Not a generic scraper.

Question 56

Based on previous answers this has worked for me with Python 3.7

from urllib.request import Request, urlopen

req = Request('Url_Link', headers={'User-Agent': 'XYZ/3.0'})
webpage = urlopen(req, timeout=10).read()

print(webpage)

Question 57

I want to download and parse webpage using python, but to access it I need a couple of cookies set. Therefore I need to login over https to the webpage first. The login moment involves sending two POST params (username, password) to /login.php. During the login request I want to retrieve the cookies from the response header and store them so I can use them in the request to download the webpage /data.php.

How would I do this in python (preferably 2.6)? If possible I only want to use builtin modules.

Question 58

import urllib, urllib2, cookielib

username = 'myuser'
password = 'mypassword'

cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
login_data = urllib.urlencode({'username' : username, 'j_password' : password})
opener.open('http://www.example.com/login.php', login_data)
resp = opener.open('http://www.example.com/hiddenpage.php')
print resp.read()

resp.read() is the straight html of the page you want to open, and you can use opener to view any page using your session cookie.

Question 59

Here’s a version using the excellent requests library:

from requests import session

payload = {
    'action': 'login',
    'username': USERNAME,
    'password': PASSWORD
}

with session() as c:
    c.post('http://example.com/login.php', data=payload)
    response = c.get('http://example.com/protected_page.php')
    print(response.headers)
    print(response.text)

Question 60

I need to upload some data to a server using HTTP PUT in python. From my brief reading of the urllib2 docs, it only does HTTP POST. Is there any way to do an HTTP PUT in python?

Question 61

I’ve used a variety of python HTTP libs in the past, and I’ve settled on ‘Requests‘ as my favourite. Existing libs had pretty useable interfaces, but code can end up being a few lines too long for simple operations. A basic PUT in requests looks like:

payload = {'username': 'bob', 'email': 'bob@bob.com'}
>>> r = requests.put("http://somedomain.org/endpoint", data=payload)

You can then check the response status code with:

r.status_code

or the response with:

r.content

Requests has a lot synactic sugar and shortcuts that’ll make your life easier.

Question 62

import urllib2
opener = urllib2.build_opener(urllib2.HTTPHandler)
request = urllib2.Request('http://example.org', data='your_put_data')
request.add_header('Content-Type', 'your/contenttype')
request.get_method = lambda: 'PUT'
url = opener.open(request)

Question 63

Httplib seems like a cleaner choice.

import httplib
connection =  httplib.HTTPConnection('1.2.3.4:1234')
body_content = 'BODY CONTENT GOES HERE'
connection.request('PUT', '/url/path/to/put/to', body_content)
result = connection.getresponse()
# Now result.status and result.reason contains interesting stuff

Question 64

You should have a look at the httplib module. It should let you make whatever sort of HTTP request you want.

问题：如何在Python 2中发送HEAD HTTP请求？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：有没有一种简单的方法可以在python中请求URL而不遵循重定向？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：Python请求库重定向新网址

回答 0

回答 1

回答 2

回答 3

回答 4

问题：如何以JSON格式发送POST请求？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：urllib2.HTTPError：HTTP错误403：禁止

回答 0

回答 1

回答 2

带有请求和BeautifulSoup的Python 3.X版本

Python 3.X version with requests and BeautifulSoup

问题：Python 3 Web Scraping中的HTTP错误403

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

httpbin(1)：HTTP请求和响应服务

正式部署在以下位置：

另请参阅

生成状态

问题：如何使用Python登录网页并检索Cookie以供以后使用？

回答 0

回答 1

问题：有没有办法在python中执行HTTP PUT

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：如何在Flask中获取HTTP标头？

回答 0

回答 1

回答 2

回答 3

有趣好用的Python教程