Python 实用宝典

Question 1

I have a cURL call that I use in PHP:

curl -i -H ‘Accept: application/xml’ -u login:key “https://app.streamsend.com/emails“

I need a way to do the same thing in Python. Is there an alternative to cURL in Python. I know of urllib but I’m a Python noob and have no idea how to use it.

Question 2

import urllib2

manager = urllib2.HTTPPasswordMgrWithDefaultRealm()
manager.add_password(None, 'https://app.streamsend.com/emails', 'login', 'key')
handler = urllib2.HTTPBasicAuthHandler(manager)

director = urllib2.OpenerDirector()
director.add_handler(handler)

req = urllib2.Request('https://app.streamsend.com/emails', headers = {'Accept' : 'application/xml'})

result = director.open(req)
# result.read() will contain the data
# result.info() will contain the HTTP headers

# To get say the content-length header
length = result.info()['Content-Length']

Your cURL call using urllib2 instead. Completely untested.

Question 3

You can use HTTP Requests that are described in the Requests: HTTP for Humans user guide.

Question 4

Here’s a simple example using urllib2 that does a basic authentication against GitHub’s API.

import urllib2

u='username'
p='userpass'
url='https://api.github.com/users/username'

# simple wrapper function to encode the username & pass
def encodeUserData(user, password):
    return "Basic " + (user + ":" + password).encode("base64").rstrip()

# create the request object and set some headers
req = urllib2.Request(url)
req.add_header('Accept', 'application/json')
req.add_header("Content-type", "application/x-www-form-urlencoded")
req.add_header('Authorization', encodeUserData(u, p))
# make the request and print the results
res = urllib2.urlopen(req)
print res.read()

Furthermore if you wrap this in a script and run it from a terminal you can pipe the response string to ‘mjson.tool’ to enable pretty printing.

>> basicAuth.py | python -mjson.tool

One last thing to note, urllib2 only supports GET & POST requests.
If you need to use other HTTP verbs like DELETE, PUT, etc you’ll probably want to take a look at PYCURL

Question 5

If you are using a command to just call curl like that, you can do the same thing in Python with subprocess. Example:

subprocess.call(['curl', '-i', '-H', '"Accept: application/xml"', '-u', 'login:key', '"https://app.streamsend.com/emails"'])

Or you could try PycURL if you want to have it as a more structured api like what PHP has.

Question 6

import requests

url = 'https://example.tld/'
auth = ('username', 'password')

r = requests.get(url, auth=auth)
print r.content

This is the simplest I’ve been able to get it.

Question 7

Some example, how to use urllib for that things, with some sugar syntax. I know about requests and other libraries, but urllib is standard lib for python and doesn’t require anything to be installed separately.

Python 2/3 compatible.

import sys
if sys.version_info.major == 3:
  from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, Request, build_opener
  from urllib.parse import urlencode
else:
  from urllib2 import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler, Request, build_opener
  from urllib import urlencode


def curl(url, params=None, auth=None, req_type="GET", data=None, headers=None):
  post_req = ["POST", "PUT"]
  get_req = ["GET", "DELETE"]

  if params is not None:
    url += "?" + urlencode(params)

  if req_type not in post_req + get_req:
    raise IOError("Wrong request type \"%s\" passed" % req_type)

  _headers = {}
  handler_chain = []

  if auth is not None:
    manager = HTTPPasswordMgrWithDefaultRealm()
    manager.add_password(None, url, auth["user"], auth["pass"])
    handler_chain.append(HTTPBasicAuthHandler(manager))

  if req_type in post_req and data is not None:
    _headers["Content-Length"] = len(data)

  if headers is not None:
    _headers.update(headers)

  director = build_opener(*handler_chain)

  if req_type in post_req:
    if sys.version_info.major == 3:
      _data = bytes(data, encoding='utf8')
    else:
      _data = bytes(data)

    req = Request(url, headers=_headers, data=_data)
  else:
    req = Request(url, headers=_headers)

  req.get_method = lambda: req_type
  result = director.open(req)

  return {
    "httpcode": result.code,
    "headers": result.info(),
    "content": result.read()
  }


"""
Usage example:
"""

Post data:
  curl("http://127.0.0.1/", req_type="POST", data='cascac')

Pass arguments (http://127.0.0.1/?q=show):
  curl("http://127.0.0.1/", params={'q': 'show'}, req_type="POST", data='cascac')

HTTP Authorization:
  curl("http://127.0.0.1/secure_data.txt", auth={"user": "username", "pass": "password"})

Function is not complete and possibly is not ideal, but shows a basic representation and concept to use. Additional things could be added or changed by taste.

12/08 update

Here is a GitHub link to live updated source. Currently supporting:

authorization
CRUD compatible
automatic charset detection
automatic encoding(compression) detection

Question 8

If it’s running all of the above from the command line that you’re looking for, then I’d recommend HTTPie. It is a fantastic cURL alternative and is super easy and convenient to use (and customize).

Here’s is its (succinct and precise) description from GitHub;

HTTPie (pronounced aych-tee-tee-pie) is a command line HTTP client. Its goal is to make CLI interaction with web services as human-friendly as possible.

It provides a simple http command that allows for sending arbitrary HTTP requests using a simple and natural syntax, and displays colorized output. HTTPie can be used for testing, debugging, and generally interacting with HTTP servers.

The documentation around authentication should give you enough pointers to solve your problem(s). Of course, all of the answers above are accurate as well, and provide different ways of accomplishing the same task.

Just so you do NOT have to move away from Stack Overflow, here’s what it offers in a nutshell.

Basic auth:

$ http -a username:password example.org
Digest auth:

$ http --auth-type=digest -a username:password example.org
With password prompt:

$ http -a username example.org

Question 9

I’m trying to compile a python extension with cython in win 7 64-bit using mingw (64-bit).
I’m working with Python 2.6 (Active Python 2.6.6) and with the adequate distutils.cfg file (setting mingw as the compiler)

When executing

> C:\Python26\programas\Cython>python setup.py build_ext --inplace

I get an error saying that gcc has not an -mno-cygwin option:

> C:\Python26\programas\Cython>python setup.py build_ext --inplace
running build_ext
skipping 'hello2.c' Cython extension (up-to-date)
building 'hello2' extension
C:\mingw\bin\gcc.exe -mno-cygwin -mdll -O -Wall -IC:\Python26\include -IC:\Python26\PC -c hello2.c -o build\temp.win-amd64-2.6\Release\hello2.o
gcc: error: unrecognized command line option '-mno-cygwin'
error: command 'gcc' failed with exit status 1

gcc is:

C:\>gcc --version
gcc (GCC) 4.7.0 20110430 (experimental)
Copyright (C) 2011 Free Software Foundation, Inc.

How could I fix it?

Question 10

It sounds like GCC 4.7.0 has finally removed the deprecated -mno-cygwin option, but distutils has not yet caught up with it. Either install a slightly older version of MinGW, or edit distutils\cygwinccompiler.py in your Python directory to remove all instances of -mno-cygwin.

Question 11

During the process of solving these and the following problems I found, I wrote a recipe in this thread. I reproduce it here in case it could be of utility for others:

Step by step recipe to compile 64-bit cython extensions with python 2.6.6 with mingw compiler in win 7 64-bit

Install mingw compiler
1) Install tdm64-gcc-4.5.2.exe for 64-bit compilation

Apply patch to python.h
2) Modify python.h in C:\python26\include as indicated in http://bugs.python.org/file12411/mingw-w64.patch

Modify distutils
Edit 2013: Note than in python 2.7.6 and 3.3.3 -mno-cygwin has been finally removed so step 3 can be skipped.

3) Eliminate all the parameters -mno-cygwin fom the call to gcc in the Mingw32CCompiler class in Python26\Lib\distutils\cygwinccompiler.py
4) In the same module, modify get_msvcr() to return an empty list instead of [‘msvcr90’] when msc_ver == ‘1500’ .

Produce the libpython26.a file (not included in 64 bit python)
Edit 2013: the following steps 5-10 can be skipped by downloading and installing libpython26.a from gohlke.

5) Obtain gendef.exe from mingw-w64-bin_x86_64- mingw_20101003_sezero.zip (gendef.exe is not available in the tmd64 distribution. Another solution is to compile gendef from source…)
6) Copy python26.dll (located at C\windows\system32) to the user directory (C:\Users\myname)
7) Produce the python26.def file with:

gendef.exe C:\Users\myname\python26.dll

8) Move the python.def file produced (located in the folder from where gendef was executed) to the user directory
9) Produce the libpython.a with:

dlltool -v –dllname python26.dll –def C:\Users\myname \python26.def –output-lib C:\Users\myname\libpython26.a

10) Move the created libpython26.a to C:\Python26\libs

Produce your .pyd extension
11) Create a test hello.pyx file and a setup.py file as indicated in cython tutorial (http://docs.cython.org/src/quickstart/build.html)
12) Compile with

python setup.py build_ext –inplace

Done!

Question 12

This bug has now been fixed in Python 2.7.6 release candidate 1.

The patching commit is here.

The resolved issue tracker thread is here.

Question 13

Try this . It really works for the error
https://github.com/develersrl/gccwinbinaries

Question 14

I know how to set it in my /etc/profile and in my environment variables.

But what if I want to set it during a script? Is it import os, sys? How do I do it?

Question 15

You don’t set PYTHONPATH, you add entries to sys.path. It’s a list of directories that should be searched for Python packages, so you can just append your directories to that list.

sys.path.append('/path/to/whatever')

In fact, sys.path is initialized by splitting the value of PYTHONPATH on the path separator character (: on Linux-like systems, ; on Windows).

You can also add directories using site.addsitedir, and that method will also take into account .pth files existing within the directories you pass. (That would not be the case with directories you specify in PYTHONPATH.)

Question 16

You can get and set environment variables via os.environ:

import os
user_home = os.environ["HOME"]

os.environ["PYTHONPATH"] = "..."

But since your interpreter is already running, this will have no effect. You’re better off using

import sys
sys.path.append("...")

which is the array that your PYTHONPATH will be transformed into on interpreter startup.

Question 17

If you put sys.path.append('dir/to/path') without check it is already added, you could generate a long list in sys.path. For that, I recommend this:

import sys
import os # if you want this directory

try:
    sys.path.index('/dir/path') # Or os.getcwd() for this directory
except ValueError:
    sys.path.append('/dir/path') # Or os.getcwd() for this directory

Question 18

PYTHONPATH ends up in sys.path, which you can modify at runtime.

import sys
sys.path += ["whatever"]

Question 19

you can set PYTHONPATH, by os.environ['PATHPYTHON']=/some/path, then you need to call os.system('python') to restart the python shell to make the newly added path effective.

Question 20

I linux this works too:

import sys
sys.path.extend(["/path/to/dotpy/file/"])

Question 21

I’m writing tests for a function like next one:

def foo():
    print 'hello world!'

So when I want to test this function the code will be like this:

import sys
from foomodule import foo
def test_foo():
    foo()
    output = sys.stdout.getline().strip() # because stdout is an StringIO instance
    assert output == 'hello world!'

But if I run nosetests with -s parameter the test crashes. How can I catch the output with unittest or nose module?

Question 22

I use this context manager to capture output. It ultimately uses the same technique as some of the other answers by temporarily replacing sys.stdout. I prefer the context manager because it wraps all the bookkeeping into a single function, so I don’t have to re-write any try-finally code, and I don’t have to write setup and teardown functions just for this.

import sys
from contextlib import contextmanager
from StringIO import StringIO

@contextmanager
def captured_output():
    new_out, new_err = StringIO(), StringIO()
    old_out, old_err = sys.stdout, sys.stderr
    try:
        sys.stdout, sys.stderr = new_out, new_err
        yield sys.stdout, sys.stderr
    finally:
        sys.stdout, sys.stderr = old_out, old_err

Use it like this:

with captured_output() as (out, err):
    foo()
# This can go inside or outside the `with` block
output = out.getvalue().strip()
self.assertEqual(output, 'hello world!')

Furthermore, since the original output state is restored upon exiting the with block, we can set up a second capture block in the same function as the first one, which isn’t possible using setup and teardown functions, and gets wordy when writing try-finally blocks manually. That ability came in handy when the goal of a test was to compare the results of two functions relative to each other rather than to some precomputed value.

Question 23

If you really want to do this, you can reassign sys.stdout for the duration of the test.

def test_foo():
    import sys
    from foomodule import foo
    from StringIO import StringIO

    saved_stdout = sys.stdout
    try:
        out = StringIO()
        sys.stdout = out
        foo()
        output = out.getvalue().strip()
        assert output == 'hello world!'
    finally:
        sys.stdout = saved_stdout

If I were writing this code, however, I would prefer to pass an optional out parameter to the foo function.

def foo(out=sys.stdout):
    out.write("hello, world!")

Then the test is much simpler:

def test_foo():
    from foomodule import foo
    from StringIO import StringIO

    out = StringIO()
    foo(out=out)
    output = out.getvalue().strip()
    assert output == 'hello world!'

Question 24

Since version 2.7, you do not need anymore to reassign sys.stdout, this is provided through buffer flag. Moreover, it is the default behavior of nosetest.

Here is a sample failing in non buffered context:

import sys
import unittest

def foo():
    print 'hello world!'

class Case(unittest.TestCase):
    def test_foo(self):
        foo()
        if not hasattr(sys.stdout, "getvalue"):
            self.fail("need to run in buffered mode")
        output = sys.stdout.getvalue().strip() # because stdout is an StringIO instance
        self.assertEquals(output,'hello world!')

You can set buffer through unit2 command line flag -b, --buffer or in unittest.main options. The opposite is achieved through nosetest flag --nocapture.

if __name__=="__main__":   
    assert not hasattr(sys.stdout, "getvalue")
    unittest.main(module=__name__, buffer=True, exit=False)
    #.
    #----------------------------------------------------------------------
    #Ran 1 test in 0.000s
    #
    #OK
    assert not hasattr(sys.stdout, "getvalue")

    unittest.main(module=__name__, buffer=False)
    #hello world!
    #F
    #======================================================================
    #FAIL: test_foo (__main__.Case)
    #----------------------------------------------------------------------
    #Traceback (most recent call last):
    #  File "test_stdout.py", line 15, in test_foo
    #    self.fail("need to run in buffered mode")
    #AssertionError: need to run in buffered mode
    #
    #----------------------------------------------------------------------
    #Ran 1 test in 0.002s
    #
    #FAILED (failures=1)

Question 25

A lot of these answers failed for me because you can’t from StringIO import StringIO in Python 3. Here’s a minimum working snippet based on @naxa’s comment and the Python Cookbook.

from io import StringIO
from unittest.mock import patch

with patch('sys.stdout', new=StringIO()) as fakeOutput:
    print('hello world')
    self.assertEqual(fakeOutput.getvalue().strip(), 'hello world')

Question 26

In python 3.5 you can use contextlib.redirect_stdout() and StringIO(). Here’s the modification to your code

import contextlib
from io import StringIO
from foomodule import foo

def test_foo():
    temp_stdout = StringIO()
    with contextlib.redirect_stdout(temp_stdout):
        foo()
    output = temp_stdout.getvalue().strip()
    assert output == 'hello world!'

Question 27

I’m only just learning Python and found myself struggling with a similar problem to the one above with unit tests for methods with output. My passing unit test for foo module above has ended up looking like this:

import sys
import unittest
from foo import foo
from StringIO import StringIO

class FooTest (unittest.TestCase):
    def setUp(self):
        self.held, sys.stdout = sys.stdout, StringIO()

    def test_foo(self):
        foo()
        self.assertEqual(sys.stdout.getvalue(),'hello world!\n')

Question 28

Writing tests often shows us a better way to write our code. Similar to Shane’s answer, I’d like to suggest yet another way of looking at this. Do you really want to assert that your program outputted a certain string, or just that it constructed a certain string for output? This becomes easier to test, since we can probably assume that the Python print statement does its job correctly.

def foo_msg():
    return 'hello world'

def foo():
    print foo_msg()

Then your test is very simple:

def test_foo_msg():
    assert 'hello world' == foo_msg()

Of course, if you really have a need to test your program’s actual output, then feel free to disregard. :)

Question 29

Based on Rob Kennedy’s answer, I wrote a class-based version of the context manager to buffer the output.

Usage is like:

with OutputBuffer() as bf:
    print('hello world')
assert bf.out == 'hello world\n'

Here’s the implementation:

from io import StringIO
import sys


class OutputBuffer(object):

    def __init__(self):
        self.stdout = StringIO()
        self.stderr = StringIO()

    def __enter__(self):
        self.original_stdout, self.original_stderr = sys.stdout, sys.stderr
        sys.stdout, sys.stderr = self.stdout, self.stderr
        return self

    def __exit__(self, exception_type, exception, traceback):
        sys.stdout, sys.stderr = self.original_stdout, self.original_stderr

    @property
    def out(self):
        return self.stdout.getvalue()

    @property
    def err(self):
        return self.stderr.getvalue()

Question 30

Or consider using pytest, it has built-in support for asserting stdout and stderr. See docs

def test_myoutput(capsys): # or use "capfd" for fd-level
    print("hello")
    captured = capsys.readouterr()
    assert captured.out == "hello\n"
    print("next")
    captured = capsys.readouterr()
    assert captured.out == "next\n"

Question 31

Both n611x007 and Noumenon already suggested using unittest.mock, but this answer adapts Acumenus’s to show how you can easily wrap unittest.TestCase methods to interact with a mocked stdout.

import io
import unittest
import unittest.mock

msg = "Hello World!"


# function we will be testing
def foo():
    print(msg, end="")


# create a decorator which wraps a TestCase method and pass it a mocked
# stdout object
mock_stdout = unittest.mock.patch('sys.stdout', new_callable=io.StringIO)


class MyTests(unittest.TestCase):

    @mock_stdout
    def test_foo(self, stdout):
        # run the function whose output we want to test
        foo()
        # get its output from the mocked stdout
        actual = stdout.getvalue()
        expected = msg
        self.assertEqual(actual, expected)

Question 32

Building on all the awesome answers in this thread, this is how I solved it. I wanted to keep it as stock as possible. I augmented the unit test mechanism using setUp() to capture sys.stdout and sys.stderr, added new assert APIs to check the captured values against an expected value and then restore sys.stdout and sys.stderr upon tearDown(). I did this to keep a similar unit test API as the built-inunittestAPI while still being able to unit test values printed tosys.stdoutorsys.stderr`.

import io
import sys
import unittest


class TestStdout(unittest.TestCase):

    # before each test, capture the sys.stdout and sys.stderr
    def setUp(self):
        self.test_out = io.StringIO()
        self.test_err = io.StringIO()
        self.original_output = sys.stdout
        self.original_err = sys.stderr
        sys.stdout = self.test_out
        sys.stderr = self.test_err

    # restore sys.stdout and sys.stderr after each test
    def tearDown(self):
        sys.stdout = self.original_output
        sys.stderr = self.original_err

    # assert that sys.stdout would be equal to expected value
    def assertStdoutEquals(self, value):
        self.assertEqual(self.test_out.getvalue().strip(), value)

    # assert that sys.stdout would not be equal to expected value
    def assertStdoutNotEquals(self, value):
        self.assertNotEqual(self.test_out.getvalue().strip(), value)

    # assert that sys.stderr would be equal to expected value
    def assertStderrEquals(self, value):
        self.assertEqual(self.test_err.getvalue().strip(), value)

    # assert that sys.stderr would not be equal to expected value
    def assertStderrNotEquals(self, value):
        self.assertNotEqual(self.test_err.getvalue().strip(), value)

    # example of unit test that can capture the printed output
    def test_print_good(self):
        print("------")

        # use assertStdoutEquals(value) to test if your
        # printed value matches your expected `value`
        self.assertStdoutEquals("------")

    # fails the test, expected different from actual!
    def test_print_bad(self):
        print("@=@=")
        self.assertStdoutEquals("@-@-")


if __name__ == '__main__':
    unittest.main()

When the unit test is run, the output is:

$ python3 -m unittest -v tests/print_test.py
test_print_bad (tests.print_test.TestStdout) ... FAIL
test_print_good (tests.print_test.TestStdout) ... ok

======================================================================
FAIL: test_print_bad (tests.print_test.TestStdout)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/tests/print_test.py", line 51, in test_print_bad
    self.assertStdoutEquals("@-@-")
  File "/tests/print_test.py", line 24, in assertStdoutEquals
    self.assertEqual(self.test_out.getvalue().strip(), value)
AssertionError: '@=@=' != '@-@-'
- @=@=
+ @-@-


----------------------------------------------------------------------
Ran 2 tests in 0.001s

FAILED (failures=1)

Question 33

I want to get all the <a> tags which are children of <li>:

<div>
<li class="test">
    <a>link1</a>
    <ul> 
       <li>  
          <a>link2</a> 
       </li>
    </ul>
</li>
</div>

I know how to find element with particular class like this:

soup.find("li", { "class" : "test" })

But I don’t know how to find all <a> which are children of <li class=test> but not any others.

Like I want to select:

<a>link1</a>

Question 34

Try this

li = soup.find('li', {'class': 'text'})
children = li.findChildren("a" , recursive=False)
for child in children:
    print(child)

Question 35

There’s a super small section in the DOCs that shows how to find/find_all direct children.

https://www.crummy.com/software/BeautifulSoup/bs4/doc/#the-recursive-argument

In your case as you want link1 which is first direct child:

# for only first direct child
soup.find("li", { "class" : "test" }).find("a", recursive=False)

If you want all direct children:

# for all direct children
soup.find("li", { "class" : "test" }).findAll("a", recursive=False)

Question 36

Perhaps you want to do

soup.find("li", { "class" : "test" }).find('a')

Question 37

try this:

li = soup.find("li", { "class" : "test" })
children = li.find_all("a") # returns a list of all <a> children of li

other reminders:

The find method only gets the first occurring child element. The find_all method gets all descendant elements and are stored in a list.

Question 38

“How to find all a which are children of <li class=test> but not any others?”

Given the HTML below (I added another <a> to show te difference between select and select_one):

<div>
  <li class="test">
    <a>link1</a>
    <ul>
      <li>
        <a>link2</a>
      </li>
    </ul>
    <a>link3</a>
  </li>
</div>

The solution is to use child combinator (>) that is placed between two CSS selectors:

>>> soup.select('li.test > a')
[<a>link1</a>, <a>link3</a>]

In case you want to find only the first child:

>>> soup.select_one('li.test > a')
<a>link1</a>

Question 39

Yet another method – create a filter function that returns True for all desired tags:

def my_filter(tag):
    return (tag.name == 'a' and
        tag.parent.name == 'li' and
        'test' in tag.parent['class'])

Then just call find_all with the argument:

for a in soup(my_filter): # or soup.find_all(my_filter)
    print a

Question 40

I need to save to disk a little dict object whose keys are of the type str and values are ints and then recover it. Something like this:

{'juanjo': 2, 'pedro':99, 'other': 333}

What is the best option and why? Serialize it with pickle or with simplejson?

I am using Python 2.6.

Question 41

If you do not have any interoperability requirements (e.g. you are just going to use the data with Python) and a binary format is fine, go with cPickle which gives you really fast Python object serialization.

If you want interoperability or you want a text format to store your data, go with JSON (or some other appropriate format depending on your constraints).

Question 42

I prefer JSON over pickle for my serialization. Unpickling can run arbitrary code, and using pickle to transfer data between programs or store data between sessions is a security hole. JSON does not introduce a security hole and is standardized, so the data can be accessed by programs in different languages if you ever need to.

Question 43

You might also find this interesting, with some charts to compare: http://kovshenin.com/archives/pickle-vs-json-which-is-faster/

Question 44

If you are primarily concerned with speed and space, use cPickle because cPickle is faster than JSON.

If you are more concerned with interoperability, security, and/or human readability, then use JSON.

The tests results referenced in other answers were recorded in 2010, and the updated tests in 2016 with cPickle protocol 2 show:

cPickle 3.8x faster loading
cPickle 1.5x faster reading
cPickle slightly smaller encoding

Reproduce this yourself with this gist, which is based on the Konstantin’s benchmark referenced in other answers, but using cPickle with protocol 2 instead of pickle, and using json instead of simplejson (since json is faster than simplejson), e.g.

wget https://gist.github.com/jdimatteo/af317ef24ccf1b3fa91f4399902bb534/raw/03e8dbab11b5605bc572bc117c8ac34cfa959a70/pickle_vs_json.py
python pickle_vs_json.py

Results with python 2.7 on a decent 2015 Xeon processor:

Dir Entries Method  Time    Length

dump    10  JSON    0.017   1484510
load    10  JSON    0.375   -
dump    10  Pickle  0.011   1428790
load    10  Pickle  0.098   -
dump    20  JSON    0.036   2969020
load    20  JSON    1.498   -
dump    20  Pickle  0.022   2857580
load    20  Pickle  0.394   -
dump    50  JSON    0.079   7422550
load    50  JSON    9.485   -
dump    50  Pickle  0.055   7143950
load    50  Pickle  2.518   -
dump    100 JSON    0.165   14845100
load    100 JSON    37.730  -
dump    100 Pickle  0.107   14287900
load    100 Pickle  9.907   -

Python 3.4 with pickle protocol 3 is even faster.

Question 45

JSON or pickle? How about JSON and pickle! You can use jsonpickle. It easy to use and the file on disk is readable because it’s JSON.

http://jsonpickle.github.com/

Question 46

I have tried several methods and found out that using cPickle with setting the protocol argument of the dumps method as: cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL) is the fastest dump method.

import msgpack
import json
import pickle
import timeit
import cPickle
import numpy as np

num_tests = 10

obj = np.random.normal(0.5, 1, [240, 320, 3])

command = 'pickle.dumps(obj)'
setup = 'from __main__ import pickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("pickle:  %f seconds" % result)

command = 'cPickle.dumps(obj)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle:   %f seconds" % result)


command = 'cPickle.dumps(obj, protocol=cPickle.HIGHEST_PROTOCOL)'
setup = 'from __main__ import cPickle, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("cPickle highest:   %f seconds" % result)

command = 'json.dumps(obj.tolist())'
setup = 'from __main__ import json, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("json:   %f seconds" % result)


command = 'msgpack.packb(obj.tolist())'
setup = 'from __main__ import msgpack, obj'
result = timeit.timeit(command, setup=setup, number=num_tests)
print("msgpack:   %f seconds" % result)

Output:

pickle         :   0.847938 seconds
cPickle        :   0.810384 seconds
cPickle highest:   0.004283 seconds
json           :   1.769215 seconds
msgpack        :   0.270886 seconds

Question 47

Personally, I generally prefer JSON because the data is human-readable. Definitely, if you need to serialize something that JSON won’t take, than use pickle.

But for most data storage, you won’t need to serialize anything weird and JSON is much easier and always allows you to pop it open in a text editor and check out the data yourself.

The speed is nice, but for most datasets the difference is negligible; Python generally isn’t too fast anyways.

Question 48

What I’m trying to do here is get the headers of a given URL so I can determine the MIME type. I want to be able to see if http://somedomain/foo/ will return an HTML document or a JPEG image for example. Thus, I need to figure out how to send a HEAD request so that I can read the MIME type without having to download the content. Does anyone know of an easy way of doing this?

Question 49

edit: This answer works, but nowadays you should just use the requests library as mentioned by other answers below.

Use httplib.

>>> import httplib
>>> conn = httplib.HTTPConnection("www.google.com")
>>> conn.request("HEAD", "/index.html")
>>> res = conn.getresponse()
>>> print res.status, res.reason
200 OK
>>> print res.getheaders()
[('content-length', '0'), ('expires', '-1'), ('server', 'gws'), ('cache-control', 'private, max-age=0'), ('date', 'Sat, 20 Sep 2008 06:43:36 GMT'), ('content-type', 'text/html; charset=ISO-8859-1')]

There’s also a getheader(name) to get a specific header.

Question 50

urllib2 can be used to perform a HEAD request. This is a little nicer than using httplib since urllib2 parses the URL for you instead of requiring you to split the URL into host name and path.

>>> import urllib2
>>> class HeadRequest(urllib2.Request):
...     def get_method(self):
...         return "HEAD"
... 
>>> response = urllib2.urlopen(HeadRequest("http://google.com/index.html"))

Headers are available via response.info() as before. Interestingly, you can find the URL that you were redirected to:

>>> print response.geturl()
http://www.google.com.au/index.html

Question 51

Obligatory Requests way:

import requests

resp = requests.head("http://www.google.com")
print resp.status_code, resp.text, resp.headers

Question 52

I believe the Requests library should be mentioned as well.

Question 53

Just:

import urllib2
request = urllib2.Request('http://localhost:8080')
request.get_method = lambda : 'HEAD'

response = urllib2.urlopen(request)
response.info().gettype()

Edit: I’ve just came to realize there is httplib2 :D

import httplib2
h = httplib2.Http()
resp = h.request("http://www.google.com", 'HEAD')
assert resp[0]['status'] == 200
assert resp[0]['content-type'] == 'text/html'
...

link text

Question 54

For completeness to have a Python3 answer equivalent to the accepted answer using httplib.

It is basically the same code just that the library isn’t called httplib anymore but http.client

from http.client import HTTPConnection

conn = HTTPConnection('www.google.com')
conn.request('HEAD', '/index.html')
res = conn.getresponse()

print(res.status, res.reason)

Question 55

import httplib
import urlparse

def unshorten_url(url):
    parsed = urlparse.urlparse(url)
    h = httplib.HTTPConnection(parsed.netloc)
    h.request('HEAD', parsed.path)
    response = h.getresponse()
    if response.status/100 == 3 and response.getheader('Location'):
        return response.getheader('Location')
    else:
        return url

Question 56

As an aside, when using the httplib (at least on 2.5.2), trying to read the response of a HEAD request will block (on readline) and subsequently fail. If you do not issue read on the response, you are unable to send another request on the connection, you will need to open a new one. Or accept a long delay between requests.

Question 57

I have found that httplib is slightly faster than urllib2. I timed two programs – one using httplib and the other using urllib2 – sending HEAD requests to 10,000 URL’s. The httplib one was faster by several minutes. httplib‘s total stats were: real 6m21.334s user 0m2.124s sys 0m16.372s

And urllib2‘s total stats were: real 9m1.380s user 0m16.666s sys 0m28.565s

Does anybody else have input on this?

Question 58

And yet another approach (similar to Pawel answer):

import urllib2
import types

request = urllib2.Request('http://localhost:8080')
request.get_method = types.MethodType(lambda self: 'HEAD', request, request.__class__)

Just to avoid having unbounded methods at instance level.

Question 59

Probably easier: use urllib or urllib2.

>>> import urllib
>>> f = urllib.urlopen('http://google.com')
>>> f.info().gettype()
'text/html'

f.info() is a dictionary-like object, so you can do f.info()[‘content-type’], etc.

http://docs.python.org/library/urllib.html
http://docs.python.org/library/urllib2.html
http://docs.python.org/library/httplib.html

The docs note that httplib is not normally used directly.

Question 60

How can I send trace messages to the console (like print) when I’m running my Django app under manage.py runserver, but have those messages sent to a log file when I’m running the app under Apache?

I reviewed Django logging and although I was impressed with its flexibility and configurability for advanced uses, I’m still stumped with how to handle my simple use-case.

Question 61

Text printed to stderr will show up in httpd’s error log when running under mod_wsgi. You can either use print directly, or use logging instead.

print >>sys.stderr, 'Goodbye, cruel world!'

Question 62

Here’s a Django logging-based solution. It uses the DEBUG setting rather than actually checking whether or not you’re running the development server, but if you find a better way to check for that it should be easy to adapt.

LOGGING = {
    'version': 1,
    'formatters': {
        'verbose': {
            'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s'
        },
        'simple': {
            'format': '%(levelname)s %(message)s'
        },
    },
    'handlers': {
        'console': {
            'level': 'DEBUG',
            'class': 'logging.StreamHandler',
            'formatter': 'simple'
        },
        'file': {
            'level': 'DEBUG',
            'class': 'logging.FileHandler',
            'filename': '/path/to/your/file.log',
            'formatter': 'simple'
        },
    },
    'loggers': {
        'django': {
            'handlers': ['file'],
            'level': 'DEBUG',
            'propagate': True,
        },
    }
}

if DEBUG:
    # make all loggers use the console.
    for logger in LOGGING['loggers']:
        LOGGING['loggers'][logger]['handlers'] = ['console']

see https://docs.djangoproject.com/en/dev/topics/logging/ for details.

Question 63

You can configure logging in your settings.py file.

One example:

if DEBUG:
    # will output to your console
    logging.basicConfig(
        level = logging.DEBUG,
        format = '%(asctime)s %(levelname)s %(message)s',
    )
else:
    # will output to logging file
    logging.basicConfig(
        level = logging.DEBUG,
        format = '%(asctime)s %(levelname)s %(message)s',
        filename = '/my_log_file.log',
        filemode = 'a'
    )

However that’s dependent upon setting DEBUG, and maybe you don’t want to have to worry about how it’s set up. See this answer on How can I tell whether my Django application is running on development server or not? for a better way of writing that conditional. Edit: the example above is from a Django 1.1 project, logging configuration in Django has changed somewhat since that version.

Question 64

I use this:

logging.conf:

[loggers]
keys=root,applog
[handlers]
keys=rotateFileHandler,rotateConsoleHandler

[formatters]
keys=applog_format,console_format

[formatter_applog_format]
format=%(asctime)s-[%(levelname)-8s]:%(message)s

[formatter_console_format]
format=%(asctime)s-%(filename)s%(lineno)d[%(levelname)s]:%(message)s

[logger_root]
level=DEBUG
handlers=rotateFileHandler,rotateConsoleHandler

[logger_applog]
level=DEBUG
handlers=rotateFileHandler
qualname=simple_example

[handler_rotateFileHandler]
class=handlers.RotatingFileHandler
level=DEBUG
formatter=applog_format
args=('applog.log', 'a', 10000, 9)

[handler_rotateConsoleHandler]
class=StreamHandler
level=DEBUG
formatter=console_format
args=(sys.stdout,)

testapp.py:

import logging
import logging.config

def main():
    logging.config.fileConfig('logging.conf')
    logger = logging.getLogger('applog')

    logger.debug('debug message')
    logger.info('info message')
    logger.warn('warn message')
    logger.error('error message')
    logger.critical('critical message')
    #logging.shutdown()

if __name__ == '__main__':
    main()

问题：Python中的CURL替代

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：使用cython和mingw进行编译会产生gcc：错误：无法识别的命令行选项’-mno-cygwin’

回答 0

回答 1

回答 2

回答 3

问题：在Python脚本中，如何设置PYTHONPATH？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何在python中使用鼻子测试/单元测试来断言输出？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：如何使用BeautifulSoup查找节点的子节点

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：泡菜还是json？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：如何在Python 2中发送HEAD HTTP请求？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：Python / Django：登录到runserver下的控制台，登录到Apache下的文件

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：Django：使用形式在一个模板中的多个模型

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：在sudo下运行pip install是否可以接受并且安全？

回答 0

回答 1

回答 2

回答 3

回答 4