Python 实用宝典

Question 1

Should I test if something is valid or just try to do it and catch the exception?

Is there any solid documentation saying that one way is preferred?
Is one way more pythonic?

For example, should I:

if len(my_list) >= 4:
    x = my_list[3]
else:
    x = 'NO_ABC'

Or:

try:
    x = my_list[3]
except IndexError:
    x = 'NO_ABC'

Some thoughts…
PEP 20 says:

Errors should never pass silently.
Unless explicitly silenced.

Should using a try instead of an if be interpreted as an error passing silently? And if so, are you explicitly silencing it by using it in this way, therefore making it OK?

I’m not referring to situations where you can only do things 1 way; for example:

try:
    import foo
except ImportError:
    import baz

Question 2

You should prefer try/except over if/else if that results in

speed-ups (for example by preventing extra lookups)
cleaner code (fewer lines/easier to read)

Often, these go hand-in-hand.

speed-ups

In the case of trying to find an element in a long list by:

try:
    x = my_list[index]
except IndexError:
    x = 'NO_ABC'

the try, except is the best option when the index is probably in the list and the IndexError is usually not raised. This way you avoid the need for an extra lookup by if index < len(my_list).

Python encourages the use of exceptions, which you handle is a phrase from Dive Into Python. Your example not only handles the exception (gracefully), rather than letting it silently pass, also the exception occurs only in the exceptional case of index not being found (hence the word exception!).

cleaner code

The official Python Documentation mentions EAFP: Easier to ask for forgiveness than permission and Rob Knight notes that catching errors rather than avoiding them, can result in cleaner, easier to read code. His example says it like this:

Worse (LBYL ‘look before you leap’):

#check whether int conversion will raise an error
if not isinstance(s, str) or not s.isdigit():
    return None
elif len(s) > 10:    #too many digits for int conversion
    return None
else:
    return int(s)

Better (EAFP: Easier to ask for forgiveness than permission):

try:
    return int(s)
except (TypeError, ValueError, OverflowError): #int conversion failed
    return None

Question 3

In this particular case, you should use something else entirely:

x = myDict.get("ABC", "NO_ABC")

In general, though: If you expect the test to fail frequently, use if. If the test is expensive relative to just trying the operation and catching the exception if it fails, use try. If neither one of these conditions applies, go with whatever reads easier.

Question 4

Using try and except directly rather than inside an if guard should always be done if there is any possibility of a race condition. For example, if you want to ensure that a directory exists, do not do this:

import os, sys
if not os.path.isdir('foo'):
  try:
    os.mkdir('foo')
  except OSError, e
    print e
    sys.exit(1)

If another thread or process creates the directory between isdir and mkdir, you’ll exit. Instead, do this:

import os, sys, errno
try:
  os.mkdir('foo')
except OSError, e
  if e.errno != errno.EEXIST:
    print e
    sys.exit(1)

That will only exit if the ‘foo’ directory can’t be created.

Question 5

If it’s trivial to check whether something will fail before you do it, you should probably favor that. After all, constructing exceptions (including their associated tracebacks) takes time.

Exceptions should be used for:

things that are unexpected, or…
things where you need to jump more than one level of logic (e.g. where a break doesn’t get you far enough), or…
things where you don’t know exactly what is going to be handling the exception ahead of time, or…
things where checking ahead of time for failure is expensive (relative to just attempting the operation)

Note that oftentimes, the real answer is “neither” – for instance, in your first example, what you really should do is just use .get() to provide a default:

x = myDict.get('ABC', 'NO_ABC')

Question 6

As the other posts mention, it depends on the situation. There are a few dangers with using try/except in place of checking the validity of your data in advance, especially when using it on bigger projects.

The code in the try block may have a chance to wreak all sorts of havoc before the exception is caught – if you proactively check beforehand with an if statement you can avoid this.
If the code called in your try block raises a common exception type, like TypeError or ValueError, you may not actually catch the same exception you were expecting to catch – it may be something else that raise the same exception class before or after even getting to the line where your exception may be raised.

e.g., suppose you had:

try:
    x = my_list[index_list[3]]
except IndexError:
    x = 'NO_ABC'

The IndexError says nothing about whether it occurred when trying to get an element of index_list or my_list.

Question 7

Should using a try instead of an if be interpreted as an error passing silently? And if so, are you explicitly silencing it by using it in this way, therefore making it OK?

Using try is acknowledging that an error may pass, which is the opposite of having it pass silently. Using except is causing it not to pass at all.

Using try: except: is preferred in cases where if: else: logic is more complicated. Simple is better than complex; complex is better than complicated; and it’s easier to ask for forgiveness than permission.

What “errors should never pass silently” is warning about, is the case where code could raise an exception that you know about, and where your design admits the possibility, but you haven’t designed in a way to deal with the exception. Explicitly silencing an error, in my view, would be doing something like pass in an except block, which should only be done with an understanding that “doing nothing” really is the correct error handling in the particular situation. (This is one of the few times where I feel like a comment in well-written code is probably really needed.)

However, in your particular example, neither is appropriate:

x = myDict.get('ABC', 'NO_ABC')

The reason everyone is pointing this out – even though you acknowledge your desire to understand in general, and inability to come up with a better example – is that equivalent side-steps actually exist in quite a lot of cases, and looking for them is the first step in solving the problem.

Question 8

Whenever you use try/except for control flow, ask yourself:

Is it easy to see when the try block succeeds and when it fails?
Are you aware of all side effects inside the try block?
Are you aware of all cases in which the try block throws the exception?
If the implementation of the try block changes, will your control flow still behave as expected?

If the answer to one or more of these questions is ‘no’, there might be a lot of forgiveness to ask for; most likely from your future self.

An example. I recently saw code in a larger project that looked like this:

try:
    y = foo(x)
except ProgrammingError:
    y = bar(x)

Talking to the programmer it turned that the intended control flow was:

If x is an integer, do y = foo(x).

If x is a list of integers, do y = bar(x).

This worked because foo made a database query and the query would be successful if x was an integer and throw a ProgrammingError if x was a list.

Using try/except is a bad choice here:

The name of the exception, ProgrammingError, does not give away the actual problem (that x is not an integer), which makes it difficult to see what is going on.
The ProgrammingError is raised during a database call, which wastes time. Things would get truly horrible if it turned out that foo writes something to the database before it throws an exception, or alters the state of some other system.
It is unclear if ProgrammingError is only raised when x is a list of integers. Suppose for instance that there is a typo in foo‘s database query. This might also raise a ProgrammingError. The consequence is that bar(x) is now also called when x is an integer. This might raise cryptic exceptions or produce unforeseeable results.
The try/except block adds a requirement to all future implementations of foo. Whenever we change foo, we must now think about how it handles lists and make sure that it throws a ProgrammingError and not, say, an AttributeError or no error at all.

Question 9

For a general meaning, you may consider reading Idioms and Anti-Idioms in Python: Exceptions.

In your particular case, as others stated, you should use dict.get():

get(key[, default])

Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.

Question 10

Python’s sum() function returns the sum of numbers in an iterable.

sum([3,4,5]) == 3 + 4 + 5 == 12

I’m looking for the function that returns the product instead.

somelib.somefunc([3,4,5]) == 3 * 4 * 5 == 60

I’m pretty sure such a function exists, but I can’t find it.

Question 11

Update:

In Python 3.8, the prod function was added to the math module. See: math.prod().

Older info: Python 3.7 and prior

The function you’re looking for would be called prod() or product() but Python doesn’t have that function. So, you need to write your own (which is easy).

Pronouncement on prod()

Yes, that’s right. Guido rejected the idea for a built-in prod() function because he thought it was rarely needed.

Alternative with reduce()

As you suggested, it is not hard to make your own using reduce() and operator.mul():

from functools import reduce  # Required in Python 3
import operator
def prod(iterable):
    return reduce(operator.mul, iterable, 1)

>>> prod(range(1, 5))
24

Note, in Python 3, the reduce() function was moved to the functools module.

Specific case: Factorials

As a side note, the primary motivating use case for prod() is to compute factorials. We already have support for that in the math module:

>>> import math

>>> math.factorial(10)
3628800

Alternative with logarithms

If your data consists of floats, you can compute a product using sum() with exponents and logarithms:

>>> from math import log, exp

>>> data = [1.2, 1.5, 2.5, 0.9, 14.2, 3.8]
>>> exp(sum(map(log, data)))
218.53799999999993

>>> 1.2 * 1.5 * 2.5 * 0.9 * 14.2 * 3.8
218.53799999999998

Note, the use of log() requires that all the inputs are positive.

Question 12

Actually, Guido vetoed the idea: http://bugs.python.org/issue1093

But, as noted in that issue, you can make one pretty easily:

from functools import reduce # Valid in Python 2.6+, required in Python 3
import operator

reduce(operator.mul, (3, 4, 5), 1)

Question 13

There isn’t one built in, but it’s simple to roll your own, as demonstrated here:

import operator
def prod(factors):
    return reduce(operator.mul, factors, 1)

See answers to this question:

Which Python module is suitable for data manipulation in a list?

Question 14

There’s a prod() in numpy that does what you’re asking for.

Question 15

Numeric.product

( or

reduce(lambda x,y:x*y,[3,4,5])

)

Question 16

Use this

def prod(iterable):
    p = 1
    for n in iterable:
        p *= n
    return p

Since there’s no built-in prod function.

Question 17

I prefer the answers a and b above using functools.reduce() and the answer using numpy.prod(), but here is yet another solution using itertools.accumulate():

import itertools
import operator
prod = list(itertools.accumulate((3, 4, 5), operator.mul))[-1]

Question 18

Perhaps not a “builtin”, but I consider it builtin. anyways just use numpy

import numpy 
prod_sum = numpy.prod(some_list)

Question 19

I get this pep8 warning whenever I use lambda expressions. Are lambda expressions not recommended? If not why?

Question 20

The recommendation in PEP-8 you are running into is:

Always use a def statement instead of an assignment statement that binds a lambda expression directly to a name.

Yes:
def f(x): return 2*x 
No:
f = lambda x: 2*x 
The first form means that the name of the resulting function object is specifically ‘f’ instead of the generic ‘<lambda>’. This is more useful for tracebacks and string representations in general. The use of the assignment statement eliminates the sole benefit a lambda expression can offer over an explicit def statement (i.e. that it can be embedded inside a larger expression)

Assigning lambdas to names basically just duplicates the functionality of def – and in general, it’s best to do something a single way to avoid confusion and increase clarity.

The legitimate use case for lambda is where you want to use a function without assigning it, e.g:

sorted(players, key=lambda player: player.rank)

In general, the main argument against doing this is that def statements will result in more lines of code. My main response to that would be: yes, and that is fine. Unless you are code golfing, minimising the number of lines isn’t something you should be doing: go for clear over short.

Question 21

Here is the story, I had a simple lambda function which I was using twice.

a = map(lambda x : x + offset, simple_list)
b = map(lambda x : x + offset, another_simple_list)

This is just for the representation, I have faced couple of different versions of this.

Now, to keep things DRY, I start to reuse this common lambda.

f = lambda x : x + offset
a = map(f, simple_list)
b = map(f, another_simple_list)

At this point my code quality checker complains about lambda being a named function so I convert it into a function.

def f(x):
    return x + offset
a = map(f, simple_list)
b = map(f, another_simple_list)

Now the checker complains that a function has to be bounded by one blank line before and after.

def f(x):
    return x + offset

a = map(f, simple_list)
b = map(f, another_simple_list)

Here we have now 6 lines of code instead of original 2 lines with no increase in readability and no increase in being pythonic. At this point the code checker complains about the function not having docstrings.

In my opinion this rule better be avoided and broken when it makes sense, use your judgement.

Question 22

Lattyware is absolutely right: Basically PEP-8 wants you to avoid things like

f = lambda x: 2 * x

and instead use

def f(x):
    return 2 * x

However, as addressed in a recent bugreport (Aug 2014), statements such as the following are now compliant:

a.f = lambda x: 2 * x
a["f"] = lambda x: 2 * x

Since my PEP-8 checker doesn’t implement this correctly yet, I turned off E731 for the time being.

Question 23

I also encountered a situation in which it was even impossible to use a def(ined) function.

class SomeClass(object):
  # pep-8 does not allow this
  f = lambda x: x + 1  # NOQA

  def not_reachable(self, x):
    return x + 1

  @staticmethod
  def also_not_reachable(x):
    return x + 1

  @classmethod
  def also_not_reachable(cls, x):
    return x + 1

  some_mapping = {
      'object1': {'name': "Object 1", 'func': f},
      'object2': {'name': "Object 2", 'func': some_other_func},
  }

In this case, I really wanted to make a mapping which belonged to the class. Some objects in the mapping needed the same function. It would be illogical to put the a named function outside of the class. I have not found a way to refer to a method (staticmethod, classmethod or normal) from inside the class body. SomeClass does not exist yet when the code is run. So referring to it from the class isn’t possible either.

Question 24

Recently I started using Python3 and it’s lack of xrange hurts.

Simple example:

1) Python2:

from time import time as t
def count():
  st = t()
  [x for x in xrange(10000000) if x%4 == 0]
  et = t()
  print et-st
count()

2) Python3:

from time import time as t

def xrange(x):

    return iter(range(x))

def count():
    st = t()
    [x for x in xrange(10000000) if x%4 == 0]
    et = t()
    print (et-st)
count()

The results are, respectively:

1) 1.53888392448 2) 3.215819835662842

Why is that? I mean, why xrange’s been removed? It’s such a great tool to learn. For the beginners, just like myself, like we all were at some point. Why remove it? Can somebody point me to the proper PEP, I can’t find it.

Cheers.

Question 25

Some performance measurements, using timeit instead of trying to do it manually with time.

First, Apple 2.7.2 64-bit:

In [37]: %timeit collections.deque((x for x in xrange(10000000) if x%4 == 0), maxlen=0)
1 loops, best of 3: 1.05 s per loop

Now, python.org 3.3.0 64-bit:

In [83]: %timeit collections.deque((x for x in range(10000000) if x%4 == 0), maxlen=0)
1 loops, best of 3: 1.32 s per loop

In [84]: %timeit collections.deque((x for x in xrange(10000000) if x%4 == 0), maxlen=0)
1 loops, best of 3: 1.31 s per loop

In [85]: %timeit collections.deque((x for x in iter(range(10000000)) if x%4 == 0), maxlen=0) 
1 loops, best of 3: 1.33 s per loop

Apparently, 3.x range really is a bit slower than 2.x xrange. And the OP’s xrange function has nothing to do with it. (Not surprising, as a one-time call to the __iter__ slot isn’t likely to be visible among 10000000 calls to whatever happens in the loop, but someone brought it up as a possibility.)

But it’s only 30% slower. How did the OP get 2x as slow? Well, if I repeat the same tests with 32-bit Python, I get 1.58 vs. 3.12. So my guess is that this is yet another of those cases where 3.x has been optimized for 64-bit performance in ways that hurt 32-bit.

But does it really matter? Check this out, with 3.3.0 64-bit again:

In [86]: %timeit [x for x in range(10000000) if x%4 == 0]
1 loops, best of 3: 3.65 s per loop

So, building the list takes more than twice as long than the entire iteration.

And as for “consumes much more resources than Python 2.6+”, from my tests, it looks like a 3.x range is exactly the same size as a 2.x xrange—and, even if it were 10x as big, building the unnecessary list is still about 10000000x more of a problem than anything the range iteration could possibly do.

And what about an explicit for loop instead of the C loop inside deque?

In [87]: def consume(x):
   ....:     for i in x:
   ....:         pass
In [88]: %timeit consume(x for x in range(10000000) if x%4 == 0)
1 loops, best of 3: 1.85 s per loop

So, almost as much time wasted in the for statement as in the actual work of iterating the range.

If you’re worried about optimizing the iteration of a range object, you’re probably looking in the wrong place.

Meanwhile, you keep asking why xrange was removed, no matter how many times people tell you the same thing, but I’ll repeat it again: It was not removed: it was renamed to range, and the 2.x range is what was removed.

Here’s some proof that the 3.3 range object is a direct descendant of the 2.x xrange object (and not of the 2.x range function): the source to 3.3 range and 2.7 xrange. You can even see the change history (linked to, I believe, the change that replaced the last instance of the string “xrange” anywhere in the file).

So, why is it slower?

Well, for one, they’ve added a lot of new features. For another, they’ve done all kinds of changes all over the place (especially inside iteration) that have minor side effects. And there’d been a lot of work to dramatically optimize various important cases, even if it sometimes slightly pessimizes less important cases. Add this all up, and I’m not surprised that iterating a range as fast as possible is now a bit slower. It’s one of those less-important cases that nobody would ever care enough to focus on. No one is likely to ever have a real-life use case where this performance difference is the hotspot in their code.

Question 26

Python3’s range is Python2’s xrange. There’s no need to wrap an iter around it. To get an actual list in Python3, you need to use list(range(...))

If you want something that works with Python2 and Python3, try this

try:
    xrange
except NameError:
    xrange = range

Question 27

Python 3’s range type works just like Python 2’s xrange. I’m not sure why you’re seeing a slowdown, since the iterator returned by your xrange function is exactly what you’d get if you iterated over range directly.

I’m not able to reproduce the slowdown on my system. Here’s how I tested:

Python 2, with xrange:

Python 2.7.3 (default, Apr 10 2012, 23:24:47) [MSC v.1500 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import timeit
>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=100)
18.631936646865853

Python 3, with range is a tiny bit faster:

Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit (AMD64)] on win32
Type "copyright", "credits" or "license()" for more information.
>>> import timeit
>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)
17.31399508687869

I recently learned that Python 3’s range type has some other neat features, such as support for slicing: range(10,100,2)[5:25:5] is range(20, 60, 10)!

Question 28

One way to fix up your python2 code is:

import sys

if sys.version_info >= (3, 0):
    def xrange(*args, **kwargs):
        return iter(range(*args, **kwargs))

Question 29

xrange from Python 2 is a generator and implements iterator while range is just a function. In Python3 I don’t know why was dropped off the xrange.

Question 30

comp:~$ python Python 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2

>>> import timeit
>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=100)

5.656799077987671

>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=100)

5.579368829727173

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

21.54827117919922

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

22.014557123184204

With timeit number=1 param:

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=1)

0.2245171070098877

>>> timeit.timeit("[x for x in xrange(1000000) if x%4]",number=1)

0.10750913619995117

comp:~$ python3 Python 3.4.3 (default, Oct 14 2015, 20:28:29) [GCC 4.8.4] on linux

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

9.113872020003328

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=100)

9.07014398300089

With timeit number=1,2,3,4 param works quick and in linear way:

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=1)

0.09329321900440846

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=2)

0.18501482300052885

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=3)

0.2703447980020428

>>> timeit.timeit("[x for x in range(1000000) if x%4]",number=4)

0.36209142999723554

So it seems if we measure 1 running loop cycle like timeit.timeit(“[x for x in range(1000000) if x%4]”,number=1) (as we actually use in real code) python3 works quick enough, but in repeated loops python 2 xrange() wins in speed against range() from python 3.

问题：更好地“尝试”某些东西并捕获异常或测试是否有可能首先避免异常？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：sum（）的功能是什么，但要乘法呢？产品（）？

回答 0

更新：

较早的信息：Python 3.7及更低版本

在prod（）上的发音

用reduce（）替代

具体情况：阶乘

对数的替代

Update:

Older info: Python 3.7 and prior

Pronouncement on prod()

Alternative with reduce()

Specific case: Factorials

Alternative with logarithms

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：E731不分配lambda表达式，使用def

回答 0

回答 1

回答 2

回答 3

问题：为什么Python3中没有xrange函数？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

有趣好用的Python教程