Python 实用宝典

Question 1

What is the difference between subprocess.Popen() and os.system()?

Question 2

If you check out the subprocess section of the Python docs, you’ll notice there is an example of how to replace os.system() with subprocess.Popen():

sts = os.system("mycmd" + " myarg")

…does the same thing as…

sts = Popen("mycmd" + " myarg", shell=True).wait()

The “improved” code looks more complicated, but it’s better because once you know subprocess.Popen(), you don’t need anything else. subprocess.Popen() replaces several other tools (os.system() is just one of those) that were scattered throughout three other Python modules.

If it helps, think of subprocess.Popen() as a very flexible os.system().

Question 3

subprocess.Popen() is strict super-set of os.system().

Question 4

os.system is equivalent to Unix system command, while subprocess was a helper module created to provide many of the facilities provided by the Popen commands with an easier and controllable interface. Those were designed similar to the Unix Popen command.

system() executes a command specified in command by calling /bin/sh -c command, and returns after the command has been completed

Whereas:

The popen() function opens a process by creating a pipe, forking, and invoking the shell.

If you are thinking which one to use, then use subprocess definitely because you have all the facilities for execution, plus additional control over the process.

Question 5

Subprocess is based on popen2, and as such has a number of advantages – there’s a full list in the PEP here, but some are:

using pipe in the shell
better newline support
better handling of exceptions

Question 6

When running python (cpython) on windows the <built-in function system> os.system will execute under the curtains _wsystem while if you’re using a non-windows os, it’ll use system.

On contrary, Popen should use CreateProcess on windows and _posixsubprocess.fork_exec in posix-based operating-systems.

That said, an important piece of advice comes from os.system docs, which says:

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the subprocess documentation for some helpful recipes.

Question 7

Sometimes it seems natural to have a default parameter which is an empty list. Yet Python gives unexpected behavior in these situations.

If for example, I have a function:

def my_func(working_list = []):
    working_list.append("a")
    print(working_list)

The first time it is called the default will work, but calls after that will update the existing list (with one “a” each call) and print the updated version.

So, what is the pythonic way to get the behavior I desire (a fresh list on each call)?

Question 8

def my_func(working_list=None):
    if working_list is None: 
        working_list = []

    working_list.append("a")
    print(working_list)

The docs say you should use None as the default and explicitly test for it in the body of the function.

Question 9

Other answers have already already provided the direct solutions as asked for, however, since this is a very common pitfall for new Python programmers, it’s worth adding the explanation of why Python behaves this way, which is nicely summarized in The Hitchhikers Guide to Python under Mutable Default Arguments:

Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.

Question 10

Not that it matters in this case, but you can use object identity to test for None:

if working_list is None: working_list = []

You could also take advantage of how the boolean operator or is defined in python:

working_list = working_list or []

Though this will behave unexpectedly if the caller gives you an empty list (which counts as false) as working_list and expects your function to modify the list he gave it.

Question 11

If the intent of the function is to modify the parameter passed as working_list, see HenryR’s answer (=None, check for None inside).

But if you didn’t intend to mutate the argument, just use it as starting point for a list, you can simply copy it:

def myFunc(starting_list = []):
    starting_list = list(starting_list)
    starting_list.append("a")
    print starting_list

(or in this simple case just print starting_list + ["a"] but I guess that was just a toy example)

In general, mutating your arguments is bad style in Python. The only functions that are fully expected to mutate an object are methods of the object. It’s even rarer to mutate an optional argument — is a side effect that happens only in some calls really the best interface?

If you do it from the C habit of “output arguments”, that’s completely unnecessary – you can always return multiple values as a tuple.
If you do this to efficiently build a long list of results without building intermediate lists, consider writing it as a generator and using result_list.extend(myFunc()) when you are calling it. This way your calling conventions remains very clean.

One pattern where mutating an optional arg is frequently done is a hidden “memo” arg in recursive functions:

def depth_first_walk_graph(graph, node, _visited=None):
    if _visited is None:
        _visited = set()  # create memo once in top-level call

    if node in _visited:
        return
    _visited.add(node)
    for neighbour in graph[node]:
        depth_first_walk_graph(graph, neighbour, _visited)

Question 12

I might be off-topic, but remember that if you just want to pass a variable number of arguments, the pythonic way is to pass a tuple *args or a dictionary **kargs. These are optional and are better than the syntax myFunc([1, 2, 3]).

If you want to pass a tuple:

def myFunc(arg1, *args):
  print args
  w = []
  w += args
  print w
>>>myFunc(1, 2, 3, 4, 5, 6, 7)
(2, 3, 4, 5, 6, 7)
[2, 3, 4, 5, 6, 7]

If you want to pass a dictionary:

def myFunc(arg1, **kargs):
   print kargs
>>>myFunc(1, option1=2, option2=3)
{'option2' : 2, 'option1' : 3}

Question 13

There have already been good and correct answers provided. I just wanted to give another syntax to write what you want to do which I find more beautiful when you for instance want to create a class with default empty lists:

class Node(object):
    def __init__(self, _id, val, parents=None, children=None):
        self.id = _id
        self.val = val
        self.parents = parents if parents is not None else []
        self.children = children if children is not None else []

This snippet makes use of the if else operator syntax. I like it especially because it’s a neat little one-liner without colons, etc. involved and it nearly reads like a normal English sentence. :)

In your case you could write

def myFunc(working_list=None):
    working_list = [] if working_list is None else working_list
    working_list.append("a")
    print working_list

Question 14

I took the UCSC extension class Python for programmer

Which is true of: def Fn(data = []):

a) is a good idea so that your data lists start empty with every call.

b) is a good idea so that all calls to the function that do not provide any arguments on the call will get the empty list as data.

c) is a reasonable idea as long as your data is a list of strings.

d) is a bad idea because the default [] will accumulate data and the default [] will change with subsequent calls.

Answer:

d) is a bad idea because the default [] will accumulate data and the default [] will change with subsequent calls.

Question 15

In python do you generally use PEP 8 — Style Guide for Python Code as your coding standards/guidelines? Are there any other formalized standards that you prefer?

Question 16

“In python do you generally use PEP 8 — Style Guide for Python Code as your coding standards/guidelines? Are there any other formalized standards that you prefer?”

As mentioned by you follow PEP 8 for the main text, and PEP 257 for docstring conventions

Along with Python Style Guides, I suggest that you refer the following:

Question 17

I follow the Python Idioms and Efficiency guidelines, by Rob Knight. I think they are exactly the same as PEP 8, but are more synthetic and based on examples.

If you are using wxPython you might also want to check Style Guide for wxPython code, by Chris Barker, as well.

Question 18

I stick to PEP-8 very closely.

There are three specific things that I can’t be bothered to change to PEP-8.

Avoid extraneous whitespace immediately inside parentheses, brackets or braces.

Suggested: spam(ham[1], {eggs: 2})

I do this anyway: spam( ham[ 1 ], { eggs: 2 } )

Why? 30+ years of ingrained habit is snuggling ()’s up against function names or (in C) statements keywords. Starting with Fortran IV in the 70’s.
Use spaces around arithmetic operators:

Suggested: x = x * 2 - 1

I do this anyway: x= x * 2 - 1

Why? Gries’ The Science of Programming suggested this as a way to emphasize the connection between assignment and the variable who’s state is being changed.

It doesn’t work well for multiple assignment or augmented assignment, for that I use lots of spaces.
For function names, method names and instance variable names

Suggested: lowercase, with words separated by underscores as necessary to improve readability.

I do this anyway: camelCase

Why? 20+ years of ingrained habit of camelCase, starting with Pascal in the 80’s.

Question 19

PEP 8 is good, the only thing that i wish it came down harder on was the Tabs-vs-Spaces holy war.

Basically if you are starting a project in python, you need to choose Tabs or Spaces and then shoot all offenders on sight.

Question 20

To add to bhadra’s list of idiomatic guides:

Checkout Anthony Baxter’s presentation on Effective Python Programming (from OSON 2005).

An excerpt:

# dict's setdefault method turns this:
if key in dictobj:
    dictobj[key].append(val)
else:
    dictobj[key] = [val]
# into this:
dictobj.setdefault(key,[]).append(val)

Question 21

I follow it extremely rigorously. The only god before PEP-8 is existing code bases.

Question 22

Yes, I try to follow it as closely as possible.

I don’t follow any other coding standards.

Question 23

I follow the PEP8, it is a great piece of coding style.

Question 24

I’m trying to understand scope in nested classes in Python. Here is my example code:

class OuterClass:
    outer_var = 1
    class InnerClass:
        inner_var = outer_var

The creation of class does not complete and I get the error:

<type 'exceptions.NameError'>: name 'outer_var' is not defined

Trying inner_var = Outerclass.outer_var doesn’t work. I get:

<type 'exceptions.NameError'>: name 'OuterClass' is not defined

I am trying to access the static outer_var from InnerClass.

Is there a way to do this?

Question 25

class Outer(object):
    outer_var = 1

    class Inner(object):
        @property
        def inner_var(self):
            return Outer.outer_var

This isn’t quite the same as similar things work in other languages, and uses global lookup instead of scoping the access to outer_var. (If you change what object the name Outer is bound to, then this code will use that object the next time it is executed.)

If you instead want all Inner objects to have a reference to an Outer because outer_var is really an instance attribute:

class Outer(object):
    def __init__(self):
        self.outer_var = 1

    def get_inner(self):
        return self.Inner(self)
        # "self.Inner" is because Inner is a class attribute of this class
        # "Outer.Inner" would also work, or move Inner to global scope
        # and then just use "Inner"

    class Inner(object):
        def __init__(self, outer):
            self.outer = outer

        @property
        def inner_var(self):
            return self.outer.outer_var

Note that nesting classes is somewhat uncommon in Python, and doesn’t automatically imply any sort of special relationship between the classes. You’re better off not nesting. (You can still set a class attribute on Outer to Inner, if you want.)

Question 26

I think you can simply do:

class OuterClass:
    outer_var = 1

    class InnerClass:
        pass
    InnerClass.inner_var = outer_var

The problem you encountered is due to this:

A block is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition.
(…)
A scope defines the visibility of a name within a block.
(…)
The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes generator expressions since they are implemented using a function scope. This means that the following will fail:
   class A:  

       a = 42  

       b = list(a + i for i in range(10))
http://docs.python.org/reference/executionmodel.html#naming-and-binding

The above means:
a function body is a code block and a method is a function, then names defined out of the function body present in a class definition do not extend to the function body.

Paraphrasing this for your case:
a class definition is a code block, then names defined out of the inner class definition present in an outer class definition do not extend to the inner class definition.

Question 27

You might be better off if you just don’t use nested classes. If you must nest, try this:

x = 1
class OuterClass:
    outer_var = x
    class InnerClass:
        inner_var = x

Or declare both classes before nesting them:

class OuterClass:
    outer_var = 1

class InnerClass:
    inner_var = OuterClass.outer_var

OuterClass.InnerClass = InnerClass

(After this you can del InnerClass if you need to.)

Question 28

Easiest solution:

class OuterClass:
    outer_var = 1
    class InnerClass:
        def __init__(self):
            self.inner_var = OuterClass.outer_var

It requires you to be explicit, but doesn’t take much effort.

Question 29

In Python mutable objects are passed as reference, so you can pass a reference of the outer class to the inner class.

class OuterClass:
    def __init__(self):
        self.outer_var = 1
        self.inner_class = OuterClass.InnerClass(self)
        print('Inner variable in OuterClass = %d' % self.inner_class.inner_var)

    class InnerClass:
        def __init__(self, outer_class):
            self.outer_class = outer_class
            self.inner_var = 2
            print('Outer variable in InnerClass = %d' % self.outer_class.outer_var)

Question 30

All explanations can be found in Python Documentation The Python Tutorial

For your first error <type 'exceptions.NameError'>: name 'outer_var' is not defined. The explanation is:

There is no shorthand for referencing data attributes (or other methods!) from within methods. I find that this actually increases the readability of methods: there is no chance of confusing local variables and instance variables when glancing through a method.

quoted from The Python Tutorial 9.4

For your second error <type 'exceptions.NameError'>: name 'OuterClass' is not defined

When a class definition is left normally (via the end), a class object is created.

quoted from The Python Tutorial 9.3.1

So when you try inner_var = Outerclass.outer_var, the Quterclass hasn’t been created yet, that’s why name 'OuterClass' is not defined

A more detailed but tedious explanation for your first error:

Although classes have access to enclosing functions’ scopes, though, they do not act as enclosing scopes to code nested within the class: Python searches enclosing functions for referenced names, but never any enclosing classes. That is, a class is a local scope and has access to enclosing local scopes, but it does not serve as an enclosing local scope to further nested code.

quoted from Learning.Python(5th).Mark.Lutz

Question 31

I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.

The minimal code is:

import sys 
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))

It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.

The code to calculate timings is

python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); "   "s.translate(mapper)"

The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.

What has improved in Python 3.5 that makes it faster compared to Python 3.4?

Question 32

TL;DR – ISSUE 21118

The long Story

Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:

In Python 3, str.translate() is usually a performance pessimization, not optimization.

Why was `str.translate()` slow?

The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.

The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.

What happened now?

The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.

The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.

The speeds now are almost the same as bytes

                                unpatched           patched

str.translate                   4.55125927699919    0.7898181750006188
str.translate from bytes trans  1.8910855210015143  0.779950579000797

A small note here is that the performance enhancement is only prominent in ASCII strings.

As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.

Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.

It can be an improvement from 71.6μs to 2.33μs as seen in this answer.

The following code demonstrates this

python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop

python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop

Tabulation of the results:

         Python 3.4    Python 3.5  
Ascii     91.2          2.3 
Unicode   101           117

Question 33

This is a slightly.. vain question, but BuildBot’s output isn’t particularly nice to look at..

For example, compared to..

..and others, BuildBot looks rather.. archaic

I’m currently playing with Hudson, but it is very Java-centric (although with this guide, I found it easier to setup than BuildBot, and produced more info)

Basically: is there any Continuous Integration systems aimed at python, that produce lots of shiny graphs and the likes?

Update: Since this time the Jenkins project has replaced Hudson as the community version of the package. The original authors have moved to this project as well. Jenkins is now a standard package on Ubuntu/Debian, RedHat/Fedora/CentOS, and others. The following update is still essentially correct. The starting point to do this with Jenkins is different.

Update: After trying a few alternatives, I think I’ll stick with Hudson. Integrity was nice and simple, but quite limited. I think Buildbot is better suited to having numerous build-slaves, rather than everything running on a single machine like I was using it.

Setting Hudson up for a Python project was pretty simple:

Download Hudson from http://hudson-ci.org/
Run it with java -jar hudson.war
Open the web interface on the default address of http://localhost:8080
Go to Manage Hudson, Plugins, click “Update” or similar
Install the Git plugin (I had to set the git path in the Hudson global preferences)
Create a new project, enter the repository, SCM polling intervals and so on
Install nosetests via easy_install if it’s not already
In the a build step, add nosetests --with-xunit --verbose
Check “Publish JUnit test result report” and set “Test report XMLs” to **/nosetests.xml

That’s all that’s required. You can setup email notifications, and the plugins are worth a look. A few I’m currently using for Python projects:

SLOCCount plugin to count lines of code (and graph it!) – you need to install sloccount separately
Violations to parse the PyLint output (you can setup warning thresholds, graph the number of violations over each build)
Cobertura can parse the coverage.py output. Nosetest can gather coverage while running your tests, using nosetests --with-coverage (this writes the output to **/coverage.xml)

Question 34

You might want to check out Nose and the Xunit output plugin. You can have it run your unit tests, and coverage checks with this command:

nosetests --with-xunit --enable-cover

That’ll be helpful if you want to go the Jenkins route, or if you want to use another CI server that has support for JUnit test reporting.

Similarly you can capture the output of pylint using the violations plugin for Jenkins

Question 35

Don’t know if it would do : Bitten is made by the guys who write Trac and is integrated with Trac. Apache Gump is the CI tool used by Apache. It is written in Python.

Question 36

We’ve had great success with TeamCity as our CI server and using nose as our test runner. Teamcity plugin for nosetests gives you count pass/fail, readable display for failed test( that can be E-Mailed). You can even see details of the test failures while you stack is running.

If of course supports things like running on multiple machines, and it’s much simpler to setup and maintain than buildbot.

Question 37

Buildbot’s waterfall page can be considerably prettified. Here’s a nice example http://build.chromium.org/buildbot/waterfall/waterfall

Question 38

Atlassian’s Bamboo is also definitely worth checking out. The entire Atlassian suite (JIRA, Confluence, FishEye, etc) is pretty sweet.

Question 39

I guess this thread is quite old but here is my take on it with hudson:

I decided to go with pip and set up a repo (the painful to get working but nice looking eggbasket), which hudson auto uploads to with a successful tests. Here is my rough and ready script for use with a hudson config execute script like: /var/lib/hudson/venv/main/bin/hudson_script.py -w $WORKSPACE -p my.package -v $BUILD_NUMBER, just put in **/coverage.xml, pylint.txt and nosetests.xml in the config bits:

#!/var/lib/hudson/venv/main/bin/python
import os
import re
import subprocess
import logging
import optparse

logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s %(levelname)s %(message)s')

#venvDir = "/var/lib/hudson/venv/main/bin/"

UPLOAD_REPO = "http://ldndev01:3442"

def call_command(command, cwd, ignore_error_code=False):
    try:
        logging.info("Running: %s" % command)
        status = subprocess.call(command, cwd=cwd, shell=True)
        if not ignore_error_code and status != 0:
            raise Exception("Last command failed")

        return status

    except:
        logging.exception("Could not run command %s" % command)
        raise

def main():
    usage = "usage: %prog [options]"
    parser = optparse.OptionParser(usage)
    parser.add_option("-w", "--workspace", dest="workspace",
                      help="workspace folder for the job")
    parser.add_option("-p", "--package", dest="package",
                      help="the package name i.e., back_office.reconciler")
    parser.add_option("-v", "--build_number", dest="build_number",
                      help="the build number, which will get put at the end of the package version")
    options, args = parser.parse_args()

    if not options.workspace or not options.package:
        raise Exception("Need both args, do --help for info")

    venvDir = options.package + "_venv/"

    #find out if venv is there
    if not os.path.exists(venvDir):
        #make it
        call_command("virtualenv %s --no-site-packages" % venvDir,
                     options.workspace)

    #install the venv/make sure its there plus install the local package
    call_command("%sbin/pip install -e ./ --extra-index %s" % (venvDir, UPLOAD_REPO),
                 options.workspace)

    #make sure pylint, nose and coverage are installed
    call_command("%sbin/pip install nose pylint coverage epydoc" % venvDir,
                 options.workspace)

    #make sure we have an __init__.py
    #this shouldn't be needed if the packages are set up correctly
    #modules = options.package.split(".")
    #if len(modules) > 1: 
    #    call_command("touch '%s/__init__.py'" % modules[0], 
    #                 options.workspace)
    #do the nosetests
    test_status = call_command("%sbin/nosetests %s --with-xunit --with-coverage --cover-package %s --cover-erase" % (venvDir,
                                                                                     options.package.replace(".", "/"),
                                                                                     options.package),
                 options.workspace, True)
    #produce coverage report -i for ignore weird missing file errors
    call_command("%sbin/coverage xml -i" % venvDir,
                 options.workspace)
    #move it so that the code coverage plugin can find it
    call_command("mv coverage.xml %s" % (options.package.replace(".", "/")),
                 options.workspace)
    #run pylint
    call_command("%sbin/pylint --rcfile ~/pylint.rc -f parseable %s > pylint.txt" % (venvDir, 
                                                                                     options.package),
                 options.workspace, True)

    #remove old dists so we only have the newest at the end
    call_command("rm -rfv %s" % (options.workspace + "/dist"),
                 options.workspace)

    #if the build passes upload the result to the egg_basket
    if test_status == 0:
        logging.info("Success - uploading egg")
        upload_bit = "upload -r %s/upload" % UPLOAD_REPO
    else:
        logging.info("Failure - not uploading egg")
        upload_bit = ""

    #create egg
    call_command("%sbin/python setup.py egg_info --tag-build=.0.%s --tag-svn-revision --tag-date sdist %s" % (venvDir,
                                                                                                              options.build_number,
                                                                                                              upload_bit),
                 options.workspace)

    call_command("%sbin/epydoc --html --graph all %s" % (venvDir, options.package),
                 options.workspace)

    logging.info("Complete")

if __name__ == "__main__":
    main()

When it comes to deploying stuff you can do something like:

pip -E /location/of/my/venv/ install my_package==X.Y.Z --extra-index http://my_repo

And then people can develop stuff using:

pip -E /location/of/my/venv/ install -e ./ --extra-index http://my_repo

This stuff assumes you have a repo structure per package with a setup.py and dependencies all set up then you can just check out the trunk and run this stuff on it.

I hope this helps someone out.

——update———

I’ve added epydoc which fits in really nicely with hudson. Just add javadoc to your config with the html folder

Note that pip doesn’t support the -E flag properly these days, so you have to create your venv separately

Question 40

another one : Shining Panda is a hosted tool for python

Question 41

If you’re considering hosted CI solution, and doing open source, you should look into Travis CI as well – it has very nice integration with GitHub. While it started as a Ruby tool, they have added Python support a while ago.

Question 42

Signal is another option. You can know more about it and watch a video also here.

Question 43

I would consider CircleCi – it has great Python support, and very pretty output.

Question 44

continuum’s binstar now is able to trigger builds from github and can compile for linux, osx and windows ( 32 / 64 ). the neat thing is that it really allows you to closely couple distribution and continuous integration. That’s crossing the t’s and dotting the I’s of Integration. The site, workflow and tools are really polished and AFAIK conda is the most robust and pythonic way to distributing complex python modules, where you need to wrap and distribute C/C++/Fotran libraries.

Question 45

We have used bitten quite a bit. It is pretty and integrates well with Trac, but it is a pain in the butt to customize if you have any nonstandard workflow. Also there just aren’t as many plugins as there are for the more popular tools. Currently we are evaluating Hudson as a replacement.

Question 46

Check rultor.com. As this article explains, it uses Docker for every build. Thanks to that, you can configure whatever you like inside your Docker image, including Python.

Question 47

Little disclaimer, I’ve actually had to build a solution like this for a client that wanted a way to automatically test and deploy any code on a git push plus manage the issue tickets via git notes. This also lead to my work on the AIMS project.

One could easily just setup a bare node system that has a build user and manage their build through make(1), expect(1), crontab(1)/systemd.unit(5), and incrontab(1). One could even go a step further and use ansible and celery for distributed builds with a gridfs/nfs file store.

Although, I would not expect anyone other than a Graybeard UNIX guy or Principle level engineer/architect to actually go this far. Just makes for a nice idea and potential learning experience since a build server is nothing more than a way to arbitrarily execute scripted tasks in an automated fashion.

Question 48

Can you suggest a module function from numpy/scipy that can find local maxima/minima in a 1D numpy array? Obviously the simplest approach ever is to have a look at the nearest neighbours, but I would like to have an accepted solution that is part of the numpy distro.

Question 49

If you are looking for all entries in the 1d array a smaller than their neighbors, you can try

numpy.r_[True, a[1:] < a[:-1]] & numpy.r_[a[:-1] < a[1:], True]

You could also smooth your array before this step using numpy.convolve().

I don’t think there is a dedicated function for this.

Question 50

In SciPy >= 0.11

import numpy as np
from scipy.signal import argrelextrema

x = np.random.random(12)

# for local maxima
argrelextrema(x, np.greater)

# for local minima
argrelextrema(x, np.less)

Produces

>>> x
array([ 0.56660112,  0.76309473,  0.69597908,  0.38260156,  0.24346445,
    0.56021785,  0.24109326,  0.41884061,  0.35461957,  0.54398472,
    0.59572658,  0.92377974])
>>> argrelextrema(x, np.greater)
(array([1, 5, 7]),)
>>> argrelextrema(x, np.less)
(array([4, 6, 8]),)

Note, these are the indices of x that are local max/min. To get the values, try:

>>> x[argrelextrema(x, np.greater)[0]]

scipy.signal also provides argrelmax and argrelmin for finding maxima and minima respectively.

Question 51

For curves with not too much noise, I recommend the following small code snippet:

from numpy import *

# example data with some peaks:
x = linspace(0,4,1e3)
data = .2*sin(10*x)+ exp(-abs(2-x)**2)

# that's the line, you need:
a = diff(sign(diff(data))).nonzero()[0] + 1 # local min+max
b = (diff(sign(diff(data))) > 0).nonzero()[0] + 1 # local min
c = (diff(sign(diff(data))) < 0).nonzero()[0] + 1 # local max


# graphical output...
from pylab import *
plot(x,data)
plot(x[b], data[b], "o", label="min")
plot(x[c], data[c], "o", label="max")
legend()
show()

The +1 is important, because diff reduces the original index number.

Question 52

Another approach (more words, less code) that may help:

The locations of local maxima and minima are also the locations of the zero crossings of the first derivative. It is generally much easier to find zero crossings than it is to directly find local maxima and minima.

Unfortunately, the first derivative tends to “amplify” noise, so when significant noise is present in the original data, the first derivative is best used only after the original data has had some degree of smoothing applied.

Since smoothing is, in the simplest sense, a low pass filter, the smoothing is often best (well, most easily) done by using a convolution kernel, and “shaping” that kernel can provide a surprising amount of feature-preserving/enhancing capability. The process of finding an optimal kernel can be automated using a variety of means, but the best may be simple brute force (plenty fast for finding small kernels). A good kernel will (as intended) massively distort the original data, but it will NOT affect the location of the peaks/valleys of interest.

Fortunately, quite often a suitable kernel can be created via a simple SWAG (“educated guess”). The width of the smoothing kernel should be a little wider than the widest expected “interesting” peak in the original data, and its shape will resemble that peak (a single-scaled wavelet). For mean-preserving kernels (what any good smoothing filter should be) the sum of the kernel elements should be precisely equal to 1.00, and the kernel should be symmetric about its center (meaning it will have an odd number of elements.

Given an optimal smoothing kernel (or a small number of kernels optimized for different data content), the degree of smoothing becomes a scaling factor for (the “gain” of) the convolution kernel.

Determining the “correct” (optimal) degree of smoothing (convolution kernel gain) can even be automated: Compare the standard deviation of the first derivative data with the standard deviation of the smoothed data. How the ratio of the two standard deviations changes with changes in the degree of smoothing cam be used to predict effective smoothing values. A few manual data runs (that are truly representative) should be all that’s needed.

All the prior solutions posted above compute the first derivative, but they don’t treat it as a statistical measure, nor do the above solutions attempt to performing feature preserving/enhancing smoothing (to help subtle peaks “leap above” the noise).

Finally, the bad news: Finding “real” peaks becomes a royal pain when the noise also has features that look like real peaks (overlapping bandwidth). The next more-complex solution is generally to use a longer convolution kernel (a “wider kernel aperture”) that takes into account the relationship between adjacent “real” peaks (such as minimum or maximum rates for peak occurrence), or to use multiple convolution passes using kernels having different widths (but only if it is faster: it is a fundamental mathematical truth that linear convolutions performed in sequence can always be convolved together into a single convolution). But it is often far easier to first find a sequence of useful kernels (of varying widths) and convolve them together than it is to directly find the final kernel in a single step.

Hopefully this provides enough info to let Google (and perhaps a good stats text) fill in the gaps. I really wish I had the time to provide a worked example, or a link to one. If anyone comes across one online, please post it here!

Question 53

As of SciPy version 1.1, you can also use find_peaks. Below are two examples taken from the documentation itself.

Using the height argument, one can select all maxima above a certain threshold (in this example, all non-negative maxima; this can be very useful if one has to deal with a noisy baseline; if you want to find minima, just multiply you input by -1):

import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
import numpy as np

x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()

Another extremely helpful argument is distance, which defines the minimum distance between two peaks:

peaks, _ = find_peaks(x, distance=150)
# difference between peaks is >= 150
print(np.diff(peaks))
# prints [186 180 177 171 177 169 167 164 158 162 172]

plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.show()

Question 54

Why not use Scipy built-in function signal.find_peaks_cwt to do the job ?

from scipy import signal
import numpy as np

#generate junk data (numpy 1D arr)
xs = np.arange(0, np.pi, 0.05)
data = np.sin(xs)

# maxima : use builtin function to find (max) peaks
max_peakind = signal.find_peaks_cwt(data, np.arange(1,10))

# inverse  (in order to find minima)
inv_data = 1/data
# minima : use builtin function fo find (min) peaks (use inversed data)
min_peakind = signal.find_peaks_cwt(inv_data, np.arange(1,10))

#show results
print "maxima",  data[max_peakind]
print "minima",  data[min_peakind]

results:

maxima [ 0.9995736]
minima [ 0.09146464]

Regards

Question 55

Update: I wasn’t happy with gradient so I found it more reliable to use numpy.diff. Please let me know if it does what you want.

Regarding the issue of noise, the mathematical problem is to locate maxima/minima if we want to look at noise we can use something like convolve which was mentioned earlier.

import numpy as np
from matplotlib import pyplot

a=np.array([10.3,2,0.9,4,5,6,7,34,2,5,25,3,-26,-20,-29],dtype=np.float)

gradients=np.diff(a)
print gradients


maxima_num=0
minima_num=0
max_locations=[]
min_locations=[]
count=0
for i in gradients[:-1]:
        count+=1

    if ((cmp(i,0)>0) & (cmp(gradients[count],0)<0) & (i != gradients[count])):
        maxima_num+=1
        max_locations.append(count)     

    if ((cmp(i,0)<0) & (cmp(gradients[count],0)>0) & (i != gradients[count])):
        minima_num+=1
        min_locations.append(count)


turning_points = {'maxima_number':maxima_num,'minima_number':minima_num,'maxima_locations':max_locations,'minima_locations':min_locations}  

print turning_points

pyplot.plot(a)
pyplot.show()

Question 56

While this question is really old. I believe there is a much simpler approach in numpy (a one liner).

import numpy as np

list = [1,3,9,5,2,5,6,9,7]

np.diff(np.sign(np.diff(list))) #the one liner

#output
array([ 0, -2,  0,  2,  0,  0, -2])

To find a local max or min we essentially want to find when the difference between the values in the list (3-1, 9-3…) changes from positive to negative (max) or negative to positive (min). Therefore, first we find the difference. Then we find the sign, and then we find the changes in sign by taking the difference again. (Sort of like a first and second derivative in calculus, only we have discrete data and don’t have a continuous function.)

The output in my example does not contain the extrema (the first and last values in the list). Also, just like calculus, if the second derivative is negative, you have max, and if it is positive you have a min.

Thus we have the following matchup:

[1,  3,  9,  5,  2,  5,  6,  9,  7]
    [0, -2,  0,  2,  0,  0, -2]
        Max     Min         Max

Question 57

None of these solutions worked for me since I wanted to find peaks in the center of repeating values as well. for example, in

ar = np.array([0,1,2,2,2,1,3,3,3,2,5,0])

the answer should be

array([ 3,  7, 10], dtype=int64)

I did this using a loop. I know it’s not super clean, but it gets the job done.

def findLocalMaxima(ar):
# find local maxima of array, including centers of repeating elements    
maxInd = np.zeros_like(ar)
peakVar = -np.inf
i = -1
while i < len(ar)-1:
#for i in range(len(ar)):
    i += 1
    if peakVar < ar[i]:
        peakVar = ar[i]
        for j in range(i,len(ar)):
            if peakVar < ar[j]:
                break
            elif peakVar == ar[j]:
                continue
            elif peakVar > ar[j]:
                peakInd = i + np.floor(abs(i-j)/2)
                maxInd[peakInd.astype(int)] = 1
                i = j
                break
    peakVar = ar[i]
maxInd = np.where(maxInd)[0]
return maxInd

Question 58

import numpy as np
x=np.array([6,3,5,2,1,4,9,7,8])
y=np.array([2,1,3,5,3,9,8,10,7])
sortId=np.argsort(x)
x=x[sortId]
y=y[sortId]
minm = np.array([])
maxm = np.array([])
i = 0
while i < length-1:
    if i < length - 1:
        while i < length-1 and y[i+1] >= y[i]:
            i+=1

        if i != 0 and i < length-1:
            maxm = np.append(maxm,i)

        i+=1

    if i < length - 1:
        while i < length-1 and y[i+1] <= y[i]:
            i+=1

        if i < length-1:
            minm = np.append(minm,i)
        i+=1


print minm
print maxm

minm and maxm contain indices of minima and maxima, respectively. For a huge data set, it will give lots of maximas/minimas so in that case smooth the curve first and then apply this algorithm.

Question 59

Another solution using essentially a dilate operator:

import numpy as np
from scipy.ndimage import rank_filter

def find_local_maxima(x):
   x_dilate = rank_filter(x, -1, size=3)
   return x_dilate == x

and for the minima:

def find_local_minima(x):
   x_erode = rank_filter(x, -0, size=3)
   return x_erode == x

Also, from scipy.ndimage you can replace rank_filter(x, -1, size=3) with grey_dilation and rank_filter(x, 0, size=3) with grey_erosion. This won’t require a local sort, so it is slightly faster.

Question 60

Another one:


def local_maxima_mask(vec):
    """
    Get a mask of all points in vec which are local maxima
    :param vec: A real-valued vector
    :return: A boolean mask of the same size where True elements correspond to maxima. 
    """
    mask = np.zeros(vec.shape, dtype=np.bool)
    greater_than_the_last = np.diff(vec)>0  # N-1
    mask[1:] = greater_than_the_last
    mask[:-1] &= ~greater_than_the_last
    return mask

Question 61

I need to download it for Python 2.7, but can’t seem to find it…

Question 62

‘pywin32’ is its canonical name.

http://sourceforge.net/projects/pywin32/

Question 63

~~There is a a new option as well: get it via pip! There is a package pypiwin32 with wheels available, so you can just install with: pip install pypiwin32!~~

Edit: Per comment from @movermeyer, the main project now publishes wheels at pywin32, and so can be installed with pip install pywin32

Question 64

I’ve found that UC Irvine has a great collection of python modules, pywin32 (win32api) being one of many listed there. I’m not sure how they do with keeping up with the latest versions of these modules but it hasn’t let me down yet.

UC Irvine Python Extension Repository – http://www.lfd.uci.edu/~gohlke/pythonlibs

pywin32 module – http://www.lfd.uci.edu/~gohlke/pythonlibs/#pywin32

问题：subprocess.Popen和os.system之间的区别

回答 0

回答 1

回答 2

回答 3

回答 4

问题：避免默认参数为空列表的pythonic方法是什么？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

问题：Python编码标准/最佳实践

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：嵌套类的范围？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：为什么在Python 3.5中str.translate比Python 3.4更快？

回答 0

为什么str.translate()慢呢？

现在发生什么事？

Why was str.translate() slow?

What happened now?

问题：Python的“漂亮”持续集成

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

1.准备

2.基本使用

3.实战一下

问题：在一维Numpy数组中使用Numpy查找局部最大值/最小值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

问题：在哪里可以找到适用于Python的win32api模块？[关闭]

回答 0

回答 1

回答 2

回答 3

问题：使用Django在两个日期之间选择

回答 0

回答 1

回答 2

回答 3

有趣好用的Python教程

为什么`str.translate()`慢呢？

Why was `str.translate()` slow?