The “improved” code looks more complicated, but it’s better because once you know subprocess.Popen(), you don’t need anything else. subprocess.Popen() replaces several other tools (os.system() is just one of those) that were scattered throughout three other Python modules.
If it helps, think of subprocess.Popen() as a very flexible os.system().
os.system is equivalent to Unix system command, while subprocess was a helper module created to provide many of the facilities provided by the Popen commands with an easier and controllable interface. Those were designed similar to the Unix Popen command.
system() executes a command specified in command by calling /bin/sh -c command, and returns after the command has been completed
Whereas:
The popen() function opens a process by creating a pipe, forking, and
invoking the shell.
If you are thinking which one to use, then use subprocess definitely because you have all the facilities for execution, plus additional control over the process.
When running python (cpython) on windows the <built-in function system>os.system will execute under the curtains _wsystem while if you’re using a non-windows os, it’ll use system.
That said, an important piece of advice comes from os.system docs, which says:
The subprocess module provides more powerful facilities for spawning
new processes and retrieving their results; using that module is
preferable to using this function. See the Replacing Older Functions
with the subprocess Module section in the subprocess documentation for
some helpful recipes.
The first time it is called the default will work, but calls after that will update the existing list (with one “a” each call) and print the updated version.
So, what is the pythonic way to get the behavior I desire (a fresh list on each call)?
Other answers have already already provided the direct solutions as asked for, however, since this is a very common pitfall for new Python programmers, it’s worth adding the explanation of why Python behaves this way, which is nicely summarized in The Hitchhikers Guide to Python under Mutable Default Arguments:
Python’s default arguments are evaluated once when the function is defined, not each time the function is called (like it is in say, Ruby). This means that if you use a mutable default argument and mutate it, you will and have mutated that object for all future calls to the function as well.
Not that it matters in this case, but you can use object identity to test for None:
if working_list is None: working_list = []
You could also take advantage of how the boolean operator or is defined in python:
working_list = working_list or []
Though this will behave unexpectedly if the caller gives you an empty list (which counts as false) as working_list and expects your function to modify the list he gave it.
(or in this simple case just print starting_list + ["a"] but I guess that was just a toy example)
In general, mutating your arguments is bad style in Python. The only functions that are fully expected to mutate an object are methods of the object. It’s even rarer to mutate an optional argument — is a side effect that happens only in some calls really the best interface?
If you do it from the C habit of “output arguments”, that’s completely unnecessary – you can always return multiple values as a tuple.
If you do this to efficiently build a long list of results without building intermediate lists, consider writing it as a generator and using result_list.extend(myFunc()) when you are calling it. This way your calling conventions remains very clean.
One pattern where mutating an optional arg is frequently done is a hidden “memo” arg in recursive functions:
def depth_first_walk_graph(graph, node, _visited=None):
if _visited is None:
_visited = set() # create memo once in top-level call
if node in _visited:
return
_visited.add(node)
for neighbour in graph[node]:
depth_first_walk_graph(graph, neighbour, _visited)
I might be off-topic, but remember that if you just want to pass a variable number of arguments, the pythonic way is to pass a tuple *args or a dictionary **kargs. These are optional and are better than the syntax myFunc([1, 2, 3]).
classNode(object):def __init__(self, _id, val, parents=None, children=None):
self.id = _id
self.val = val
self.parents = parents if parents isnotNoneelse[]
self.children = children if children isnotNoneelse[]
There have already been good and correct answers provided. I just wanted to give another syntax to write what you want to do which I find more beautiful when you for instance want to create a class with default empty lists:
class Node(object):
def __init__(self, _id, val, parents=None, children=None):
self.id = _id
self.val = val
self.parents = parents if parents is not None else []
self.children = children if children is not None else []
This snippet makes use of the if else operator syntax. I like it especially because it’s a neat little one-liner without colons, etc. involved and it nearly reads like a normal English sentence. :)
In your case you could write
def myFunc(working_list=None):
working_list = [] if working_list is None else working_list
working_list.append("a")
print working_list
In python do you generally use PEP 8 — Style Guide for Python Code as your coding standards/guidelines? Are there any other formalized standards that you prefer?
“In python do you generally use PEP 8 — Style Guide for Python Code as your coding standards/guidelines? Are there any other formalized standards that you prefer?”
As mentioned by you follow PEP 8 for the main text, and PEP 257 for docstring conventions
Along with Python Style Guides, I suggest that you refer the following:
I follow the Python Idioms and Efficiency guidelines, by Rob Knight. I think they are exactly the same as PEP 8, but are more synthetic and based on examples.
There are three specific things that I can’t be bothered to change to PEP-8.
Avoid extraneous whitespace immediately inside parentheses, brackets or braces.
Suggested: spam(ham[1], {eggs: 2})
I do this anyway: spam( ham[ 1 ], { eggs: 2 } )
Why? 30+ years of ingrained habit is snuggling ()’s up against function names or (in C) statements keywords. Starting with Fortran IV in the 70’s.
Use spaces around arithmetic operators:
Suggested: x = x * 2 - 1
I do this anyway: x= x * 2 - 1
Why? Gries’ The Science of Programming suggested this as a way to emphasize the connection between assignment and the variable who’s state is being changed.
It doesn’t work well for multiple assignment or augmented assignment, for that I use lots of spaces.
For function names, method names and instance variable names
Suggested: lowercase, with words separated by underscores as necessary to improve readability.
I do this anyway: camelCase
Why? 20+ years of ingrained habit of camelCase, starting with Pascal in the 80’s.
classOuter(object):def __init__(self):
self.outer_var =1def get_inner(self):return self.Inner(self)# "self.Inner" is because Inner is a class attribute of this class# "Outer.Inner" would also work, or move Inner to global scope# and then just use "Inner"classInner(object):def __init__(self, outer):
self.outer = outer
@propertydef inner_var(self):return self.outer.outer_var
class Outer(object):
outer_var = 1
class Inner(object):
@property
def inner_var(self):
return Outer.outer_var
This isn’t quite the same as similar things work in other languages, and uses global lookup instead of scoping the access to outer_var. (If you change what object the name Outer is bound to, then this code will use that object the next time it is executed.)
If you instead want all Inner objects to have a reference to an Outer because outer_var is really an instance attribute:
class Outer(object):
def __init__(self):
self.outer_var = 1
def get_inner(self):
return self.Inner(self)
# "self.Inner" is because Inner is a class attribute of this class
# "Outer.Inner" would also work, or move Inner to global scope
# and then just use "Inner"
class Inner(object):
def __init__(self, outer):
self.outer = outer
@property
def inner_var(self):
return self.outer.outer_var
Note that nesting classes is somewhat uncommon in Python, and doesn’t automatically imply any sort of special relationship between the classes. You’re better off not nesting. (You can still set a class attribute on Outer to Inner, if you want.)
class OuterClass:
outer_var = 1
class InnerClass:
pass
InnerClass.inner_var = outer_var
The problem you encountered is due to this:
A block is a piece of Python program text that is executed as a unit.
The following are blocks: a module, a function body, and a class
definition.
(…)
A scope defines the visibility of a name within
a block.
(…)
The scope of names defined in a class block is
limited to the class block; it does not extend to the code blocks of
methods – this includes generator expressions since they are
implemented using a function scope. This means that the following will
fail:
class A:
a = 42
b = list(a + i for i in range(10))
The above means:
a function body is a code block and a method is a function, then names defined out of the function body present in a class definition do not extend to the function body.
Paraphrasing this for your case:
a class definition is a code block, then names defined out of the inner class definition present in an outer class definition do not extend to the inner class definition.
回答 2
如果您不使用嵌套类,则可能会更好。如果必须嵌套,请尝试以下操作:
x =1classOuterClass:
outer_var = x
classInnerClass:
inner_var = x
All explanations can be found in Python Documentation The Python Tutorial
For your first error <type 'exceptions.NameError'>: name 'outer_var' is not defined. The explanation is:
There is no shorthand for referencing data attributes (or other methods!) from within methods. I find that this actually increases the readability of methods: there is no chance of confusing local variables and instance variables when glancing through a method.
quoted from The Python Tutorial 9.4
For your second error <type 'exceptions.NameError'>: name 'OuterClass' is not defined
When a class definition is left normally (via the end), a class object is created.
quoted from The Python Tutorial 9.3.1
So when you try inner_var = Outerclass.outer_var, the Quterclass hasn’t been created yet, that’s why name 'OuterClass' is not defined
A more detailed but tedious explanation for your first error:
Although classes have access to enclosing functions’ scopes, though, they do not act
as enclosing scopes to code nested within the class: Python searches enclosing functions
for referenced names, but never any enclosing classes. That is, a class is a local scope
and has access to enclosing local scopes, but it does not serve as an enclosing local scope
to further nested code.
import sys
s ='abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode)if chr(i)in'@#$')print(s.translate(mapper))
它按预期工作。但是,在Python 3.4和Python 3.5中执行相同的程序会产生很大的不同。
计算时间的代码是
python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); ""s.translate(mapper)"
I was trying to remove unwanted characters from a given string using text.translate() in Python 3.4.
The minimal code is:
import sys
s = 'abcde12345@#@$#%$'
mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$')
print(s.translate(mapper))
It works as expected. However the same program when executed in Python 3.4 and Python 3.5 gives a large difference.
The code to calculate timings is
python3 -m timeit -s "import sys;s = 'abcde12345@#@$#%$'*1000 ; mapper = dict.fromkeys(i for i in range(sys.maxunicode) if chr(i) in '@#$'); " "s.translate(mapper)"
The Python 3.4 program takes 1.3ms whereas the same program in Python 3.5 takes only 26.4μs.
What has improved in Python 3.5 that makes it faster compared to Python 3.4?
python3.5-m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')""text.translate(d)"100000 loops, best of 3:2.3 usec per loop
python3.5-m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}""text.translate(d)"10000 loops, best of 3:117 usec per loop
python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}""text.translate(d)"10000 loops, best of 3:91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')""text.translate(d)"10000 loops, best of 3:101 usec per loop
Josh Rosenberg found out that the str.translate() function is very slow compared to the bytes.translate, he raised an issue, stating that:
In Python 3, str.translate() is usually a performance pessimization, not optimization.
Why was str.translate() slow?
The main reason for str.translate() to be very slow was that the lookup used to be in a Python dictionary.
The usage of maketrans made this problem worse. The similar approach using bytes builds a C array of 256 items to fast table lookup. Hence the usage of higher level Python dict makes the str.translate() in Python 3.4 very slow.
What happened now?
The first approach was to add a small patch, translate_writer, However the speed increase was not that pleasing. Soon another patch fast_translate was tested and it yielded very nice results of up to 55% speedup.
The main change as can be seen from the file is that the Python dictionary lookup is changed into a C level lookup.
The speeds now are almost the same as bytes
unpatched patched
str.translate 4.55125927699919 0.7898181750006188
str.translate from bytes trans 1.8910855210015143 0.779950579000797
A small note here is that the performance enhancement is only prominent in ASCII strings.
As J.F.Sebastian mentions in a comment below, Before 3.5, translate used to work in the same way for both ASCII and non-ASCII cases. However from 3.5 ASCII case is much faster.
Earlier ASCII vs non-ascii used to be almost same, however now we can see a great change in the performance.
It can be an improvement from 71.6μs to 2.33μs as seen in this answer.
The following code demonstrates this
python3.5 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
100000 loops, best of 3: 2.3 usec per loop
python3.5 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 117 usec per loop
python3 -m timeit -s "text = 'm\U0001F602ssissippi'*100; d={'\U0001F602': 'i'}" "text.translate(d)"
10000 loops, best of 3: 91.2 usec per loop
python3 -m timeit -s "text = 'mJssissippi'*100; d=dict(J='i')" "text.translate(d)"
10000 loops, best of 3: 101 usec per loop
I’m currently playing with Hudson, but it is very Java-centric (although with this guide, I found it easier to setup than BuildBot, and produced more info)
Basically: is there any Continuous Integration systems aimed at python, that produce lots of shiny graphs and the likes?
Update: Since this time the Jenkins project has replaced Hudson as the community version of the package. The original authors have moved to this project as well. Jenkins is now a standard package on Ubuntu/Debian, RedHat/Fedora/CentOS, and others. The following update is still essentially correct. The starting point to do this with Jenkins is different.
Update: After trying a few alternatives, I think I’ll stick with Hudson. Integrity was nice and simple, but quite limited. I think Buildbot is better suited to having numerous build-slaves, rather than everything running on a single machine like I was using it.
Setting Hudson up for a Python project was pretty simple:
Violations to parse the PyLint output (you can setup warning thresholds, graph the number of violations over each build)
Cobertura can parse the coverage.py output. Nosetest can gather coverage while running your tests, using nosetests --with-coverage (this writes the output to **/coverage.xml)
Don’t know if it would do : Bitten is made by the guys who write Trac and is integrated with Trac. Apache Gump is the CI tool used by Apache. It is written in Python.
We’ve had great success with TeamCity as our CI server and using nose as our test runner. Teamcity plugin for nosetests gives you count pass/fail, readable display for failed test( that can be E-Mailed). You can even see details of the test failures while you stack is running.
If of course supports things like running on multiple machines, and it’s much simpler to setup and maintain than buildbot.
#!/var/lib/hudson/venv/main/bin/pythonimport os
import re
import subprocess
import logging
import optparse
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')#venvDir = "/var/lib/hudson/venv/main/bin/"
UPLOAD_REPO ="http://ldndev01:3442"def call_command(command, cwd, ignore_error_code=False):try:
logging.info("Running: %s"% command)
status = subprocess.call(command, cwd=cwd, shell=True)ifnot ignore_error_code and status !=0:raiseException("Last command failed")return status
except:
logging.exception("Could not run command %s"% command)raisedef main():
usage ="usage: %prog [options]"
parser = optparse.OptionParser(usage)
parser.add_option("-w","--workspace", dest="workspace",
help="workspace folder for the job")
parser.add_option("-p","--package", dest="package",
help="the package name i.e., back_office.reconciler")
parser.add_option("-v","--build_number", dest="build_number",
help="the build number, which will get put at the end of the package version")
options, args = parser.parse_args()ifnot options.workspace ornot options.package:raiseException("Need both args, do --help for info")
venvDir = options.package +"_venv/"#find out if venv is thereifnot os.path.exists(venvDir):#make it
call_command("virtualenv %s --no-site-packages"% venvDir,
options.workspace)#install the venv/make sure its there plus install the local package
call_command("%sbin/pip install -e ./ --extra-index %s"%(venvDir, UPLOAD_REPO),
options.workspace)#make sure pylint, nose and coverage are installed
call_command("%sbin/pip install nose pylint coverage epydoc"% venvDir,
options.workspace)#make sure we have an __init__.py#this shouldn't be needed if the packages are set up correctly#modules = options.package.split(".")#if len(modules) > 1: # call_command("touch '%s/__init__.py'" % modules[0], # options.workspace)#do the nosetests
test_status = call_command("%sbin/nosetests %s --with-xunit --with-coverage --cover-package %s --cover-erase"%(venvDir,
options.package.replace(".","/"),
options.package),
options.workspace,True)#produce coverage report -i for ignore weird missing file errors
call_command("%sbin/coverage xml -i"% venvDir,
options.workspace)#move it so that the code coverage plugin can find it
call_command("mv coverage.xml %s"%(options.package.replace(".","/")),
options.workspace)#run pylint
call_command("%sbin/pylint --rcfile ~/pylint.rc -f parseable %s > pylint.txt"%(venvDir,
options.package),
options.workspace,True)#remove old dists so we only have the newest at the end
call_command("rm -rfv %s"%(options.workspace +"/dist"),
options.workspace)#if the build passes upload the result to the egg_basketif test_status ==0:
logging.info("Success - uploading egg")
upload_bit ="upload -r %s/upload"% UPLOAD_REPO
else:
logging.info("Failure - not uploading egg")
upload_bit =""#create egg
call_command("%sbin/python setup.py egg_info --tag-build=.0.%s --tag-svn-revision --tag-date sdist %s"%(venvDir,
options.build_number,
upload_bit),
options.workspace)
call_command("%sbin/epydoc --html --graph all %s"%(venvDir, options.package),
options.workspace)
logging.info("Complete")if __name__ =="__main__":
main()
I guess this thread is quite old but here is my take on it with hudson:
I decided to go with pip and set up a repo (the painful to get working but nice looking eggbasket), which hudson auto uploads to with a successful tests. Here is my rough and ready script for use with a hudson config execute script like: /var/lib/hudson/venv/main/bin/hudson_script.py -w $WORKSPACE -p my.package -v $BUILD_NUMBER, just put in **/coverage.xml, pylint.txt and nosetests.xml in the config bits:
#!/var/lib/hudson/venv/main/bin/python
import os
import re
import subprocess
import logging
import optparse
logging.basicConfig(level=logging.INFO,
format='%(asctime)s %(levelname)s %(message)s')
#venvDir = "/var/lib/hudson/venv/main/bin/"
UPLOAD_REPO = "http://ldndev01:3442"
def call_command(command, cwd, ignore_error_code=False):
try:
logging.info("Running: %s" % command)
status = subprocess.call(command, cwd=cwd, shell=True)
if not ignore_error_code and status != 0:
raise Exception("Last command failed")
return status
except:
logging.exception("Could not run command %s" % command)
raise
def main():
usage = "usage: %prog [options]"
parser = optparse.OptionParser(usage)
parser.add_option("-w", "--workspace", dest="workspace",
help="workspace folder for the job")
parser.add_option("-p", "--package", dest="package",
help="the package name i.e., back_office.reconciler")
parser.add_option("-v", "--build_number", dest="build_number",
help="the build number, which will get put at the end of the package version")
options, args = parser.parse_args()
if not options.workspace or not options.package:
raise Exception("Need both args, do --help for info")
venvDir = options.package + "_venv/"
#find out if venv is there
if not os.path.exists(venvDir):
#make it
call_command("virtualenv %s --no-site-packages" % venvDir,
options.workspace)
#install the venv/make sure its there plus install the local package
call_command("%sbin/pip install -e ./ --extra-index %s" % (venvDir, UPLOAD_REPO),
options.workspace)
#make sure pylint, nose and coverage are installed
call_command("%sbin/pip install nose pylint coverage epydoc" % venvDir,
options.workspace)
#make sure we have an __init__.py
#this shouldn't be needed if the packages are set up correctly
#modules = options.package.split(".")
#if len(modules) > 1:
# call_command("touch '%s/__init__.py'" % modules[0],
# options.workspace)
#do the nosetests
test_status = call_command("%sbin/nosetests %s --with-xunit --with-coverage --cover-package %s --cover-erase" % (venvDir,
options.package.replace(".", "/"),
options.package),
options.workspace, True)
#produce coverage report -i for ignore weird missing file errors
call_command("%sbin/coverage xml -i" % venvDir,
options.workspace)
#move it so that the code coverage plugin can find it
call_command("mv coverage.xml %s" % (options.package.replace(".", "/")),
options.workspace)
#run pylint
call_command("%sbin/pylint --rcfile ~/pylint.rc -f parseable %s > pylint.txt" % (venvDir,
options.package),
options.workspace, True)
#remove old dists so we only have the newest at the end
call_command("rm -rfv %s" % (options.workspace + "/dist"),
options.workspace)
#if the build passes upload the result to the egg_basket
if test_status == 0:
logging.info("Success - uploading egg")
upload_bit = "upload -r %s/upload" % UPLOAD_REPO
else:
logging.info("Failure - not uploading egg")
upload_bit = ""
#create egg
call_command("%sbin/python setup.py egg_info --tag-build=.0.%s --tag-svn-revision --tag-date sdist %s" % (venvDir,
options.build_number,
upload_bit),
options.workspace)
call_command("%sbin/epydoc --html --graph all %s" % (venvDir, options.package),
options.workspace)
logging.info("Complete")
if __name__ == "__main__":
main()
When it comes to deploying stuff you can do something like:
This stuff assumes you have a repo structure per package with a setup.py and dependencies all set up then you can just check out the trunk and run this stuff on it.
I hope this helps someone out.
——update———
I’ve added epydoc which fits in really nicely with hudson. Just add javadoc to your config with the html folder
Note that pip doesn’t support the -E flag properly these days, so you have to create your venv separately
If you’re considering hosted CI solution, and doing open source, you should look into Travis CI as well – it has very nice integration with GitHub. While it started as a Ruby tool, they have added Python support a while ago.
continuum’s binstar now is able to trigger builds from github and can compile for linux, osx and windows ( 32 / 64 ). the neat thing is that it really allows you to closely couple distribution and continuous integration. That’s crossing the t’s and dotting the I’s of Integration. The site, workflow and tools are really polished and AFAIK conda is the most robust and pythonic way to distributing complex python modules, where you need to wrap and distribute C/C++/Fotran libraries.
We have used bitten quite a bit. It is pretty and integrates well with Trac, but it is a pain in the butt to customize if you have any nonstandard workflow. Also there just aren’t as many plugins as there are for the more popular tools. Currently we are evaluating Hudson as a replacement.
Check rultor.com. As this article explains, it uses Docker for every build. Thanks to that, you can configure whatever you like inside your Docker image, including Python.
Little disclaimer, I’ve actually had to build a solution like this for a client that wanted a way to automatically test and deploy any code on a git push plus manage the issue tickets via git notes. This also lead to my work on the AIMS project.
One could easily just setup a bare node system that has a build user and manage their build through make(1), expect(1), crontab(1)/systemd.unit(5), and incrontab(1). One could even go a step further and use ansible and celery for distributed builds with a gridfs/nfs file store.
Although, I would not expect anyone other than a Graybeard UNIX guy or Principle level engineer/architect to actually go this far. Just makes for a nice idea and potential learning experience since a build server is nothing more than a way to arbitrarily execute scripted tasks in an automated fashion.
Can you suggest a module function from numpy/scipy that can find local maxima/minima in a 1D numpy array? Obviously the simplest approach ever is to have a look at the nearest neighbours, but I would like to have an accepted solution that is part of the numpy distro.
You could also smooth your array before this step using numpy.convolve().
I don’t think there is a dedicated function for this.
回答 1
在SciPy中> = 0.11
import numpy as np
from scipy.signal import argrelextrema
x = np.random.random(12)# for local maxima
argrelextrema(x, np.greater)# for local minima
argrelextrema(x, np.less)
产生
>>> x
array([0.56660112,0.76309473,0.69597908,0.38260156,0.24346445,0.56021785,0.24109326,0.41884061,0.35461957,0.54398472,0.59572658,0.92377974])>>> argrelextrema(x, np.greater)(array([1,5,7]),)>>> argrelextrema(x, np.less)(array([4,6,8]),)
import numpy as np
from scipy.signal import argrelextrema
x = np.random.random(12)
# for local maxima
argrelextrema(x, np.greater)
# for local minima
argrelextrema(x, np.less)
Note, these are the indices of x that are local max/min. To get the values, try:
>>> x[argrelextrema(x, np.greater)[0]]
scipy.signal also provides argrelmax and argrelmin for finding maxima and minima respectively.
回答 2
对于噪声不太大的曲线,我建议使用以下小代码段:
from numpy import*# example data with some peaks:
x = linspace(0,4,1e3)
data =.2*sin(10*x)+ exp(-abs(2-x)**2)# that's the line, you need:
a = diff(sign(diff(data))).nonzero()[0]+1# local min+max
b =(diff(sign(diff(data)))>0).nonzero()[0]+1# local min
c =(diff(sign(diff(data)))<0).nonzero()[0]+1# local max# graphical output...from pylab import*
plot(x,data)
plot(x[b], data[b],"o", label="min")
plot(x[c], data[c],"o", label="max")
legend()
show()
For curves with not too much noise, I recommend the following small code snippet:
from numpy import *
# example data with some peaks:
x = linspace(0,4,1e3)
data = .2*sin(10*x)+ exp(-abs(2-x)**2)
# that's the line, you need:
a = diff(sign(diff(data))).nonzero()[0] + 1 # local min+max
b = (diff(sign(diff(data))) > 0).nonzero()[0] + 1 # local min
c = (diff(sign(diff(data))) < 0).nonzero()[0] + 1 # local max
# graphical output...
from pylab import *
plot(x,data)
plot(x[b], data[b], "o", label="min")
plot(x[c], data[c], "o", label="max")
legend()
show()
The +1 is important, because diff reduces the original index number.
Another approach (more words, less code) that may help:
The locations of local maxima and minima are also the locations of the zero crossings of the first derivative. It is generally much easier to find zero crossings than it is to directly find local maxima and minima.
Unfortunately, the first derivative tends to “amplify” noise, so when significant noise is present in the original data, the first derivative is best used only after the original data has had some degree of smoothing applied.
Since smoothing is, in the simplest sense, a low pass filter, the smoothing is often best (well, most easily) done by using a convolution kernel, and “shaping” that kernel can provide a surprising amount of feature-preserving/enhancing capability. The process of finding an optimal kernel can be automated using a variety of means, but the best may be simple brute force (plenty fast for finding small kernels). A good kernel will (as intended) massively distort the original data, but it will NOT affect the location of the peaks/valleys of interest.
Fortunately, quite often a suitable kernel can be created via a simple SWAG (“educated guess”). The width of the smoothing kernel should be a little wider than the widest expected “interesting” peak in the original data, and its shape will resemble that peak (a single-scaled wavelet). For mean-preserving kernels (what any good smoothing filter should be) the sum of the kernel elements should be precisely equal to 1.00, and the kernel should be symmetric about its center (meaning it will have an odd number of elements.
Given an optimal smoothing kernel (or a small number of kernels optimized for different data content), the degree of smoothing becomes a scaling factor for (the “gain” of) the convolution kernel.
Determining the “correct” (optimal) degree of smoothing (convolution kernel gain) can even be automated: Compare the standard deviation of the first derivative data with the standard deviation of the smoothed data. How the ratio of the two standard deviations changes with changes in the degree of smoothing cam be used to predict effective smoothing values. A few manual data runs (that are truly representative) should be all that’s needed.
All the prior solutions posted above compute the first derivative, but they don’t treat it as a statistical measure, nor do the above solutions attempt to performing feature preserving/enhancing smoothing (to help subtle peaks “leap above” the noise).
Finally, the bad news: Finding “real” peaks becomes a royal pain when the noise also has features that look like real peaks (overlapping bandwidth). The next more-complex solution is generally to use a longer convolution kernel (a “wider kernel aperture”) that takes into account the relationship between adjacent “real” peaks (such as minimum or maximum rates for peak occurrence), or to use multiple convolution passes using kernels having different widths (but only if it is faster: it is a fundamental mathematical truth that linear convolutions performed in sequence can always be convolved together into a single convolution). But it is often far easier to first find a sequence of useful kernels (of varying widths) and convolve them together than it is to directly find the final kernel in a single step.
Hopefully this provides enough info to let Google (and perhaps a good stats text) fill in the gaps. I really wish I had the time to provide a worked example, or a link to one. If anyone comes across one online, please post it here!
As of SciPy version 1.1, you can also use find_peaks. Below are two examples taken from the documentation itself.
Using the height argument, one can select all maxima above a certain threshold (in this example, all non-negative maxima; this can be very useful if one has to deal with a noisy baseline; if you want to find minima, just multiply you input by -1):
import matplotlib.pyplot as plt
from scipy.misc import electrocardiogram
from scipy.signal import find_peaks
import numpy as np
x = electrocardiogram()[2000:4000]
peaks, _ = find_peaks(x, height=0)
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.plot(np.zeros_like(x), "--", color="gray")
plt.show()
Another extremely helpful argument is distance, which defines the minimum distance between two peaks:
peaks, _ = find_peaks(x, distance=150)
# difference between peaks is >= 150
print(np.diff(peaks))
# prints [186 180 177 171 177 169 167 164 158 162 172]
plt.plot(x)
plt.plot(peaks, x[peaks], "x")
plt.show()
from scipy import signal
import numpy as np
#generate junk data (numpy 1D arr)
xs = np.arange(0, np.pi,0.05)
data = np.sin(xs)# maxima : use builtin function to find (max) peaks
max_peakind = signal.find_peaks_cwt(data, np.arange(1,10))# inverse (in order to find minima)
inv_data =1/data
# minima : use builtin function fo find (min) peaks (use inversed data)
min_peakind = signal.find_peaks_cwt(inv_data, np.arange(1,10))#show resultsprint"maxima", data[max_peakind]print"minima", data[min_peakind]
from scipy import signal
import numpy as np
#generate junk data (numpy 1D arr)
xs = np.arange(0, np.pi, 0.05)
data = np.sin(xs)
# maxima : use builtin function to find (max) peaks
max_peakind = signal.find_peaks_cwt(data, np.arange(1,10))
# inverse (in order to find minima)
inv_data = 1/data
# minima : use builtin function fo find (min) peaks (use inversed data)
min_peakind = signal.find_peaks_cwt(inv_data, np.arange(1,10))
#show results
print "maxima", data[max_peakind]
print "minima", data[min_peakind]
Update:
I wasn’t happy with gradient so I found it more reliable to use numpy.diff. Please let me know if it does what you want.
Regarding the issue of noise, the mathematical problem is to locate maxima/minima if we want to look at noise we can use something like convolve which was mentioned earlier.
import numpy as np
from matplotlib import pyplot
a=np.array([10.3,2,0.9,4,5,6,7,34,2,5,25,3,-26,-20,-29],dtype=np.float)
gradients=np.diff(a)
print gradients
maxima_num=0
minima_num=0
max_locations=[]
min_locations=[]
count=0
for i in gradients[:-1]:
count+=1
if ((cmp(i,0)>0) & (cmp(gradients[count],0)<0) & (i != gradients[count])):
maxima_num+=1
max_locations.append(count)
if ((cmp(i,0)<0) & (cmp(gradients[count],0)>0) & (i != gradients[count])):
minima_num+=1
min_locations.append(count)
turning_points = {'maxima_number':maxima_num,'minima_number':minima_num,'maxima_locations':max_locations,'minima_locations':min_locations}
print turning_points
pyplot.plot(a)
pyplot.show()
回答 7
虽然这个问题确实很老。我相信在numpy中使用一种简单得多的方法(一个划线员)。
import numpy as np
list =[1,3,9,5,2,5,6,9,7]
np.diff(np.sign(np.diff(list)))#the one liner#output
array([0,-2,0,2,0,0,-2])
While this question is really old. I believe there is a much simpler approach in numpy (a one liner).
import numpy as np
list = [1,3,9,5,2,5,6,9,7]
np.diff(np.sign(np.diff(list))) #the one liner
#output
array([ 0, -2, 0, 2, 0, 0, -2])
To find a local max or min we essentially want to find when the difference between the values in the list (3-1, 9-3…) changes from positive to negative (max) or negative to positive (min). Therefore, first we find the difference. Then we find the sign, and then we find the changes in sign by taking the difference again. (Sort of like a first and second derivative in calculus, only we have discrete data and don’t have a continuous function.)
The output in my example does not contain the extrema (the first and last values in the list). Also, just like calculus, if the second derivative is negative, you have max, and if it is positive you have a min.
Thus we have the following matchup:
[1, 3, 9, 5, 2, 5, 6, 9, 7]
[0, -2, 0, 2, 0, 0, -2]
Max Min Max
回答 8
这些解决方案都不适合我,因为我也想在重复值的中心找到峰值。例如,在
ar = np.array([0,1,2,2,2,1,3,3,3,2,5,0])
答案应该是
array([3,7,10], dtype=int64)
我使用循环来做到这一点。我知道这不是超级干净,但是可以完成工作。
def findLocalMaxima(ar):# find local maxima of array, including centers of repeating elements
maxInd = np.zeros_like(ar)
peakVar =-np.inf
i =-1while i < len(ar)-1:#for i in range(len(ar)):
i +=1if peakVar < ar[i]:
peakVar = ar[i]for j in range(i,len(ar)):if peakVar < ar[j]:breakelif peakVar == ar[j]:continueelif peakVar > ar[j]:
peakInd = i + np.floor(abs(i-j)/2)
maxInd[peakInd.astype(int)]=1
i = j
break
peakVar = ar[i]
maxInd = np.where(maxInd)[0]return maxInd
None of these solutions worked for me since I wanted to find peaks in the center of repeating values as well. for example, in
ar = np.array([0,1,2,2,2,1,3,3,3,2,5,0])
the answer should be
array([ 3, 7, 10], dtype=int64)
I did this using a loop. I know it’s not super clean, but it gets the job done.
def findLocalMaxima(ar):
# find local maxima of array, including centers of repeating elements
maxInd = np.zeros_like(ar)
peakVar = -np.inf
i = -1
while i < len(ar)-1:
#for i in range(len(ar)):
i += 1
if peakVar < ar[i]:
peakVar = ar[i]
for j in range(i,len(ar)):
if peakVar < ar[j]:
break
elif peakVar == ar[j]:
continue
elif peakVar > ar[j]:
peakInd = i + np.floor(abs(i-j)/2)
maxInd[peakInd.astype(int)] = 1
i = j
break
peakVar = ar[i]
maxInd = np.where(maxInd)[0]
return maxInd
回答 9
import numpy as np
x=np.array([6,3,5,2,1,4,9,7,8])
y=np.array([2,1,3,5,3,9,8,10,7])
sortId=np.argsort(x)
x=x[sortId]
y=y[sortId]
minm = np.array([])
maxm = np.array([])
i =0while i < length-1:if i < length -1:while i < length-1and y[i+1]>= y[i]:
i+=1if i !=0and i < length-1:
maxm = np.append(maxm,i)
i+=1if i < length -1:while i < length-1and y[i+1]<= y[i]:
i+=1if i < length-1:
minm = np.append(minm,i)
i+=1print minm
print maxm
import numpy as np
x=np.array([6,3,5,2,1,4,9,7,8])
y=np.array([2,1,3,5,3,9,8,10,7])
sortId=np.argsort(x)
x=x[sortId]
y=y[sortId]
minm = np.array([])
maxm = np.array([])
i = 0
while i < length-1:
if i < length - 1:
while i < length-1 and y[i+1] >= y[i]:
i+=1
if i != 0 and i < length-1:
maxm = np.append(maxm,i)
i+=1
if i < length - 1:
while i < length-1 and y[i+1] <= y[i]:
i+=1
if i < length-1:
minm = np.append(minm,i)
i+=1
print minm
print maxm
minm and maxm contain indices of minima and maxima, respectively. For a huge data set, it will give lots of maximas/minimas so in that case smooth the curve first and then apply this algorithm.
回答 10
使用膨胀运算符的另一种解决方案:
import numpy as np
from scipy.ndimage import rank_filter
def find_local_maxima(x):
x_dilate = rank_filter(x,-1, size=3)return x_dilate == x
对于最小值:
def find_local_minima(x):
x_erode = rank_filter(x,-0, size=3)return x_erode == x
Also, from scipy.ndimage you can replace rank_filter(x, -1, size=3) with grey_dilation and rank_filter(x, 0, size=3) with grey_erosion. This won’t require a local sort, so it is slightly faster.
回答 11
另一个:
def local_maxima_mask(vec):"""
Get a mask of all points in vec which are local maxima
:param vec: A real-valued vector
:return: A boolean mask of the same size where True elements correspond to maxima.
"""
mask = np.zeros(vec.shape, dtype=np.bool)
greater_than_the_last = np.diff(vec)>0# N-1
mask[1:]= greater_than_the_last
mask[:-1]&=~greater_than_the_last
return mask
def local_maxima_mask(vec):
"""
Get a mask of all points in vec which are local maxima
:param vec: A real-valued vector
:return: A boolean mask of the same size where True elements correspond to maxima.
"""
mask = np.zeros(vec.shape, dtype=np.bool)
greater_than_the_last = np.diff(vec)>0 # N-1
mask[1:] = greater_than_the_last
mask[:-1] &= ~greater_than_the_last
return mask
There is a a new option as well: get it via pip! There is a package pypiwin32 with wheels available, so you can just install with: pip install pypiwin32!
Edit: Per comment from @movermeyer, the main project now publishes wheels at pywin32, and so can be installed with pip install pywin32
I’ve found that UC Irvine has a great collection of python modules, pywin32 (win32api) being one of many listed there. I’m not sure how they do with keeping up with the latest versions of these modules but it hasn’t let me down yet.