Python 实用宝典

Question 1

I’m trying to perform an element wise divide in python, but if a zero is encountered, I need the quotient to just be zero.

For example:

array1 = np.array([0, 1, 2])
array2 = np.array([0, 1, 1])

array1 / array2 # should be np.array([0, 1, 2])

I could always just use a for-loop through my data, but to really utilize numpy’s optimizations, I need the divide function to return 0 upon divide by zero errors instead of ignoring the error.

Unless I’m missing something, it doesn’t seem numpy.seterr() can return values upon errors. Does anyone have any other suggestions on how I could get the best out of numpy while setting my own divide by zero error handling?

Question 2

In numpy v1.7+, you can take advantage of the “where” option for ufuncs. You can do things in one line and you don’t have to deal with the errstate context manager.

>>> a = np.array([-1, 0, 1, 2, 3], dtype=float)
>>> b = np.array([ 0, 0, 0, 2, 2], dtype=float)

# If you don't pass `out` the indices where (b == 0) will be uninitialized!
>>> c = np.divide(a, b, out=np.zeros_like(a), where=b!=0)
>>> print(c)
[ 0.   0.   0.   1.   1.5]

In this case, it does the divide calculation anywhere ‘where’ b does not equal zero. When b does equal zero, then it remains unchanged from whatever value you originally gave it in the ‘out’ argument.

Question 3

Building on @Franck Dernoncourt’s answer, fixing -1 / 0:

def div0( a, b ):
    """ ignore / 0, div0( [-1, 0, 1], 0 ) -> [0, 0, 0] """
    with np.errstate(divide='ignore', invalid='ignore'):
        c = np.true_divide( a, b )
        c[ ~ np.isfinite( c )] = 0  # -inf inf NaN
    return c

div0( [-1, 0, 1], 0 )
array([0, 0, 0])

Question 4

Building on the other answers, and improving on:

0/0 handling by adding invalid='ignore' to numpy.errstate()
introducing numpy.nan_to_num() to convert np.nan to 0.

Code:

import numpy as np

a = np.array([0,0,1,1,2], dtype='float')
b = np.array([0,1,0,1,3], dtype='float')

with np.errstate(divide='ignore', invalid='ignore'):
    c = np.true_divide(a,b)
    c[c == np.inf] = 0
    c = np.nan_to_num(c)

print('c: {0}'.format(c))

Output:

c: [ 0.          0.          0.          1.          0.66666667]

Question 5

One-liner (throws warning)

np.nan_to_num(array1 / array2)

Question 6

Try doing it in two steps. Division first, then replace.

with numpy.errstate(divide='ignore'):
    result = numerator / denominator
    result[denominator == 0] = 0

The numpy.errstate line is optional, and just prevents numpy from telling you about the “error” of dividing by zero, since you’re already intending to do so, and handling that case.

Question 7

You can also replace based on inf, only if the array dtypes are floats, as per this answer:

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> c = a / b
>>> c
array([ inf,   2.,   1.])
>>> c[c == np.inf] = 0
>>> c
array([ 0.,  2.,  1.])

Question 8

One answer I found searching a related question was to manipulate the output based upon whether the denominator was zero or not.

Suppose arrayA and arrayB have been initialized, but arrayB has some zeros. We could do the following if we want to compute arrayC = arrayA / arrayB safely.

In this case, whenever I have a divide by zero in one of the cells, I set the cell to be equal to myOwnValue, which in this case would be zero

myOwnValue = 0
arrayC = np.zeros(arrayA.shape())
indNonZeros = np.where(arrayB != 0)
indZeros = np.where(arrayB = 0)

# division in two steps: first with nonzero cells, and then zero cells
arrayC[indNonZeros] = arrayA[indNonZeros] / arrayB[indNonZeros]
arrayC[indZeros] = myOwnValue # Look at footnote

Footnote: In retrospect, this line is unnecessary anyways, since arrayC[i] is instantiated to zero. But if were the case that myOwnValue != 0, this operation would do something.

Question 9

An other solution worth mentioning :

>>> a = np.array([1,2,3], dtype='float')
>>> b = np.array([0,1,3], dtype='float')
>>> b_inv = np.array([1/i if i!=0 else 0 for i in b])
>>> a*b_inv
array([0., 2., 1.])

Question 10

How can I log my Python errors?

try:
    do_something()
except:
    # How can I log my exception here, complete with its traceback?

Question 11

Use logging.exception from within the except: handler/block to log the current exception along with the trace information, prepended with a message.

import logging
LOG_FILENAME = '/tmp/logging_example.out'
logging.basicConfig(filename=LOG_FILENAME, level=logging.DEBUG)

logging.debug('This message should go to the log file')

try:
    run_my_stuff()
except:
    logging.exception('Got exception on main handler')
    raise

Now looking at the log file, /tmp/logging_example.out:

DEBUG:root:This message should go to the log file
ERROR:root:Got exception on main handler
Traceback (most recent call last):
  File "/tmp/teste.py", line 9, in <module>
    run_my_stuff()
NameError: name 'run_my_stuff' is not defined

Question 12

Use exc_info options may be better, remains warning or error title:

try:
    # coode in here
except Exception as e:
    logging.error(e, exc_info=True)

Question 13

My job recently tasked me with logging all the tracebacks/exceptions from our application. I tried numerous techniques that others had posted online such as the one above but settled on a different approach. Overriding traceback.print_exception.

I have a write up at http://www.bbarrows.com/ That would be much easier to read but Ill paste it in here as well.

When tasked with logging all the exceptions that our software might encounter in the wild I tried a number of different techniques to log our python exception tracebacks. At first I thought that the python system exception hook, sys.excepthook would be the perfect place to insert the logging code. I was trying something similar to:

import traceback
import StringIO
import logging
import os, sys

def my_excepthook(excType, excValue, traceback, logger=logger):
    logger.error("Logging an uncaught exception",
                 exc_info=(excType, excValue, traceback))

sys.excepthook = my_excepthook

This worked for the main thread but I soon found that the my sys.excepthook would not exist across any new threads my process started. This is a huge issue because most everything happens in threads in this project.

After googling and reading plenty of documentation the most helpful information I found was from the Python Issue tracker.

The first post on the thread shows a working example of the sys.excepthook NOT persisting across threads (as shown below). Apparently this is expected behavior.

import sys, threading

def log_exception(*args):
    print 'got exception %s' % (args,)
sys.excepthook = log_exception

def foo():
    a = 1 / 0

threading.Thread(target=foo).start()

The messages on this Python Issue thread really result in 2 suggested hacks. Either subclass Thread and wrap the run method in our own try except block in order to catch and log exceptions or monkey patch threading.Thread.run to run in your own try except block and log the exceptions.

The first method of subclassing Thread seems to me to be less elegant in your code as you would have to import and use your custom Thread class EVERYWHERE you wanted to have a logging thread. This ended up being a hassle because I had to search our entire code base and replace all normal Threads with this custom Thread. However, it was clear as to what this Thread was doing and would be easier for someone to diagnose and debug if something went wrong with the custom logging code. A custome logging thread might look like this:

class TracebackLoggingThread(threading.Thread):
    def run(self):
        try:
            super(TracebackLoggingThread, self).run()
        except (KeyboardInterrupt, SystemExit):
            raise
        except Exception, e:
            logger = logging.getLogger('')
            logger.exception("Logging an uncaught exception")

The second method of monkey patching threading.Thread.run is nice because I could just run it once right after __main__ and instrument my logging code in all exceptions. Monkey patching can be annoying to debug though as it changes the expected functionality of something. The suggested patch from the Python Issue tracker was:

def installThreadExcepthook():
    """
    Workaround for sys.excepthook thread bug
    From
http://spyced.blogspot.com/2007/06/workaround-for-sysexcepthook-bug.html

(https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1230540&group_id=5470).
    Call once from __main__ before creating any threads.
    If using psyco, call psyco.cannotcompile(threading.Thread.run)
    since this replaces a new-style class method.
    """
    init_old = threading.Thread.__init__
    def init(self, *args, **kwargs):
        init_old(self, *args, **kwargs)
        run_old = self.run
        def run_with_except_hook(*args, **kw):
            try:
                run_old(*args, **kw)
            except (KeyboardInterrupt, SystemExit):
                raise
            except:
                sys.excepthook(*sys.exc_info())
        self.run = run_with_except_hook
    threading.Thread.__init__ = init

It was not until I started testing my exception logging I realized that I was going about it all wrong.

To test I had placed a

raise Exception("Test")

somewhere in my code. However, wrapping a a method that called this method was a try except block that printed out the traceback and swallowed the exception. This was very frustrating because I saw the traceback bring printed to STDOUT but not being logged. It was I then decided that a much easier method of logging the tracebacks was just to monkey patch the method that all python code uses to print the tracebacks themselves, traceback.print_exception. I ended up with something similar to the following:

def add_custom_print_exception():
    old_print_exception = traceback.print_exception
    def custom_print_exception(etype, value, tb, limit=None, file=None):
        tb_output = StringIO.StringIO()
        traceback.print_tb(tb, limit, tb_output)
        logger = logging.getLogger('customLogger')
        logger.error(tb_output.getvalue())
        tb_output.close()
        old_print_exception(etype, value, tb, limit=None, file=None)
    traceback.print_exception = custom_print_exception

This code writes the traceback to a String Buffer and logs it to logging ERROR. I have a custom logging handler set up the ‘customLogger’ logger which takes the ERROR level logs and send them home for analysis.

Question 14

You can log all uncaught exceptions on the main thread by assigning a handler to sys.excepthook, perhaps using the exc_info parameter of Python’s logging functions:

import sys
import logging

logging.basicConfig(filename='/tmp/foobar.log')

def exception_hook(exc_type, exc_value, exc_traceback):
    logging.error(
        "Uncaught exception",
        exc_info=(exc_type, exc_value, exc_traceback)
    )

sys.excepthook = exception_hook

raise Exception('Boom')

If your program uses threads, however, then note that threads created using threading.Thread will not trigger sys.excepthook when an uncaught exception occurs inside them, as noted in Issue 1230540 on Python’s issue tracker. Some hacks have been suggested there to work around this limitation, like monkey-patching Thread.__init__ to overwrite self.run with an alternative run method that wraps the original in a try block and calls sys.excepthook from inside the except block. Alternatively, you could just manually wrap the entry point for each of your threads in try/except yourself.

Question 15

Uncaught exception messages go to STDERR, so instead of implementing your logging in Python itself you could send STDERR to a file using whatever shell you’re using to run your Python script. In a Bash script, you can do this with output redirection, as described in the BASH guide.

Examples

Append errors to file, other output to the terminal:

./test.py 2>> mylog.log

Overwrite file with interleaved STDOUT and STDERR output:

./test.py &> mylog.log

Question 16

What I was looking for:

import sys
import traceback

exc_type, exc_value, exc_traceback = sys.exc_info()
traceback_in_var = traceback.format_tb(exc_traceback)

See:

https://docs.python.org/3/library/traceback.html

Question 17

You can get the traceback using a logger, at any level (DEBUG, INFO, …). Note that using logging.exception, the level is ERROR.

# test_app.py
import sys
import logging

logging.basicConfig(level="DEBUG")

def do_something():
    raise ValueError(":(")

try:
    do_something()
except Exception:
    logging.debug("Something went wrong", exc_info=sys.exc_info())

DEBUG:root:Something went wrong
Traceback (most recent call last):
  File "test_app.py", line 10, in <module>
    do_something()
  File "test_app.py", line 7, in do_something
    raise ValueError(":(")
ValueError: :(

EDIT:

This works too (using python 3.6)

logging.debug("Something went wrong", exc_info=True)

Question 18

Here is a version that uses sys.excepthook

import traceback
import sys

logger = logging.getLogger()

def handle_excepthook(type, message, stack):
     logger.error(f'An unhandled exception occured: {message}. Traceback: {traceback.format_tb(stack)}')

sys.excepthook = handle_excepthook

Question 19

maybe not as stylish, but easier:

#!/bin/bash
log="/var/log/yourlog"
/path/to/your/script.py 2>&1 | (while read; do echo "$REPLY" >> $log; done)

Question 20

Heres a simple example taken from the python 2.6 documentation:

import logging
LOG_FILENAME = '/tmp/logging_example.out'
logging.basicConfig(filename=LOG_FILENAME,level=logging.DEBUG,)

logging.debug('This message should go to the log file')

Question 21

My background is in C# and I’ve just recently started programming in Python. When an exception is thrown I typically want to wrap it in another exception that adds more information, while still showing the full stack trace. It’s quite easy in C#, but how do I do it in Python?

Eg. in C# I would do something like this:

try
{
  ProcessFile(filePath);
}
catch (Exception ex)
{
  throw new ApplicationException("Failed to process file " + filePath, ex);
}

In Python I can do something similar:

try:
  ProcessFile(filePath)
except Exception as e:
  raise Exception('Failed to process file ' + filePath, e)

…but this loses the traceback of the inner exception!

Edit: I’d like to see both exception messages and both stack traces and correlate the two. That is, I want to see in the output that exception X occurred here and then exception Y there – same as I would in C#. Is this possible in Python 2.6? Looks like the best I can do so far (based on Glenn Maynard’s answer) is:

try:
  ProcessFile(filePath)
except Exception as e:
  raise Exception('Failed to process file' + filePath, e), None, sys.exc_info()[2]

This includes both the messages and both the tracebacks, but it doesn’t show which exception occurred where in the traceback.

Question 22

Python 2

It’s simple; pass the traceback as the third argument to raise.

import sys
class MyException(Exception): pass

try:
    raise TypeError("test")
except TypeError, e:
    raise MyException(), None, sys.exc_info()[2]

Always do this when catching one exception and re-raising another.

Question 23

Python 3

In python 3 you can do the following:

try:
    raise MyExceptionToBeWrapped("I have twisted my ankle")

except MyExceptionToBeWrapped as e:

    raise MyWrapperException("I'm not in a good shape") from e

This will produce something like this:

   Traceback (most recent call last):
   ...
   MyExceptionToBeWrapped: ("I have twisted my ankle")

The above exception was the direct cause of the following exception:

   Traceback (most recent call last):
   ...
   MyWrapperException: ("I'm not in a good shape")

Question 24

Python 3 has the raise … from clause to chain exceptions. Glenn’s answer is great for Python 2.7, but it only uses the original exception’s traceback and throws away the error message and other details. Here are some examples in Python 2.7 that add context information from the current scope into the original exception’s error message, but keep other details intact.

Known Exception Type

try:
    sock_common = xmlrpclib.ServerProxy(rpc_url+'/common')
    self.user_id = sock_common.login(self.dbname, username, self.pwd)
except IOError:
    _, ex, traceback = sys.exc_info()
    message = "Connecting to '%s': %s." % (config['connection'],
                                           ex.strerror)
    raise IOError, (ex.errno, message), traceback

That flavour of raise statement takes the exception type as the first expression, the exception class constructor arguments in a tuple as the second expression, and the traceback as the third expression. If you’re running earlier than Python 2.2, see the warnings on sys.exc_info().

Any Exception Type

Here’s another example that’s more general purpose if you don’t know what kind of exceptions your code might have to catch. The downside is that it loses the exception type and just raises a RuntimeError. You have to import the traceback module.

except Exception:
    extype, ex, tb = sys.exc_info()
    formatted = traceback.format_exception_only(extype, ex)[-1]
    message = "Importing row %d, %s" % (rownum, formatted)
    raise RuntimeError, message, tb

Modify the Message

Here’s another option if the exception type will let you add context to it. You can modify the exception’s message and then reraise it.

import subprocess

try:
    final_args = ['lsx', '/home']
    s = subprocess.check_output(final_args)
except OSError as ex:
    ex.strerror += ' for command {}'.format(final_args)
    raise

That generates the following stack trace:

Traceback (most recent call last):
  File "/mnt/data/don/workspace/scratch/scratch.py", line 5, in <module>
    s = subprocess.check_output(final_args)
  File "/usr/lib/python2.7/subprocess.py", line 566, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/lib/python2.7/subprocess.py", line 710, in __init__
    errread, errwrite)
  File "/usr/lib/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory for command ['lsx', '/home']

You can see that it shows the line where check_output() was called, but the exception message now includes the command line.

Question 25

In Python 3.x:

raise Exception('Failed to process file ' + filePath).with_traceback(e.__traceback__)

or simply

except Exception:
    raise MyException()

which will propagate MyException but print both exceptions if it will not be handled.

In Python 2.x:

raise Exception, 'Failed to process file ' + filePath, e

You can prevent printing both exceptions by killing the __context__ attribute. Here I write a context manager using that to catch and change your exception on the fly: (see http://docs.python.org/3.1/library/stdtypes.html for expanation of how they work)

try: # Wrap the whole program into the block that will kill __context__.

    class Catcher(Exception):
        '''This context manager reraises an exception under a different name.'''

        def __init__(self, name):
            super().__init__('Failed to process code in {!r}'.format(name))

        def __enter__(self):
            return self

        def __exit__(self, exc_type, exc_val, exc_tb):
            if exc_type is not None:
                self.__traceback__ = exc_tb
                raise self

    ...


    with Catcher('class definition'):
        class a:
            def spam(self):
                # not really pass, but you get the idea
                pass

            lut = [1,
                   3,
                   17,
                   [12,34],
                   5,
                   _spam]


        assert a().lut[-1] == a.spam

    ...


except Catcher as e:
    e.__context__ = None
    raise

Question 26

I don’t think you can do this in Python 2.x, but something similar to this functionality is part of Python 3. From PEP 3134:

In today’s Python implementation, exceptions are composed of three parts: the type, the value, and the traceback. The ‘sys’ module, exposes the current exception in three parallel variables, exc_type, exc_value, and exc_traceback, the sys.exc_info() function returns a tuple of these three parts, and the ‘raise’ statement has a three-argument form accepting these three parts. Manipulating exceptions often requires passing these three things in parallel, which can be tedious and error-prone. Additionally, the ‘except’ statement can only provide access to the value, not the traceback. Adding the ‘traceback‘ attribute to exception values makes all the exception information accessible from a single place.

Comparison to C#:

Exceptions in C# contain a read-only ‘InnerException’ property that may point to another exception. Its documentation [10] says that “When an exception X is thrown as a direct result of a previous exception Y, the InnerException property of X should contain a reference to Y.” This property is not set by the VM automatically; rather, all exception constructors take an optional ‘innerException’ argument to set it explicitly. The ‘cause‘ attribute fulfills the same purpose as InnerException, but this PEP proposes a new form of ‘raise’ rather than extending the constructors of all exceptions. C# also provides a GetBaseException method that jumps directly to the end of the InnerException chain; this PEP proposes no analog.

Note also that Java, Ruby and Perl 5 don’t support this type of thing either. Quoting again:

As for other languages, Java and Ruby both discard the original exception when another exception occurs in a ‘catch’/’rescue’ or ‘finally’/’ensure’ clause. Perl 5 lacks built-in structured exception handling. For Perl 6, RFC number 88 [9] proposes an exception mechanism that implicitly retains chained exceptions in an array named @@.

Question 27

For maximum compatibility between Python 2 and 3, you can use raise_from in the six library. https://six.readthedocs.io/#six.raise_from . Here is your example (slightly modified for clarity):

import six

try:
  ProcessFile(filePath)
except Exception as e:
  six.raise_from(IOError('Failed to process file ' + repr(filePath)), e)

Question 28

You could use my CausedException class to chain exceptions in Python 2.x (and even in Python 3 it can be useful in case you want to give more than one caught exception as cause to a newly raised exception). Maybe it can help you.

Question 29

Maybe you could grab the relevant information and pass it up? I’m thinking something like:

import traceback
import sys
import StringIO

class ApplicationError:
    def __init__(self, value, e):
        s = StringIO.StringIO()
        traceback.print_exc(file=s)
        self.value = (value, s.getvalue())

    def __str__(self):
        return repr(self.value)

try:
    try:
        a = 1/0
    except Exception, e:
        raise ApplicationError("Failed to process file", e)
except Exception, e:
    print e

Question 30

Assuming:

you need a solution, which works for Python 2 (for pure Python 3 see raise ... from solution)
just want to enrich the error message, e.g. providing some additional context
need the full stack trace

you can use a simple solution from the docs https://docs.python.org/3/tutorial/errors.html#raising-exceptions:

try:
    raise NameError('HiThere')
except NameError:
    print 'An exception flew by!' # print or log, provide details about context
    raise # reraise the original exception, keeping full stack trace

The output:

An exception flew by!
Traceback (most recent call last):
  File "<stdin>", line 2, in ?
NameError: HiThere

It looks like the key piece is the simplified ‘raise’ keyword that stands alone. That will re-raise the Exception in the except block.

Question 31

I am trying to raise a Warning in Python without making the program crash / stop / interrupt.

I use the following simple function to check if the user passed a non-zero number to it. If so, the program should warn them, but continue as per normal. It should work like the code below, but should use class Warning(), Error() or Exception() instead of printing the warning out manually.

def is_zero(i):
   if i != 0:
     print "OK"
   else:
     print "WARNING: the input is 0!"
   return i

If I use the code below and pass 0 to the function, the program crashes and the value is never returned. Instead, I want the program to continue normally and just inform the user that he passed 0 to the function.

def is_zero(i):
   if i != 0:
     print "OK"
   else:
     raise Warning("the input is 0!")
   return i

I want to be able to test that a warning has been thrown testing it by unittest. If I simply print the message out, I am not able to test it with assertRaises in unittest.

Question 32

You shouldn’t raise the warning, you should be using warnings module. By raising it you’re generating error, rather than warning.

Question 33

import warnings
warnings.warn("Warning...........Message")

See the python documentation: here

Question 34

By default, unlike an exception, a warning doesn’t interrupt.

After import warnings, it is possible to specify a Warnings class when generating a warning. If one is not specified, it is literally UserWarning by default.

>>> warnings.warn('This is a default warning.')
<string>:1: UserWarning: This is a default warning.

To simply use a preexisting class instead, e.g. DeprecationWarning:

>>> warnings.warn('This is a particular warning.', DeprecationWarning)
<string>:1: DeprecationWarning: This is a particular warning.

Creating a custom warning class is similar to creating a custom exception class:

>>> class MyCustomWarning(UserWarning):
...     pass
... 
... warnings.warn('This is my custom warning.', MyCustomWarning)

<string>:1: MyCustomWarning: This is my custom warning.

For testing, consider assertWarns or assertWarnsRegex.

As an alternative, especially for standalone applications, consider the logging module. It can log messages having a level of debug, info, warning, error, etc. Log messages having a level of warning or higher are by default printed to stderr.

Question 35

I’m using a python script as a driver for a hydrodynamics code. When it comes time to run the simulation, I use subprocess.Popen to run the code, collect the output from stdout and stderr into a subprocess.PIPE — then I can print (and save to a log-file) the output information, and check for any errors. The problem is, I have no idea how the code is progressing. If I run it directly from the command line, it gives me output about what iteration its at, what time, what the next time-step is, etc.

Is there a way to both store the output (for logging and error checking), and also produce a live-streaming output?

The relevant section of my code:

ret_val = subprocess.Popen( run_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True )
output, errors = ret_val.communicate()
log_file.write(output)
print output
if( ret_val.returncode ):
    print "RUN failed\n\n%s\n\n" % (errors)
    success = False

if( errors ): log_file.write("\n\n%s\n\n" % errors)

Originally I was piping the run_command through tee so that a copy went directly to the log-file, and the stream still output directly to the terminal — but that way I can’t store any errors (to my knowlege).

Edit:

Temporary solution:

ret_val = subprocess.Popen( run_command, stdout=log_file, stderr=subprocess.PIPE, shell=True )
while not ret_val.poll():
    log_file.flush()

then, in another terminal, run tail -f log.txt (s.t. log_file = 'log.txt').

Question 36

You have two ways of doing this, either by creating an iterator from the read or readline functions and do:

import subprocess
import sys
with open('test.log', 'w') as f:  # replace 'w' with 'wb' for Python 3
    process = subprocess.Popen(your_command, stdout=subprocess.PIPE)
    for c in iter(lambda: process.stdout.read(1), ''):  # replace '' with b'' for Python 3
        sys.stdout.write(c)
        f.write(c)

or

import subprocess
import sys
with open('test.log', 'w') as f:  # replace 'w' with 'wb' for Python 3
    process = subprocess.Popen(your_command, stdout=subprocess.PIPE)
    for line in iter(process.stdout.readline, ''):  # replace '' with b'' for Python 3
        sys.stdout.write(line)
        f.write(line)

Or you can create a reader and a writer file. Pass the writer to the Popen and read from the reader

import io
import time
import subprocess
import sys

filename = 'test.log'
with io.open(filename, 'wb') as writer, io.open(filename, 'rb', 1) as reader:
    process = subprocess.Popen(command, stdout=writer)
    while process.poll() is None:
        sys.stdout.write(reader.read())
        time.sleep(0.5)
    # Read the remaining
    sys.stdout.write(reader.read())

This way you will have the data written in the test.log as well as on the standard output.

The only advantage of the file approach is that your code doesn’t block. So you can do whatever you want in the meantime and read whenever you want from the reader in a non-blocking way. When you use PIPE, read and readline functions will block until either one character is written to the pipe or a line is written to the pipe respectively.

Question 37

Executive Summary (or “tl;dr” version): it’s easy when there’s at most one `subprocess.PIPE`, otherwise it’s hard.

It may be time to explain a bit about how subprocess.Popen does its thing.

(Caveat: this is for Python 2.x, although 3.x is similar; and I’m quite fuzzy on the Windows variant. I understand the POSIX stuff much better.)

The Popen function needs to deal with zero-to-three I/O streams, somewhat simultaneously. These are denoted stdin, stdout, and stderr as usual.

You can provide:

None, indicating that you don’t want to redirect the stream. It will inherit these as usual instead. Note that on POSIX systems, at least, this does not mean it will use Python’s sys.stdout, just Python’s actual stdout; see demo at end.
An int value. This is a “raw” file descriptor (in POSIX at least). (Side note: PIPE and STDOUT are actually ints internally, but are “impossible” descriptors, -1 and -2.)
A stream—really, any object with a fileno method. Popen will find the descriptor for that stream, using stream.fileno(), and then proceed as for an int value.
subprocess.PIPE, indicating that Python should create a pipe.
subprocess.STDOUT (for stderr only): tell Python to use the same descriptor as for stdout. This only makes sense if you provided a (non-None) value for stdout, and even then, it is only needed if you set stdout=subprocess.PIPE. (Otherwise you can just provide the same argument you provided for stdout, e.g., Popen(..., stdout=stream, stderr=stream).)

The easiest cases (no pipes)

If you redirect nothing (leave all three as the default None value or supply explicit None), Pipe has it quite easy. It just needs to spin off the subprocess and let it run. Or, if you redirect to a non-PIPE—an int or a stream’s fileno()—it’s still easy, as the OS does all the work. Python just needs to spin off the subprocess, connecting its stdin, stdout, and/or stderr to the provided file descriptors.

The still-easy case: one pipe

If you redirect only one stream, Pipe still has things pretty easy. Let’s pick one stream at a time and watch.

Suppose you want to supply some stdin, but let stdout and stderr go un-redirected, or go to a file descriptor. As the parent process, your Python program simply needs to use write() to send data down the pipe. You can do this yourself, e.g.:

proc = subprocess.Popen(cmd, stdin=subprocess.PIPE)
proc.stdin.write('here, have some data\n') # etc

or you can pass the stdin data to proc.communicate(), which then does the stdin.write shown above. There is no output coming back so communicate() has only one other real job: it also closes the pipe for you. (If you don’t call proc.communicate() you must call proc.stdin.close() to close the pipe, so that the subprocess knows there is no more data coming through.)

Suppose you want to capture stdout but leave stdin and stderr alone. Again, it’s easy: just call proc.stdout.read() (or equivalent) until there is no more output. Since proc.stdout() is a normal Python I/O stream you can use all the normal constructs on it, like:

for line in proc.stdout:

or, again, you can use proc.communicate(), which simply does the read() for you.

If you want to capture only stderr, it works the same as with stdout.

There’s one more trick before things get hard. Suppose you want to capture stdout, and also capture stderr but on the same pipe as stdout:

proc = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)

In this case, subprocess “cheats”! Well, it has to do this, so it’s not really cheating: it starts the subprocess with both its stdout and its stderr directed into the (single) pipe-descriptor that feeds back to its parent (Python) process. On the parent side, there’s again only a single pipe-descriptor for reading the output. All the “stderr” output shows up in proc.stdout, and if you call proc.communicate(), the stderr result (second value in the tuple) will be None, not a string.

The hard cases: two or more pipes

The problems all come about when you want to use at least two pipes. In fact, the subprocess code itself has this bit:

def communicate(self, input=None):
    ...
    # Optimization: If we are only using one pipe, or no pipe at
    # all, using select() or threads is unnecessary.
    if [self.stdin, self.stdout, self.stderr].count(None) >= 2:

But, alas, here we’ve made at least two, and maybe three, different pipes, so the count(None) returns either 1 or 0. We must do things the hard way.

On Windows, this uses threading.Thread to accumulate results for self.stdout and self.stderr, and has the parent thread deliver self.stdin input data (and then close the pipe).

On POSIX, this uses poll if available, otherwise select, to accumulate output and deliver stdin input. All this runs in the (single) parent process/thread.

Threads or poll/select are needed here to avoid deadlock. Suppose, for instance, that we’ve redirected all three streams to three separate pipes. Suppose further that there’s a small limit on how much data can be stuffed into to a pipe before the writing process is suspended, waiting for the reading process to “clean out” the pipe from the other end. Let’s set that small limit to a single byte, just for illustration. (This is in fact how things work, except that the limit is much bigger than one byte.)

If the parent (Python) process tries to write several bytes—say, 'go\n'to proc.stdin, the first byte goes in and then the second causes the Python process to suspend, waiting for the subprocess to read the first byte, emptying the pipe.

Meanwhile, suppose the subprocess decides to print a friendly “Hello! Don’t Panic!” greeting. The H goes into its stdout pipe, but the e causes it to suspend, waiting for its parent to read that H, emptying the stdout pipe.

Now we’re stuck: the Python process is asleep, waiting to finish saying “go”, and the subprocess is also asleep, waiting to finish saying “Hello! Don’t Panic!”.

The subprocess.Popen code avoids this problem with threading-or-select/poll. When bytes can go over the pipes, they go. When they can’t, only a thread (not the whole process) has to sleep—or, in the case of select/poll, the Python process waits simultaneously for “can write” or “data available”, writes to the process’s stdin only when there is room, and reads its stdout and/or stderr only when data are ready. The proc.communicate() code (actually _communicate where the hairy cases are handled) returns once all stdin data (if any) have been sent and all stdout and/or stderr data have been accumulated.

If you want to read both stdout and stderr on two different pipes (regardless of any stdin redirection), you will need to avoid deadlock too. The deadlock scenario here is different—it occurs when the subprocess writes something long to stderr while you’re pulling data from stdout, or vice versa—but it’s still there.

The Demo

I promised to demonstrate that, un-redirected, Python subprocesses write to the underlying stdout, not sys.stdout. So, here is some code:

from cStringIO import StringIO
import os
import subprocess
import sys

def show1():
    print 'start show1'
    save = sys.stdout
    sys.stdout = StringIO()
    print 'sys.stdout being buffered'
    proc = subprocess.Popen(['echo', 'hello'])
    proc.wait()
    in_stdout = sys.stdout.getvalue()
    sys.stdout = save
    print 'in buffer:', in_stdout

def show2():
    print 'start show2'
    save = sys.stdout
    sys.stdout = open(os.devnull, 'w')
    print 'after redirect sys.stdout'
    proc = subprocess.Popen(['echo', 'hello'])
    proc.wait()
    sys.stdout = save

show1()
show2()

When run:

$ python out.py
start show1
hello
in buffer: sys.stdout being buffered

start show2
hello

Note that the first routine will fail if you add stdout=sys.stdout, as a StringIO object has no fileno. The second will omit the hello if you add stdout=sys.stdout since sys.stdout has been redirected to os.devnull.

(If you redirect Python’s file-descriptor-1, the subprocess will follow that redirection. The open(os.devnull, 'w') call produces a stream whose fileno() is greater than 2.)

Question 38

We can also use the default file iterator for reading stdout instead of using iter construct with readline().

import subprocess
import sys
process = subprocess.Popen(your_command, stdout=subprocess.PIPE)
for line in process.stdout:
    sys.stdout.write(line)

Question 39

If you’re able to use third-party libraries, You might be able to use something like sarge (disclosure: I’m its maintainer). This library allows non-blocking access to output streams from subprocesses – it’s layered over the subprocess module.

Question 40

Solution 1: Log stdout AND stderr concurrently in realtime

A simple solution which logs both stdout AND stderr concurrently, line-by-line in realtime into a log file.

import subprocess as sp
from concurrent.futures import ThreadPoolExecutor


def log_popen_pipe(p, stdfile):

    with open("mylog.txt", "w") as f:

        while p.poll() is None:
            f.write(stdfile.readline())
            f.flush()

        # Write the rest from the buffer
        f.write(stdfile.read())


with sp.Popen(["ls"], stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    with ThreadPoolExecutor(2) as pool:
        r1 = pool.submit(log_popen_pipe, p, p.stdout)
        r2 = pool.submit(log_popen_pipe, p, p.stderr)
        r1.result()
        r2.result()

Solution 2: A function read_popen_pipes() that allows you to iterate over both pipes (stdout/stderr), concurrently in realtime

import subprocess as sp
from queue import Queue, Empty
from concurrent.futures import ThreadPoolExecutor


def enqueue_output(file, queue):
    for line in iter(file.readline, ''):
        queue.put(line)
    file.close()


def read_popen_pipes(p):

    with ThreadPoolExecutor(2) as pool:
        q_stdout, q_stderr = Queue(), Queue()

        pool.submit(enqueue_output, p.stdout, q_stdout)
        pool.submit(enqueue_output, p.stderr, q_stderr)

        while True:

            if p.poll() is not None and q_stdout.empty() and q_stderr.empty():
                break

            out_line = err_line = ''

            try:
                out_line = q_stdout.get_nowait()
                err_line = q_stderr.get_nowait()
            except Empty:
                pass

            yield (out_line, err_line)

# The function in use:

with sp.Popen(["ls"], stdout=sp.PIPE, stderr=sp.PIPE, text=True) as p:

    for out_line, err_line in read_popen_pipes(p):
        print(out_line, end='')
        print(err_line, end='')

    p.poll()

Question 41

A good but “heavyweight” solution is to use Twisted – see the bottom.

If you’re willing to live with only stdout something along those lines should work:

import subprocess
import sys
popenobj = subprocess.Popen(["ls", "-Rl"], stdout=subprocess.PIPE)
while not popenobj.poll():
   stdoutdata = popenobj.stdout.readline()
   if stdoutdata:
      sys.stdout.write(stdoutdata)
   else:
      break
print "Return code", popenobj.returncode

(If you use read() it tries to read the entire “file” which isn’t useful, what we really could use here is something that reads all the data that’s in the pipe right now)

One might also try to approach this with threading, e.g.:

import subprocess
import sys
import threading

popenobj = subprocess.Popen("ls", stdout=subprocess.PIPE, shell=True)

def stdoutprocess(o):
   while True:
      stdoutdata = o.stdout.readline()
      if stdoutdata:
         sys.stdout.write(stdoutdata)
      else:
         break

t = threading.Thread(target=stdoutprocess, args=(popenobj,))
t.start()
popenobj.wait()
t.join()
print "Return code", popenobj.returncode

Now we could potentially add stderr as well by having two threads.

Note however the subprocess docs discourage using these files directly and recommends to use communicate() (mostly concerned with deadlocks which I think isn’t an issue above) and the solutions are a little klunky so it really seems like the subprocess module isn’t quite up to the job (also see: http://www.python.org/dev/peps/pep-3145/ ) and we need to look at something else.

A more involved solution is to use Twisted as shown here: https://twistedmatrix.com/documents/11.1.0/core/howto/process.html

The way you do this with Twisted is to create your process using reactor.spawnprocess() and providing a ProcessProtocol that then processes output asynchronously. The Twisted sample Python code is here: https://twistedmatrix.com/documents/11.1.0/core/howto/listings/process/process.py

Question 42

In addition to all these answer, one simple approach could also be as follows:

process = subprocess.Popen(your_command, stdout=subprocess.PIPE)

while process.stdout.readable():
    line = process.stdout.readline()

    if not line:
        break

    print(line.strip())

Loop through the readable stream as long as it’s readable and if it gets an empty result, stop.

The key here is that readline() returns a line (with \n at the end) as long as there’s an output and empty if it’s really at the end.

Hope this helps someone.

Question 43

Based on all the above I suggest a slightly modified version (python3):

while loop calling readline (The iter solution suggested seemed to block forever for me – Python 3, Windows 7)
structered so handling of read data does not need to be duplicated after poll returns not-None
stderr piped into stdout so both output outputs are read
Added code to get exit value of cmd.

Code:

import subprocess
proc = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE,
                        stderr=subprocess.STDOUT, universal_newlines=True)
while True:
    rd = proc.stdout.readline()
    print(rd, end='')  # and whatever you want to do...
    if not rd:  # EOF
        returncode = proc.poll()
        if returncode is not None:
            break
        time.sleep(0.1)  # cmd closed stdout, but not exited yet

# You may want to check on ReturnCode here

Question 44

It looks like line-buffered output will work for you, in which case something like the following might suit. (Caveat: it’s untested.) This will only give the subprocess’s stdout in real time. If you want to have both stderr and stdout in real time, you’ll have to do something more complex with select.

proc = subprocess.Popen(run_command, stdout=subprocess.PIPE, stderr=subprocess.PIPE, shell=True)
while proc.poll() is None:
    line = proc.stdout.readline()
    print line
    log_file.write(line + '\n')
# Might still be data on stdout at this point.  Grab any
# remainder.
for line in proc.stdout.read().split('\n'):
    print line
    log_file.write(line + '\n')
# Do whatever you want with proc.stderr here...

Question 45

Why not set stdout directly to sys.stdout? And if you need to output to a log as well, then you can simply override the write method of f.

import sys
import subprocess

class SuperFile(open.__class__):

    def write(self, data):
        sys.stdout.write(data)
        super(SuperFile, self).write(data)

f = SuperFile("log.txt","w+")       
process = subprocess.Popen(command, stdout=f, stderr=f)

Question 46

All of the above solutions I tried failed either to separate stderr and stdout output, (multiple pipes) or blocked forever when the OS pipe buffer was full which happens when the command you are running outputs too fast (there is a warning for this on python poll() manual of subprocess). The only reliable way I found was through select, but this is a posix-only solution:

import subprocess
import sys
import os
import select
# returns command exit status, stdout text, stderr text
# rtoutput: show realtime output while running
def run_script(cmd,rtoutput=0):
    p = subprocess.Popen(cmd, shell=True, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
    poller = select.poll()
    poller.register(p.stdout, select.POLLIN)
    poller.register(p.stderr, select.POLLIN)

    coutput=''
    cerror=''
    fdhup={}
    fdhup[p.stdout.fileno()]=0
    fdhup[p.stderr.fileno()]=0
    while sum(fdhup.values()) < len(fdhup):
        try:
            r = poller.poll(1)
        except select.error, err:
            if err.args[0] != EINTR:
                raise
            r=[]
        for fd, flags in r:
            if flags & (select.POLLIN | select.POLLPRI):
                c = os.read(fd, 1024)
                if rtoutput:
                    sys.stdout.write(c)
                    sys.stdout.flush()
                if fd == p.stderr.fileno():
                    cerror+=c
                else:
                    coutput+=c
            else:
                fdhup[fd]=1
    return p.poll(), coutput.strip(), cerror.strip()

Question 47

Similar to previous answers but the following solution worked for for me on windows using Python3 to provide a common method to print and log in realtime (getting-realtime-output-using-python):

def print_and_log(command, logFile):
    with open(logFile, 'wb') as f:
        command = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)

        while True:
            output = command.stdout.readline()
            if not output and command.poll() is not None:
                f.close()
                break
            if output:
                f.write(output)
                print(str(output.strip(), 'utf-8'), flush=True)
        return command.poll()

Question 48

I think that the subprocess.communicate method is a bit misleading: it actually fills the stdout and stderr that you specify in the subprocess.Popen.

Yet, reading from the subprocess.PIPE that you can provide to the subprocess.Popen‘s stdout and stderr parameters will eventually fill up OS pipe buffers and deadlock your app (especially if you’ve multiple processes/threads that must use subprocess).

My proposed solution is to provide the stdout and stderr with files – and read the files’ content instead of reading from the deadlocking PIPE. These files can be tempfile.NamedTemporaryFile() – which can also be accessed for reading while they’re being written into by subprocess.communicate.

Below is a sample usage:

        try:
            with ProcessRunner(('python', 'task.py'), env=os.environ.copy(), seconds_to_wait=0.01) as process_runner:
                for out in process_runner:
                    print(out)
        catch ProcessError as e:
            print(e.error_message)
            raise

And this is the source code which is ready to be used with as many comments as I could provide to explain what it does:

If you’re using python 2, please make sure to first install the latest version of the subprocess32 package from pypi.


import os
import sys
import threading
import time
import tempfile
import logging

if os.name == 'posix' and sys.version_info[0] < 3:
    # Support python 2
    import subprocess32 as subprocess
else:
    # Get latest and greatest from python 3
    import subprocess

logger = logging.getLogger(__name__)


class ProcessError(Exception):
    """Base exception for errors related to running the process"""


class ProcessTimeout(ProcessError):
    """Error that will be raised when the process execution will exceed a timeout"""


class ProcessRunner(object):
    def __init__(self, args, env=None, timeout=None, bufsize=-1, seconds_to_wait=0.25, **kwargs):
        """
        Constructor facade to subprocess.Popen that receives parameters which are more specifically required for the
        Process Runner. This is a class that should be used as a context manager - and that provides an iterator
        for reading captured output from subprocess.communicate in near realtime.

        Example usage:


        try:
            with ProcessRunner(('python', task_file_path), env=os.environ.copy(), seconds_to_wait=0.01) as process_runner:
                for out in process_runner:
                    print(out)
        catch ProcessError as e:
            print(e.error_message)
            raise

        :param args: same as subprocess.Popen
        :param env: same as subprocess.Popen
        :param timeout: same as subprocess.communicate
        :param bufsize: same as subprocess.Popen
        :param seconds_to_wait: time to wait between each readline from the temporary file
        :param kwargs: same as subprocess.Popen
        """
        self._seconds_to_wait = seconds_to_wait
        self._process_has_timed_out = False
        self._timeout = timeout
        self._process_done = False
        self._std_file_handle = tempfile.NamedTemporaryFile()
        self._process = subprocess.Popen(args, env=env, bufsize=bufsize,
                                         stdout=self._std_file_handle, stderr=self._std_file_handle, **kwargs)
        self._thread = threading.Thread(target=self._run_process)
        self._thread.daemon = True

    def __enter__(self):
        self._thread.start()
        return self

    def __exit__(self, exc_type, exc_val, exc_tb):
        self._thread.join()
        self._std_file_handle.close()

    def __iter__(self):
        # read all output from stdout file that subprocess.communicate fills
        with open(self._std_file_handle.name, 'r') as stdout:
            # while process is alive, keep reading data
            while not self._process_done:
                out = stdout.readline()
                out_without_trailing_whitespaces = out.rstrip()
                if out_without_trailing_whitespaces:
                    # yield stdout data without trailing \n
                    yield out_without_trailing_whitespaces
                else:
                    # if there is nothing to read, then please wait a tiny little bit
                    time.sleep(self._seconds_to_wait)

            # this is a hack: terraform seems to write to buffer after process has finished
            out = stdout.read()
            if out:
                yield out

        if self._process_has_timed_out:
            raise ProcessTimeout('Process has timed out')

        if self._process.returncode != 0:
            raise ProcessError('Process has failed')

    def _run_process(self):
        try:
            # Start gathering information (stdout and stderr) from the opened process
            self._process.communicate(timeout=self._timeout)
            # Graceful termination of the opened process
            self._process.terminate()
        except subprocess.TimeoutExpired:
            self._process_has_timed_out = True
            # Force termination of the opened process
            self._process.kill()

        self._process_done = True

    @property
    def return_code(self):
        return self._process.returncode

Question 49

Here is a class which I’m using in one of my projects. It redirects output of a subprocess to the log. At first I tried simply overwriting the write-method but that doesn’t work as the subprocess will never call it (redirection happens on filedescriptor level). So I’m using my own pipe, similar to how it’s done in the subprocess-module. This has the advantage of encapsulating all logging/printing logic in the adapter and you can simply pass instances of the logger to Popen: subprocess.Popen("/path/to/binary", stderr = LogAdapter("foo"))

class LogAdapter(threading.Thread):

    def __init__(self, logname, level = logging.INFO):
        super().__init__()
        self.log = logging.getLogger(logname)
        self.readpipe, self.writepipe = os.pipe()

        logFunctions = {
            logging.DEBUG: self.log.debug,
            logging.INFO: self.log.info,
            logging.WARN: self.log.warn,
            logging.ERROR: self.log.warn,
        }

        try:
            self.logFunction = logFunctions[level]
        except KeyError:
            self.logFunction = self.log.info

    def fileno(self):
        #when fileno is called this indicates the subprocess is about to fork => start thread
        self.start()
        return self.writepipe

    def finished(self):
       """If the write-filedescriptor is not closed this thread will
       prevent the whole program from exiting. You can use this method
       to clean up after the subprocess has terminated."""
       os.close(self.writepipe)

    def run(self):
        inputFile = os.fdopen(self.readpipe)

        while True:
            line = inputFile.readline()

            if len(line) == 0:
                #no new data was added
                break

            self.logFunction(line.strip())

If you don’t need logging but simply want to use print() you can obviously remove large portions of the code and keep the class shorter. You could also expand it by an __enter__ and __exit__ method and call finished in __exit__ so that you could easily use it as context.

Question 50

None of the Pythonic solutions worked for me. It turned out that proc.stdout.read() or similar may block forever.

Therefore, I use tee like this:

subprocess.run('./my_long_running_binary 2>&1 | tee -a my_log_file.txt && exit ${PIPESTATUS}', shell=True, check=True, executable='/bin/bash')

This solution is convenient if you are already using shell=True.

${PIPESTATUS} captures the success status of the entire command chain (only available in Bash). If I omitted the && exit ${PIPESTATUS}, then this would always return zero since tee never fails.

unbuffer might be necessary for printing each line immediately into the terminal, instead of waiting way too long until the “pipe buffer” gets filled. However, unbuffer swallows the exit status of assert (SIG Abort)…

2>&1 also logs stderror to the file.

问题：如何用零除返回0

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

问题：带回溯的日志异常

回答 0

回答 1

回答 2

回答 3

回答 4

例子

Examples

回答 5

回答 6

回答 7

回答 8

回答 9

问题：Python中的“内部异常”（带有追溯）？

回答 0

Python 2

Python 2

回答 1

Python 3

Python 3

回答 2

已知异常类型

任何异常类型

修改讯息

Known Exception Type

Any Exception Type

Modify the Message

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

问题：在Python中发出警告而不会中断程序

回答 0

回答 1

回答 2

问题：子流程命令的实时输出

回答 0

回答 1

执行摘要（或“ tl; dr”版本）：最多有一个很容易subprocess.PIPE，否则很难。

最简单的情况（无管道）

仍然很容易的情况：一根烟斗

困难情况：两个或更多管道

演示

Executive Summary (or “tl;dr” version): it’s easy when there’s at most one subprocess.PIPE, otherwise it’s hard.

The easiest cases (no pipes)

The still-easy case: one pipe

The hard cases: two or more pipes

The Demo

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

问题：为什么“ except：pass”是不好的编程习惯？

回答 0

不要发现任何错误

尽量避免传入除了块

except: pass

Don’t catch any error

Try to avoid passing in except blocks

except: pass

执行摘要（或“ tl; dr”版本）：最多有一个很容易`subprocess.PIPE`，否则很难。

Executive Summary (or “tl;dr” version): it’s easy when there’s at most one `subprocess.PIPE`, otherwise it’s hard.

`except: pass`

`except: pass`

Python 3： `logging`

Python 3: `logging`