Python 实用宝典

Question 1

Is there a Pythonic way to have only one instance of a program running?

The only reasonable solution I’ve come up with is trying to run it as a server on some port, then second program trying to bind to same port – fails. But it’s not really a great idea, maybe there’s something more lightweight than this?

(Take into consideration that program is expected to fail sometimes, i.e. segfault – so things like “lock file” won’t work)

Question 2

The following code should do the job, it is cross-platform and runs on Python 2.4-3.2. I tested it on Windows, OS X and Linux.

from tendo import singleton
me = singleton.SingleInstance() # will sys.exit(-1) if other instance is running

The latest code version is available singleton.py. Please file bugs here.

You can install tend using one of the following methods:

easy_install tendo
pip install tendo
manually by getting it from http://pypi.python.org/pypi/tendo

Question 3

Simple, ~~cross-platform~~ solution, found in another question by zgoda:

import fcntl
import os
import sys

def instance_already_running(label="default"):
    """
    Detect if an an instance with the label is already running, globally
    at the operating system level.

    Using `os.open` ensures that the file pointer won't be closed
    by Python's garbage collector after the function's scope is exited.

    The lock will be released when the program exits, or could be
    released if the file pointer were closed.
    """

    lock_file_pointer = os.open(f"/tmp/instance_{label}.lock", os.O_WRONLY)

    try:
        fcntl.lockf(lock_file_pointer, fcntl.LOCK_EX | fcntl.LOCK_NB)
        already_running = False
    except IOError:
        already_running = True

    return already_running

A lot like S.Lott’s suggestion, but with the code.

Question 4

This code is Linux specific. It uses ‘abstract’ UNIX domain sockets, but it is simple and won’t leave stale lock files around. I prefer it to the solution above because it doesn’t require a specially reserved TCP port.

try:
    import socket
    s = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
    ## Create an abstract socket, by prefixing it with null. 
    s.bind( '\0postconnect_gateway_notify_lock') 
except socket.error as e:
    error_code = e.args[0]
    error_string = e.args[1]
    print "Process already running (%d:%s ). Exiting" % ( error_code, error_string) 
    sys.exit (0)

The unique string postconnect_gateway_notify_lock can be changed to allow multiple programs that need a single instance enforced.

Question 5

I don’t know if it’s pythonic enough, but in the Java world listening on a defined port is a pretty widely used solution, as it works on all major platforms and doesn’t have any problems with crashing programs.

Another advantage of listening to a port is that you could send a command to the running instance. For example when the users starts the program a second time, you could send the running instance a command to tell it to open another window (that’s what Firefox does, for example. I don’t know if they use TCP ports or named pipes or something like that, ‘though).

Question 6

Never written python before, but this is what I’ve just implemented in mycheckpoint, to prevent it being started twice or more by crond:

import os
import sys
import fcntl
fh=0
def run_once():
    global fh
    fh=open(os.path.realpath(__file__),'r')
    try:
        fcntl.flock(fh,fcntl.LOCK_EX|fcntl.LOCK_NB)
    except:
        os._exit(0)

run_once()

Found Slava-N’s suggestion after posting this in another issue (http://stackoverflow.com/questions/2959474). This one is called as a function, locks the executing scripts file (not a pid file) and maintains the lock until the script ends (normal or error).

Question 7

Use a pid file. You have some known location, “/path/to/pidfile” and at startup you do something like this (partially pseudocode because I’m pre-coffee and don’t want to work all that hard):

import os, os.path
pidfilePath = """/path/to/pidfile"""
if os.path.exists(pidfilePath):
   pidfile = open(pidfilePath,"r")
   pidString = pidfile.read()
   if <pidString is equal to os.getpid()>:
      # something is real weird
      Sys.exit(BADCODE)
   else:
      <use ps or pidof to see if the process with pid pidString is still running>
      if  <process with pid == 'pidString' is still running>:
          Sys.exit(ALREADAYRUNNING)
      else:
          # the previous server must have crashed
          <log server had crashed>
          <reopen pidfilePath for writing>
          pidfile.write(os.getpid())
else:
    <open pidfilePath for writing>
    pidfile.write(os.getpid())

So, in other words, you’re checking if a pidfile exists; if not, write your pid to that file. If the pidfile does exist, then check to see if the pid is the pid of a running process; if so, then you’ve got another live process running, so just shut down. If not, then the previous process crashed, so log it, and then write your own pid to the file in place of the old one. Then continue.

Question 8

You already found reply to similar question in another thread, so for completeness sake see how to achieve the same on Windows uning named mutex.

http://code.activestate.com/recipes/474070/

Question 9

This may work.

Attempt create a PID file to a known location. If you fail, someone has the file locked, you’re done.
When you finish normally, close and remove the PID file, so someone else can overwrite it.

You can wrap your program in a shell script that removes the PID file even if your program crashes.

You can, also, use the PID file to kill the program if it hangs.

Question 10

Using a lock-file is a quite common approach on unix. If it crashes, you have to clean up manually. You could stor the PID in the file, and on startup check if there is a process with this PID, overriding the lock-file if not. (However, you also need a lock around the read-file-check-pid-rewrite-file). You will find what you need for getting and checking pid in the os-package. The common way of checking if there exists a process with a given pid, is to send it a non-fatal signal.

Other alternatives could be combining this with flock or posix semaphores.

Opening a network socket, as saua proposed, would probably be the easiest and most portable.

Question 11

For anybody using wxPython for their application, you can use the function wx.SingleInstanceChecker documented here.

I personally use a subclass of wx.App which makes use of wx.SingleInstanceChecker and returns False from OnInit() if there is an existing instance of the app already executing like so:

import wx

class SingleApp(wx.App):
    """
    class that extends wx.App and only permits a single running instance.
    """

    def OnInit(self):
        """
        wx.App init function that returns False if the app is already running.
        """
        self.name = "SingleApp-%s".format(wx.GetUserId())
        self.instance = wx.SingleInstanceChecker(self.name)
        if self.instance.IsAnotherRunning():
            wx.MessageBox(
                "An instance of the application is already running", 
                "Error", 
                 wx.OK | wx.ICON_WARNING
            )
            return False
        return True

This is a simple drop-in replacement for wx.App that prohibits multiple instances. To use it simply replace wx.App with SingleApp in your code like so:

app = SingleApp(redirect=False)
frame = wx.Frame(None, wx.ID_ANY, "Hello World")
frame.Show(True)
app.MainLoop()

Question 12

Here is my eventual Windows-only solution. Put the following into a module, perhaps called ‘onlyone.py’, or whatever. Include that module directly into your __ main __ python script file.

import win32event, win32api, winerror, time, sys, os
main_path = os.path.abspath(sys.modules['__main__'].__file__).replace("\\", "/")

first = True
while True:
        mutex = win32event.CreateMutex(None, False, main_path + "_{<paste YOUR GUID HERE>}")
        if win32api.GetLastError() == 0:
            break
        win32api.CloseHandle(mutex)
        if first:
            print "Another instance of %s running, please wait for completion" % main_path
            first = False
        time.sleep(1)

Explanation

The code attempts to create a mutex with name derived from the full path to the script. We use forward-slashes to avoid potential confusion with the real file system.

Advantages

No configuration or ‘magic’ identifiers needed, use it in as many different scripts as needed.
No stale files left around, the mutex dies with you.
Prints a helpful message when waiting

Question 13

The best solution for this on windows is to use mutexes as suggested by @zgoda.

import win32event
import win32api
from winerror import ERROR_ALREADY_EXISTS

mutex = win32event.CreateMutex(None, False, 'name')
last_error = win32api.GetLastError()

if last_error == ERROR_ALREADY_EXISTS:
   print("App instance already running")

Some answers use fctnl (included also in @sorin tendo package) which is not available on windows and should you try to freeze your python app using a package like pyinstaller which does static imports, it throws an error.

Also, using the lock file method, creates a read-only problem with database files( experienced this with sqlite3).

Question 14

I’m posting this as an answer because I’m a new user and Stack Overflow won’t let me vote yet.

Sorin Sbarnea’s solution works for me under OS X, Linux and Windows, and I am grateful for it.

However, tempfile.gettempdir() behaves one way under OS X and Windows and another under other some/many/all(?) *nixes (ignoring the fact that OS X is also Unix!). The difference is important to this code.

OS X and Windows have user-specific temp directories, so a tempfile created by one user isn’t visible to another user. By contrast, under many versions of *nix (I tested Ubuntu 9, RHEL 5, OpenSolaris 2008 and FreeBSD 8), the temp dir is /tmp for all users.

That means that when the lockfile is created on a multi-user machine, it’s created in /tmp and only the user who creates the lockfile the first time will be able to run the application.

A possible solution is to embed the current username in the name of the lock file.

It’s worth noting that the OP’s solution of grabbing a port will also misbehave on a multi-user machine.

Question 15

I use single_process on my gentoo;

pip install single_process

example:

from single_process import single_process

@single_process
def main():
    print 1

if __name__ == "__main__":
    main()

refer: https://pypi.python.org/pypi/single_process/1.0

Question 16

I keep suspecting there ought to be a good POSIXy solution using process groups, without having to hit the file system, but I can’t quite nail it down. Something like:

On startup, your process sends a ‘kill -0’ to all processes in a particular group. If any such processes exist, it exits. Then it joins the group. No other processes use that group.

However, this has a race condition – multiple processes could all do this at precisely the same time and all end up joining the group and running simultaneously. By the time you’ve added some sort of mutex to make it watertight, you no longer need the process groups.

This might be acceptable if your process only gets started by cron, once every minute or every hour, but it makes me a bit nervous that it would go wrong precisely on the day when you don’t want it to.

I guess this isn’t a very good solution after all, unless someone can improve on it?

Question 17

I ran into this exact problem last week, and although I did find some good solutions, I decided to make a very simple and clean python package and uploaded it to PyPI. It differs from tendo in that it can lock any string resource name. Although you could certainly lock __file__ to achieve the same effect.

Install with: pip install quicklock

Using it is extremely simple:

[nate@Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> # Let's create a lock so that only one instance of a script will run
...
>>> singleton('hello world')
>>>
>>> # Let's try to do that again, this should fail
...
>>> singleton('hello world')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nate/live/gallery/env/lib/python2.7/site-packages/quicklock/quicklock.py", line 47, in singleton
    raise RuntimeError('Resource <{}> is currently locked by <Process {}: "{}">'.format(resource, other_process.pid, other_process.name()))
RuntimeError: Resource <hello world> is currently locked by <Process 24801: "python">
>>>
>>> # But if we quit this process, we release the lock automatically
...
>>> ^D
[nate@Nates-MacBook-Pro-3 ~/live] python
Python 2.7.6 (default, Sep  9 2014, 15:04:36)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.39)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> from quicklock import singleton
>>> singleton('hello world')
>>>
>>> # No exception was thrown, we own 'hello world'!

Take a look: https://pypi.python.org/pypi/quicklock

Question 18

Building upon Roberto Rosario’s answer, I come up with the following function:

SOCKET = None
def run_single_instance(uniq_name):
    try:
        import socket
        global SOCKET
        SOCKET = socket.socket(socket.AF_UNIX, socket.SOCK_STREAM)
        ## Create an abstract socket, by prefixing it with null.
        # this relies on a feature only in linux, when current process quits, the
        # socket will be deleted.
        SOCKET.bind('\0' + uniq_name)
        return True
    except socket.error as e:
        return False

We need to define global SOCKET vaiable since it will only be garbage collected when the whole process quits. If we declare a local variable in the function, it will go out of scope after the function exits, thus the socket be deleted.

All the credit should go to Roberto Rosario, since I only clarify and elaborate upon his code. And this code will work only on Linux, as the following quoted text from https://troydhanson.github.io/network/Unix_domain_sockets.html explains:

Linux has a special feature: if the pathname for a UNIX domain socket begins with a null byte \0, its name is not mapped into the filesystem. Thus it won’t collide with other names in the filesystem. Also, when a server closes its UNIX domain listening socket in the abstract namespace, its file is deleted; with regular UNIX domain sockets, the file persists after the server closes it.

Question 19

linux example

This method is based on the creation of a temporary file automatically deleted after you close the application. the program launch we verify the existence of the file; if the file exists ( there is a pending execution) , the program is closed ; otherwise it creates the file and continues the execution of the program.

from tempfile import *
import time
import os
import sys


f = NamedTemporaryFile( prefix='lock01_', delete=True) if not [f  for f in     os.listdir('/tmp') if f.find('lock01_')!=-1] else sys.exit()

YOUR CODE COMES HERE

Question 20

On a Linux system one could also ask pgrep -a for the number of instances, the script is found in the process list (option -a reveals the full command line string). E.g.

import os
import sys
import subprocess

procOut = subprocess.check_output( "/bin/pgrep -u $UID -a python", shell=True, 
                                   executable="/bin/bash", universal_newlines=True)

if procOut.count( os.path.basename(__file__)) > 1 :        
    sys.exit( ("found another instance of >{}<, quitting."
              ).format( os.path.basename(__file__)))

Remove -u $UID if the restriction should apply to all users. Disclaimer: a) it is assumed that the script’s (base)name is unique, b) there might be race conditions.

Question 21

import sys,os

# start program
try:  # (1)
    os.unlink('lock')  # (2)
    fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (3)  
except: 
    try: fd=os.open("lock", os.O_CREAT|os.O_EXCL) # (4) 
    except:  
        print "Another Program running !.."  # (5)
        sys.exit()  

# your program  ...
# ...

# exit program
try: os.close(fd)  # (6)
except: pass
try: os.unlink('lock')  
except: pass
sys.exit()

Question 22

Is there a way to check to see if a pid corresponds to a valid process? I’m getting a pid from a different source other than from os.getpid() and I need to check to see if a process with that pid doesn’t exist on the machine.

I need it to be available in Unix and Windows. I’m also checking to see if the PID is NOT in use.

Question 23

Sending signal 0 to a pid will raise an OSError exception if the pid is not running, and do nothing otherwise.

import os

def check_pid(pid):        
    """ Check For the existence of a unix pid. """
    try:
        os.kill(pid, 0)
    except OSError:
        return False
    else:
        return True

Question 24

Have a look at the psutil module:

psutil (python system and process utilities) is a cross-platform library for retrieving information on running processes and system utilization (CPU, memory, disks, network) in Python. […] It currently supports Linux, Windows, OSX, FreeBSD and Sun Solaris, both 32-bit and 64-bit architectures, with Python versions from 2.6 to 3.4 (users of Python 2.4 and 2.5 may use 2.1.3 version). PyPy is also known to work.

It has a function called pid_exists() that you can use to check whether a process with the given pid exists.

Here’s an example:

import psutil
pid = 12345
if psutil.pid_exists(pid):
    print("a process with pid %d exists" % pid)
else:
    print("a process with pid %d does not exist" % pid)

For reference:

Question 25

mluebke code is not 100% correct; kill() can also raise EPERM (access denied) in which case that obviously means a process exists. This is supposed to work:

(edited as per Jason R. Coombs comments)

import errno
import os

def pid_exists(pid):
    """Check whether pid exists in the current process table.
    UNIX only.
    """
    if pid < 0:
        return False
    if pid == 0:
        # According to "man 2 kill" PID 0 refers to every process
        # in the process group of the calling process.
        # On certain systems 0 is a valid PID but we have no way
        # to know that in a portable fashion.
        raise ValueError('invalid PID 0')
    try:
        os.kill(pid, 0)
    except OSError as err:
        if err.errno == errno.ESRCH:
            # ESRCH == No such process
            return False
        elif err.errno == errno.EPERM:
            # EPERM clearly means there's a process to deny access to
            return True
        else:
            # According to "man 2 kill" possible error values are
            # (EINVAL, EPERM, ESRCH)
            raise
    else:
        return True

You can’t do this on Windows unless you use pywin32, ctypes or a C extension module. If you’re OK with depending from an external lib you can use psutil:

>>> import psutil
>>> psutil.pid_exists(2353)
True

Question 26

The answers involving sending ‘signal 0’ to the process will work only if the process in question is owned by the user running the test. Otherwise you will get an OSError due to permissions, even if the pid exists in the system.

In order to bypass this limitation you can check if /proc/<pid> exists:

import os

def is_running(pid):
    if os.path.isdir('/proc/{}'.format(pid)):
        return True
    return False

This applies to linux based systems only, obviously.

Question 27

In Python 3.3+, you could use exception names instead of errno constants. Posix version:

import os

def pid_exists(pid): 
    if pid < 0: return False #NOTE: pid == 0 returns True
    try:
        os.kill(pid, 0) 
    except ProcessLookupError: # errno.ESRCH
        return False # No such process
    except PermissionError: # errno.EPERM
        return True # Operation not permitted (i.e., process exists)
    else:
        return True # no error, we can send a signal to the process

Question 28

Look here for windows-specific way of getting full list of running processes with their IDs. It would be something like

from win32com.client import GetObject
def get_proclist():
    WMI = GetObject('winmgmts:')
    processes = WMI.InstancesOf('Win32_Process')
    return [process.Properties_('ProcessID').Value for process in processes]

You can then verify pid you get against this list. I have no idea about performance cost, so you’d better check this if you’re going to do pid verification often.

For *NIx, just use mluebke’s solution.

Question 29

Building upon ntrrgc’s I’ve beefed up the windows version so it checks the process exit code and checks for permissions:

def pid_exists(pid):
    """Check whether pid exists in the current process table."""
    if os.name == 'posix':
        import errno
        if pid < 0:
            return False
        try:
            os.kill(pid, 0)
        except OSError as e:
            return e.errno == errno.EPERM
        else:
            return True
    else:
        import ctypes
        kernel32 = ctypes.windll.kernel32
        HANDLE = ctypes.c_void_p
        DWORD = ctypes.c_ulong
        LPDWORD = ctypes.POINTER(DWORD)
        class ExitCodeProcess(ctypes.Structure):
            _fields_ = [ ('hProcess', HANDLE),
                ('lpExitCode', LPDWORD)]

        SYNCHRONIZE = 0x100000
        process = kernel32.OpenProcess(SYNCHRONIZE, 0, pid)
        if not process:
            return False

        ec = ExitCodeProcess()
        out = kernel32.GetExitCodeProcess(process, ctypes.byref(ec))
        if not out:
            err = kernel32.GetLastError()
            if kernel32.GetLastError() == 5:
                # Access is denied.
                logging.warning("Access is denied to get pid info.")
            kernel32.CloseHandle(process)
            return False
        elif bool(ec.lpExitCode):
            # print ec.lpExitCode.contents
            # There is an exist code, it quit
            kernel32.CloseHandle(process)
            return False
        # No exit code, it's running.
        kernel32.CloseHandle(process)
        return True

Question 30

Combining Giampaolo Rodolà’s answer for POSIX and mine for Windows I got this:

import os
if os.name == 'posix':
    def pid_exists(pid):
        """Check whether pid exists in the current process table."""
        import errno
        if pid < 0:
            return False
        try:
            os.kill(pid, 0)
        except OSError as e:
            return e.errno == errno.EPERM
        else:
            return True
else:
    def pid_exists(pid):
        import ctypes
        kernel32 = ctypes.windll.kernel32
        SYNCHRONIZE = 0x100000

        process = kernel32.OpenProcess(SYNCHRONIZE, 0, pid)
        if process != 0:
            kernel32.CloseHandle(process)
            return True
        else:
            return False

Question 31

In Windows, you can do it in this way:

import ctypes
PROCESS_QUERY_INFROMATION = 0x1000
def checkPid(pid):
    processHandle = ctypes.windll.kernel32.OpenProcess(PROCESS_QUERY_INFROMATION, 0,pid)
    if processHandle == 0:
        return False
    else:
        ctypes.windll.kernel32.CloseHandle(processHandle)
    return True

First of all, in this code you try to get a handle for process with pid given. If the handle is valid, then close the handle for process and return True; otherwise, you return False. Documentation for OpenProcess: https://msdn.microsoft.com/en-us/library/windows/desktop/ms684320%28v=vs.85%29.aspx

Question 32

This will work for Linux, for example if you want to check if banshee is running… (banshee is a music player)

import subprocess

def running_process(process):
    "check if process is running. < process > is the name of the process."

    proc = subprocess.Popen(["if pgrep " + process + " >/dev/null 2>&1; then echo 'True'; else echo 'False'; fi"], stdout=subprocess.PIPE, shell=True)

    (Process_Existance, err) = proc.communicate()
    return Process_Existance

# use the function
print running_process("banshee")

Question 33

I’d say use the PID for whatever purpose you’re obtaining it and handle the errors gracefully. Otherwise, it’s a classic race (the PID may be valid when you check it’s valid, but go away an instant later)

Question 34

I have a python daemon running as a part of my web app/ How can I quickly check (using python) if my daemon is running and, if not, launch it?

I want to do it that way to fix any crashes of the daemon, and so the script does not have to be run manually, it will automatically run as soon as it is called and then stay running.

How can i check (using python) if my script is running?

Question 35

Drop a pidfile somewhere (e.g. /tmp). Then you can check to see if the process is running by checking to see if the PID in the file exists. Don’t forget to delete the file when you shut down cleanly, and check for it when you start up.

#/usr/bin/env python

import os
import sys

pid = str(os.getpid())
pidfile = "/tmp/mydaemon.pid"

if os.path.isfile(pidfile):
    print "%s already exists, exiting" % pidfile
    sys.exit()
file(pidfile, 'w').write(pid)
try:
    # Do some actual work here
finally:
    os.unlink(pidfile)

Then you can check to see if the process is running by checking to see if the contents of /tmp/mydaemon.pid are an existing process. Monit (mentioned above) can do this for you, or you can write a simple shell script to check it for you using the return code from ps.

ps up `cat /tmp/mydaemon.pid ` >/dev/null && echo "Running" || echo "Not running"

For extra credit, you can use the atexit module to ensure that your program cleans up its pidfile under any circumstances (when killed, exceptions raised, etc.).

Question 36

A technique that is handy on a Linux system is using domain sockets:

import socket
import sys
import time

def get_lock(process_name):
    # Without holding a reference to our socket somewhere it gets garbage
    # collected when the function exits
    get_lock._lock_socket = socket.socket(socket.AF_UNIX, socket.SOCK_DGRAM)

    try:
        # The null byte (\0) means the socket is created 
        # in the abstract namespace instead of being created 
        # on the file system itself.
        # Works only in Linux
        get_lock._lock_socket.bind('\0' + process_name)
        print 'I got the lock'
    except socket.error:
        print 'lock exists'
        sys.exit()


get_lock('running_test')
while True:
    time.sleep(3)

It is atomic and avoids the problem of having lock files lying around if your process gets sent a SIGKILL

You can read in the documentation for socket.close that sockets are automatically closed when garbage collected.

Question 37

The pid library can do exactly this.

from pid import PidFile

with PidFile():
  do_something()

It will also automatically handle the case where the pidfile exists but the process is not running.

Question 38

Of course the example from Dan will not work as it should be.

Indeed, if the script crash, rise an exception, or does not clean pid file, the script will be run multiple times.

I suggest the following based from another website:

This is to check if there is already a lock file existing

\#/usr/bin/env python
import os
import sys
if os.access(os.path.expanduser("~/.lockfile.vestibular.lock"), os.F_OK):
        #if the lockfile is already there then check the PID number
        #in the lock file
        pidfile = open(os.path.expanduser("~/.lockfile.vestibular.lock"), "r")
        pidfile.seek(0)
        old_pid = pidfile.readline()
        # Now we check the PID from lock file matches to the current
        # process PID
        if os.path.exists("/proc/%s" % old_pid):
                print "You already have an instance of the program running"
                print "It is running as process %s," % old_pid
                sys.exit(1)
        else:
                print "File is there but the program is not running"
                print "Removing lock file for the: %s as it can be there because of the program last time it was run" % old_pid
                os.remove(os.path.expanduser("~/.lockfile.vestibular.lock"))

This is part of code where we put a PID file in the lock file

pidfile = open(os.path.expanduser("~/.lockfile.vestibular.lock"), "w")
pidfile.write("%s" % os.getpid())
pidfile.close()

This code will check the value of pid compared to existing running process., avoiding double execution.

I hope it will help.

Question 39

There are very good packages for restarting processes on UNIX. One that has a great tutorial about building and configuring it is monit. With some tweaking you can have a rock solid proven technology keeping up your daemon.

Question 40

My solution is to check for the process and command line arguments Tested on windows and ubuntu linux

import psutil
import os

def is_running(script):
    for q in psutil.process_iter():
        if q.name().startswith('python'):
            if len(q.cmdline())>1 and script in q.cmdline()[1] and q.pid !=os.getpid():
                print("'{}' Process is already running".format(script))
                return True

    return False


if not is_running("test.py"):
    n = input("What is Your Name? ")
    print ("Hello " + n)

Question 41

There are a myriad of options. One method is using system calls or python libraries that perform such calls for you. The other is simply to spawn out a process like:

ps ax | grep processName

and parse the output. Many people choose this approach, it isn’t necessarily a bad approach in my view.

Question 42

Came across this old question looking for solution myself.

Use psutil:

import psutil
import sys
from subprocess import Popen

for process in psutil.process_iter():
    if process.cmdline() == ['python', 'your_script.py']:
        sys.exit('Process found: exiting.')

print('Process not found: starting it.')
Popen(['python', 'your_script.py'])

Question 43

I’m a big fan of Supervisor for managing daemons. It’s written in Python, so there are plenty of examples of how to interact with or extend it from Python. For your purposes the XML-RPC process control API should work nicely.

Question 44

Try this other version

def checkPidRunning(pid):        
    '''Check For the existence of a unix pid.
    '''
    try:
        os.kill(pid, 0)
    except OSError:
        return False
    else:
        return True

# Entry point
if __name__ == '__main__':
    pid = str(os.getpid())
    pidfile = os.path.join("/", "tmp", __program__+".pid")

    if os.path.isfile(pidfile) and checkPidRunning(int(file(pidfile,'r').readlines()[0])):
            print "%s already exists, exiting" % pidfile
            sys.exit()
    else:
        file(pidfile, 'w').write(pid)

    # Do some actual work here
    main()

    os.unlink(pidfile)

Question 45

Rather than developing your own PID file solution (which has more subtleties and corner cases than you might think), have a look at supervisord — this is a process control system that makes it easy to wrap job control and daemon behaviors around an existing Python script.

Question 46

The other answers are great for things like cron jobs, but if you’re running a daemon you should monitor it with something like daemontools.

Question 47

ps ax | grep processName

if yor debug script in pycharm always exit

pydevd.py --multiproc --client 127.0.0.1 --port 33882 --file processName

Question 48

try this:

#/usr/bin/env python
import os, sys, atexit

try:
    # Set PID file
    def set_pid_file():
        pid = str(os.getpid())
        f = open('myCode.pid', 'w')
        f.write(pid)
        f.close()

    def goodby():
        pid = str('myCode.pid')
        os.remove(pid)

    atexit.register(goodby)
    set_pid_file()
    # Place your code here

except KeyboardInterrupt:
    sys.exit(0)

Question 49

Here is more useful code (with checking if exactly python executes the script):

#! /usr/bin/env python

import os
from sys import exit


def checkPidRunning(pid):
    global script_name
    if pid<1:
        print "Incorrect pid number!"
        exit()
    try:
        os.kill(pid, 0)
    except OSError:
        print "Abnormal termination of previous process."
        return False
    else:
        ps_command = "ps -o command= %s | grep -Eq 'python .*/%s'" % (pid,script_name)
        process_exist = os.system(ps_command)
        if process_exist == 0:
            return True
        else:
            print "Process with pid %s is not a Python process. Continue..." % pid
            return False


if __name__ == '__main__':
    script_name = os.path.basename(__file__)
    pid = str(os.getpid())
    pidfile = os.path.join("/", "tmp/", script_name+".pid")
    if os.path.isfile(pidfile):
        print "Warning! Pid file %s existing. Checking for process..." % pidfile
        r_pid = int(file(pidfile,'r').readlines()[0])
        if checkPidRunning(r_pid):
            print "Python process with pid = %s is already running. Exit!" % r_pid
            exit()
        else:
            file(pidfile, 'w').write(pid)
    else:
        file(pidfile, 'w').write(pid)

# main programm
....
....

os.unlink(pidfile)

Here is string:

ps_command = "ps -o command= %s | grep -Eq 'python .*/%s'" % (pid,script_name)

returns 0 if “grep” is successful, and the process “python” is currently running with the name of your script as a parameter .

Question 50

A simple example if you only are looking for a process name exist or not:

import os

def pname_exists(inp):
    os.system('ps -ef > /tmp/psef')
    lines=open('/tmp/psef', 'r').read().split('\n')
    res=[i for i in lines if inp in i]
    return True if res else False

Result:
In [21]: pname_exists('syslog')
Out[21]: True

In [22]: pname_exists('syslog_')
Out[22]: False

Question 51

Consider the following example to solve your problem:

#!/usr/bin/python
# -*- coding: latin-1 -*-

import os, sys, time, signal

def termination_handler (signum,frame):
    global running
    global pidfile
    print 'You have requested to terminate the application...'
    sys.stdout.flush()
    running = 0
    os.unlink(pidfile)

running = 1
signal.signal(signal.SIGINT,termination_handler)

pid = str(os.getpid())
pidfile = '/tmp/'+os.path.basename(__file__).split('.')[0]+'.pid'

if os.path.isfile(pidfile):
    print "%s already exists, exiting" % pidfile
    sys.exit()
else:
    file(pidfile, 'w').write(pid)

# Do some actual work here

while running:
  time.sleep(10)

I suggest this script because it can be executed one time only.

Question 52

Using bash to look for a process with the current script’s name. No extra file.

import commands
import os
import time
import sys

def stop_if_already_running():
    script_name = os.path.basename(__file__)
    l = commands.getstatusoutput("ps aux | grep -e '%s' | grep -v grep | awk '{print $2}'| awk '{print $2}'" % script_name)
    if l[1]:
        sys.exit(0);

To test, add

stop_if_already_running()
print "running normally"
while True:
    time.sleep(3)

Question 53

This is what I use in Linux to avoid starting a script if already running:

import os
import sys


script_name = os.path.basename(__file__)
pidfile = os.path.join("/tmp", os.path.splitext(script_name)[0]) + ".pid"


def create_pidfile():
    if os.path.exists(pidfile):
        with open(pidfile, "r") as _file:
            last_pid = int(_file.read())

        # Checking if process is still running
        last_process_cmdline = "/proc/%d/cmdline" % last_pid
        if os.path.exists(last_process_cmdline):
            with open(last_process_cmdline, "r") as _file:
                cmdline = _file.read()
            if script_name in cmdline:
                raise Exception("Script already running...")

    with open(pidfile, "w") as _file:
        pid = str(os.getpid())
        _file.write(pid)


def main():
    """Your application logic goes here"""


if __name__ == "__main__":
    create_pidfile()
    main()

This approach works good without any dependency on an external module.

Question 54

I am learning how to use the threading and the multiprocessing modules in Python to run certain operations in parallel and speed up my code.

I am finding this hard (maybe because I don’t have any theoretical background about it) to understand what the difference is between a threading.Thread() object and a multiprocessing.Process() one.

Also, it is not entirely clear to me how to instantiate a queue of jobs and having only 4 (for example) of them running in parallel, while the other wait for resources to free before being executed.

I find the examples in the documentation clear, but not very exhaustive; as soon as I try to complicate things a bit, I receive a lot of weird errors (like a method that can’t be pickled, and so on).

So, when should I use the threading and multiprocessing modules?

Can you link me to some resources that explain the concepts behind these two modules and how to use them properly for complex tasks?

Question 55

What Giulio Franco says is true for multithreading vs. multiprocessing in general.

However, Python^* has an added issue: There’s a Global Interpreter Lock that prevents two threads in the same process from running Python code at the same time. This means that if you have 8 cores, and change your code to use 8 threads, it won’t be able to use 800% CPU and run 8x faster; it’ll use the same 100% CPU and run at the same speed. (In reality, it’ll run a little slower, because there’s extra overhead from threading, even if you don’t have any shared data, but ignore that for now.)

There are exceptions to this. If your code’s heavy computation doesn’t actually happen in Python, but in some library with custom C code that does proper GIL handling, like a numpy app, you will get the expected performance benefit from threading. The same is true if the heavy computation is done by some subprocess that you run and wait on.

More importantly, there are cases where this doesn’t matter. For example, a network server spends most of its time reading packets off the network, and a GUI app spends most of its time waiting for user events. One reason to use threads in a network server or GUI app is to allow you to do long-running “background tasks” without stopping the main thread from continuing to service network packets or GUI events. And that works just fine with Python threads. (In technical terms, this means Python threads give you concurrency, even though they don’t give you core-parallelism.)

But if you’re writing a CPU-bound program in pure Python, using more threads is generally not helpful.

Using separate processes has no such problems with the GIL, because each process has its own separate GIL. Of course you still have all the same tradeoffs between threads and processes as in any other languages—it’s more difficult and more expensive to share data between processes than between threads, it can be costly to run a huge number of processes or to create and destroy them frequently, etc. But the GIL weighs heavily on the balance toward processes, in a way that isn’t true for, say, C or Java. So, you will find yourself using multiprocessing a lot more often in Python than you would in C or Java.

Meanwhile, Python’s “batteries included” philosophy brings some good news: It’s very easy to write code that can be switched back and forth between threads and processes with a one-liner change.

If you design your code in terms of self-contained “jobs” that don’t share anything with other jobs (or the main program) except input and output, you can use the concurrent.futures library to write your code around a thread pool like this:

with concurrent.futures.ThreadPoolExecutor(max_workers=4) as executor:
    executor.submit(job, argument)
    executor.map(some_function, collection_of_independent_things)
    # ...

You can even get the results of those jobs and pass them on to further jobs, wait for things in order of execution or in order of completion, etc.; read the section on Future objects for details.

Now, if it turns out that your program is constantly using 100% CPU, and adding more threads just makes it slower, then you’re running into the GIL problem, so you need to switch to processes. All you have to do is change that first line:

with concurrent.futures.ProcessPoolExecutor(max_workers=4) as executor:

The only real caveat is that your jobs’ arguments and return values have to be pickleable (and not take too much time or memory to pickle) to be usable cross-process. Usually this isn’t a problem, but sometimes it is.

But what if your jobs can’t be self-contained? If you can design your code in terms of jobs that pass messages from one to another, it’s still pretty easy. You may have to use threading.Thread or multiprocessing.Process instead of relying on pools. And you will have to create queue.Queue or multiprocessing.Queue objects explicitly. (There are plenty of other options—pipes, sockets, files with flocks, … but the point is, you have to do something manually if the automatic magic of an Executor is insufficient.)

But what if you can’t even rely on message passing? What if you need two jobs to both mutate the same structure, and see each others’ changes? In that case, you will need to do manual synchronization (locks, semaphores, conditions, etc.) and, if you want to use processes, explicit shared-memory objects to boot. This is when multithreading (or multiprocessing) gets difficult. If you can avoid it, great; if you can’t, you will need to read more than someone can put into an SO answer.

From a comment, you wanted to know what’s different between threads and processes in Python. Really, if you read Giulio Franco’s answer and mine and all of our links, that should cover everything… but a summary would definitely be useful, so here goes:

Threads share data by default; processes do not.
As a consequence of (1), sending data between processes generally requires pickling and unpickling it.^**
As another consequence of (1), directly sharing data between processes generally requires putting it into low-level formats like Value, Array, and ctypes types.
Processes are not subject to the GIL.
On some platforms (mainly Windows), processes are much more expensive to create and destroy.
There are some extra restrictions on processes, some of which are different on different platforms. See Programming guidelines for details.
The threading module doesn’t have some of the features of the multiprocessing module. (You can use multiprocessing.dummy to get most of the missing API on top of threads, or you can use higher-level modules like concurrent.futures and not worry about it.)

_{* It’s not actually Python, the language, that has this issue, but CPython, the “standard” implementation of that language. Some other implementations don’t have a GIL, like Jython.}

_{** If you’re using the fork start method for multiprocessing—which you can on most non-Windows platforms—each child process gets any resources the parent had when the child was started, which can be another way to pass data to children.}

Question 56

Multiple threads can exist in a single process. The threads that belong to the same process share the same memory area (can read from and write to the very same variables, and can interfere with one another). On the contrary, different processes live in different memory areas, and each of them has its own variables. In order to communicate, processes have to use other channels (files, pipes or sockets).

If you want to parallelize a computation, you’re probably going to need multithreading, because you probably want the threads to cooperate on the same memory.

Speaking about performance, threads are faster to create and manage than processes (because the OS doesn’t need to allocate a whole new virtual memory area), and inter-thread communication is usually faster than inter-process communication. But threads are harder to program. Threads can interfere with one another, and can write to each other’s memory, but the way this happens is not always obvious (due to several factors, mainly instruction reordering and memory caching), and so you are going to need synchronization primitives to control access to your variables.

Question 57

I believe this link answers your question in an elegant way.

To be short, if one of your sub-problems has to wait while another finishes, multithreading is good (in I/O heavy operations, for example); by contrast, if your sub-problems could really happen at the same time, multiprocessing is suggested. However, you won’t create more processes than your number of cores.

Question 58

Python documentation quotes

I’ve highlighted the key Python documentation quotes about Process vs Threads and the GIL at: What is the global interpreter lock (GIL) in CPython?

Process vs thread experiments

I did a bit of benchmarking in order to show the difference more concretely.

In the benchmark, I timed CPU and IO bound work for various numbers of threads on an 8 hyperthread CPU. The work supplied per thread is always the same, such that more threads means more total work supplied.

The results were:

Plot data.

Conclusions:

for CPU bound work, multiprocessing is always faster, presumably due to the GIL
for IO bound work. both are exactly the same speed
threads only scale up to about 4x instead of the expected 8x since I’m on an 8 hyperthread machine.

Contrast that with a C POSIX CPU-bound work which reaches the expected 8x speedup: What do ‘real’, ‘user’ and ‘sys’ mean in the output of time(1)?

TODO: I don’t know the reason for this, there must be other Python inefficiencies coming into play.

Test code:

#!/usr/bin/env python3

import multiprocessing
import threading
import time
import sys

def cpu_func(result, niters):
    '''
    A useless CPU bound function.
    '''
    for i in range(niters):
        result = (result * result * i + 2 * result * i * i + 3) % 10000000
    return result

class CpuThread(threading.Thread):
    def __init__(self, niters):
        super().__init__()
        self.niters = niters
        self.result = 1
    def run(self):
        self.result = cpu_func(self.result, self.niters)

class CpuProcess(multiprocessing.Process):
    def __init__(self, niters):
        super().__init__()
        self.niters = niters
        self.result = 1
    def run(self):
        self.result = cpu_func(self.result, self.niters)

class IoThread(threading.Thread):
    def __init__(self, sleep):
        super().__init__()
        self.sleep = sleep
        self.result = self.sleep
    def run(self):
        time.sleep(self.sleep)

class IoProcess(multiprocessing.Process):
    def __init__(self, sleep):
        super().__init__()
        self.sleep = sleep
        self.result = self.sleep
    def run(self):
        time.sleep(self.sleep)

if __name__ == '__main__':
    cpu_n_iters = int(sys.argv[1])
    sleep = 1
    cpu_count = multiprocessing.cpu_count()
    input_params = [
        (CpuThread, cpu_n_iters),
        (CpuProcess, cpu_n_iters),
        (IoThread, sleep),
        (IoProcess, sleep),
    ]
    header = ['nthreads']
    for thread_class, _ in input_params:
        header.append(thread_class.__name__)
    print(' '.join(header))
    for nthreads in range(1, 2 * cpu_count):
        results = [nthreads]
        for thread_class, work_size in input_params:
            start_time = time.time()
            threads = []
            for i in range(nthreads):
                thread = thread_class(work_size)
                threads.append(thread)
                thread.start()
            for i, thread in enumerate(threads):
                thread.join()
            results.append(time.time() - start_time)
        print(' '.join('{:.6e}'.format(result) for result in results))

GitHub upstream + plotting code on same directory.

Tested on Ubuntu 18.10, Python 3.6.7, in a Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB), SSD: Samsung MZVLB512HAJQ-000L7 (3,000 MB/s).

Visualize which threads are running at a given time

This post https://rohanvarma.me/GIL/ taught me that you can run a callback whenever a thread is scheduled with the target= argument of threading.Thread and the same for multiprocessing.Process.

This allows us to view exactly which thread runs at each time. When this is done, we would see something like (I made this particular graph up):

            +--------------------------------------+
            + Active threads / processes           +
+-----------+--------------------------------------+
|Thread   1 |********     ************             |
|         2 |        *****            *************|
+-----------+--------------------------------------+
|Process  1 |***  ************** ******  ****      |
|         2 |** **** ****** ** ********* **********|
+-----------+--------------------------------------+
            + Time -->                             +
            +--------------------------------------+

which would show that:

threads are fully serialized by the GIL
processes can run in parallel

Question 59

Here’s some performance data for python 2.6.x that calls to question the notion that threading is more performant that multiprocessing in IO-bound scenarios. These results are from a 40-processor IBM System x3650 M4 BD.

IO-Bound Processing : Process Pool performed better than Thread Pool

>>> do_work(50, 300, 'thread','fileio')
do_work function took 455.752 ms

>>> do_work(50, 300, 'process','fileio')
do_work function took 319.279 ms

CPU-Bound Processing : Process Pool performed better than Thread Pool

>>> do_work(50, 2000, 'thread','square')
do_work function took 338.309 ms

>>> do_work(50, 2000, 'process','square')
do_work function took 287.488 ms

These aren’t rigorous tests, but they tell me that multiprocessing isn’t entirely unperformant in comparison to threading.

Code used in the interactive python console for the above tests

from multiprocessing import Pool
from multiprocessing.pool import ThreadPool
import time
import sys
import os
from glob import glob

text_for_test = str(range(1,100000))

def fileio(i):
 try :
  os.remove(glob('./test/test-*'))
 except : 
  pass
 f=open('./test/test-'+str(i),'a')
 f.write(text_for_test)
 f.close()
 f=open('./test/test-'+str(i),'r')
 text = f.read()
 f.close()


def square(i):
 return i*i

def timing(f):
 def wrap(*args):
  time1 = time.time()
  ret = f(*args)
  time2 = time.time()
  print '%s function took %0.3f ms' % (f.func_name, (time2-time1)*1000.0)
  return ret
 return wrap

result = None

@timing
def do_work(process_count, items, process_type, method) :
 pool = None
 if process_type == 'process' :
  pool = Pool(processes=process_count)
 else :
  pool = ThreadPool(processes=process_count)
 if method == 'square' : 
  multiple_results = [pool.apply_async(square,(a,)) for a in range(1,items)]
  result = [res.get()  for res in multiple_results]
 else :
  multiple_results = [pool.apply_async(fileio,(a,)) for a in range(1,items)]
  result = [res.get()  for res in multiple_results]


do_work(50, 300, 'thread','fileio')
do_work(50, 300, 'process','fileio')

do_work(50, 2000, 'thread','square')
do_work(50, 2000, 'process','square')

Question 60

Well, most of the question is answered by Giulio Franco. I will further elaborate on the consumer-producer problem, which I suppose will put you on the right track for your solution to using a multithreaded app.

fill_count = Semaphore(0) # items produced
empty_count = Semaphore(BUFFER_SIZE) # remaining space
buffer = Buffer()

def producer(fill_count, empty_count, buffer):
    while True:
        item = produceItem()
        empty_count.down();
        buffer.push(item)
        fill_count.up()

def consumer(fill_count, empty_count, buffer):
    while True:
        fill_count.down()
        item = buffer.pop()
        empty_count.up()
        consume_item(item)

You could read more on the synchronization primitives from:

 http://linux.die.net/man/7/sem_overview
 http://docs.python.org/2/library/threading.html

The pseudocode is above. I suppose you should search the producer-consumer-problem to get more references.

Question 61

I’m trying to port a shell script to the much more readable python version. The original shell script starts several processes (utilities, monitors, etc.) in the background with “&”. How can I achieve the same effect in python? I’d like these processes not to die when the python scripts complete. I am sure it’s related to the concept of a daemon somehow, but I couldn’t find how to do this easily.

Question 62

Note: This answer is less current than it was when posted in 2009. Using the subprocess module shown in other answers is now recommended in the docs

(Note that the subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using these functions.)

If you want your process to start in the background you can either use system() and call it in the same way your shell script did, or you can spawn it:

import os
os.spawnl(os.P_DETACH, 'some_long_running_command')

(or, alternatively, you may try the less portable os.P_NOWAIT flag).

See the documentation here.

Question 63

While jkp‘s solution works, the newer way of doing things (and the way the documentation recommends) is to use the subprocess module. For simple commands its equivalent, but it offers more options if you want to do something complicated.

Example for your case:

import subprocess
subprocess.Popen(["rm","-r","some.file"])

This will run rm -r some.file in the background. Note that calling .communicate() on the object returned from Popen will block until it completes, so don’t do that if you want it to run in the background:

import subprocess
ls_output=subprocess.Popen(["sleep", "30"])
ls_output.communicate()  # Will block for 30 seconds

See the documentation here.

Also, a point of clarification: “Background” as you use it here is purely a shell concept; technically, what you mean is that you want to spawn a process without blocking while you wait for it to complete. However, I’ve used “background” here to refer to shell-background-like behavior.

Question 64

You probably want the answer to “How to call an external command in Python”.

The simplest approach is to use the os.system function, e.g.:

import os
os.system("some_command &")

Basically, whatever you pass to the system function will be executed the same as if you’d passed it to the shell in a script.

问题：确保只运行一个程序实例

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

说明

优点

Explanation

Advantages

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

回答 19

问题：如何检查Python中是否存在具有给定pid的进程？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

问题：检查python脚本是否正在运行

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

问题：线程和多处理模块之间有什么区别？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：如何在Python中启动后台进程？

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

有趣好用的Python教程