标签归档:pipe

多处理-管道与队列

问题:多处理-管道与队列

Python的多处理程序包中的队列和管道之间的根本区别是什么?

在什么情况下应该选择一种?什么时候使用比较有利Pipe()?什么时候使用比较有利Queue()

What are the fundamental differences between queues and pipes in Python’s multiprocessing package?

In what scenarios should one choose one over the other? When is it advantageous to use Pipe()? When is it advantageous to use Queue()?


回答 0

  • A Pipe()只能有两个端点。

  • 一个Queue()可以有多个生产者和消费者。

何时使用它们

如果您需要两个以上的交流点,请使用Queue()

如果您需要绝对的性能,那么a Pipe()会更快,因为Queue()它建立在之上Pipe()

绩效基准

假设您要生成两个进程并在它们之间尽快发送消息。这些是使用Pipe()和进行类似测试之间的拖动竞赛的计时结果Queue()。这是在运行Ubuntu 11.10和Python 2.7.2的ThinkpadT61上进行的。

仅供参考,我把成绩JoinableQueue()作为奖金;JoinableQueue()queue.task_done()调用时说明任务(它甚至不知道特定任务,它只计算队列中未完成的任务),以便queue.join()知道工作已完成。

每个答案底部的代码…

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

概括而言,Pipe()它的速度比速度快3倍Queue()JoinableQueue()除非您确实必须拥有这些好处,否则请不要考虑。

奖励材料2

除非您知道一些捷径,否则多处理会在信息流中引入微妙的变化,使调试变得困难。例如,在许多情况下,当您通过字典建立索引时,您的脚本可能运行良好,但是某些输入很少会失败。

通常,当整个python进程崩溃时,我们会获得有关失败的线索;但是,如果多处理功能崩溃,则不会在控制台上打印未经请求的崩溃回溯。很难找到未知的多处理崩溃,而又不知道导致进程崩溃的线索。

我发现跟踪多处理崩溃信息的最简单方法是将整个多处理功能包装在try/中except并使用traceback.print_exc()

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

现在,当您发现崩溃时,您会看到类似以下内容的信息:

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

源代码:


"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in xrange(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in xrange(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))
  • A Pipe() can only have two endpoints.

  • A Queue() can have multiple producers and consumers.

When to use them

If you need more than two points to communicate, use a Queue().

If you need absolute performance, a Pipe() is much faster because Queue() is built on top of Pipe().

Performance Benchmarking

Let’s assume you want to spawn two processes and send messages between them as quickly as possible. These are the timing results of a drag race between similar tests using Pipe() and Queue()… This is on a ThinkpadT61 running Ubuntu 11.10, and Python 2.7.2.

FYI, I threw in results for JoinableQueue() as a bonus; JoinableQueue() accounts for tasks when queue.task_done() is called (it doesn’t even know about the specific task, it just counts unfinished tasks in the queue), so that queue.join() knows the work is finished.

The code for each at bottom of this answer…

mpenning@mpenning-T61:~$ python multi_pipe.py 
Sending 10000 numbers to Pipe() took 0.0369849205017 seconds
Sending 100000 numbers to Pipe() took 0.328398942947 seconds
Sending 1000000 numbers to Pipe() took 3.17266988754 seconds
mpenning@mpenning-T61:~$ python multi_queue.py 
Sending 10000 numbers to Queue() took 0.105256080627 seconds
Sending 100000 numbers to Queue() took 0.980564117432 seconds
Sending 1000000 numbers to Queue() took 10.1611330509 seconds
mpnening@mpenning-T61:~$ python multi_joinablequeue.py 
Sending 10000 numbers to JoinableQueue() took 0.172781944275 seconds
Sending 100000 numbers to JoinableQueue() took 1.5714070797 seconds
Sending 1000000 numbers to JoinableQueue() took 15.8527247906 seconds
mpenning@mpenning-T61:~$

In summary Pipe() is about three times faster than a Queue(). Don’t even think about the JoinableQueue() unless you really must have the benefits.

BONUS MATERIAL 2

Multiprocessing introduces subtle changes in information flow that make debugging hard unless you know some shortcuts. For instance, you might have a script that works fine when indexing through a dictionary in under many conditions, but infrequently fails with certain inputs.

Normally we get clues to the failure when the entire python process crashes; however, you don’t get unsolicited crash tracebacks printed to the console if the multiprocessing function crashes. Tracking down unknown multiprocessing crashes is hard without a clue to what crashed the process.

The simplest way I have found to track down multiprocessing crash informaiton is to wrap the entire multiprocessing function in a try / except and use traceback.print_exc():

import traceback
def run(self, args):
    try:
        # Insert stuff to be multiprocessed here
        return args[0]['that']
    except:
        print "FATAL: reader({0}) exited while multiprocessing".format(args) 
        traceback.print_exc()

Now, when you find a crash you see something like:

FATAL: reader([{'crash': 'this'}]) exited while multiprocessing
Traceback (most recent call last):
  File "foo.py", line 19, in __init__
    self.run(args)
  File "foo.py", line 46, in run
    KeyError: 'that'

Source Code:


"""
multi_pipe.py
"""
from multiprocessing import Process, Pipe
import time

def reader_proc(pipe):
    ## Read from the pipe; this will be spawned as a separate Process
    p_output, p_input = pipe
    p_input.close()    # We are only reading
    while True:
        msg = p_output.recv()    # Read from the output pipe and do nothing
        if msg=='DONE':
            break

def writer(count, p_input):
    for ii in xrange(0, count):
        p_input.send(ii)             # Write 'count' numbers into the input pipe
    p_input.send('DONE')

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        # Pipes are unidirectional with two endpoints:  p_input ------> p_output
        p_output, p_input = Pipe()  # writer() writes to p_input from _this_ process
        reader_p = Process(target=reader_proc, args=((p_output, p_input),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process

        p_output.close()       # We no longer need this part of the Pipe()
        _start = time.time()
        writer(count, p_input) # Send a lot of stuff to reader_proc()
        p_input.close()
        reader_p.join()
        print("Sending {0} numbers to Pipe() took {1} seconds".format(count,
            (time.time() - _start)))

"""
multi_queue.py
"""

from multiprocessing import Process, Queue
import time
import sys

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        if (msg == 'DONE'):
            break

def writer(count, queue):
    ## Write to the queue
    for ii in range(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue
    queue.put('DONE')

if __name__=='__main__':
    pqueue = Queue() # writer() writes to pqueue from _this_ process
    for count in [10**4, 10**5, 10**6]:             
        ### reader_proc() reads from pqueue as a separate process
        reader_p = Process(target=reader_proc, args=((pqueue),))
        reader_p.daemon = True
        reader_p.start()        # Launch reader_proc() as a separate python process

        _start = time.time()
        writer(count, pqueue)    # Send a lot of stuff to reader()
        reader_p.join()         # Wait for the reader to finish
        print("Sending {0} numbers to Queue() took {1} seconds".format(count, 
            (time.time() - _start)))

"""
multi_joinablequeue.py
"""
from multiprocessing import Process, JoinableQueue
import time

def reader_proc(queue):
    ## Read from the queue; this will be spawned as a separate Process
    while True:
        msg = queue.get()         # Read from the queue and do nothing
        queue.task_done()

def writer(count, queue):
    for ii in xrange(0, count):
        queue.put(ii)             # Write 'count' numbers into the queue

if __name__=='__main__':
    for count in [10**4, 10**5, 10**6]:
        jqueue = JoinableQueue() # writer() writes to jqueue from _this_ process
        # reader_proc() reads from jqueue as a different process...
        reader_p = Process(target=reader_proc, args=((jqueue),))
        reader_p.daemon = True
        reader_p.start()     # Launch the reader process
        _start = time.time()
        writer(count, jqueue) # Send a lot of stuff to reader_proc() (in different process)
        jqueue.join()         # Wait for the reader to finish
        print("Sending {0} numbers to JoinableQueue() took {1} seconds".format(count, 
            (time.time() - _start)))

回答 1

另一个Queue()值得注意的功能是进纸器线程。节指出:“当进程首先将项目放入队列时,将启动一个供料器线程,该线程将对象从缓冲区转移到管道中。” 可以插入无数(或maxsize)个项目,Queue()而无需任何queue.put()阻塞。这使您可以将多个项目存储在中Queue(),直到程序准备处理它们为止。

Pipe()另一方面,对于已发送到一个连接但尚未从另一连接接收的项目,则具有有限的存储量。存储空间用完后,对的调用connection.send()将阻塞,直到有空间写入整个项目为止。这将使线程停止写入操作,直到从管道读取其他线程为止。Connection对象使您可以访问基础文件描述符。在* nix系统上,您可以connection.send()使用os.set_blocking()函数防止呼叫阻塞。但是,如果您尝试发送不适合管道文件的单个项目,这将导致问题。Linux的最新版本允许您增加文件的大小,但是允许的最大大小根据系统配置而有所不同。因此,您永远不应依赖于Pipe()缓冲数据。调用connection.send 可能会阻塞,直到从其他管道读取数据为止。

总之,当您需要缓冲数据时,队列是比管道更好的选择。即使您只需要在两点之间进行交流。

One additional feature of Queue() that is worth noting is the feeder thread. This section notes “When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe.” An infinite number of (or maxsize) items can be inserted into Queue() without any calls to queue.put() blocking. This allows you to store multiple items in a Queue(), until your program is ready to process them.

Pipe(), on the other hand, has a finite amount of storage for items that have been sent to one connection, but have not been received from the other connection. After this storage is used up, calls to connection.send() will block until there is space to write the entire item. This will stall the thread doing the writing until some other thread reads from the pipe. Connection objects give you access to the underlying file descriptor. On *nix systems, you can prevent connection.send() calls from blocking using the os.set_blocking() function. However, this will cause problems if you try to send a single item that does not fit in the pipe’s file. Recent versions of Linux allow you to increase the size of a file, but the maximum size allowed varies based on system configurations. You should therefore never rely on Pipe() to buffer data. Calls to connection.send could block until data gets read from the pipe somehwere else.

In conclusion, Queue is a better choice than pipe when you need to buffer data. Even when you only need to communicate between two points.


如何打开文件夹中的每个文件?

问题:如何打开文件夹中的每个文件?

我有一个python脚本parse.py,该脚本在脚本中打开一个文件,例如file1,然后执行一些操作,可能会打印出字符总数。

filename = 'file1'
f = open(filename, 'r')
content = f.read()
print filename, len(content)

现在,我正在使用stdout将结果定向到我的输出文件-输出

python parse.py >> output

但是,我不想按文件手动处理此文件,有没有办法自动处理每个文件?喜欢

ls | awk '{print}' | python parse.py >> output 

然后问题是如何从standardin中读取文件名?还是已经有一些内置函数可以轻松执行ls和此类工作?

谢谢!

I have a python script parse.py, which in the script open a file, say file1, and then do something maybe print out the total number of characters.

filename = 'file1'
f = open(filename, 'r')
content = f.read()
print filename, len(content)

Right now, I am using stdout to direct the result to my output file – output

python parse.py >> output

However, I don’t want to do this file by file manually, is there a way to take care of every single file automatically? Like

ls | awk '{print}' | python parse.py >> output 

Then the problem is how could I read the file name from standardin? or there are already some built-in functions to do the ls and those kind of work easily?

Thanks!


回答 0

操作系统

您可以使用以下命令列出当前目录中的所有文件os.listdir

import os
for filename in os.listdir(os.getcwd()):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

球状

或者,您可以根据glob模块的文件模式仅列出一些文件:

import glob
for filename in glob.glob('*.txt'):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

不必是当前目录,您可以在所需的任何路径中列出它们:

path = '/some/path/to/file'
for filename in glob.glob(os.path.join(path, '*.txt')):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

管道 或者您甚至可以使用指定的管道来使用fileinput

import fileinput
for line in fileinput.input():
    # do your stuff

然后将其与管道一起使用:

ls -1 | python parse.py

Os

You can list all files in the current directory using os.listdir:

import os
for filename in os.listdir(os.getcwd()):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

Glob

Or you can list only some files, depending on the file pattern using the glob module:

import glob
for filename in glob.glob('*.txt'):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

It doesn’t have to be the current directory you can list them in any path you want:

path = '/some/path/to/file'
for filename in glob.glob(os.path.join(path, '*.txt')):
   with open(os.path.join(os.cwd(), filename), 'r') as f: # open in readonly mode
      # do your stuff

Pipe Or you can even use the pipe as you specified using fileinput

import fileinput
for line in fileinput.input():
    # do your stuff

And then use it with piping:

ls -1 | python parse.py

回答 1

你应该尝试使用os.walk

yourpath = 'path'

import os
for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        stuff
    for name in dirs:
        print(os.path.join(root, name))
        stuff

you should try using os.walk

yourpath = 'path'

import os
for root, dirs, files in os.walk(yourpath, topdown=False):
    for name in files:
        print(os.path.join(root, name))
        stuff
    for name in dirs:
        print(os.path.join(root, name))
        stuff

回答 2

我一直在寻找这个答案:

import os,glob
folder_path = '/some/path/to/file'
for filename in glob.glob(os.path.join(folder_path, '*.htm')):
  with open(filename, 'r') as f:
    text = f.read()
    print (filename)
    print (len(text))

您也可以选择“ * .txt”或文件名的另一端

I was looking for this answer:

import os,glob
folder_path = '/some/path/to/file'
for filename in glob.glob(os.path.join(folder_path, '*.htm')):
  with open(filename, 'r') as f:
    text = f.read()
    print (filename)
    print (len(text))

you can choose as well ‘*.txt’ or other ends of your filename


回答 3

您实际上可以只使用os模块来完成这两个操作:

  1. 列出文件夹中的所有文件
  2. 按文件类型,文件名等对文件进行排序

这是一个简单的例子:

import os #os module imported here
location = os.getcwd() # get present working directory location here
counter = 0 #keep a count of all files found
csvfiles = [] #list to store all csv files found at location
filebeginwithhello = [] # list to keep all files that begin with 'hello'
otherfiles = [] #list to keep any other file that do not match the criteria

for file in os.listdir(location):
    try:
        if file.endswith(".csv"):
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello") and file.endswith(".csv"): #because some files may start with hello and also be a csv file
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello"):
            print "hello files found: \t", file
            filebeginwithhello.append(file)
            counter = counter+1

        else:
            otherfiles.append(file)
            counter = counter+1
    except Exception as e:
        raise e
        print "No files found here!"

print "Total files found:\t", counter

现在,您不仅列出了文件夹中的所有文件,而且(可选)按起始名称,文件类型等排序。刚才遍历每个列表并做您的工作。

You can actually just use os module to do both:

  1. list all files in a folder
  2. sort files by file type, file name etc.

Here’s a simple example:

import os #os module imported here
location = os.getcwd() # get present working directory location here
counter = 0 #keep a count of all files found
csvfiles = [] #list to store all csv files found at location
filebeginwithhello = [] # list to keep all files that begin with 'hello'
otherfiles = [] #list to keep any other file that do not match the criteria

for file in os.listdir(location):
    try:
        if file.endswith(".csv"):
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello") and file.endswith(".csv"): #because some files may start with hello and also be a csv file
            print "csv file found:\t", file
            csvfiles.append(str(file))
            counter = counter+1

        elif file.startswith("hello"):
            print "hello files found: \t", file
            filebeginwithhello.append(file)
            counter = counter+1

        else:
            otherfiles.append(file)
            counter = counter+1
    except Exception as e:
        raise e
        print "No files found here!"

print "Total files found:\t", counter

Now you have not only listed all the files in a folder but also have them (optionally) sorted by starting name, file type and others. Just now iterate over each list and do your stuff.


回答 4

import pyautogui
import keyboard
import time
import os
import pyperclip

os.chdir("target directory")

# get the current directory
cwd=os.getcwd()

files=[]

for i in os.walk(cwd):
    for j in i[2]:
        files.append(os.path.abspath(j))

os.startfile("C:\Program Files (x86)\Adobe\Acrobat 11.0\Acrobat\Acrobat.exe")
time.sleep(1)


for i in files:
    print(i)
    pyperclip.copy(i)
    keyboard.press('ctrl')
    keyboard.press_and_release('o')
    keyboard.release('ctrl')
    time.sleep(1)

    keyboard.press('ctrl')
    keyboard.press_and_release('v')
    keyboard.release('ctrl')
    time.sleep(1)
    keyboard.press_and_release('enter')
    keyboard.press('ctrl')
    keyboard.press_and_release('p')
    keyboard.release('ctrl')
    keyboard.press_and_release('enter')
    time.sleep(3)
    keyboard.press('ctrl')
    keyboard.press_and_release('w')
    keyboard.release('ctrl')
    pyperclip.copy('')
import pyautogui
import keyboard
import time
import os
import pyperclip

os.chdir("target directory")

# get the current directory
cwd=os.getcwd()

files=[]

for i in os.walk(cwd):
    for j in i[2]:
        files.append(os.path.abspath(j))

os.startfile("C:\Program Files (x86)\Adobe\Acrobat 11.0\Acrobat\Acrobat.exe")
time.sleep(1)


for i in files:
    print(i)
    pyperclip.copy(i)
    keyboard.press('ctrl')
    keyboard.press_and_release('o')
    keyboard.release('ctrl')
    time.sleep(1)

    keyboard.press('ctrl')
    keyboard.press_and_release('v')
    keyboard.release('ctrl')
    time.sleep(1)
    keyboard.press_and_release('enter')
    keyboard.press('ctrl')
    keyboard.press_and_release('p')
    keyboard.release('ctrl')
    keyboard.press_and_release('enter')
    time.sleep(3)
    keyboard.press('ctrl')
    keyboard.press_and_release('w')
    keyboard.release('ctrl')
    pyperclip.copy('')

回答 5

下面的代码读取包含我们正在运行的脚本的目录中所有可用的文本文件。然后,它将打开每个文本文件,并将文本行中的单词存储到列表中。存储单词后,我们逐行打印每个单词

import os, fnmatch

listOfFiles = os.listdir('.')
pattern = "*.txt"
store = []
for entry in listOfFiles:
    if fnmatch.fnmatch(entry, pattern):
        _fileName = open(entry,"r")
        if _fileName.mode == "r":
            content = _fileName.read()
            contentList = content.split(" ")
            for i in contentList:
                if i != '\n' and i != "\r\n":
                    store.append(i)

for i in store:
    print(i)

The code below reads for any text files available in the directory which contains the script we are running. Then it opens every text file and stores the words of the text line into a list. After store the words we print each word line by line

import os, fnmatch

listOfFiles = os.listdir('.')
pattern = "*.txt"
store = []
for entry in listOfFiles:
    if fnmatch.fnmatch(entry, pattern):
        _fileName = open(entry,"r")
        if _fileName.mode == "r":
            content = _fileName.read()
            contentList = content.split(" ")
            for i in contentList:
                if i != '\n' and i != "\r\n":
                    store.append(i)

for i in store:
    print(i)

如何在管道上使用`subprocess`命令

问题:如何在管道上使用`subprocess`命令

我想搭配subprocess.check_output()使用ps -A | grep 'process_name'。我尝试了各种解决方案,但到目前为止没有任何效果。有人可以指导我怎么做吗?

I want to use subprocess.check_output() with ps -A | grep 'process_name'. I tried various solutions but so far nothing worked. Can someone guide me how to do it?


回答 0

要将管道与subprocess模块一起使用,必须通过shell=True

但是,出于种种原因,这并不是真正可取的选择,其中不仅包括安全性。相反,请分别创建psgrep流程,并将输出从一个管道传递到另一个管道,如下所示:

ps = subprocess.Popen(('ps', '-A'), stdout=subprocess.PIPE)
output = subprocess.check_output(('grep', 'process_name'), stdin=ps.stdout)
ps.wait()

但是,在您的特定情况下,简单的解决方案是先调用subprocess.check_output(('ps', '-A'))然后str.find在输出上。

To use a pipe with the subprocess module, you have to pass shell=True.

However, this isn’t really advisable for various reasons, not least of which is security. Instead, create the ps and grep processes separately, and pipe the output from one into the other, like so:

ps = subprocess.Popen(('ps', '-A'), stdout=subprocess.PIPE)
output = subprocess.check_output(('grep', 'process_name'), stdin=ps.stdout)
ps.wait()

In your particular case, however, the simple solution is to call subprocess.check_output(('ps', '-A')) and then str.find on the output.


回答 1

或者,您始终可以在子流程对象上使用communication方法。

cmd = "ps -A|grep 'process_name'"
ps = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output = ps.communicate()[0]
print(output)

通信方法返回标准输出和标准错误的元组。

Or you can always use the communicate method on the subprocess objects.

cmd = "ps -A|grep 'process_name'"
ps = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output = ps.communicate()[0]
print(output)

The communicate method returns a tuple of the standard output and the standard error.


回答 2

请参阅有关使用子流程设置管道的文档:http : //docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

我没有测试下面的代码示例,但是它应该大致是您想要的:

query = "process_name"
ps_process = Popen(["ps", "-A"], stdout=PIPE)
grep_process = Popen(["grep", query], stdin=ps_process.stdout, stdout=PIPE)
ps_process.stdout.close()  # Allow ps_process to receive a SIGPIPE if grep_process exits.
output = grep_process.communicate()[0]

See the documentation on setting up a pipeline using subprocess: http://docs.python.org/2/library/subprocess.html#replacing-shell-pipeline

I haven’t tested the following code example but it should be roughly what you want:

query = "process_name"
ps_process = Popen(["ps", "-A"], stdout=PIPE)
grep_process = Popen(["grep", query], stdin=ps_process.stdout, stdout=PIPE)
ps_process.stdout.close()  # Allow ps_process to receive a SIGPIPE if grep_process exits.
output = grep_process.communicate()[0]

回答 3

JKALAVIS解决方案很好,但是我将使用shlex代替SHELL = TRUE进行改进。下面即时查询时间

#!/bin/python
import subprocess
import shlex

cmd = "dig @8.8.4.4 +notcp www.google.com|grep 'Query'"
ps = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output = ps.communicate()[0]
print(output)

JKALAVIS solution is good, however I would add an improvement to use shlex instead of SHELL=TRUE. below im grepping out Query times

#!/bin/python
import subprocess
import shlex

cmd = "dig @8.8.4.4 +notcp www.google.com|grep 'Query'"
ps = subprocess.Popen(cmd,shell=True,stdout=subprocess.PIPE,stderr=subprocess.STDOUT)
output = ps.communicate()[0]
print(output)

回答 4

另外,尝试使用'pgrep'命令代替'ps -A | grep 'process_name'

Also, try to use 'pgrep' command instead of 'ps -A | grep 'process_name'


回答 5

您可以在sh.py中尝试管道功能:

import sh
print sh.grep(sh.ps("-ax"), "process_name")

You can try the pipe functionality in sh.py:

import sh
print sh.grep(sh.ps("-ax"), "process_name")

回答 6

Python 3.5之后,您还可以使用:

    import subprocess

    f = open('test.txt', 'w')
    process = subprocess.run(['ls', '-la'], stdout=subprocess.PIPE, universal_newlines=True)
    f.write(process.stdout)
    f.close()

该命令的执行被阻止,输出将在process.stdout中

After Python 3.5 you can also use:

    import subprocess

    f = open('test.txt', 'w')
    process = subprocess.run(['ls', '-la'], stdout=subprocess.PIPE, universal_newlines=True)
    f.write(process.stdout)
    f.close()

The execution of the command is blocking and the output will be in process.stdout.