在Python中运行Bash命令

问题:在Python中运行Bash命令

在我的本地计算机上,我运行一个包含此行的python脚本

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)

这很好。

然后,我在服务器上运行相同的代码,并收到以下错误消息

'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import  diag
ImportError: No module named swap

因此,我要做的就是print bashCommand在运行之前,在终端中插入了一个比命令更清晰的信息os.system()

当然,我再次收到错误(由引起os.system(bashCommand)),但是在该错误出现之前,它将在终端中打印命令。然后我只是复制了输出,然后将复制粘贴到终端中,然后按回车,它就可以工作了…

有人知道发生了什么吗?

On my local machine, I run a python script which contains this line

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
os.system(bashCommand)

This works fine.

Then I run the same code on a server and I get the following error message

'import site' failed; use -v for traceback
Traceback (most recent call last):
File "/usr/bin/cwm", line 48, in <module>
from swap import  diag
ImportError: No module named swap

So what I did then is I inserted a print bashCommand which prints me than the command in the terminal before it runs it with os.system().

Of course, I get again the error (caused by os.system(bashCommand)) but before that error it prints the command in the terminal. Then I just copied that output and did a copy paste into the terminal and hit enter and it works…

Does anyone have a clue what’s going on?


回答 0

不要使用os.system。不推荐使用它,而推荐使用subprocess。从文档:“此模块旨在取代旧的几个模块和功能:os.systemos.spawn”。

就像您的情况一样:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
import subprocess
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

Don’t use os.system. It has been deprecated in favor of subprocess. From the docs: “This module intends to replace several older modules and functions: os.system, os.spawn“.

Like in your case:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
import subprocess
process = subprocess.Popen(bashCommand.split(), stdout=subprocess.PIPE)
output, error = process.communicate()

回答 1

为了稍微扩展此处的早期答案,通常会忽略许多细节。

  • 身高subprocess.run()subprocess.check_call()和朋友过subprocess.call()subprocess.Popen()os.system()os.popen()
  • 理解并可能使用text=True,又名universal_newlines=True
  • 了解shell=Trueor 的含义shell=False以及它如何更改报价以及shell便利的可用性。
  • 了解sh和Bash 之间的差异
  • 了解子流程如何与其父流程分离,并且通常无法更改父流程。
  • 避免将Python解释器作为Python的子进程运行。

这些主题将在下面更详细地介绍。

更喜欢subprocess.run()还是subprocess.check_call()

subprocess.Popen()函数是低级主力,但正确使用起来很棘手,最终您会复制/粘贴多行代码…这些代码已经方便地存在于标准库中,作为一组用于各种用途的高级包装函数,下面将更详细地介绍。

这是文档中的一段:

推荐的调用子流程的方法是将该run()功能用于它可以处理的所有用例。对于更高级的用例,Popen可以直接使用基础接口。

不幸的是,这些包装函数的可用性在Python版本之间有所不同。

  • subprocess.run()在Python 3.5中正式引入。它旨在替换以下所有内容。
  • subprocess.check_output()是在Python 2.7 / 3.1中引入的。它基本上相当于subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
  • subprocess.check_call()是在Python 2.5中引入的。它基本上相当于subprocess.run(..., check=True)
  • subprocess.call()是在Python 2.4中的原始subprocess模块(PEP-324)中引入的。它基本上相当于subprocess.run(...).returncode

高级API与 subprocess.Popen()

subprocess.run()它替代的旧版旧功能相比,重构和扩展的逻辑更加丰富,用途更广。它返回一个CompletedProcess具有各种方法的对象,使您可以从完成的子流程中检索退出状态,标准输出以及其他一些结果和状态指示符。

subprocess.run()如果您只需要一个程序来运行并将控制权返回给Python,则可以采用这种方法。对于更复杂的场景(后台进程,也许通过Python父程序使用交互式I / O),您仍然需要使用subprocess.Popen()并照顾好所有管道。这需要对所有运动部件有相当复杂的了解,因此不应掉以轻心。更简单的Popen对象表示(可能仍在运行)的进程,在子进程的剩余生命周期中,需要从您的代码中对其进行管理。

也许应该强调,仅仅subprocess.Popen()是创造一个过程。如果不这样做的话,您将有一个子进程与Python同时运行,因此是一个“后台”进程。如果不需要进行输入或输出或与您进行协调,则可以与Python程序并行进行有用的工作。

避免os.system()os.popen()

自从永恒(从Python 2.5开始)以来,os模块文档中就包含了建议优先subprocessos.system()

subprocess模块提供了更强大的功能来生成新流程并检索其结果。使用该模块优于使用此功能。

问题system()在于它显然依赖于系统,并且没有提供与子流程进行交互的方法。它运行简单,标准输出和标准错误超出了Python的范围。Python收到的唯一信息是命令的退出状态(零表示成功,尽管非零值的含义在某种程度上也取决于系统)。

PEP-324(上面已经提到过)包含了更详细的理由,说明了为什么os.system会出现问题以及如何subprocess尝试解决这些问题。

os.popen()过去更不鼓励

从2.6版开始不推荐使用:此功能已过时。使用subprocess模块。

但是,自从Python 3发行以来,它已经重新实现为仅使用subprocess,并重定向到subprocess.Popen()文档以获取详细信息。

了解并通常使用 check=True

您还会注意到,它与subprocess.call()有许多相同的限制os.system()。在常规使用中,通常应该检查流程是否成功完成,执行subprocess.check_call()subprocess.check_output()执行(其中后者还返回完成的子流程的标准输出)。同样,除非特别需要允许子流程返回错误状态check=Truesubprocess.run()否则通常应使用with 。

实际上,使用check=True或时subprocess.check_*,如果子进程返回非零退出状态,Python将抛出CalledProcessError异常

一个常见的错误subprocess.run()check=True如果子进程失败,则在下游代码失败时忽略并感到惊讶。

在另一方面,有一个共同的问题check_call()check_output()是谁盲目使用这些功能的用户,当异常发生时,如感到惊讶grep并没有找到匹配。(您可能应该grep仍然用本机Python代码替换,如下所述。)

所有事情都计算在内,您需要了解shell命令如何返回退出代码,以及在什么条件下它们将返回非零(错误)退出代码,并做出有意识的决定,如何精确地处理它。

了解并且可能使用text=Trueakauniversal_newlines=True

从Python 3开始,Python内部的字符串是Unicode字符串。但是,不能保证子进程会生成Unicode输出或字符串。

(如果差异不是立即显而易见的,则建议使用Ned Batchelder的实用Unicode(如果不是必须的话)阅读。如果您愿意,可以在链接后进行36分钟的视频演示,尽管您自己阅读页面的时间可能会大大减少。 )

深入地讲,Python必须获取bytes缓冲区并以某种方式解释它。如果它包含二进制数据的斑点,则不应将其解码为Unicode字符串,因为这是容易出错和引起错误的行为-正是这种讨厌的行为,使许多Python 2脚本充满了麻烦,之后才有办法正确区分编码文本和二进制数据。

使用text=True,您可以告诉Python您实际上希望以系统的默认编码返回文本数据,并且应将其解码为Python(Unicode)字符串,以达到Python的最佳能力(通常,UTF-8在不超过日期系统,也许除了Windows?)

如果您没有要求,Python只会bytesstdoutstderr字符串中提供字符串。也许稍后您确实知道它们是文本字符串,并且知道了它们的编码。然后,您可以解码它们。

normal = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True,
    text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode('utf-8'))

Python 3.7 text为关键字参数引入了更简短,更具描述性和可理解性的别名,该别名以前曾被误导使用universal_newlines

了解shell=Trueshell=False

随着shell=True您将单个字符串传递给您的外壳,外壳便从那里接收它。

随着shell=False您将参数列表传递给操作系统,绕过了外壳程序。

当没有外壳时,您可以保存进程并摆脱相当多的隐藏复杂性,这些复杂性可能会也可能不会包含错误甚至安全问题。

另一方面,当您没有外壳程序时,就没有重定向,通配符扩展,作业控制和大量其他外壳程序功能。

一个常见的错误是使用shell=TruePython,然后仍将令牌列表传递给Python,反之亦然。这在某些情况下可能会起作用,但确实定义不清,并可能以有趣的方式破坏。

# XXX AVOID THIS BUG
buggy = subprocess.run('dig +short stackoverflow.com')

# XXX AVOID THIS BUG TOO
broken = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(['dig +short stackoverflow.com'],
    shell=True)

correct = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    # Probably don't forget these, too
    check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run('dig +short stackoverflow.com',
    shell=True,
    # Probably don't forget these, too
    check=True, text=True)

常见的反驳“但对我有用”不是一个有用的反驳,除非您确切地了解它在什么情况下会停止工作。

重构实例

通常,shell的功能可以用本地Python代码替换。简单的Awk或sed脚本可能应该简单地翻译成Python。

为了部分说明这一点,这是一个典型但有些愚蠢的示例,其中涉及许多外壳功能。

cmd = '''while read -r x;
   do ping -c 3 "$x" | grep 'round-trip min/avg/max'
   done <hosts.txt'''

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open('hosts.txt') as hosts:
    for host in hosts:
        host = host.rstrip('\n')  # drop newline
        ping = subprocess.run(
             ['ping', '-c', '3', host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split('\n'):
             if 'round-trip min/avg/max' in line:
                 print('{}: {}'.format(host, line))

这里要注意一些事情:

  • 随着shell=False你不需要引用的外壳需要大约字符串。无论如何用引号引起来可能是一个错误。
  • 在子流程中运行尽可能少的代码通常是有意义的。这使您可以从Python代码中更好地控制执行。
  • 话虽这么说,复杂的Shell管道非常繁琐,有时很难在Python中重新实现。

重构后的代码还以非常简洁的语法说明了shell实际为您做了多少-更好或更坏。Python说明确优于隐式,但Python代码相当冗长,可以说是看起来复杂得多,这确实是。另一方面,它提供了许多要点,您可以在其中进行控制,例如通过增强功能可以很容易地说明这一点,我们可以轻松地将主机名与shell命令输出一起包括在内。(这在shell中也绝非挑战性,但是以另一种转移和也许另一种过程为代价的。)

普通壳结构

为了完整起见,这里简要介绍了其中一些外壳程序功能,并提供了一些注释,说明如何用本地Python设施替换它们。

  • Globbing aka通配符扩展可以glob.glob()用python或类似的简单Python字符串比较代替,或者经常用for file in os.listdir('.'): if not file.endswith('.png'): continue。Bash具有各种其他扩展功能,例如.{png,jpg}大括号扩展和{1..100}波浪号扩展(~扩展到您的主目录,并且更广泛~account地扩展到另一个用户的主目录)
  • Shell变量(例如$SHELL$my_exported_var有时可以简单地用Python变量替换)。导出的shell变量可作为例如os.environ['SHELL'](的意思export是使变量提供给子进程-一个变量,它是不可用的子进程显然不会提供给Python的运行作为shell的子进程,反之亦然env=关键字subprocess方法的参数可让您将子流程的环境定义为字典,因此这是使Python变量对子流程可见的一种方法。与shell=False您将需要了解如何删除任何引号;例如,cd "$HOME"相当于os.chdir(os.environ['HOME'])目录名周围不带引号。(常常cd是不是有用的或必要的,无论如何,和很多新手忽略了双引号周围的变量,并摆脱它,直到有一天……
  • 重定向允许您从文件读取作为标准输入,并将标准输出写入文件。grep 'foo' <inputfile >outputfile打开outputfile以进行写入和inputfile阅读,并将其内容作为标准输入传递给grep,然后其标准输出进入outputfile。通常,用本机Python代码替换它并不难。
  • 管道是重定向的一种形式。echo foo | nl运行两个子进程,其中的标准输出echo是的标准输入nl(在OS级别,在类Unix系统中,这是一个文件句柄)。如果您无法用本机Python代码替换管道的一端或两端,则也许可以考虑使用外壳程序,尤其是在管道具有两个或三个以上进程的情况下(尽管请查看pipesPython标准库中的模块或多个更具现代性和多功能的第三方竞争对手)。
  • 作业控制使您可以中断作业,在后台运行它们,将它们返回到前台等。当然,Python也提供了停止和继续一个进程的基本Unix信号。但是作业是外壳程序中的一个更高层次的抽象,涉及流程组等,如果您想从Python中进行类似的工作,则必须理解。
  • 除非您了解所有内容基本上都是字符串,否则在外壳程序中进行报价可能会造成混淆。因此ls -l /等效于,'ls' '-l' '/'但文字的引号是完全可选的。包含外壳元字符的未加引号的字符串将进行参数扩展,空格标记化和通配符扩展;双引号可防止空格标记化和通配符扩展,但允许参数扩展(变量替换,命令替换和反斜杠处理)。从理论上讲这很简单,但是可能会令人困惑,尤其是在存在多层解释时(例如,远程shell命令)。

了解sh和Bash 之间的差异

subprocess/bin/sh除非另有明确要求,否则将使用shell命令运行外壳命令(当然,在Windows上除外,因为Windows使用COMSPEC变量的值)。这意味着各种仅Bash的功能(如数组[[等)不可用。

如果您需要使用仅Bash语法,则可以将路径传递为shell executable='/bin/bash'(当然,如果您的Bash安装在其他位置,则需要调整路径)。

subprocess.run('''
    # This for loop syntax is Bash only
    for((i=1;i<=$#;i++)); do
        # Arrays are Bash-only
        array[i]+=123
    done''',
    shell=True, check=True,
    executable='/bin/bash')

A subprocess与它的父项分开,并且不能对其进行更改

一个常见的错误是做类似的事情

subprocess.run('foo=bar', shell=True)
subprocess.run('echo "$foo"', shell=True)  # Doesn't work

除了缺乏优雅之外,这还背叛了人们对“ subprocess”名称中“ sub”部分的根本性了解。

子进程与Python完全独立运行,完成时,Python不知道它做了什么(除了模糊的指示,它可以从子进程的退出状态和输出中推断出来)。孩子通常不能改变父母的环境;它不能设置变量,更改工作目录,也不能在没有上级合作的情况下与其上级通信。

在这种情况下,立即解决的办法是在一个子进程中运行两个命令。

subprocess.run('foo=bar; echo "$foo"', shell=True)

尽管显然,这种特定用例根本不需要外壳。请记住,您可以通过以下方式操纵当前进程的环境(因此也可以操纵其子进程)

os.environ['foo'] = 'bar'

或通过以下方式将环境设置传递给子进程

subprocess.run('echo "$foo"', shell=True, env={'foo': 'bar'})

(更不用说明显的重构了subprocess.run(['echo', 'bar']);但是echo,当然,这是在子流程中首先要运行的一个糟糕的例子)。

不要从Python运行Python

这是些可疑的建议。当然,在某些情况下,将Python解释器作为Python脚本的子进程运行甚至是绝对必要的情况。但是通常,正确的方法只是import将另一个Python模块放入您的调用脚本中,然后直接调用其功能。

如果其他Python脚本在您的控制下,并且不是模块,请考虑将其转换为一个。(此答案已经太久了,因此在这里我将不做详细介绍。)

如果需要并行处理,则可以在带有multiprocessing模块的子进程中运行Python函数 还有一个可以threading在单个进程中运行多个任务(它更轻巧,可以为您提供更多控制权,但同时也更多地限制了进程中的线程紧密耦合并绑定到单个GIL。)

To somewhat expand on the earlier answers here, there are a number of details which are commonly overlooked.

  • Prefer subprocess.run() over subprocess.check_call() and friends over subprocess.call() over subprocess.Popen() over os.system() over os.popen()
  • Understand and probably use text=True, aka universal_newlines=True.
  • Understand the meaning of shell=True or shell=False and how it changes quoting and the availability of shell conveniences.
  • Understand differences between sh and Bash
  • Understand how a subprocess is separate from its parent, and generally cannot change the parent.
  • Avoid running the Python interpreter as a subprocess of Python.

These topics are covered in some more detail below.

Prefer subprocess.run() or subprocess.check_call()

The subprocess.Popen() function is a low-level workhorse but it is tricky to use correctly and you end up copy/pasting multiple lines of code … which conveniently already exist in the standard library as a set of higher-level wrapper functions for various purposes, which are presented in more detail in the following.

Here’s a paragraph from the documentation:

The recommended approach to invoking subprocesses is to use the run() function for all use cases it can handle. For more advanced use cases, the underlying Popen interface can be used directly.

Unfortunately, the availability of these wrapper functions differs between Python versions.

  • subprocess.run() was officially introduced in Python 3.5. It is meant to replace all of the following.
  • subprocess.check_output() was introduced in Python 2.7 / 3.1. It is basically equivalent to subprocess.run(..., check=True, stdout=subprocess.PIPE).stdout
  • subprocess.check_call() was introduced in Python 2.5. It is basically equivalent to subprocess.run(..., check=True)
  • subprocess.call() was introduced in Python 2.4 in the original subprocess module (PEP-324). It is basically equivalent to subprocess.run(...).returncode

High-level API vs subprocess.Popen()

The refactored and extended subprocess.run() is more logical and more versatile than the older legacy functions it replaces. It returns a CompletedProcess object which has various methods which allow you to retrieve the exit status, the standard output, and a few other results and status indicators from the finished subprocess.

subprocess.run() is the way to go if you simply need a program to run and return control to Python. For more involved scenarios (background processes, perhaps with interactive I/O with the Python parent program) you still need to use subprocess.Popen() and take care of all the plumbing yourself. This requires a fairly intricate understanding of all the moving parts and should not be undertaken lightly. The simpler Popen object represents the (possibly still-running) process which needs to be managed from your code for the remainder of the lifetime of the subprocess.

It should perhaps be emphasized that just subprocess.Popen() merely creates a process. If you leave it at that, you have a subprocess running concurrently alongside with Python, so a “background” process. If it doesn’t need to do input or output or otherwise coordinate with you, it can do useful work in parallel with your Python program.

Avoid os.system() and os.popen()

Since time eternal (well, since Python 2.5) the os module documentation has contained the recommendation to prefer subprocess over os.system():

The subprocess module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function.

The problems with system() are that it’s obviously system-dependent and doesn’t offer ways to interact with the subprocess. It simply runs, with standard output and standard error outside of Python’s reach. The only information Python receives back is the exit status of the command (zero means success, though the meaning of non-zero values is also somewhat system-dependent).

PEP-324 (which was already mentioned above) contains a more detailed rationale for why os.system is problematic and how subprocess attempts to solve those issues.

os.popen() used to be even more strongly discouraged:

Deprecated since version 2.6: This function is obsolete. Use the subprocess module.

However, since sometime in Python 3, it has been reimplemented to simply use subprocess, and redirects to the subprocess.Popen() documentation for details.

Understand and usually use check=True

You’ll also notice that subprocess.call() has many of the same limitations as os.system(). In regular use, you should generally check whether the process finished successfully, which subprocess.check_call() and subprocess.check_output() do (where the latter also returns the standard output of the finished subprocess). Similarly, you should usually use check=True with subprocess.run() unless you specifically need to allow the subprocess to return an error status.

In practice, with check=True or subprocess.check_*, Python will throw a CalledProcessError exception if the subprocess returns a nonzero exit status.

A common error with subprocess.run() is to omit check=True and be surprised when downstream code fails if the subprocess failed.

On the other hand, a common problem with check_call() and check_output() was that users who blindly used these functions were surprised when the exception was raised e.g. when grep did not find a match. (You should probably replace grep with native Python code anyway, as outlined below.)

All things counted, you need to understand how shell commands return an exit code, and under what conditions they will return a non-zero (error) exit code, and make a conscious decision how exactly it should be handled.

Understand and probably use text=True aka universal_newlines=True

Since Python 3, strings internal to Python are Unicode strings. But there is no guarantee that a subprocess generates Unicode output, or strings at all.

(If the differences are not immediately obvious, Ned Batchelder’s Pragmatic Unicode is recommended, if not outright obligatory, reading. There is a 36-minute video presentation behind the link if you prefer, though reading the page yourself will probably take significantly less time.)

Deep down, Python has to fetch a bytes buffer and interpret it somehow. If it contains a blob of binary data, it shouldn’t be decoded into a Unicode string, because that’s error-prone and bug-inducing behavior – precisely the sort of pesky behavior which riddled many Python 2 scripts, before there was a way to properly distinguish between encoded text and binary data.

With text=True, you tell Python that you, in fact, expect back textual data in the system’s default encoding, and that it should be decoded into a Python (Unicode) string to the best of Python’s ability (usually UTF-8 on any moderately up to date system, except perhaps Windows?)

If that’s not what you request back, Python will just give you bytes strings in the stdout and stderr strings. Maybe at some later point you do know that they were text strings after all, and you know their encoding. Then, you can decode them.

normal = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True,
    text=True)
print(normal.stdout)

convoluted = subprocess.run([external, arg],
    stdout=subprocess.PIPE, stderr=subprocess.PIPE,
    check=True)
# You have to know (or guess) the encoding
print(convoluted.stdout.decode('utf-8'))

Python 3.7 introduced the shorter and more descriptive and understandable alias text for the keyword argument which was previously somewhat misleadingly called universal_newlines.

Understand shell=True vs shell=False

With shell=True you pass a single string to your shell, and the shell takes it from there.

With shell=False you pass a list of arguments to the OS, bypassing the shell.

When you don’t have a shell, you save a process and get rid of a fairly substantial amount of hidden complexity, which may or may not harbor bugs or even security problems.

On the other hand, when you don’t have a shell, you don’t have redirection, wildcard expansion, job control, and a large number of other shell features.

A common mistake is to use shell=True and then still pass Python a list of tokens, or vice versa. This happens to work in some cases, but is really ill-defined and could break in interesting ways.

# XXX AVOID THIS BUG
buggy = subprocess.run('dig +short stackoverflow.com')

# XXX AVOID THIS BUG TOO
broken = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    shell=True)

# XXX DEFINITELY AVOID THIS
pathological = subprocess.run(['dig +short stackoverflow.com'],
    shell=True)

correct = subprocess.run(['dig', '+short', 'stackoverflow.com'],
    # Probably don't forget these, too
    check=True, text=True)

# XXX Probably better avoid shell=True
# but this is nominally correct
fixed_but_fugly = subprocess.run('dig +short stackoverflow.com',
    shell=True,
    # Probably don't forget these, too
    check=True, text=True)

The common retort “but it works for me” is not a useful rebuttal unless you understand exactly under what circumstances it could stop working.

Refactoring Example

Very often, the features of the shell can be replaced with native Python code. Simple Awk or sed scripts should probably simply be translated to Python instead.

To partially illustrate this, here is a typical but slightly silly example which involves many shell features.

cmd = '''while read -r x;
   do ping -c 3 "$x" | grep 'round-trip min/avg/max'
   done <hosts.txt'''

# Trivial but horrible
results = subprocess.run(
    cmd, shell=True, universal_newlines=True, check=True)
print(results.stdout)

# Reimplement with shell=False
with open('hosts.txt') as hosts:
    for host in hosts:
        host = host.rstrip('\n')  # drop newline
        ping = subprocess.run(
             ['ping', '-c', '3', host],
             text=True,
             stdout=subprocess.PIPE,
             check=True)
        for line in ping.stdout.split('\n'):
             if 'round-trip min/avg/max' in line:
                 print('{}: {}'.format(host, line))

Some things to note here:

  • With shell=False you don’t need the quoting that the shell requires around strings. Putting quotes anyway is probably an error.
  • It often makes sense to run as little code as possible in a subprocess. This gives you more control over execution from within your Python code.
  • Having said that, complex shell pipelines are tedious and sometimes challenging to reimplement in Python.

The refactored code also illustrates just how much the shell really does for you with a very terse syntax — for better or for worse. Python says explicit is better than implicit but the Python code is rather verbose and arguably looks more complex than this really is. On the other hand, it offers a number of points where you can grab control in the middle of something else, as trivially exemplified by the enhancement that we can easily include the host name along with the shell command output. (This is by no means challenging to do in the shell, either, but at the expense of yet another diversion and perhaps another process.)

Common Shell Constructs

For completeness, here are brief explanations of some of these shell features, and some notes on how they can perhaps be replaced with native Python facilities.

  • Globbing aka wildcard expansion can be replaced with glob.glob() or very often with simple Python string comparisons like for file in os.listdir('.'): if not file.endswith('.png'): continue. Bash has various other expansion facilities like .{png,jpg} brace expansion and {1..100} as well as tilde expansion (~ expands to your home directory, and more generally ~account to the home directory of another user)
  • Shell variables like $SHELL or $my_exported_var can sometimes simply be replaced with Python variables. Exported shell variables are available as e.g. os.environ['SHELL'] (the meaning of export is to make the variable available to subprocesses — a variable which is not available to subprocesses will obviously not be available to Python running as a subprocess of the shell, or vice versa. The env= keyword argument to subprocess methods allows you to define the environment of the subprocess as a dictionary, so that’s one way to make a Python variable visible to a subprocess). With shell=False you will need to understand how to remove any quotes; for example, cd "$HOME" is equivalent to os.chdir(os.environ['HOME']) without quotes around the directory name. (Very often cd is not useful or necessary anyway, and many beginners omit the double quotes around the variable and get away with it until one day …)
  • Redirection allows you to read from a file as your standard input, and write your standard output to a file. grep 'foo' <inputfile >outputfile opens outputfile for writing and inputfile for reading, and passes its contents as standard input to grep, whose standard output then lands in outputfile. This is not generally hard to replace with native Python code.
  • Pipelines are a form of redirection. echo foo | nl runs two subprocesses, where the standard output of echo is the standard input of nl (on the OS level, in Unix-like systems, this is a single file handle). If you cannot replace one or both ends of the pipeline with native Python code, perhaps think about using a shell after all, especially if the pipeline has more than two or three processes (though look at the pipes module in the Python standard library or a number of more modern and versatile third-party competitors).
  • Job control lets you interrupt jobs, run them in the background, return them to the foreground, etc. The basic Unix signals to stop and continue a process are of course available from Python, too. But jobs are a higher-level abstraction in the shell which involve process groups etc which you have to understand if you want to do something like this from Python.
  • Quoting in the shell is potentially confusing until you understand that everything is basically a string. So ls -l / is equivalent to 'ls' '-l' '/' but the quoting around literals is completely optional. Unquoted strings which contain shell metacharacters undergo parameter expansion, whitespace tokenization and wildcard expansion; double quotes prevent whitespace tokenization and wildcard expansion but allow parameter expansions (variable substitution, command substitution, and backslash processing). This is simple in theory but can get bewildering, especially when there are several layers of interpretation (a remote shell command, for example).

Understand differences between sh and Bash

subprocess runs your shell commands with /bin/sh unless you specifically request otherwise (except of course on Windows, where it uses the value of the COMSPEC variable). This means that various Bash-only features like arrays, [[ etc are not available.

If you need to use Bash-only syntax, you can pass in the path to the shell as executable='/bin/bash' (where of course if your Bash is installed somewhere else, you need to adjust the path).

subprocess.run('''
    # This for loop syntax is Bash only
    for((i=1;i<=$#;i++)); do
        # Arrays are Bash-only
        array[i]+=123
    done''',
    shell=True, check=True,
    executable='/bin/bash')

A subprocess is separate from its parent, and cannot change it

A somewhat common mistake is doing something like

subprocess.run('cd /tmp', shell=True)
subprocess.run('pwd', shell=True)  # Oops, doesn't print /tmp

The same thing will happen if the first subprocess tries to set an environment variable, which of course will have disappeared when you run another subprocess, etc.

A child process runs completely separate from Python, and when it finishes, Python has no idea what it did (apart from the vague indicators that it can infer from the exit status and output from the child process). A child generally cannot change the parent’s environment; it cannot set a variable, change the working directory, or, in so many words, communicate with its parent without cooperation from the parent.

The immediate fix in this particular case is to run both commands in a single subprocess;

subprocess.run('cd /tmp; pwd', shell=True)

though obviously this particular use case isn’t very useful; instead, use the cwd keyword argument, or simply os.chdir() before running the subprocess. Similarly, for setting a variable, you can manipulate the environment of the current process (and thus also its children) via

os.environ['foo'] = 'bar'

or pass an environment setting to a child process with

subprocess.run('echo "$foo"', shell=True, env={'foo': 'bar'})

(not to mention the obvious refactoring subprocess.run(['echo', 'bar']); but echo is a poor example of something to run in a subprocess in the first place, of course).

Don’t run Python from Python

This is slightly dubious advice; there are certainly situations where it does make sense or is even an absolute requirement to run the Python interpreter as a subprocess from a Python script. But very frequently, the correct approach is simply to import the other Python module into your calling script and call its functions directly.

If the other Python script is under your control, and it isn’t a module, consider turning it into one. (This answer is too long already so I will not delve into details here.)

If you need parallelism, you can run Python functions in subprocesses with the multiprocessing module. There is also threading which runs multiple tasks in a single process (which is more lightweight and gives you more control, but also more constrained in that threads within a process are tightly coupled, and bound to a single GIL.)


回答 2

用子流程调用

import subprocess
subprocess.Popen("cwm --rdf test.rdf --ntriples > test.nt")

您收到的错误似乎是由于服务器上没有交换模块,您应该在服务器上安装交换,然后再次运行脚本

Call it with subprocess

import subprocess
subprocess.Popen("cwm --rdf test.rdf --ntriples > test.nt")

The error you are getting seems to be because there is no swap module on the server, you should install swap on the server then run the script again


回答 3

可以使用带有参数-c的bash程序来执行命令:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
output = subprocess.check_output(['bash','-c', bashCommand])

It is possible you use the bash program, with the parameter -c for execute the commands:

bashCommand = "cwm --rdf test.rdf --ntriples > test.nt"
output = subprocess.check_output(['bash','-c', bashCommand])

回答 4

您可以使用subprocess,但是我始终觉得这不是一种“ Pythonic”方式。因此,我创建了Sultan(无耻插件),使运行命令行功能变得容易。

https://github.com/aeroxis/sultan

You can use subprocess, but I always felt that it was not a ‘Pythonic’ way of doing it. So I created Sultan (shameless plug) that makes it easy to run command line functions.

https://github.com/aeroxis/sultan


回答 5

根据错误,您在服务器上缺少名为swap的软件包。这/usr/bin/cwm需要它。如果您使用的是Ubuntu / Debian,请python-swap使用aptitude 安装。

According to the error you are missing a package named swap on the server. This /usr/bin/cwm requires it. If you’re on Ubuntu/Debian, install python-swap using aptitude.


回答 6

您也可以使用“ os.popen”。例:

import os

command = os.popen('ls -al')
print(command.read())
print(command.close())

输出:

total 16
drwxr-xr-x 2 root root 4096 ago 13 21:53 .
drwxr-xr-x 4 root root 4096 ago 13 01:50 ..
-rw-r--r-- 1 root root 1278 ago 13 21:12 bot.py
-rw-r--r-- 1 root root   77 ago 13 21:53 test.py

None

Also you can use ‘os.popen’. Example:

import os

command = os.popen('ls -al')
print(command.read())
print(command.close())

Output:

total 16
drwxr-xr-x 2 root root 4096 ago 13 21:53 .
drwxr-xr-x 4 root root 4096 ago 13 01:50 ..
-rw-r--r-- 1 root root 1278 ago 13 21:12 bot.py
-rw-r--r-- 1 root root   77 ago 13 21:53 test.py

None

回答 7

要在没有外壳的情况下运行命令,请将命令作为列表传递,并使用[subprocess]以下命令在Python中实现重定向:

#!/usr/bin/env python
import subprocess

with open('test.nt', 'wb', 0) as file:
    subprocess.check_call("cwm --rdf test.rdf --ntriples".split(),
                          stdout=file)

注意:最后没有> test.ntstdout=file实现重定向。


要在Python中使用Shell运行命令,请将命令作为字符串传递并启用shell=True

#!/usr/bin/env python
import subprocess

subprocess.check_call("cwm --rdf test.rdf --ntriples > test.nt",
                      shell=True)

这是外壳程序,负责输出重定向(> test.nt在命令中)。


要运行使用bashisms的bash命令,请显式指定bash可执行文件,例如,模拟bash进程替换

#!/usr/bin/env python
import subprocess

subprocess.check_call('program <(command) <(another-command)',
                      shell=True, executable='/bin/bash')

To run the command without a shell, pass the command as a list and implement the redirection in Python using [subprocess]:

#!/usr/bin/env python
import subprocess

with open('test.nt', 'wb', 0) as file:
    subprocess.check_call("cwm --rdf test.rdf --ntriples".split(),
                          stdout=file)

Note: no > test.nt at the end. stdout=file implements the redirection.


To run the command using the shell in Python, pass the command as a string and enable shell=True:

#!/usr/bin/env python
import subprocess

subprocess.check_call("cwm --rdf test.rdf --ntriples > test.nt",
                      shell=True)

Here’s the shell is responsible for the output redirection (> test.nt is in the command).


To run a bash command that uses bashisms, specify the bash executable explicitly e.g., to emulate bash process substitution:

#!/usr/bin/env python
import subprocess

subprocess.check_call('program <(command) <(another-command)',
                      shell=True, executable='/bin/bash')

回答 8

执行此操作的pythonic方法是使用 subprocess.Popen

subprocess.Popen 接受一个列表,其中第一个元素是要运行的命令,后跟任何命令行参数。

举个例子:

import subprocess

args = ['echo', 'Hello!']
subprocess.Popen(args) // same as running `echo Hello!` on cmd line

args2 = ['echo', '-v', '"Hello Again"']
subprocess.Popen(args2) // same as running 'echo -v "Hello Again!"` on cmd line

The pythonic way of doing this is using subprocess.Popen

subprocess.Popen takes a list where the first element is the command to be run followed by any command line arguments.

As an example:

import subprocess

args = ['echo', 'Hello!']
subprocess.Popen(args) // same as running `echo Hello!` on cmd line

args2 = ['echo', '-v', '"Hello Again"']
subprocess.Popen(args2) // same as running 'echo -v "Hello Again!"` on cmd line