为什么要使用Python的os模块方法而不是直接执行shell命令?

问题:为什么要使用Python的os模块方法而不是直接执行shell命令?

我试图了解使用Python的库函数执行特定于操作系统的任务(例如创建文件/目录,更改文件属性等)的动机是什么,而不是仅通过os.system()or 来执行这些命令subprocess.call()

例如,为什么我要使用os.chmod而不是做os.system("chmod...")

我知道,尽可能多地使用Python的可用库方法,而不是直接执行Shell命令,更像是“ Pythonic”。但是,从功能角度来看,这样做还有其他动机吗?

我只在这里谈论执行简单的单行shell命令。当我们需要对任务的执行进行更多控制时,我知道subprocess例如使用模块更有意义。

I am trying to understand what is the motivation behind using Python’s library functions for executing OS-specific tasks such as creating files/directories, changing file attributes, etc. instead of just executing those commands via os.system() or subprocess.call()?

For example, why would I want to use os.chmod instead of doing os.system("chmod...")?

I understand that it is more “pythonic” to use Python’s available library methods as much as possible instead of just executing shell commands directly. But, is there any other motivation behind doing this from a functionality point of view?

I am only talking about executing simple one-line shell commands here. When we need more control over the execution of the task, I understand that using subprocess module makes more sense, for example.


回答 0

  1. 速度更快os.systemsubprocess.call创建了新的流程,而这对于这种简单的操作是不必要的。事实上,os.systemsubprocess.callshell参数通常至少创建两个新的流程:第一个是罩,而第二个是命令,你正在运行(如果它不是内置像贝壳test)。

  2. 有些命令在单独的过程没有用。例如,如果运行os.spawn("cd dir/"),它将更改子进程的当前工作目录,但不会更改Python进程的当前工作目录。您需要使用os.chdir它。

  3. 您不必担心shell 解释的特殊字符os.chmod(path, mode)不管文件名是什么都可以使用,而os.spawn("chmod 777 " + path)如果文件名是则将失败; rm -rf ~。(请注意,如果subprocess.call不带shell参数使用,可以解决此问题。)

  4. 您不必担心以破折号开头的文件名os.chmod("--quiet", mode)将更改名为的文件的权限--quiet,但os.spawn("chmod 777 --quiet")会失败,因为--quiet会解释为参数。即使这样,也是如此subprocess.call(["chmod", "777", "--quiet"])

  5. 您可以减少跨平台和跨外壳的问题,因为Python的标准库应该可以为您解决这些问题。您的系统有chmod命令吗?安装好了吗?它支持您期望它支持的参数吗?该os模块将尝试尽可能地跨平台,并在不可能的情况下进行记录。

  6. 如果您正在运行的命令具有您所关心的输出,则需要对其进行解析,这比听起来要棘手,因为您可能会忘记了极端情况(其中包含空格,制表符和换行符的文件名),即使您不在乎可移植性。

  1. It’s faster, os.system and subprocess.call create new processes which is unnecessary for something this simple. In fact, os.system and subprocess.call with the shell argument usually create at least two new processes: the first one being the shell, and the second one being the command that you’re running (if it’s not a shell built-in like test).

  2. Some commands are useless in a separate process. For example, if you run os.spawn("cd dir/"), it will change the current working directory of the child process, but not of the Python process. You need to use os.chdir for that.

  3. You don’t have to worry about special characters interpreted by the shell. os.chmod(path, mode) will work no matter what the filename is, whereas os.spawn("chmod 777 " + path) will fail horribly if the filename is something like ; rm -rf ~. (Note that you can work around this if you use subprocess.call without the shell argument.)

  4. You don’t have to worry about filenames that begin with a dash. os.chmod("--quiet", mode) will change the permissions of the file named --quiet, but os.spawn("chmod 777 --quiet") will fail, as --quiet is interpreted as an argument. This is true even for subprocess.call(["chmod", "777", "--quiet"]).

  5. You have fewer cross-platform and cross-shell concerns, as Python’s standard library is supposed to deal with that for you. Does your system have chmod command? Is it installed? Does it support the parameters that you expect it to support? The os module will try to be as cross-platform as possible and documents when that it’s not possible.

  6. If the command you’re running has output that you care about, you need to parse it, which is trickier than it sounds, as you may forget about corner-cases (filenames with spaces, tabs and newlines in them), even when you don’t care about portability.


回答 1

更安全。这里给你一个想法是一个示例脚本

import os
file = raw_input("Please enter a file: ")
os.system("chmod 777 " + file)

如果来自用户的输入是test; rm -rf ~,则将删除主目录。

这就是为什么使用内置函数更安全的原因。

因此,为什么还要使用子流程而不是系统。

It is safer. To give you an idea here is an example script

import os
file = raw_input("Please enter a file: ")
os.system("chmod 777 " + file)

If the input from the user was test; rm -rf ~ this would then delete the home directory.

This is why it is safer to use the built in function.

Hence why you should use subprocess instead of system too.


回答 2

在执行命令时,有四种很强的情况os比起使用os.systemsubprocess模块,更喜欢在模块中使用Python更具体的方法:

  • 冗余 -产生另一个进程是多余的,浪费时间和资源。
  • 可移植性os模块中的许多方法可在多个平台上使用,而许多shell命令是特定于OS的。
  • 了解结果 -生成执行任意命令的进程会迫使您从输出中解析结果,并了解命令是否以及为什么做错了什么。
  • 安全 -进程可以执行它给出的任何命令。这是一个较弱的设计,可以通过使用os模块中的特定方法来避免。

冗余(请参阅冗余代码):

实际上,您在执行最终系统调用的过程chmod中正在执行一个冗余的“中间人”(在您的示例中)。这个中间人是一个新的进程或子外壳。

来自os.system

在子shell中执行命令(字符串)…

并且subprocess仅仅是产生新流程的模块。

您可以执行所需的操作而无需产生这些过程。

可移植性(请参阅源代码可移植性):

os模块的目的是提供通用的操作系统服务,其描述始于:

该模块提供了使用依赖于操作系统的功能的便携式方法。

您可以os.listdir在Windows和Unix上使用。尝试将os.system/ subprocess用于此功能将迫使您维护两个调用(ls/ dir),并检查您所使用的操作系统。这不是便携式的,以后引起更大的挫败感(请参阅处理输出)。

了解命令的结果:

假设您要列出目录中的文件。

如果使用os.system("ls")/ subprocess.call(['ls']),则只能返回该进程的输出,这基本上是一个带有文件名的大字符串。

如何从两个文件中分辨出文件名中带有空格的文件?

如果您无权列出文件怎么办?

您应该如何将数据映射到python对象?

这些只是我的头上问题,尽管有解决这些问题的方法-为什么要再次解决为您解决的问题?

这是通过重复已经存在且可供您免费使用的实现来遵循“ 不要重复自己”原理(通常称为“ DRY”)的示例。

安全:

os.system并且subprocess功能强大。当您需要这种功能时,这很好,但是当您不需要这种功能时,这是危险的。使用时os.listdir,您知道它只能执行其他操作,然后列出文件或引发错误。当您使用os.systemsubprocess实现相同的行为时,您可能最终会做一些原本不想做的事情。

注射安全性(请参见外壳注射示例

如果将来自用户的输入用作新命令,则基本上已经给了他一个外壳。这就像SQL注入为用户在DB中提供外壳程序一样。

一个示例将是以下形式的命令:

# ... read some user input
os.system(user_input + " some continutation")

这可以很容易利用来运行任何使用输入任意代码:NASTY COMMAND;#创建最终的:

os.system("NASTY COMMAND; # some continuation")

有许多这样的命令会使您的系统处于危险之中。

There are four strong cases for preferring Python’s more-specific methods in the os module over using os.system or the subprocess module when executing a command:

  • Redundancy – spawning another process is redundant and wastes time and resources.
  • Portability – Many of the methods in the os module are available in multiple platforms while many shell commands are os-specific.
  • Understanding the results – Spawning a process to execute arbitrary commands forces you to parse the results from the output and understand if and why a command has done something wrong.
  • Safety – A process can potentially execute any command it’s given. This is a weak design and it can be avoided by using specific methods in the os module.

Redundancy (see redundant code):

You’re actually executing a redundant “middle-man” on your way to the eventual system calls (chmod in your example). This middle man is a new process or sub-shell.

From os.system:

Execute the command (a string) in a subshell …

And subprocess is just a module to spawn new processes.

You can do what you need without spawning these processes.

Portability (see source code portability):

The os module’s aim is to provide generic operating-system services and it’s description starts with:

This module provides a portable way of using operating system dependent functionality.

You can use os.listdir on both windows and unix. Trying to use os.system / subprocess for this functionality will force you to maintain two calls (for ls / dir) and check what operating system you’re on. This is not as portable and will cause even more frustration later on (see Handling Output).

Understanding the command’s results:

Suppose you want to list the files in a directory.

If you’re using os.system("ls") / subprocess.call(['ls']), you can only get the process’s output back, which is basically a big string with the file names.

How can you tell a file with a space in it’s name from two files?

What if you have no permission to list the files?

How should you map the data to python objects?

These are only off the top of my head, and while there are solutions to these problems – why solve again a problem that was solved for you?

This is an example of following the Don’t Repeat Yourself principle (Often reffered to as “DRY”) by not repeating an implementation that already exists and is freely available for you.

Safety:

os.system and subprocess are powerful. It’s good when you need this power, but it’s dangerous when you don’t. When you use os.listdir, you know it can not do anything else other then list files or raise an error. When you use os.system or subprocess to achieve the same behaviour you can potentially end up doing something you did not mean to do.

Injection Safety (see shell injection examples):

If you use input from the user as a new command you’ve basically given him a shell. This is much like SQL injection providing a shell in the DB for the user.

An example would be a command of the form:

# ... read some user input
os.system(user_input + " some continutation")

This can be easily exploited to run any arbitrary code using the input: NASTY COMMAND;# to create the eventual:

os.system("NASTY COMMAND; # some continuation")

There are many such commands that can put your system at risk.


回答 3

出于简单的原因-当您调用shell函数时,它将创建一个子shell,该子shell在命令存在后会被破坏,因此,如果您在shell中更改目录,则不会影响您在Python中的环境。

此外,创建子外壳非常耗时,因此直接使用OS命令将影响您的性能。

编辑

我正在运行一些计时测试:

In [379]: %timeit os.chmod('Documents/recipes.txt', 0755)
10000 loops, best of 3: 215 us per loop

In [380]: %timeit os.system('chmod 0755 Documents/recipes.txt')
100 loops, best of 3: 2.47 ms per loop

In [382]: %timeit call(['chmod', '0755', 'Documents/recipes.txt'])
100 loops, best of 3: 2.93 ms per loop

内部功能运行速度提高10倍以上

编辑2

在某些情况下,调用外部可执行文件可能比Python软件包产生更好的结果-我刚刚记得我的一位同事发送的一封邮件,其中说通过子进程调用的gzip的性能比他使用的Python软件包的性能高得多。但是当我们谈论模拟标准OS命令的标准OS软件包时肯定不会

For a simple reason – when you call a shell function, it creates a sub-shell which is destroyed after your command exists, so if you change directory in a shell – it does not affect your environment in Python.

Besides, creating sub-shell is time consuming, so using OS commands directly will impact your performance

EDIT

I had some timing tests running:

In [379]: %timeit os.chmod('Documents/recipes.txt', 0755)
10000 loops, best of 3: 215 us per loop

In [380]: %timeit os.system('chmod 0755 Documents/recipes.txt')
100 loops, best of 3: 2.47 ms per loop

In [382]: %timeit call(['chmod', '0755', 'Documents/recipes.txt'])
100 loops, best of 3: 2.93 ms per loop

Internal function runs more than 10 time faster

EDIT2

There may be cases when invoking external executable may yield better results than Python packages – I just remembered a mail sent by a colleague of mine that performance of gzip called through subprocess was much higher than the performance of a Python package he used. But certainly not when we are talking about standard OS packages emulating standard OS commands


回答 4

Shell调用是特定于OS的,而在大多数情况下不是Python os模块的功能。并且避免产生子流程。

Shell call are OS specific whereas Python os module functions are not, in most of the case. And it avoid spawning a subprocess.


回答 5

效率更高。“ shell”只是另一个OS二进制文件,其中包含许多系统调用。为什么只为单个系统调用而产生创建整个Shell进程的开销?

当您使用os.system的不是内置shell 时,情况甚至更糟。您启动一个Shell进程,然后启动一个可执行文件,然后该可执行文件(两个进程分开)进行系统调用。至少subprocess可以消除对shell中介过程的需求。

这不是特定于Python的。systemd出于相同的原因,它大大缩短了Linux启动时间:它使必要的系统调用本身而不是产生一千个shell。

It’s far more efficient. The “shell” is just another OS binary which contains a lot of system calls. Why incur the overhead of creating the whole shell process just for that single system call?

The situation is even worse when you use os.system for something that’s not a shell built-in. You start a shell process which in turn starts an executable which then (two processes away) makes the system call. At least subprocess would have removed the need for a shell intermediary process.

It’s not specific to Python, this. systemd is such an improvement to Linux startup times for the same reason: it makes the necessary system calls itself instead of spawning a thousand shells.