如何在Python中实现常见的bash习惯用法?[关闭]

问题:如何在Python中实现常见的bash习惯用法?[关闭]

目前,我通过一堆不记得的AWK,sed,Bash和一小部分Perl对文本文件进行操作。

我见过提到python在这种情况下有好处的几个地方。如何使用Python替换Shell脚本,AWK,sed和朋友?

I currently do my textfile manipulation through a bunch of badly remembered AWK, sed, Bash and a tiny bit of Perl.

I’ve seen mentioned a few places that python is good for this kind of thing. How can I use Python to replace shell scripting, AWK, sed and friends?


回答 0

任何外壳程序都有几套功能。

  • 基本的Linux / Unix命令。所有这些都可以通过子流程库获得。对于执行所有外部命令,这并不总是最好的首选。还要查看shutil中的一些命令,这些命令是独立的Linux命令,但是您可以直接在Python脚本中实现。os库中还有另一批Linux命令。您可以在Python中更简单地完成这些操作。

    还有-奖金! – 更快速。外壳程序中的每个单独的Linux命令(有一些exceptions)都会派生一个子进程。通过使用Python shutilos模块,您无需派生子进程。

  • 外壳环境功能。这包括设置命令环境的内容(当前目录和环境变量以及诸如此类)。您可以直接从Python轻松地对此进行管理。

  • Shell编程功能。这就是所有过程状态代码检查,各种逻辑命令(如果有,为……等),测试命令及其所有亲属。函数定义的东西。在Python中,这一切非常容易。这是摆脱bash并在Python中完成的巨大胜利之一。

  • 互动功能。这包括命令历史记录和“不”记录。编写shell脚本不需要此。这仅用于人类交互,而不用于脚本编写。

  • Shell文件管理功能。这包括重定向和管道。这比较棘手。其中大部分可以通过子流程来完成。但是一些容易在shell中执行的操作在Python中是令人不快的。具体来说就是这样的东西(a | b; c ) | something >result。这将并行运行两个进程(输出a作为的输入b),然后是第三个进程。该序列的输出与并行运行,something并将输出收集到名为的文件中result。用任何其他语言表达都是很复杂的。

特定程序(awk,sed,grep等)通常可以重写为Python模块。不要太过分。替换您需要的内容并发展您的“ grep”模块。不要以编写替换“ grep”的Python模块开始。

最好的事情是您可以分步执行此操作。

  1. 用Python替换AWK和PERL。别的一切。
  2. 看一下用Python替换GREP。这可能会稍微复杂一些,但是您的GREP版本可以根据您的处理需求进行定制。
  3. 看一下用来代替FIND的Python循环os.walk。这是一个很大的胜利,因为您不会产生那么多的进程。
  4. 看一下用Python脚本替换常见的shell逻辑(循环,决策等)。

Any shell has several sets of features.

  • The Essential Linux/Unix commands. All of these are available through the subprocess library. This isn’t always the best first choice for doing all external commands. Look also at shutil for some commands that are separate Linux commands, but you could probably implement directly in your Python scripts. Another huge batch of Linux commands are in the os library; you can do these more simply in Python.

    And — bonus! — more quickly. Each separate Linux command in the shell (with a few exceptions) forks a subprocess. By using Python shutil and os modules, you don’t fork a subprocess.

  • The shell environment features. This includes stuff that sets a command’s environment (current directory and environment variables and what-not). You can easily manage this from Python directly.

  • The shell programming features. This is all the process status code checking, the various logic commands (if, while, for, etc.) the test command and all of it’s relatives. The function definition stuff. This is all much, much easier in Python. This is one of the huge victories in getting rid of bash and doing it in Python.

  • Interaction features. This includes command history and what-not. You don’t need this for writing shell scripts. This is only for human interaction, and not for script-writing.

  • The shell file management features. This includes redirection and pipelines. This is trickier. Much of this can be done with subprocess. But some things that are easy in the shell are unpleasant in Python. Specifically stuff like (a | b; c ) | something >result. This runs two processes in parallel (with output of a as input to b), followed by a third process. The output from that sequence is run in parallel with something and the output is collected into a file named result. That’s just complex to express in any other language.

Specific programs (awk, sed, grep, etc.) can often be rewritten as Python modules. Don’t go overboard. Replace what you need and evolve your “grep” module. Don’t start out writing a Python module that replaces “grep”.

The best thing is that you can do this in steps.

  1. Replace AWK and PERL with Python. Leave everything else alone.
  2. Look at replacing GREP with Python. This can be a bit more complex, but your version of GREP can be tailored to your processing needs.
  3. Look at replacing FIND with Python loops that use os.walk. This is a big win because you don’t spawn as many processes.
  4. Look at replacing common shell logic (loops, decisions, etc.) with Python scripts.

回答 1

当然是 :)

看一下这些库,这些库可以帮助您不再编写Shell脚本(Plumbum的座右铭)。

另外,如果你要替换的awk,sed的和grep的东西基于Python的话,我建议小学项目

“ Pyed Piper”或pyp是类似于awk或sed的linux命令行文本操作工具,但是它使用标准的python字符串和列表方法以及自定义功能,这些功能在激烈的生产环境中可以快速生成结果。

Yes, of course :)

Take a look at these libraries which help you Never write shell scripts again (Plumbum’s motto).

Also, if you want to replace awk, sed and grep with something Python based then I recommend pyp

“The Pyed Piper”, or pyp, is a linux command line text manipulation tool similar to awk or sed, but which uses standard python string and list methods as well as custom functions evolved to generate fast results in an intense production environment.


回答 2

我刚刚发现了如何结合bash和ipython的最佳部分。到目前为止,对我来说,这比使用子流程等更舒服。您可以轻松地复制现有bash脚本的大部分内容,例如以python方式添加错误处理:)这是我的结果:

#!/usr/bin/env ipython3

# *** How to have the most comfort scripting experience of your life ***
# ######################################################################
#
# … by using ipython for scripting combined with subcommands from bash!
#
# 1. echo "#!/usr/bin/env ipython3" > scriptname.ipy    # creates new ipy-file
#
# 2. chmod +x scriptname.ipy                            # make in executable
#
# 3. starting with line 2, write normal python or do some of
#    the ! magic of ipython, so that you can use unix commands
#    within python and even assign their output to a variable via
#    var = !cmd1 | cmd2 | cmd3                          # enjoy ;)
#
# 4. run via ./scriptname.ipy - if it fails with recognizing % and !
#    but parses raw python fine, please check again for the .ipy suffix

# ugly example, please go and find more in the wild
files = !ls *.* | grep "y"
for file in files:
  !echo $file | grep "p"
# sorry for this nonsense example ;)

请参阅有关系统外壳程序命令的 IPython文档,并将其用作系统外壳程序

I just discovered how to combine the best parts of bash and ipython. Up to now this seems more comfortable to me than using subprocess and so on. You can easily copy big parts of existing bash scripts and e.g. add error handling in the python way :) And here is my result:

#!/usr/bin/env ipython3

# *** How to have the most comfort scripting experience of your life ***
# ######################################################################
#
# … by using ipython for scripting combined with subcommands from bash!
#
# 1. echo "#!/usr/bin/env ipython3" > scriptname.ipy    # creates new ipy-file
#
# 2. chmod +x scriptname.ipy                            # make in executable
#
# 3. starting with line 2, write normal python or do some of
#    the ! magic of ipython, so that you can use unix commands
#    within python and even assign their output to a variable via
#    var = !cmd1 | cmd2 | cmd3                          # enjoy ;)
#
# 4. run via ./scriptname.ipy - if it fails with recognizing % and !
#    but parses raw python fine, please check again for the .ipy suffix

# ugly example, please go and find more in the wild
files = !ls *.* | grep "y"
for file in files:
  !echo $file | grep "p"
# sorry for this nonsense example ;)

See IPython docs on system shell commands and using it as a system shell.


回答 3

从2015年和Python 3.4发行版开始,现在可以通过以下网址获得相当完整的用户交互shell:http : //xon.sh/https://github.com/scopatz/xonsh

演示视频不显示正在使用的管道,但他们在默认的shell模式下支持。

Xonsh(’conch’)会非常努力地模仿bash,因此您已经获得了肌肉记忆的东西,例如

env | uniq | sort -r | grep PATH

要么

my-web-server 2>&1 | my-log-sorter

仍然可以正常工作。

该教程篇幅很长,似乎涵盖了人们通常希望在ash或bash提示符下看到的大量功能:

  • 编译,评估和执行!
  • 命令历史记录和制表符完成
  • ?&的帮助和超级帮助??
  • 别名和自定义提示
  • 执行*.xsh也可以导入的命令和/或脚本
  • 环境变量,包括使用 ${}
  • 输入/输出重定向和组合
  • 后台作业和作业控制
  • 嵌套子流程,管道和协同流程
  • 存在命令时为子进程模式,否则为Python模式
  • 使用捕获的子过程,使用的未捕获子$()过程$[],使用的Python评估@()
  • 带有“ *或”的正则表达式的文件名,以及带有反引号的正则表达式的文件名

As of 2015 and Python 3.4’s release, there’s now a reasonably complete user-interactive shell available at: http://xon.sh/ or https://github.com/scopatz/xonsh

The demonstration video does not show pipes being used, but they ARE supported when in the default shell mode.

Xonsh (‘conch’) tries very hard to emulate bash, so things you’ve already gained muscle memory for, like

env | uniq | sort -r | grep PATH

or

my-web-server 2>&1 | my-log-sorter

will still work fine.

The tutorial is quite lengthy and seems to cover a significant amount of the functionality someone would generally expect at a ash or bash prompt:

  • Compiles, Evaluates, & Executes!
  • Command History and Tab Completion
  • Help & Superhelp with ? & ??
  • Aliases & Customized Prompts
  • Executes Commands and/or *.xsh Scripts which can also be imported
  • Environment Variables including Lookup with ${}
  • Input/Output Redirection and Combining
  • Background Jobs & Job Control
  • Nesting Subprocesses, Pipes, and Coprocesses
  • Subprocess-mode when a command exists, Python-mode otherwise
  • Captured Subprocess with $(), Uncaptured Subprocess with $[], Python Evaluation with @()
  • Filename Globbing with * or Regular Expression Filename Globbing with Backticks

回答 4

  • 如果要使用Python作为外壳,为什么不看看IPython?交互式学习语言也很好。
  • 如果您进行大量的文本操作,并且将Vim用作文本编辑器,则还可以直接在python中为Vim编写插件。只需在Vim中输入“:help python”,然后按照说明进行操作或查看此演示文稿即可。编写可直接在编辑器中使用的函数是如此简单和强大!
  • If you want to use Python as a shell, why not have a look at IPython ? It is also good to learn interactively the language.
  • If you do a lot of text manipulation, and if you use Vim as a text editor, you can also directly write plugins for Vim in python. just type “:help python” in Vim and follow the instructions or have a look at this presentation. It is so easy and powerfull to write functions that you will use directly in your editor!

回答 5

最初有sh,sed和awk(以及find,grep和…)。这很好。但是awk可能是一个奇怪的小野兽,如果您不经常使用它,将很难记住。然后,伟大的骆驼创造了Perl。Perl是系统管理员的梦想。就像在类固醇上编写外壳脚本一样。文本处理(包括正则表达式)只是该语言的一部分。然后它变得丑陋了。人们试图用Perl进行大型应用程序。现在,请不要误会我的意思,Perl可以是一个应用程序,但是如果您不太谨慎的话,它可能(可以!)看起来像一团糟。然后就是所有这些平面数据业务。这足以使程序员发疯。

输入Python,Ruby等。这些确实是非常好的通用语言。它们支持文本处理,并且做得很好(尽管在语言的基本核心中可能并不紧密地缠在一起)。但是它们也可以很好地扩展,并且到最后仍然具有漂亮的代码。他们还开发了相当庞大的社区,其中有大量的图书馆可以满足大多数需求。

现在,对Perl的许多负面影响只是一个见解,当然有些人可以编写非常简洁的Perl,但是由于许多人抱怨创建混淆代码太容易了,因此您知道其中有些道理。真正的问题就变成了,您是否打算将这种语言用于比简单的bash脚本替换更多的事情。如果没有,请学习更多Perl。另一方面,如果您想要一种语言,并且随着您想做更多的事情而发展,那么我建议使用Python或Ruby。

无论哪种方式,祝您好运!

In the beginning there was sh, sed, and awk (and find, and grep, and…). It was good. But awk can be an odd little beast and hard to remember if you don’t use it often. Then the great camel created Perl. Perl was a system administrator’s dream. It was like shell scripting on steroids. Text processing, including regular expressions were just part of the language. Then it got ugly… People tried to make big applications with Perl. Now, don’t get me wrong, Perl can be an application, but it can (can!) look like a mess if you’re not really careful. Then there is all this flat data business. It’s enough to drive a programmer nuts.

Enter Python, Ruby, et al. These are really very good general purpose languages. They support text processing, and do it well (though perhaps not as tightly entwined in the basic core of the language). But they also scale up very well, and still have nice looking code at the end of the day. They also have developed pretty hefty communities with plenty of libraries for most anything.

Now, much of the negativeness towards Perl is a matter of opinion, and certainly some people can write very clean Perl, but with this many people complaining about it being too easy to create obfuscated code, you know some grain of truth is there. The question really becomes then, are you ever going to use this language for more than simple bash script replacements. If not, learn some more Perl.. it is absolutely fantastic for that. If, on the other hand, you want a language that will grow with you as you want to do more, may I suggest Python or Ruby.

Either way, good luck!


回答 6

我建议使用很棒的在线书籍《Dive Into Python》。这就是我最初学习语言的方式。

除了教给您语言的基本结构和大量有用的数据结构外,它还有一章很好的文件处理章节,随后的一章是正则表达式等等。

I suggest the awesome online book Dive Into Python. It’s how I learned the language originally.

Beyond teaching you the basic structure of the language, and a whole lot of useful data structures, it has a good chapter on file handling and subsequent chapters on regular expressions and more.


回答 7

添加到先前的答案:检查pexpect模块以处理交互式命令(adduser,passwd等)

Adding to previous answers: check the pexpect module for dealing with interactive commands (adduser, passwd etc.)


回答 8

我喜欢Python的原因之一是,它比POSIX工具具有更好的标准化。我必须仔细检查每一位是否与其他操作系统兼容。在Linux系统上编写的程序在OSX的BSD系统上可能无法正常工作。使用Python,我只需要检查目标系统是否具有足够现代的Python版本。

更好的是,使用标准Python编写的程序甚至可以在Windows上运行!

One reason I love Python is that it is much better standardized than the POSIX tools. I have to double and triple check that each bit is compatible with other operating systems. A program written on a Linux system might not work the same on a BSD system of OSX. With Python, I just have to check that the target system has a sufficiently modern version of Python.

Even better, a program written in standard Python will even run on Windows!


回答 9

我将根据经验给出我的看法:

对于外壳:

  • Shell可以很容易地产生只读代码。编写它,当您重新使用它时,将永远不会再想出自己做了什么。这很容易做到。
  • shell可以使用管道在一行中进行大量文本处理,拆分等操作。
  • 当集成不同编程语言中的程序调用时,它是最好的粘合语言。

对于python:

  • 如果要包括Windows的可移植性,请使用python。
  • 当您必须要操作的不仅仅是文本(例如数字集合)时,python可能会更好。为此,我建议使用python。

我通常会为大多数事情选择bash,但是当我必须跨Windows边界进行操作时,我只会使用python。

I will give here my opinion based on experience:

For shell:

  • shell can very easily spawn read-only code. Write it and when you come back to it, you will never figure out what you did again. It’s very easy to accomplish this.
  • shell can do A LOT of text processing, splitting, etc in one line with pipes.
  • it is the best glue language when it comes to integrate the call of programs in different programming languages.

For python:

  • if you want portability to windows included, use python.
  • python can be better when you must manipulate just more than text, such as collections of numbers. For this, I recommend python.

I usually choose bash for most of the things, but when I have something that must cross windows boundaries, I just use python.


回答 10

pythonpy是一种工具,可使用python语法轻松访问awk和sed的许多功能:

$ echo me2 | py -x 're.sub("me", "you", x)'
you2

pythonpy is a tool that provides easy access to many of the features from awk and sed, but using python syntax:

$ echo me2 | py -x 're.sub("me", "you", x)'
you2

回答 11

我建立了半长的shell脚本(300-500行)和Python代码,它们具有相似的功能。当执行许多外部命令时,我发现该外壳更易于使用。当有大量的文本操作时,Perl也是一个不错的选择。

I have built semi-long shell scripts (300-500 lines) and Python code which does similar functionality. When many external commands are being executed, I find the shell is easier to use. Perl is also a good option when there is lots of text manipulation.


回答 12

在研究此主题时,我发现了这个概念验证代码(通过http://jlebar.com/2010/2/1/Replacing_Bash.html上的注释),可以使您“使用Python编写类似Shell的管道简洁的语法,并在有意义的地方利用现有的系统工具”:

for line in sh("cat /tmp/junk2") | cut(d=',',f=1) | 'sort' | uniq:
    sys.stdout.write(line)

While researching this topic, I found this proof-of-concept code (via a comment at http://jlebar.com/2010/2/1/Replacing_Bash.html) that lets you “write shell-like pipelines in Python using a terse syntax, and leveraging existing system tools where they make sense”:

for line in sh("cat /tmp/junk2") | cut(d=',',f=1) | 'sort' | uniq:
    sys.stdout.write(line)

回答 13

最好的选择是专门针对您的问题的工具。如果正在处理文本文件,则Sed,Awk和Perl是最有竞争力的竞争者。Python是一种通用的动态语言。与任何通用语言一样,文件处理也受支持,但这并不是其核心目的。如果我特别需要动态语言,我会考虑使用Python或Ruby。

简而言之,请非常好地学习Sed和Awk,以及带有* nix风格的所有其他好东西(所有Bash内置,grep,tr等)。如果您对文本文件处理感兴趣,那么您已经在使用正确的东西。

Your best bet is a tool that is specifically geared towards your problem. If it’s processing text files, then Sed, Awk and Perl are the top contenders. Python is a general-purpose dynamic language. As with any general purpose language, there’s support for file-manipulation, but that isn’t what it’s core purpose is. I would consider Python or Ruby if I had a requirement for a dynamic language in particular.

In short, learn Sed and Awk really well, plus all the other goodies that come with your flavour of *nix (All the Bash built-ins, grep, tr and so forth). If it’s text file processing you’re interested in, you’re already using the right stuff.


回答 14

您可以在ShellPy库中使用python代替bash 。

这是一个从Github下载Python用户的化身的示例:

import json
import os
import tempfile

# get the api answer with curl
answer = `curl https://api.github.com/users/python
# syntactic sugar for checking returncode of executed process for zero
if answer:
    answer_json = json.loads(answer.stdout)
    avatar_url = answer_json['avatar_url']

    destination = os.path.join(tempfile.gettempdir(), 'python.png')

    # execute curl once again, this time to get the image
    result = `curl {avatar_url} > {destination}
    if result:
        # if there were no problems show the file
        p`ls -l {destination}
    else:
        print('Failed to download avatar')

    print('Avatar downloaded')
else:
    print('Failed to access github api')

如您所见,重音符(`)符号内的所有表达式都在shell中执行。并且在Python代码中,您可以捕获此执行的结果并对其执行操作。例如:

log = `git log --pretty=oneline --grep='Create'

该行将首先git log --pretty=oneline --grep='Create'在shell中执行,然后将结果分配给log变量。结果具有以下属性:

标准输出从运行进程的标准输出的全部文本

stderr来自执行过程的stderr的全文

返回码的执行的返回码

这是库的一般概述,可在此处找到带有示例的更详细描述。

You can use python instead of bash with the ShellPy library.

Here is an example that downloads avatar of Python user from Github:

import json
import os
import tempfile

# get the api answer with curl
answer = `curl https://api.github.com/users/python
# syntactic sugar for checking returncode of executed process for zero
if answer:
    answer_json = json.loads(answer.stdout)
    avatar_url = answer_json['avatar_url']

    destination = os.path.join(tempfile.gettempdir(), 'python.png')

    # execute curl once again, this time to get the image
    result = `curl {avatar_url} > {destination}
    if result:
        # if there were no problems show the file
        p`ls -l {destination}
    else:
        print('Failed to download avatar')

    print('Avatar downloaded')
else:
    print('Failed to access github api')

As you can see, all expressions inside of grave accent ( ` ) symbol are executed in shell. And in Python code, you can capture results of this execution and perform actions on it. For example:

log = `git log --pretty=oneline --grep='Create'

This line will first execute git log --pretty=oneline --grep='Create' in shell and then assign the result to the log variable. The result has the following properties:

stdout the whole text from stdout of the executed process

stderr the whole text from stderr of the executed process

returncode returncode of the execution

This is general overview of the library, more detailed description with examples can be found here.


回答 15

如果您的文本文件操作通常是一次性的,可能是在shell提示符下完成的,那么您将无法从python得到更好的结果。

另一方面,如果通常您需要一遍又一遍地执行相同(或类似)的任务,并且必须编写脚本来执行此操作,那么python很棒-而且您可以轻松地创建自己的库(您可以也使用shell脚本,但这比较麻烦)。

一个非常简单的例子来让人感觉。

import popen2
stdout_text, stdin_text=popen2.popen2("your-shell-command-here")
for line in stdout_text:
  if line.startswith("#"):
    pass
  else
    jobID=int(line.split(",")[0].split()[1].lstrip("<").rstrip(">"))
    # do something with jobID

还要检查sys和getopt模块,它们是您首先需要的。

If your textfile manipulation usually is one-time, possibly done on the shell-prompt, you will not get anything better from python.

On the other hand, if you usually have to do the same (or similar) task over and over, and you have to write your scripts for doing that, then python is great – and you can easily create your own libraries (you can do that with shell scripts too, but it’s more cumbersome).

A very simple example to get a feeling.

import popen2
stdout_text, stdin_text=popen2.popen2("your-shell-command-here")
for line in stdout_text:
  if line.startswith("#"):
    pass
  else
    jobID=int(line.split(",")[0].split()[1].lstrip("<").rstrip(">"))
    # do something with jobID

Check also sys and getopt module, they are the first you will need.


回答 16

我已经在PyPI上发布了一个软件包:ez
使用pip install ez安装它。

它在shell中打包了常用命令,很好的是我的lib使用与shell基本相同的语法。例如,cp(源,目标)可以处理文件和文件夹!(shutil.copy shutil.copytree的包装,它决定何时使用哪一个)。更妙的是,它可以像R一样支持向量化!

另一个示例:没有os.walk,使用fls(path,regex)递归查找文件并使用正则表达式进行过滤,并返回具有或不具有全路径的文件列表

最后一个例子:您可以将它们组合起来以编写非常简单的脚本:
files = fls('.','py$'); cp(files, myDir)

一定要检查一下!我花了数百个小时来编写/改进它!

I have published a package on PyPI: ez.
Use pip install ez to install it.

It has packed common commands in shell and nicely my lib uses basically the same syntax as shell. e.g., cp(source, destination) can handle both file and folder! (wrapper of shutil.copy shutil.copytree and it decides when to use which one). Even more nicely, it can support vectorization like R!

Another example: no os.walk, use fls(path, regex) to recursively find files and filter with regular expression and it returns a list of files with or without fullpath

Final example: you can combine them to write very simply scripts:
files = fls('.','py$'); cp(files, myDir)

Definitely check it out! It has cost me hundreds of hours to write/improve it!