解析一个.py文件,读取AST,对其进行修改,然后写回修改后的源代码

问题:解析一个.py文件,读取AST,对其进行修改,然后写回修改后的源代码

我想以编程方式编辑python源代码。基本上,我想读取一个.py文件,生成AST,然后写回修改后的python源代码(即另一个.py文件)。

有多种方法可以使用标准python模块(例如ast或)来解析/编译python源代码compiler。但是,我认为它们都不支持修改源代码(例如删除此函数声明)然后写回修改后的python源代码的方法。

更新:我要这样做的原因是我想为python 编写一个Mutation测试库,主要是通过删除语句/表达式,重新运行测试并查看中断。

I want to programmatically edit python source code. Basically I want to read a .py file, generate the AST, and then write back the modified python source code (i.e. another .py file).

There are ways to parse/compile python source code using standard python modules, such as ast or compiler. However, I don’t think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.

UPDATE: The reason I want to do this is I’d like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.


回答 0

Pythoscope会对自动生成的测试用例执行此操作,就像python 2.6 的2to3工具一样(它将python 2.x源转换为python 3.x源)。

这两个工具都使用lib2to3库,该库是python解析器/编译器机制的实现,当从源-> AST->源往返时,可以在源中保留注释。

绳项目,如果你想要做的更像变换重构可满足您的需求。

AST模块是你的其他选择,并有一个如何“unparse”语法树放回代码旧的例子(使用解析器模块)。但是,ast当对代码进行AST转换,然后将其转换为代码对象时,该模块更有用。

redbaron项目也可能是一个不错的选择(HT泽维尔Combelle)

Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).

Both these tools uses the lib2to3 library which is a implementation of the python parser/compiler machinery that can preserve comments in source when it’s round tripped from source -> AST -> source.

The rope project may meet your needs if you want to do more refactoring like transforms.

The ast module is your other option, and there’s an older example of how to “unparse” syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.

The redbaron project also may be a good fit (ht Xavier Combelle)


回答 1

内置的ast模块似乎没有方法可以转换回源代码。但是,这里的codegen模块为ast提供了一台漂亮的打印机,使您能够这样做。例如。

import ast
import codegen

expr="""
def foo():
   print("hello world")
"""
p=ast.parse(expr)

p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"

print(codegen.to_source(p))

这将打印:

def foo():
    return 42

请注意,您可能会丢失准确的格式和注释,因为这些格式和注释不会保留。

但是,您可能不需要。如果您需要执行的只是替换的AST,则只需在ast上调用compile()并执行生成的代码对象即可。

The builtin ast module doesn’t seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so. eg.

import ast
import codegen

expr="""
def foo():
   print("hello world")
"""
p=ast.parse(expr)

p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"

print(codegen.to_source(p))

This will print:

def foo():
    return 42

Note that you may lose the exact formatting and comments, as these are not preserved.

However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.


回答 2

在一个不同的答案中,我建议使用该astor程序包,但此后我发现了一个名为AST的最新的非解析程序包astunparse

>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))


def foo(x):
    return (2 * x)

我已经在Python 3.5上进行了测试。

In a different answer I suggested using the astor package, but I have since found a more up-to-date AST un-parsing package called astunparse:

>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))


def foo(x):
    return (2 * x)

I have tested this on Python 3.5.


回答 3

您可能不需要重新生成源代码。当然,这对我来说有点危险,因为您尚未真正解释为什么您认为需要生成一个充满代码的.py文件。但:

  • 如果您想生成一个供人们实际使用的.py文件,也许以便他们可以填写表格并获得一个有用的.py文件以插入其项目中,那么您就不想将其更改为AST和返回,因为您将丢失所有格式设置(想像一下通过将相关的行集合在一起使Python易于阅读的空白行)ast节点具有linenocol_offset属性)注释。相反,您可能需要使用模板引擎(例如,Django模板语言旨在简化模板文本文件)来自定义.py文件,或者使用Rick Copeland的MetaPython扩展。

  • 如果要在模块编译期间进行更改,请注意,您不必一直回到文本;您可以直接编译AST,而不必将其重新转换为.py文件。

  • 但是在几乎所有情况下,您可能都在尝试做一些动态的事情,像Python这样的语言实际上很容易,而无需编写新的.py文件!如果您扩展问题以使我们知道您实际要完成的工作,那么答案中可能根本不会涉及新的.py文件;我已经看到数百个Python项目在做数百个现实世界的事情,而编写一个.py文件并不需要它们中的任何一个。因此,我必须承认,我有点怀疑您已经找到了第一个好的用例。:-)

更新:既然您已经解释了您要做什么,那么无论如何我都会很想直接在AST上进行操作。您将希望通过删除而不是删除文件的行来进行更改(这可能导致半语句仅因SyntaxError而死),而是通过整个语句来进行更改,那么与AST相比,还有什么更好的地方呢?

You might not need to re-generate source code. That’s a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:

  • If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don’t want to change it into an AST and back because you’ll lose all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together) (ast nodes have lineno and col_offset attributes) comments. Instead, you’ll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland’s MetaPython extension.

  • If you are trying to make a change during compilation of a module, note that you don’t have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.

  • But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I’m a bit of a skeptic that you’ve found the first good use-case. :-)

Update: now that you’ve explained what you’re trying to do, I’d be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?


回答 4

ast模块的帮助下,解析和修改代码结构当然是可能的,我将在稍后的示例中进行演示。但是,ast单独使用模块无法写回修改后的源代码。还有其他可用于此工作的模块,例如此处的一个。

注意:以下示例可被视为有关ast模块用法的入门教程,但是有关使用ast模块的更全面指南,可从Green Tree snakes教程有关ast模块的官方文档中获得

简介ast

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> exec(compile(tree, filename="<ast>", mode="exec"))
Hello Python!!

您可以通过简单地调用API来解析python代码(以字符串表示)ast.parse()。这将句柄返回到抽象语法树(AST)结构。有趣的是,您可以编译该结构并执行它,如上所示。

另一个非常有用的API是以ast.dump()字符串形式转储整个AST。它可用于检查树结构,并且在调试中非常有帮助。例如,

在Python 2.7上:

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> ast.dump(tree)
"Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"

在Python 3.5上:

>>> import ast
>>> tree = ast.parse("print ('Hello Python!!')")
>>> ast.dump(tree)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=[]))])"

请注意,Python 2.7与Python 3.5中的print语句在语法上的差异以及相应树中AST节点类型的差异。


如何使用ast以下方式修改代码:

现在,让我们看一下按ast模块修改python代码的示例。修改AST结构的主要工具是ast.NodeTransformer类。每当需要修改AST时,他/她都需要从AST中继承子类并相应地编写Node Transformation。

对于我们的示例,让我们尝试编写一个简单的实用程序,将Python 2的print语句转换为Python 3函数调用。

打印语句到Fun呼叫转换器实用程序:print2to3.py:

#!/usr/bin/env python
'''
This utility converts the python (2.7) statements to Python 3 alike function calls before running the code.

USAGE:
     python print2to3.py <filename>
'''
import ast
import sys

class P2to3(ast.NodeTransformer):
    def visit_Print(self, node):
        new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()),
            args=node.values,
            keywords=[], starargs=None, kwargs=None))
        ast.copy_location(new_node, node)
        return new_node

def main(filename=None):
    if not filename:
        return

    with open(filename, 'r') as fp:
        data = fp.readlines()
    data = ''.join(data)
    tree = ast.parse(data)

    print "Converting python 2 print statements to Python 3 function calls"
    print "-" * 35
    P2to3().visit(tree)
    ast.fix_missing_locations(tree)
    # print ast.dump(tree)

    exec(compile(tree, filename="p23", mode="exec"))

if __name__ == '__main__':
    if len(sys.argv) <=1:
        print ("\nUSAGE:\n\t print2to3.py <filename>")
        sys.exit(1)
    else:
        main(sys.argv[1])

可以在较小的示例文件(例如下面的示例文件)上尝试使用该实用程序,并且应该可以正常工作。

测试输入文件:py2.py

class A(object):
    def __init__(self):
        pass

def good():
    print "I am good"

main = good

if __name__ == '__main__':
    print "I am in main"
    main()

请注意,以上转换仅用于ast教程目的,在实际情况下,您必须查看所有不同的情况,例如print " x is %s" % ("Hello Python")

Parsing and modifying the code structure is certainly possible with the help of ast module and I will show it in an example in a moment. However, writing back the modified source code is not possible with ast module alone. There are other modules available for this job such as one here.

NOTE: Example below can be treated as an introductory tutorial on the usage of ast module but a more comprehensive guide on using ast module is available here at Green Tree snakes tutorial and official documentation on ast module.

Introduction to ast:

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> exec(compile(tree, filename="<ast>", mode="exec"))
Hello Python!!

You can parse the python code (represented in string) by simply calling the API ast.parse(). This returns the handle to Abstract Syntax Tree (AST) structure. Interestingly you can compile back this structure and execute it as shown above.

Another very useful API is ast.dump() which dumps the whole AST in a string form. It can be used to inspect the tree structure and is very helpful in debugging. For example,

On Python 2.7:

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> ast.dump(tree)
"Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"

On Python 3.5:

>>> import ast
>>> tree = ast.parse("print ('Hello Python!!')")
>>> ast.dump(tree)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=[]))])"

Notice the difference in syntax for print statement in Python 2.7 vs. Python 3.5 and the difference in type of AST node in respective trees.


How to modify code using ast:

Now, let’s a have a look at an example of modification of python code by ast module. The main tool for modifying AST structure is ast.NodeTransformer class. Whenever one needs to modify the AST, he/she needs to subclass from it and write Node Transformation(s) accordingly.

For our example, let’s try to write a simple utility which transforms the Python 2 , print statements to Python 3 function calls.

Print statement to Fun call converter utility: print2to3.py:

#!/usr/bin/env python
'''
This utility converts the python (2.7) statements to Python 3 alike function calls before running the code.

USAGE:
     python print2to3.py <filename>
'''
import ast
import sys

class P2to3(ast.NodeTransformer):
    def visit_Print(self, node):
        new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()),
            args=node.values,
            keywords=[], starargs=None, kwargs=None))
        ast.copy_location(new_node, node)
        return new_node

def main(filename=None):
    if not filename:
        return

    with open(filename, 'r') as fp:
        data = fp.readlines()
    data = ''.join(data)
    tree = ast.parse(data)

    print "Converting python 2 print statements to Python 3 function calls"
    print "-" * 35
    P2to3().visit(tree)
    ast.fix_missing_locations(tree)
    # print ast.dump(tree)

    exec(compile(tree, filename="p23", mode="exec"))

if __name__ == '__main__':
    if len(sys.argv) <=1:
        print ("\nUSAGE:\n\t print2to3.py <filename>")
        sys.exit(1)
    else:
        main(sys.argv[1])

This utility can be tried on small example file, such as one below, and it should work fine.

Test Input file : py2.py

class A(object):
    def __init__(self):
        pass

def good():
    print "I am good"

main = good

if __name__ == '__main__':
    print "I am in main"
    main()

Please note that above transformation is only for ast tutorial purpose and in real case scenario one will have to look at all different scenarios such as print " x is %s" % ("Hello Python").


回答 5

我最近创建了相当稳定的(核心真的经过了很好的测试)和可扩展的代码,这些代码从ast树中生成了代码:https : //github.com/paluh/code-formatter

我将我的项目用作小vim插件的基础(我每天都在使用),所以我的目标是生成非常好的可读性python代码。

PS我已经尝试扩展,codegen但是它的体系结构是基于ast.NodeVisitor接口的,所以格式化程序(visitor_方法)只是功能。我发现这种结构相当局限且难以优化(在长且嵌套的表达式的情况下,保留对象树并缓存部分结果更容易-如果您要搜索最佳布局,则可以用其他方式达到指数复杂性)。但是, codegen由于光彦的每件作品(我读过的作品)都写得很简洁。

I’ve created recently quite stable (core is really well tested) and extensible piece of code which generates code from ast tree: https://github.com/paluh/code-formatter .

I’m using my project as a base for a small vim plugin (which I’m using every day), so my goal is to generate really nice and readable python code.

P.S. I’ve tried to extend codegen but it’s architecture is based on ast.NodeVisitor interface, so formatters (visitor_ methods) are just functions. I’ve found this structure quite limiting and hard to optimize (in case of long and nested expressions it’s easier to keep objects tree and cache some partial results – in other way you can hit exponential complexity if you want to search for best layout). BUT codegen as every piece of mitsuhiko’s work (which I’ve read) is very well written and concise.


回答 6

建议的其他答案之一codegen,似乎已被取代astorastorPyPI的版本(撰写本文时为0.5版)似乎也有些过时,因此您可以astor按以下方式安装开发版本。

pip install git+https://github.com/berkerpeksag/astor.git#egg=astor

然后,您可以用于astor.to_source将Python AST转换为人类可读的Python源代码:

>>> import ast
>>> import astor
>>> print(astor.to_source(ast.parse('def foo(x): return 2 * x')))
def foo(x):
    return 2 * x

我已经在Python 3.5上进行了测试。

One of the other answers recommends codegen, which seems to have been superceded by astor. The version of astor on PyPI (version 0.5 as of this writing) seems to be a little outdated as well, so you can install the development version of astor as follows.

pip install git+https://github.com/berkerpeksag/astor.git#egg=astor

Then you can use astor.to_source to convert a Python AST to human-readable Python source code:

>>> import ast
>>> import astor
>>> print(astor.to_source(ast.parse('def foo(x): return 2 * x')))
def foo(x):
    return 2 * x

I have tested this on Python 3.5.


回答 7

如果您在2019年查看此内容,则可以使用此libcs​​t 软件包。它的语法类似于ast。这就像一个魅力,并保留了代码结构。它对于必须保留注释,空格,换行符等的项目基本上是有帮助的。

如果您不需要关心保留的注释,空格和其他内容,则ast和astor的组合效果很好。

If you are looking at this in 2019, then you can use this libcst package. It has syntax similar to ast. This works like a charm, and preserve the code structure. It’s basically helpful for the project where you have to preserve comments, whitespace, newline etc.

If you don’t need to care about the preserving comments, whitespace and others, then the combination of ast and astor works well.


回答 8

我们有类似的需求,但这里没有其他答案可以解决。因此,我们为此创建了一个库ASTTokens,该库使用由astastroid生成的AST树模块,并用原始源代码中的文本范围对其进行标记。

它不会直接修改代码,但这并不难于添加,因为它确实告诉您需要修改的文本范围。

例如,这将一个函数调用包装在中WRAP(...),保留注释和其他所有内容:

example = """
def foo(): # Test
  '''My func'''
  log("hello world")  # Print
"""

import ast, asttokens
atok = asttokens.ASTTokens(example, parse=True)

call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call))
start, end = atok.get_text_range(call)
print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end])  + atok.text[end:])

生成:

def foo(): # Test
  '''My func'''
  WRAP(log("hello world"))  # Print

希望这可以帮助!

We had a similar need, which wasn’t solved by other answers here. So we created a library for this, ASTTokens, which takes an AST tree produced with the ast or astroid modules, and marks it with the ranges of text in the original source code.

It doesn’t do modifications of code directly, but that’s not hard to add on top, since it does tell you the range of text you need to modify.

For example, this wraps a function call in WRAP(...), preserving comments and everything else:

example = """
def foo(): # Test
  '''My func'''
  log("hello world")  # Print
"""

import ast, asttokens
atok = asttokens.ASTTokens(example, parse=True)

call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call))
start, end = atok.get_text_range(call)
print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end])  + atok.text[end:])

Produces:

def foo(): # Test
  '''My func'''
  WRAP(log("hello world"))  # Print

Hope this helps!


回答 9

一个程序变换系统是一个工具,解析源文本,建立AST的,允许您使用源到源转换(“如果你看到这个模式,通过该模式取代它”)对其进行修改。此类工具非常适合对现有源代码进行变异,这些变异只是“如果您看到此模式,请替换为模式变体”。

当然,您需要一个程序转换引擎,该引擎可以解析您感兴趣的语言,并且仍然进行模式导向的转换。我们的DMS软件再造工具包是一个可以执行此操作的系统,可以处理Python和多种其他语言。

请参阅此SO答案,以获取DMS解析的AST的示例,该AST用于Python准确捕获注释。DMS可以更改AST,并重新生成有效的文本,包括注释。您可以要求它使用自己的格式设置约定对AST进行漂亮的打印(可以更改这些格式),或者执行“保真打印”,它使用原始的行和列信息来最大程度地保留原始布局(对布局进行一些更改,其中使用了新代码)是不可避免的)。

要使用DMS为Python实现“变异”规则,您可以编写以下代码:

rule mutate_addition(s:sum, p:product):sum->sum =
  " \s + \p " -> " \s - \p"
 if mutate_this_place(s);

该规则以语法正确的方式用“-”替换“ +”;它在AST上运行,因此不会碰到看起来正确的字符串或注释。“ mutate_this_place”上的额外条件是让您控制这种情况发生的频率;您不想改变程序中的每个位置。

显然,您会想要更多这样的规则来检测各种代码结构,并将其替换为变异的版本。DMS很乐意应用一组规则。然后对突变的AST进行漂亮打印。

A Program Transformation System is a tool that parses source text, builds ASTs, allows you to modify them using source-to-source transformations (“if you see this pattern, replace it by that pattern”). Such tools are ideal for doing mutation of existing source codes, which are just “if you see this pattern, replace by a pattern variant”.

Of course, you need a program transformation engine that can parse the language of interest to you, and still do the pattern-directed transformations. Our DMS Software Reengineering Toolkit is a system that can do that, and handles Python, and a variety of other languages.

See this SO answer for an example of a DMS-parsed AST for Python capturing comments accurately. DMS can make changes to the AST, and regenerate valid text, including the comments. You can ask it to prettyprint the AST, using its own formatting conventions (you can changes these), or do “fidelity printing”, which uses the original line and column information to maximally preserve the original layout (some change in layout where new code is inserted is unavoidable).

To implement a “mutation” rule for Python with DMS, you could write the following:

rule mutate_addition(s:sum, p:product):sum->sum =
  " \s + \p " -> " \s - \p"
 if mutate_this_place(s);

This rule replace “+” with “-” in a syntactically correct way; it operates on the AST and thus won’t touch strings or comments that happen to look right. The extra condition on “mutate_this_place” is to let you control how often this occurs; you don’t want to mutate every place in the program.

You’d obviously want a bunch more rules like this that detect various code structures, and replace them by the mutated versions. DMS is happy to apply a set of rules. The mutated AST is then prettyprinted.


回答 10

我曾经为此使用男爵,但现在切换到parso,因为它与现代python保持同步。效果很好。

对于突变测试仪,我也需要它。用parso制作一个非常简单,请在https://github.com/boxed/mutmut上查看我的代码

I used to use baron for this, but have now switched to parso because it’s up to date with modern python. It works great.

I also needed this for a mutation tester. It’s really quite simple to make one with parso, check out my code at https://github.com/boxed/mutmut