您可以在Python的语法中添加新的语句吗?

问题:您可以在Python的语法中添加新的语句吗?

你可以添加新的语句(例如printraisewith)Python的语法?

说,允许

mystatement "Something"

要么,

new_if True:
    print "example"

如果您应该的话,不要太多,但如果可能的话,就可以了(只需修改python解释器代码即可)

Can you add new statements (like print, raise, with) to Python’s syntax?

Say, to allow..

mystatement "Something"

Or,

new_if True:
    print "example"

Not so much if you should, but rather if it’s possible (short of modifying the python interpreters code)


回答 0

您可能会发现这很有用-Python内部:在Python上添加新语句,引用如下:


本文旨在更好地了解Python前端的工作方式。仅阅读文档和源代码可能会有些无聊,因此我在这里采用动手实践的方法:我将向untilPython 添加一条语句。

本文的所有编码都是针对Python Mercurial存储库镜像中最前沿的Py3k分支完成的。

until声明

有些语言,如红宝石,有一个until说法,这是补充whileuntil num == 0相当于while num != 0)。在Ruby中,我可以这样写:

num = 3
until num == 0 do
  puts num
  num -= 1
end

它将打印:

3
2
1

因此,我想为Python添加类似的功能。也就是说,能够写:

num = 3
until num == 0:
  print(num)
  num -= 1

语言倡导题外话

本文并不试图建议在untilPython中添加一条语句。尽管我认为这样的声明可以使一些代码更清晰,并且本文显示了添加的难易程度,但我完全尊重Python的极简主义哲学。实际上,我在这里要做的只是深入了解Python的内部工作原理。

修改语法

Python使用名为的自定义解析器生成器pgen。这是一个LL(1)解析器,它将Python源代码转换为解析树。解析器生成器的输入是文件Grammar/Grammar[1]。这是一个简单的文本文件,指定Python的语法。

[1]:从这里开始,相对于源代码树的根目录,对Python源文件中的文件进行引用,该目录是您运行configure和make生成Python的目录。

必须对语法文件进行两次修改。首先是为until语句添加定义。我找到了该while语句的定义位置(while_stmt),并添加until_stmt到了[2]下面:

compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite

[2]:这演示了在修改我不熟悉的源代码时使用的一种通用技术:按相似性工作。这个原则并不能解决您的所有问题,但绝对可以简化流程。由于必须完成的所有工作while都必须完成until,因此它可以作为很好的指导。

请注意,我已经决定else从该子句的定义中排除该子句until,只是为了使它有所不同(并且因为坦率地说,我不喜欢else循环的子句,并且认为它与Python的Zen不太匹配)。

第二个更改是将规则修改为compound_stmtinclude until_stmt,如您在上面的代码段中所见。紧接着while_stmt又是。

当你运行make修改后Grammar/Grammar,通知该pgen程序运行重新生成Include/graminit.hPython/graminit.c,然后几个文件得到重新编译。

修改AST生成代码

在Python解析器创建了一个解析树之后,该树将转换为AST,因为在编译过程的后续阶段中,使用 AST 更简单

因此,我们将要访问Parser/Python.asdl,它定义了Python AST的结构,并为我们的新until语句添加了一个AST节点,再次位于以下位置while

| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)

如果您现在运行make,请注意,在编译一堆文件之前,请先Parser/asdl_c.py运行以从AST定义文件生成C代码。这(如Grammar/Grammar)是Python源代码的另一个示例,它使用迷你语言(即DSL)简化了编程。还要注意,由于Parser/asdl_c.py是Python脚本,所以这是一种引导程序 -要从头开始构建Python,Python必须已经可用。

Parser/asdl_c.py生成用于管理新定义的AST节点的代码(到文件Include/Python-ast.h和中Python/Python-ast.c)时,我们仍然必须编写代码来手动将相关的解析树节点转换为它。这是在文件中完成的Python/ast.c。在那里,一个名为的函数ast_for_stmt将语句的解析树节点转换为AST节点。同样,在我们的老朋友的指导下while,我们跳入switch了处理复合语句的大幕,并为until_stmt以下项添加了一个子句:

case while_stmt:
    return ast_for_while_stmt(c, ch);
case until_stmt:
    return ast_for_until_stmt(c, ch);

现在我们应该执行ast_for_until_stmt。这里是:

static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
    /* until_stmt: 'until' test ':' suite */
    REQ(n, until_stmt);

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;
        return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "wrong number of tokens for 'until' statement: %d",
                 NCH(n));
    return NULL;
}

同样,在仔细查看等效项的同时对它进行了编码ast_for_while_stmt,所不同的是,until我决定不支持该else子句。如预期的那样,使用其他AST创建函数(如ast_for_expr条件表达式和语句ast_for_suite主体)以递归方式创建AST until。最后,Until返回一个名为的新节点。

请注意,我们n使用诸如NCH和的一些宏来访问解析树节点CHILD。这些值得理解-它们的代码在Include/node.h

题外话:AST组成

我选择为该until语句创建一种新型的AST ,但实际上这不是必需的。我可以使用现有AST节点的组成来节省一些工作并实现新功能,因为:

until condition:
   # do stuff

在功能上等同于:

while not condition:
  # do stuff

与其在中创建Until节点ast_for_until_stmt,不如创建一个节点作为子Not节点的While节点。由于AST编译器已经知道如何处理这些节点,因此可以跳过该过程的后续步骤。

将AST编译成字节码

下一步是将AST编译为Python字节码。编译产生的中间结果是CFG(控制流图),但是由于使用相同的代码进行处理,因此我暂时将忽略此细节,并留给另一篇文章。

我们接下来要看的代码是Python/compile.c。按照的开头while,我们找到函数compiler_visit_stmt,该函数负责将语句编译为字节码。我们为添加一个子句Until

case While_kind:
    return compiler_while(c, s);
case Until_kind:
    return compiler_until(c, s);

如果您想知道Until_kind是什么,那么它是一个_stmt_kind从AST定义文件自动生成为的常数(实际上是枚举的值)Include/Python-ast.h。无论如何,我们称compiler_until它当然仍然不存在。我待会儿。

如果您像我一样好奇,您会发现这compiler_visit_stmt很奇怪。grep对源代码树进行-ping操作并没有揭示调用它的位置。在这种情况下,仅保留一个选项-C macro-fu。确实,经过简短的调查,我们找到了以下VISIT宏中定义的宏Python/compile.c

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \

它用来调用compiler_visit_stmtcompiler_body。回到我们的业务,但是…

按照承诺,这里是compiler_until

static int
compiler_until(struct compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = NULL;
    int constant = expr_constant(s->v.Until.test);

    if (constant == 1) {
        return 1;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    if (constant == -1) {
        anchor = compiler_new_block(c);
        if (anchor == NULL)
            return 0;
    }
    if (loop == NULL || end == NULL)
        return 0;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    if (!compiler_push_fblock(c, LOOP, loop))
        return 0;
    if (constant == -1) {
        VISIT(c, expr, s->v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s->v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    if (constant == -1) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    return 1;
}

我有一个表白:这段代码并不是基于对Python字节码的深刻理解而编写的。像本文的其余部分一样,它是模仿亲属compiler_while功能来完成的。但是,通过仔细阅读它,牢记Python VM是基于堆栈的,并浏览该dis模块的文档(该模块的文档提供了带有说明的Python字节码列表),可以了解正在发生的事情。

就是这样,我们完成了……不是吗?

进行所有更改并运行之后make,我们可以运行新编译的Python并尝试新的until语句:

>>> until num == 0:
...   print(num)
...   num -= 1
...
3
2
1

瞧,行得通!我们来看一下使用dis模块为新语句创建的字节码,如下所示:

import dis

def myfoo(num):
    until num == 0:
        print(num)
        num -= 1

dis.dis(myfoo)

结果如下:

4           0 SETUP_LOOP              36 (to 39)
      >>    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      >>   38 POP_BLOCK
      >>   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE

最有趣的操作是数字12:如果条件为true,则在循环之后跳转到。这是的正确语义until。如果未执行该跳转,则循环主体将继续运行,直到其跳回到操作35的状态为止。

感觉很不错,然后尝试运行该函数(执行myfoo(3)),而不显示其字节码。结果令人鼓舞:

Traceback (most recent call last):
  File "zy.py", line 9, in
    myfoo(3)
  File "zy.py", line 5, in myfoo
    print(num)
SystemError: no locals when loading 'print'

哇…这不好。那么出了什么问题?

缺少符号表的情况

Python编译器在编译AST时执行的步骤之一是为其编译的代码创建符号表。对PySymtable_Buildin 的调用将PyAST_Compile调用符号表模块(Python/symtable.c),该模块以类似于代码生成功能的方式遍历AST。每个作用域都有一个符号表,有助于编译器找出一些关键信息,例如哪些变量是全局变量,哪些是局部变量。

为了解决这个问题,我们必须修改的symtable_visit_stmt函数,在类似语句[3]的代码之后Python/symtable.c添加用于处理until语句的代码:while

case While_kind:
    VISIT(st, expr, s->v.While.test);
    VISIT_SEQ(st, stmt, s->v.While.body);
    if (s->v.While.orelse)
        VISIT_SEQ(st, stmt, s->v.While.orelse);
    break;
case Until_kind:
    VISIT(st, expr, s->v.Until.test);
    VISIT_SEQ(st, stmt, s->v.Until.body);
    break;

[3]:顺便说一下,如果没有此代码,则会有的编译器警告Python/symtable.c。编译器注意到,Until_kind枚举值未在和的switch语句中处理symtable_visit_stmt。检查编译器警告始终很重要!

现在我们真的完成了。进行此更改后编译源代码可以myfoo(3)按预期执行工作。

结论

在本文中,我演示了如何向Python添加新语句。尽管需要对Python编译器的代码进行大量修改,但更改并不难实现,因为我使用了类似的现有语句作为准则。

Python编译器是复杂的软件,我并不声称自己是该领域的专家。但是,我对Python的内部结构特别是前端非常感兴趣。因此,我发现此练习对于编译器原理和源代码的理论研究非常有用。它将作为以后将深入编译器的文章的基础。

参考资料

我使用了一些出色的参考来构建本文。在这里,它们没有特定的顺序:

  • PEP 339:CPython编译器的设计 -可能是Python编译器最重要,最全面的官方文档。太短了,它痛苦地显示出缺乏有关Python内部结构的好的文档。
  • “ Python编译器内部知识”-Thomas Lee的文章
  • “ Python:设计与实现”-Guido van Rossum的演讲
  • Python(2.5)虚拟机,导览-PeterTröger的演示

原始资料

You may find this useful – Python internals: adding a new statement to Python, quoted here:


This article is an attempt to better understand how the front-end of Python works. Just reading documentation and source code may be a bit boring, so I’m taking a hands-on approach here: I’m going to add an until statement to Python.

All the coding for this article was done against the cutting-edge Py3k branch in the Python Mercurial repository mirror.

The until statement

Some languages, like Ruby, have an until statement, which is the complement to while (until num == 0 is equivalent to while num != 0). In Ruby, I can write:

num = 3
until num == 0 do
  puts num
  num -= 1
end

And it will print:

3
2
1

So, I want to add a similar capability to Python. That is, being able to write:

num = 3
until num == 0:
  print(num)
  num -= 1

A language-advocacy digression

This article doesn’t attempt to suggest the addition of an until statement to Python. Although I think such a statement would make some code clearer, and this article displays how easy it is to add, I completely respect Python’s philosophy of minimalism. All I’m trying to do here, really, is gain some insight into the inner workings of Python.

Modifying the grammar

Python uses a custom parser generator named pgen. This is a LL(1) parser that converts Python source code into a parse tree. The input to the parser generator is the file Grammar/Grammar[1]. This is a simple text file that specifies the grammar of Python.

[1]: From here on, references to files in the Python source are given relatively to the root of the source tree, which is the directory where you run configure and make to build Python.

Two modifications have to be made to the grammar file. The first is to add a definition for the until statement. I found where the while statement was defined (while_stmt), and added until_stmt below [2]:

compound_stmt: if_stmt | while_stmt | until_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | decorated
if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
while_stmt: 'while' test ':' suite ['else' ':' suite]
until_stmt: 'until' test ':' suite

[2]: This demonstrates a common technique I use when modifying source code I’m not familiar with: work by similarity. This principle won’t solve all your problems, but it can definitely ease the process. Since everything that has to be done for while also has to be done for until, it serves as a pretty good guideline.

Note that I’ve decided to exclude the else clause from my definition of until, just to make it a little bit different (and because frankly I dislike the else clause of loops and don’t think it fits well with the Zen of Python).

The second change is to modify the rule for compound_stmt to include until_stmt, as you can see in the snippet above. It’s right after while_stmt, again.

When you run make after modifying Grammar/Grammar, notice that the pgen program is run to re-generate Include/graminit.h and Python/graminit.c, and then several files get re-compiled.

Modifying the AST generation code

After the Python parser has created a parse tree, this tree is converted into an AST, since ASTs are much simpler to work with in subsequent stages of the compilation process.

So, we’re going to visit Parser/Python.asdl which defines the structure of Python’s ASTs and add an AST node for our new until statement, again right below the while:

| While(expr test, stmt* body, stmt* orelse)
| Until(expr test, stmt* body)

If you now run make, notice that before compiling a bunch of files, Parser/asdl_c.py is run to generate C code from the AST definition file. This (like Grammar/Grammar) is another example of the Python source-code using a mini-language (in other words, a DSL) to simplify programming. Also note that since Parser/asdl_c.py is a Python script, this is a kind of bootstrapping – to build Python from scratch, Python already has to be available.

While Parser/asdl_c.py generated the code to manage our newly defined AST node (into the files Include/Python-ast.h and Python/Python-ast.c), we still have to write the code that converts a relevant parse-tree node into it by hand. This is done in the file Python/ast.c. There, a function named ast_for_stmt converts parse tree nodes for statements into AST nodes. Again, guided by our old friend while, we jump right into the big switch for handling compound statements and add a clause for until_stmt:

case while_stmt:
    return ast_for_while_stmt(c, ch);
case until_stmt:
    return ast_for_until_stmt(c, ch);

Now we should implement ast_for_until_stmt. Here it is:

static stmt_ty
ast_for_until_stmt(struct compiling *c, const node *n)
{
    /* until_stmt: 'until' test ':' suite */
    REQ(n, until_stmt);

    if (NCH(n) == 4) {
        expr_ty expression;
        asdl_seq *suite_seq;

        expression = ast_for_expr(c, CHILD(n, 1));
        if (!expression)
            return NULL;
        suite_seq = ast_for_suite(c, CHILD(n, 3));
        if (!suite_seq)
            return NULL;
        return Until(expression, suite_seq, LINENO(n), n->n_col_offset, c->c_arena);
    }

    PyErr_Format(PyExc_SystemError,
                 "wrong number of tokens for 'until' statement: %d",
                 NCH(n));
    return NULL;
}

Again, this was coded while closely looking at the equivalent ast_for_while_stmt, with the difference that for until I’ve decided not to support the else clause. As expected, the AST is created recursively, using other AST creating functions like ast_for_expr for the condition expression and ast_for_suite for the body of the until statement. Finally, a new node named Until is returned.

Note that we access the parse-tree node n using some macros like NCH and CHILD. These are worth understanding – their code is in Include/node.h.

Digression: AST composition

I chose to create a new type of AST for the until statement, but actually this isn’t necessary. I could’ve saved some work and implemented the new functionality using composition of existing AST nodes, since:

until condition:
   # do stuff

Is functionally equivalent to:

while not condition:
  # do stuff

Instead of creating the Until node in ast_for_until_stmt, I could have created a Not node with an While node as a child. Since the AST compiler already knows how to handle these nodes, the next steps of the process could be skipped.

Compiling ASTs into bytecode

The next step is compiling the AST into Python bytecode. The compilation has an intermediate result which is a CFG (Control Flow Graph), but since the same code handles it I will ignore this detail for now and leave it for another article.

The code we will look at next is Python/compile.c. Following the lead of while, we find the function compiler_visit_stmt, which is responsible for compiling statements into bytecode. We add a clause for Until:

case While_kind:
    return compiler_while(c, s);
case Until_kind:
    return compiler_until(c, s);

If you wonder what Until_kind is, it’s a constant (actually a value of the _stmt_kind enumeration) automatically generated from the AST definition file into Include/Python-ast.h. Anyway, we call compiler_until which, of course, still doesn’t exist. I’ll get to it an a moment.

If you’re curious like me, you’ll notice that compiler_visit_stmt is peculiar. No amount of grep-ping the source tree reveals where it is called. When this is the case, only one option remains – C macro-fu. Indeed, a short investigation leads us to the VISIT macro defined in Python/compile.c:

#define VISIT(C, TYPE, V) {\
    if (!compiler_visit_ ## TYPE((C), (V))) \
        return 0; \

It’s used to invoke compiler_visit_stmt in compiler_body. Back to our business, however…

As promised, here’s compiler_until:

static int
compiler_until(struct compiler *c, stmt_ty s)
{
    basicblock *loop, *end, *anchor = NULL;
    int constant = expr_constant(s->v.Until.test);

    if (constant == 1) {
        return 1;
    }
    loop = compiler_new_block(c);
    end = compiler_new_block(c);
    if (constant == -1) {
        anchor = compiler_new_block(c);
        if (anchor == NULL)
            return 0;
    }
    if (loop == NULL || end == NULL)
        return 0;

    ADDOP_JREL(c, SETUP_LOOP, end);
    compiler_use_next_block(c, loop);
    if (!compiler_push_fblock(c, LOOP, loop))
        return 0;
    if (constant == -1) {
        VISIT(c, expr, s->v.Until.test);
        ADDOP_JABS(c, POP_JUMP_IF_TRUE, anchor);
    }
    VISIT_SEQ(c, stmt, s->v.Until.body);
    ADDOP_JABS(c, JUMP_ABSOLUTE, loop);

    if (constant == -1) {
        compiler_use_next_block(c, anchor);
        ADDOP(c, POP_BLOCK);
    }
    compiler_pop_fblock(c, LOOP, loop);
    compiler_use_next_block(c, end);

    return 1;
}

I have a confession to make: this code wasn’t written based on a deep understanding of Python bytecode. Like the rest of the article, it was done in imitation of the kin compiler_while function. By reading it carefully, however, keeping in mind that the Python VM is stack-based, and glancing into the documentation of the dis module, which has a list of Python bytecodes with descriptions, it’s possible to understand what’s going on.

That’s it, we’re done… Aren’t we?

After making all the changes and running make, we can run the newly compiled Python and try our new until statement:

>>> until num == 0:
...   print(num)
...   num -= 1
...
3
2
1

Voila, it works! Let’s see the bytecode created for the new statement by using the dis module as follows:

import dis

def myfoo(num):
    until num == 0:
        print(num)
        num -= 1

dis.dis(myfoo)

Here’s the result:

4           0 SETUP_LOOP              36 (to 39)
      >>    3 LOAD_FAST                0 (num)
            6 LOAD_CONST               1 (0)
            9 COMPARE_OP               2 (==)
           12 POP_JUMP_IF_TRUE        38

5          15 LOAD_NAME                0 (print)
           18 LOAD_FAST                0 (num)
           21 CALL_FUNCTION            1
           24 POP_TOP

6          25 LOAD_FAST                0 (num)
           28 LOAD_CONST               2 (1)
           31 INPLACE_SUBTRACT
           32 STORE_FAST               0 (num)
           35 JUMP_ABSOLUTE            3
      >>   38 POP_BLOCK
      >>   39 LOAD_CONST               0 (None)
           42 RETURN_VALUE

The most interesting operation is number 12: if the condition is true, we jump to after the loop. This is correct semantics for until. If the jump isn’t executed, the loop body keeps running until it jumps back to the condition at operation 35.

Feeling good about my change, I then tried running the function (executing myfoo(3)) instead of showing its bytecode. The result was less than encouraging:

Traceback (most recent call last):
  File "zy.py", line 9, in
    myfoo(3)
  File "zy.py", line 5, in myfoo
    print(num)
SystemError: no locals when loading 'print'

Whoa… this can’t be good. So what went wrong?

The case of the missing symbol table

One of the steps the Python compiler performs when compiling the AST is create a symbol table for the code it compiles. The call to PySymtable_Build in PyAST_Compile calls into the symbol table module (Python/symtable.c), which walks the AST in a manner similar to the code generation functions. Having a symbol table for each scope helps the compiler figure out some key information, such as which variables are global and which are local to a scope.

To fix the problem, we have to modify the symtable_visit_stmt function in Python/symtable.c, adding code for handling until statements, after the similar code for while statements [3]:

case While_kind:
    VISIT(st, expr, s->v.While.test);
    VISIT_SEQ(st, stmt, s->v.While.body);
    if (s->v.While.orelse)
        VISIT_SEQ(st, stmt, s->v.While.orelse);
    break;
case Until_kind:
    VISIT(st, expr, s->v.Until.test);
    VISIT_SEQ(st, stmt, s->v.Until.body);
    break;

[3]: By the way, without this code there’s a compiler warning for Python/symtable.c. The compiler notices that the Until_kind enumeration value isn’t handled in the switch statement of symtable_visit_stmt and complains. It’s always important to check for compiler warnings!

And now we really are done. Compiling the source after this change makes the execution of myfoo(3) work as expected.

Conclusion

In this article I’ve demonstrated how to add a new statement to Python. Albeit requiring quite a bit of tinkering in the code of the Python compiler, the change wasn’t difficult to implement, because I used a similar and existing statement as a guideline.

The Python compiler is a sophisticated chunk of software, and I don’t claim being an expert in it. However, I am really interested in the internals of Python, and particularly its front-end. Therefore, I found this exercise a very useful companion to theoretical study of the compiler’s principles and source code. It will serve as a base for future articles that will get deeper into the compiler.

References

I used a few excellent references for the construction of this article. Here they are, in no particular order:

  • PEP 339: Design of the CPython compiler – probably the most important and comprehensive piece of official documentation for the Python compiler. Being very short, it painfully displays the scarcity of good documentation of the internals of Python.
  • “Python Compiler Internals” – an article by Thomas Lee
  • “Python: Design and Implementation” – a presentation by Guido van Rossum
  • Python (2.5) Virtual Machine, A guided tour – a presentation by Peter Tröger

original source


回答 1

做这种事情的一种方法是预处理并修改源代码,将添加的语句翻译成python。这种方法会带来各种问题,我不建议将其用于一般用途,但是对于尝试语言或特定用途的元编程,它有时会很有用。

例如,假设我们要引入“ myprint”语句,该语句不是打印到屏幕而是登录到特定文件。即:

myprint "This gets logged to file"

相当于

print >>open('/tmp/logfile.txt','a'), "This gets logged to file"

从正则表达式替换到生成AST,以及根据自己的语法与现有python匹配的紧密程度来编写自己的解析器,有多种选择方法。一个好的中间方法是使用标记器模块。这应该允许您在解释源代码时(类似于python解释器)添加新的关键字,控制结构等,从而避免原始正则表达式解决方案造成损坏。对于上面的“ myprint”,您可以编写以下转换代码:

import tokenize

LOGFILE = '/tmp/log.txt'
def translate(readline):
    for type, name,_,_,_ in tokenize.generate_tokens(readline):
        if type ==tokenize.NAME and name =='myprint':
            yield tokenize.NAME, 'print'
            yield tokenize.OP, '>>'
            yield tokenize.NAME, "open"
            yield tokenize.OP, "("
            yield tokenize.STRING, repr(LOGFILE)
            yield tokenize.OP, ","
            yield tokenize.STRING, "'a'"
            yield tokenize.OP, ")"
            yield tokenize.OP, ","
        else:
            yield type,name

(这确实使myprint有效地成为关键字,因此在其他地方用作变量可能会引起问题)

然后的问题是如何使用它,以便您的代码可从python使用。一种方法就是编写自己的导入函数,并使用它来加载以自定义语言编写的代码。即:

import new
def myimport(filename):
    mod = new.module(filename)
    f=open(filename)
    data = tokenize.untokenize(translate(f.readline))
    exec data in mod.__dict__
    return mod

这就要求您处理自定义代码的方法不同于普通的python模块。即“ some_mod = myimport("some_mod.py")”而非“ import some_mod

另一个相当整洁(尽管很hacky)的解决方案是创建自定义编码(请参阅PEP 263),如食谱所示。您可以将其实现为:

import codecs, cStringIO, encodings
from encodings import utf_8

class StreamReader(utf_8.StreamReader):
    def __init__(self, *args, **kwargs):
        codecs.StreamReader.__init__(self, *args, **kwargs)
        data = tokenize.untokenize(translate(self.stream.readline))
        self.stream = cStringIO.StringIO(data)

def search_function(s):
    if s!='mylang': return None
    utf8=encodings.search_function('utf8') # Assume utf8 encoding
    return codecs.CodecInfo(
        name='mylang',
        encode = utf8.encode,
        decode = utf8.decode,
        incrementalencoder=utf8.incrementalencoder,
        incrementaldecoder=utf8.incrementaldecoder,
        streamreader=StreamReader,
        streamwriter=utf8.streamwriter)

codecs.register(search_function)

现在,在运行此代码之后(例如,您可以将其放置在.pythonrc或site.py中),以注释“ #coding:mylang”开头的任何代码都将自动通过上述预处理步骤进行翻译。例如。

# coding: mylang
myprint "this gets logged to file"
for i in range(10):
    myprint "so does this : ", i, "times"
myprint ("works fine" "with arbitrary" + " syntax" 
  "and line continuations")

注意事项:

预处理器方法存在一些问题,如果您使用过C预处理器,您可能会很熟悉。主要的是调试。python看到的只是经过预处理的文件,这意味着打印在堆栈跟踪等中的文本将引用该文件。如果您执行了重要的翻译,这可能与源文本有很大不同。上面的示例不会更改行号等,因此不会有太大的不同,但是更改的次数越多,越难弄清。

One way to do things like this is to preprocess the source and modify it, translating your added statement to python. There are various problems this approach will bring, and I wouldn’t recommend it for general usage, but for experimentation with language, or specific-purpose metaprogramming, it can occassionally be useful.

For instance, lets say we want to introduce a “myprint” statement, that instead of printing to the screen instead logs to a specific file. ie:

myprint "This gets logged to file"

would be equivalent to

print >>open('/tmp/logfile.txt','a'), "This gets logged to file"

There are various options as to how to do the replacing, from regex substitution to generating an AST, to writing your own parser depending on how close your syntax matches existing python. A good intermediate approach is to use the tokenizer module. This should allow you to add new keywords, control structures etc while interpreting the source similarly to the python interpreter, thus avoiding the breakage crude regex solutions would cause. For the above “myprint”, you could write the following transformation code:

import tokenize

LOGFILE = '/tmp/log.txt'
def translate(readline):
    for type, name,_,_,_ in tokenize.generate_tokens(readline):
        if type ==tokenize.NAME and name =='myprint':
            yield tokenize.NAME, 'print'
            yield tokenize.OP, '>>'
            yield tokenize.NAME, "open"
            yield tokenize.OP, "("
            yield tokenize.STRING, repr(LOGFILE)
            yield tokenize.OP, ","
            yield tokenize.STRING, "'a'"
            yield tokenize.OP, ")"
            yield tokenize.OP, ","
        else:
            yield type,name

(This does make myprint effectively a keyword, so use as a variable elsewhere will likely cause problems)

The problem then is how to use it so that your code is usable from python. One way would just be to write your own import function, and use it to load code written in your custom language. ie:

import new
def myimport(filename):
    mod = new.module(filename)
    f=open(filename)
    data = tokenize.untokenize(translate(f.readline))
    exec data in mod.__dict__
    return mod

This requires you handle your customised code differently from normal python modules however. ie “some_mod = myimport("some_mod.py")” rather than “import some_mod

Another fairly neat (albeit hacky) solution is to create a custom encoding (See PEP 263) as this recipe demonstrates. You could implement this as:

import codecs, cStringIO, encodings
from encodings import utf_8

class StreamReader(utf_8.StreamReader):
    def __init__(self, *args, **kwargs):
        codecs.StreamReader.__init__(self, *args, **kwargs)
        data = tokenize.untokenize(translate(self.stream.readline))
        self.stream = cStringIO.StringIO(data)

def search_function(s):
    if s!='mylang': return None
    utf8=encodings.search_function('utf8') # Assume utf8 encoding
    return codecs.CodecInfo(
        name='mylang',
        encode = utf8.encode,
        decode = utf8.decode,
        incrementalencoder=utf8.incrementalencoder,
        incrementaldecoder=utf8.incrementaldecoder,
        streamreader=StreamReader,
        streamwriter=utf8.streamwriter)

codecs.register(search_function)

Now after this code gets run (eg. you could place it in your .pythonrc or site.py) any code starting with the comment “# coding: mylang” will automatically be translated through the above preprocessing step. eg.

# coding: mylang
myprint "this gets logged to file"
for i in range(10):
    myprint "so does this : ", i, "times"
myprint ("works fine" "with arbitrary" + " syntax" 
  "and line continuations")

Caveats:

There are problems to the preprocessor approach, as you’ll probably be familiar with if you’ve worked with the C preprocessor. The main one is debugging. All python sees is the preprocessed file which means that text printed in the stack trace etc will refer to that. If you’ve performed significant translation, this may be very different from your source text. The example above doesn’t change line numbers etc, so won’t be too different, but the more you change it, the harder it will be to figure out.


回答 2

是的,在某种程度上是可能的。有一个模块可以sys.settrace()用来实现gotocomefrom“关键字”:

from goto import goto, label
for i in range(1, 10):
  for j in range(1, 20):
    print i, j
    if j == 3:
      goto .end # breaking out from nested loop
label .end
print "Finished"

Yes, to some extent it is possible. There is a module out there that uses sys.settrace() to implement goto and comefrom “keywords”:

from goto import goto, label
for i in range(1, 10):
  for j in range(1, 20):
    print i, j
    if j == 3:
      goto .end # breaking out from nested loop
label .end
print "Finished"

回答 3

缺少更改和重新编译源代码(在开放源代码中可能的)的情况下,更改基本语言实际上是不可能的。

即使您确实重新编译了源代码,也不会是python,只是经过修改的修改过的版本,您需要非常小心,不要引入错误。

但是,我不确定您为什么要这么做。Python的面向对象功能使使用这种语言实现类似的结果非常简单。

Short of changing and recompiling the source code (which is possible with open source), changing the base language is not really possible.

Even if you do recompile the source, it wouldn’t be python, just your hacked-up changed version which you need to be very careful not to introduce bugs into.

However, I’m not sure why you’d want to. Python’s object-oriented features makes it quite simple to achieve similar results with the language as it stands.


回答 4

通用答案:您需要预处理源文件。

更具体的答案:安装EasyExtend,然后执行以下步骤

i)创建一个新的langlet(扩展语言)

import EasyExtend
EasyExtend.new_langlet("mystmts", prompt = "my> ", source_ext = "mypy")

如果没有其他规范,则应在EasyExtend / langlets / mystmts /下创建一堆文件。

ii)打开mystmts / parsedef / Grammar.ext并添加以下行

small_stmt: (expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt |
             import_stmt | global_stmt | exec_stmt | assert_stmt | my_stmt )

my_stmt: 'mystatement' expr

这足以定义新语句的语法。small_stmt非终结符是Python语法的一部分,是连接新语句的地方。解析器现在将识别新语句,即将解析包含该新语句的源文件。尽管编译器将拒绝它,因为它仍然必须转换为有效的Python。

iii)现在必须添加语句的语义。为此,必须编辑msytmts / langlet.py并添加my_stmt节点访问者。

 def call_my_stmt(expression):
     "defines behaviour for my_stmt"
     print "my stmt called with", expression

 class LangletTransformer(Transformer):
       @transform
       def my_stmt(self, node):
           _expr = find_node(node, symbol.expr)
           return any_stmt(CST_CallFunc("call_my_stmt", [_expr]))

 __publish__ = ["call_my_stmt"]

iv)cd到langlets / mystmts并输入

python run_mystmts.py

现在将开始一个会话,可以使用新定义的语句:

__________________________________________________________________________________

 mystmts

 On Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)]
 __________________________________________________________________________________

 my> mystatement 40+2
 my stmt called with 42

做出一些琐碎的声明需要几个步骤,对吗?目前还没有一种API可以让人们定义简单的东西而不必关心语法。但是EE对一些错误进行模化是非常可靠的。因此,出现一个API只是时间问题,它使程序员可以使用便捷的OO编程定义便捷的内容,例如中缀运算符或小语句。对于更复杂的事情,例如通过构建langlet在Python中嵌入整个语言,没有办法解决完整的语法方法。

General answer: you need to preprocess your source files.

More specific answer: install EasyExtend, and go through following steps

i) Create a new langlet ( extension language )

import EasyExtend
EasyExtend.new_langlet("mystmts", prompt = "my> ", source_ext = "mypy")

Without additional specification a bunch of files shall be created under EasyExtend/langlets/mystmts/ .

ii) Open mystmts/parsedef/Grammar.ext and add following lines

small_stmt: (expr_stmt | print_stmt  | del_stmt | pass_stmt | flow_stmt |
             import_stmt | global_stmt | exec_stmt | assert_stmt | my_stmt )

my_stmt: 'mystatement' expr

This is sufficient to define the syntax of your new statement. The small_stmt non-terminal is part of the Python grammar and it’s the place where the new statement is hooked in. The parser will now recognize the new statement i.e. a source file containing it will be parsed. The compiler will reject it though because it still has to be transformed into valid Python.

iii) Now one has to add semantics of the statement. For this one has to edit msytmts/langlet.py and add a my_stmt node visitor.

 def call_my_stmt(expression):
     "defines behaviour for my_stmt"
     print "my stmt called with", expression

 class LangletTransformer(Transformer):
       @transform
       def my_stmt(self, node):
           _expr = find_node(node, symbol.expr)
           return any_stmt(CST_CallFunc("call_my_stmt", [_expr]))

 __publish__ = ["call_my_stmt"]

iv) cd to langlets/mystmts and type

python run_mystmts.py

Now a session shall be started and the newly defined statement can be used:

__________________________________________________________________________________

 mystmts

 On Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)]
 __________________________________________________________________________________

 my> mystatement 40+2
 my stmt called with 42

Quite a few steps to come to a trivial statement, right? There isn’t an API yet that lets one define simple things without having to care about grammars. But EE is very reliable modulo some bugs. So it’s just a matter of time that an API emerges that lets programmers define convenient stuff like infix operators or small statements using just convenient OO programming. For more complex things like embedding whole languages in Python by means of building a langlet there is no way of going around a full grammar approach.


回答 5

这是一种仅在解释模式下添加新语句的非常简单但糟糕的方法。我只使用sys.displayhook将它用于少量的1个字母的命令来编辑基因注释,但是为了回答这个问题,我还为语法错误添加了sys.excepthook。后者确实很丑陋,从readline缓冲区中获取原始代码。好处是,以这种方式添加新语句非常容易。


jcomeau@intrepid:~/$ cat demo.py; ./demo.py
#!/usr/bin/python -i
'load everything needed under "package", such as package.common.normalize()'
import os, sys, readline, traceback
if __name__ == '__main__':
    class t:
        @staticmethod
        def localfunction(*args):
            print 'this is a test'
            if args:
                print 'ignoring %s' % repr(args)

    def displayhook(whatever):
        if hasattr(whatever, 'localfunction'):
            return whatever.localfunction()
        else:
            print whatever

    def excepthook(exctype, value, tb):
        if exctype is SyntaxError:
            index = readline.get_current_history_length()
            item = readline.get_history_item(index)
            command = item.split()
            print 'command:', command
            if len(command[0]) == 1:
                try:
                    eval(command[0]).localfunction(*command[1:])
                except:
                    traceback.print_exception(exctype, value, tb)
        else:
            traceback.print_exception(exctype, value, tb)

    sys.displayhook = displayhook
    sys.excepthook = excepthook
>>> t
this is a test
>>> t t
command: ['t', 't']
this is a test
ignoring ('t',)
>>> ^D

Here’s a very simple but crappy way to add new statements, in interpretive mode only. I’m using it for little 1-letter commands for editing gene annotations using only sys.displayhook, but just so I could answer this question I added sys.excepthook for the syntax errors as well. The latter is really ugly, fetching the raw code from the readline buffer. The benefit is, it’s trivially easy to add new statements this way.


jcomeau@intrepid:~/$ cat demo.py; ./demo.py
#!/usr/bin/python -i
'load everything needed under "package", such as package.common.normalize()'
import os, sys, readline, traceback
if __name__ == '__main__':
    class t:
        @staticmethod
        def localfunction(*args):
            print 'this is a test'
            if args:
                print 'ignoring %s' % repr(args)

    def displayhook(whatever):
        if hasattr(whatever, 'localfunction'):
            return whatever.localfunction()
        else:
            print whatever

    def excepthook(exctype, value, tb):
        if exctype is SyntaxError:
            index = readline.get_current_history_length()
            item = readline.get_history_item(index)
            command = item.split()
            print 'command:', command
            if len(command[0]) == 1:
                try:
                    eval(command[0]).localfunction(*command[1:])
                except:
                    traceback.print_exception(exctype, value, tb)
        else:
            traceback.print_exception(exctype, value, tb)

    sys.displayhook = displayhook
    sys.excepthook = excepthook
>>> t
this is a test
>>> t t
command: ['t', 't']
this is a test
ignoring ('t',)
>>> ^D


回答 6

我找到了有关添加新语句的指南:

https://troeger.eu/files/teaching/pythonvm08lab.pdf

基本上,要添加新语句,您必须Python/ast.c(除其他外)进行编辑并重新编译python二进制文件。

尽管有可能,但不要这样做。您几乎可以通过函数和类来实现所有目的(这不需要人们重新编译python才能运行您的脚本。)

I’ve found a guide on adding new statements:

https://troeger.eu/files/teaching/pythonvm08lab.pdf

Basically, to add new statements, you must edit Python/ast.c (among other things) and recompile the python binary.

While it’s possible, don’t. You can achieve almost everything via functions and classes (which wont require people to recompile python just to run your script..)


回答 7

使用EasyExtend可以做到这一点

EasyExtend(EE)是一个用纯Python编写并与CPython集成的预处理器生成器和元编程框架。EasyExtend的主要目的是创建扩展语言,即向Python添加自定义语法和语义。

It’s possible to do this using EasyExtend:

EasyExtend (EE) is a preprocessor generator and metaprogramming framework written in pure Python and integrated with CPython. The main purpose of EasyExtend is the creation of extension languages i.e. adding custom syntax and semantics to Python.


回答 8

它并没有在语言语法中添加新的语句,但是宏是一个强大的工具:https : //github.com/lihaoyi/macropy

It’s not exactly adding new statements to the language syntax, but macros are a powerful tool: https://github.com/lihaoyi/macropy


回答 9

并非没有修改解释器。我知道过去几年中许多语言都被描述为“可扩展”,但并不是您所描述的那样。您可以通过添加函数和类来扩展Python。

Not without modifying the interpreter. I know a lot of languages in the past several years have been described as “extensible”, but not in the way you’re describing. You extend Python by adding functions and classes.


回答 10

有一种基于Python的语言称为Logix,您可以使用它执行此操作。它不是一直在开发了一段时间,但功能,你要求做的工作与最新版本。

There is a language based on python called Logix with which you CAN do such things. It hasn’t been under development for a while, but the features that you asked for do work with the latest version.


回答 11

装饰器可以完成某些操作。例如,假设Python没有with语句。然后,我们可以实现类似的行为,如下所示:

# ====== Implementation of "mywith" decorator ======

def mywith(stream):
    def decorator(function):
        try: function(stream)
        finally: stream.close()
    return decorator

# ====== Using the decorator ======

@mywith(open("test.py","r"))
def _(infile):
    for l in infile.readlines():
        print(">>", l.rstrip())

但是,这是一个非常不干净的解决方案。特别是装饰器调用函数并设置_为的行为None是意外的。为了澄清起见:此装饰器等效于编写

def _(infile): ...
_ = mywith(open(...))(_) # mywith returns None.

通常,装饰器和装饰器将修改而不执行功能。

我以前在脚本中使用过这种方法,在该脚本中,我不得不临时设置几个功能的工作目录。

Some things can be done with decorators. Let’s e.g. assume, Python had no with statement. We could then implement a similar behaviour like this:

# ====== Implementation of "mywith" decorator ======

def mywith(stream):
    def decorator(function):
        try: function(stream)
        finally: stream.close()
    return decorator

# ====== Using the decorator ======

@mywith(open("test.py","r"))
def _(infile):
    for l in infile.readlines():
        print(">>", l.rstrip())

It is a pretty unclean solution however as done here. Especially the behaviour where the decorator calls the function and sets _ to None is unexpected. For clarification: This decorator is equivalent to writing

def _(infile): ...
_ = mywith(open(...))(_) # mywith returns None.

and decorators are normally expected to modify, not to execute, functions.

I used such a method before in a script where I had to temporarily set the working directory for several functions.


回答 12

十年前,您做不到,我怀疑情况已经改变。但是,如果您准备重新编译python,那么修改语法并不难,我也怀疑是否已更改。

Ten years ago you couldn’t, and I doubt that’s changed. However, it wasn’t that hard to modify the syntax back then if you were prepared to recompile python, and I doubt that’s changed, either.