小马(ORM)如何发挥作用?

问题:小马(ORM)如何发挥作用?

Pony ORM很好地把生成器表达式转换成SQL。例:

>>> select(p for p in Person if p.name.startswith('Paul'))
        .order_by(Person.name)[:2]

SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2

[Person[3], Person[1]]
>>>

我知道Python具有出色的自省和内置元编程功能,但是该库如何能够在不进行预处理的情况下转换生成器表达式?看起来像魔术。

[更新]

搅拌器写道:

这是您要查找的文件。似乎可以使用一些自省向导来重构生成器。我不确定它是否支持100%的Python语法,但这很酷。- 搅拌机

我以为他们正在研究生成器表达协议中的某些功能,但正在查看此文件并看到其中ast涉及的模块…不,他们不是在动态检查程序源,是吗?令人振奋…

@BrenBarn:如果我尝试在select函数调用之外调用生成器,则结果为:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<interactive input>", line 1, in <genexpr>
  File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
    % self.entity.__name__)
  File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
    raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

好像他们在做更多不可思议的事情,例如检查select函数调用和动态处理Python抽象语法语法树。

我仍然希望看到有人对此进行解释,其来源远远超出了我的巫术水平。

Pony ORM does the nice trick of converting a generator expression into SQL. Example:

>>> select(p for p in Person if p.name.startswith('Paul'))
        .order_by(Person.name)[:2]

SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2

[Person[3], Person[1]]
>>>

I know Python has wonderful introspection and metaprogramming builtin, but how this library is able to translate the generator expression without preprocessing? It looks like magic.

[update]

Blender wrote:

Here is the file that you’re after. It seems to reconstruct the generator using some introspection wizardry. I’m not sure if it supports 100% of Python’s syntax, but this is pretty cool. – Blender

I was thinking they were exploring some feature from the generator expression protocol, but looking this file, and seeing the ast module involved… No, they are not inspecting the program source on the fly, are they? Mind-blowing…

@BrenBarn: If I try to call the generator outside the select function call, the result is:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<interactive input>", line 1, in <genexpr>
  File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
    % self.entity.__name__)
  File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
    raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

Seems like they are doing more arcane incantations like inspecting the select function call and processing the Python abstract syntax grammar tree on the fly.

I still would like to see someone explaining it, the source is way beyond my wizardry level.


回答 0

小马ORM作者在这里。

Pony通过三个步骤将Python生成器转换为SQL查询:

  1. 反编译生成器字节码并重建生成器AST(抽象语法树)
  2. 将Python AST转换为“抽象SQL”-SQL查询的基于列表的通用表示形式
  3. 将抽象SQL表示转换为特定于数据库的SQL方言

最复杂的部分是第二步,其中Pony必须了解Python表达式的“含义”。似乎您对第一步最感兴趣,所以让我解释一下反编译的工作原理。

让我们考虑以下查询:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

将其转换为以下SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

下面是该查询的结果,将其打印出来:

id|email              |password|name          |country|address  
--+-------------------+--------+--------------+-------+---------
1 |john@example.com   |***     |John Smith    |USA    |address 1
2 |matthew@example.com|***     |Matthew Reed  |USA    |address 2
4 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

select()函数接受python生成器作为参数,然后分析其字节码。我们可以使用标准的python dis模块获取此生成器的字节码指令:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                26 (to 32)
              6 STORE_FAST               1 (c)
              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)
             21 POP_JUMP_IF_FALSE        3
             24 LOAD_FAST                1 (c)
             27 YIELD_VALUE         
             28 POP_TOP             
             29 JUMP_ABSOLUTE            3
        >>   32 LOAD_CONST               1 (None)
             35 RETURN_VALUE

Pony ORM decompile()在模块内pony.orm.decompiling具有可以从字节码恢复AST 的功能:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

在这里,我们可以看到AST节点的文本表示形式:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

现在让我们看看该decompile()函数是如何工作的。

decompile()函数创建一个Decompiler对象,该对象实现了Visitor模式。反编译器实例一一获取字节码指令。对于每条指令,反编译器对象都会调用其自己的方法。该方法的名称等于当前字节码指令的名称。

Python计算表达式时,它使用堆栈,该堆栈存储中间的计算结果。反编译器对象也有自己的堆栈,但是该堆栈不存储表达式计算的结果,而是存储表达式的AST节点。

当调用下一个字节码指令的反编译器方法时,它将从堆栈中取出AST节点,将它们组合成一个新的AST节点,然后将该节点放在堆栈的顶部。

例如,让我们看看如何c.country == 'USA'计算子表达式。相应的字节码片段为:

              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)

因此,反编译器对象执行以下操作:

  1. 来电decompiler.LOAD_FAST('c')。此方法将Name('c')节点放在反编译器堆栈的顶部。
  2. 来电decompiler.LOAD_ATTR('country')。此方法Name('c')从堆栈中取出节点,创建该Geattr(Name('c'), 'country')节点并将其放在堆栈顶部。
  3. 来电decompiler.LOAD_CONST('USA')。此方法将Const('USA')节点放在堆栈顶部。
  4. 来电decompiler.COMPARE_OP('==')。此方法从堆栈中获取两个节点(Getattr和Const),然后将其Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) 放在堆栈的顶部。

在处理完所有字节码指令之后,反编译器堆栈将包含一个与整个生成器表达式相对应的AST节点。

由于Pony ORM仅需要反编译生成器和lambda,因此并没有那么复杂,因为生成器的指令流相对简单-它只是一堆嵌套循环。

目前,Pony ORM涵盖了整个生成器指令集,但以下两点除外:

  1. 内联if表达式: a if b else c
  2. 复合比较: a < b < c

如果Pony遇到此类表达,则会引发NotImplementedError异常。但是即使在这种情况下,您也可以通过将生成器表达式作为字符串传递来使其工作。当您将生成器作为字符串传递时,Pony不使用反编译器模块。相反,它使用标准Python compiler.parse函数获取AST 。

希望这能回答您的问题。

Pony ORM author is here.

Pony translates Python generator into SQL query in three steps:

  1. Decompiling of generator bytecode and rebuilding generator AST (abstract syntax tree)
  2. Translation of Python AST into “abstract SQL” — universal list-based representation of a SQL query
  3. Converting abstract SQL representation into specific database-dependent SQL dialect

The most complex part is the second step, where Pony must understand the “meaning” of Python expressions. Seems you are most interested in the first step, so let me explain how decompiling works.

Let’s consider this query:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

Which will be translated into the following SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

And below is the result of this query which will be printed out:

id|email              |password|name          |country|address  
--+-------------------+--------+--------------+-------+---------
1 |john@example.com   |***     |John Smith    |USA    |address 1
2 |matthew@example.com|***     |Matthew Reed  |USA    |address 2
4 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

The select() function accepts a python generator as argument, and then analyzes its bytecode. We can get bytecode instructions of this generator using standard python dis module:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                26 (to 32)
              6 STORE_FAST               1 (c)
              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)
             21 POP_JUMP_IF_FALSE        3
             24 LOAD_FAST                1 (c)
             27 YIELD_VALUE         
             28 POP_TOP             
             29 JUMP_ABSOLUTE            3
        >>   32 LOAD_CONST               1 (None)
             35 RETURN_VALUE

Pony ORM has the function decompile() within module pony.orm.decompiling which can restore an AST from the bytecode:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

Here, we can see the textual representation of the AST nodes:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

Let’s now see how the decompile() function works.

The decompile() function creates a Decompiler object, which implements the Visitor pattern. The decompiler instance gets bytecode instructions one-by-one. For each instruction the decompiler object calls its own method. The name of this method is equal to the name of current bytecode instruction.

When Python calculates an expression, it uses stack, which stores an intermediate result of calculation. The decompiler object also has its own stack, but this stack stores not the result of expression calculation, but AST node for the expression.

When decompiler method for the next bytecode instruction is called, it takes AST nodes from the stack, combines them into a new AST node, and then puts this node on the top of the stack.

For example, let’s see how the subexpression c.country == 'USA' is calculated. The corresponding bytecode fragment is:

              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)

So, the decompiler object does the following:

  1. Calls decompiler.LOAD_FAST('c'). This method puts the Name('c') node on the top of the decompiler stack.
  2. Calls decompiler.LOAD_ATTR('country'). This method takes the Name('c') node from the stack, creates the Geattr(Name('c'), 'country') node and puts it on the top of the stack.
  3. Calls decompiler.LOAD_CONST('USA'). This method puts the Const('USA') node on top of the stack.
  4. Calls decompiler.COMPARE_OP('=='). This method takes two nodes (Getattr and Const) from the stack, and then puts Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) on the top of the stack.

After all bytecode instructions are processed, the decompiler stack contains a single AST node which corresponds to the whole generator expression.

Since Pony ORM needs to decompile generators and lambdas only, this is not that complex, because the instruction flow for a generator is relatively straightforward – it is just a bunch of nested loops.

Currently Pony ORM covers the whole generator instructions set except two things:

  1. Inline if expressions: a if b else c
  2. Compound comparisons: a < b < c

If Pony encounters such expression it raises the NotImplementedError exception. But even in this case you can make it work by passing the generator expression as a string. When you pass a generator as a string Pony doesn’t use the decompiler module. Instead it gets the AST using the standard Python compiler.parse function.

Hope this answers your question.