Python 实用宝典

Question 1

Pony ORM does the nice trick of converting a generator expression into SQL. Example:

>>> select(p for p in Person if p.name.startswith('Paul'))
        .order_by(Person.name)[:2]

SELECT "p"."id", "p"."name", "p"."age"
FROM "Person" "p"
WHERE "p"."name" LIKE "Paul%"
ORDER BY "p"."name"
LIMIT 2

[Person[3], Person[1]]
>>>

I know Python has wonderful introspection and metaprogramming builtin, but how this library is able to translate the generator expression without preprocessing? It looks like magic.

[update]

Blender wrote:

Here is the file that you’re after. It seems to reconstruct the generator using some introspection wizardry. I’m not sure if it supports 100% of Python’s syntax, but this is pretty cool. – Blender

I was thinking they were exploring some feature from the generator expression protocol, but looking this file, and seeing the ast module involved… No, they are not inspecting the program source on the fly, are they? Mind-blowing…

@BrenBarn: If I try to call the generator outside the select function call, the result is:

>>> x = (p for p in Person if p.age > 20)
>>> x.next()
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
  File "<interactive input>", line 1, in <genexpr>
  File "C:\Python27\lib\site-packages\pony\orm\core.py", line 1822, in next
    % self.entity.__name__)
  File "C:\Python27\lib\site-packages\pony\utils.py", line 92, in throw
    raise exc
TypeError: Use select(...) function or Person.select(...) method for iteration
>>>

Seems like they are doing more arcane incantations like inspecting the select function call and processing the Python abstract syntax grammar tree on the fly.

I still would like to see someone explaining it, the source is way beyond my wizardry level.

Question 2

Pony ORM author is here.

Pony translates Python generator into SQL query in three steps:

Decompiling of generator bytecode and rebuilding generator AST (abstract syntax tree)
Translation of Python AST into “abstract SQL” — universal list-based representation of a SQL query
Converting abstract SQL representation into specific database-dependent SQL dialect

The most complex part is the second step, where Pony must understand the “meaning” of Python expressions. Seems you are most interested in the first step, so let me explain how decompiling works.

Let’s consider this query:

>>> from pony.orm.examples.estore import *
>>> select(c for c in Customer if c.country == 'USA').show()

Which will be translated into the following SQL:

SELECT "c"."id", "c"."email", "c"."password", "c"."name", "c"."country", "c"."address"
FROM "Customer" "c"
WHERE "c"."country" = 'USA'

And below is the result of this query which will be printed out:

id|email              |password|name          |country|address  
--+-------------------+--------+--------------+-------+---------
1 |john@example.com   |***     |John Smith    |USA    |address 1
2 |matthew@example.com|***     |Matthew Reed  |USA    |address 2
4 |rebecca@example.com|***     |Rebecca Lawson|USA    |address 4

The select() function accepts a python generator as argument, and then analyzes its bytecode. We can get bytecode instructions of this generator using standard python dis module:

>>> gen = (c for c in Customer if c.country == 'USA')
>>> import dis
>>> dis.dis(gen.gi_frame.f_code)
  1           0 LOAD_FAST                0 (.0)
        >>    3 FOR_ITER                26 (to 32)
              6 STORE_FAST               1 (c)
              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)
             21 POP_JUMP_IF_FALSE        3
             24 LOAD_FAST                1 (c)
             27 YIELD_VALUE         
             28 POP_TOP             
             29 JUMP_ABSOLUTE            3
        >>   32 LOAD_CONST               1 (None)
             35 RETURN_VALUE

Pony ORM has the function decompile() within module pony.orm.decompiling which can restore an AST from the bytecode:

>>> from pony.orm.decompiling import decompile
>>> ast, external_names = decompile(gen)

Here, we can see the textual representation of the AST nodes:

>>> ast
GenExpr(GenExprInner(Name('c'), [GenExprFor(AssName('c', 'OP_ASSIGN'), Name('.0'),
[GenExprIf(Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]))])]))

Let’s now see how the decompile() function works.

The decompile() function creates a Decompiler object, which implements the Visitor pattern. The decompiler instance gets bytecode instructions one-by-one. For each instruction the decompiler object calls its own method. The name of this method is equal to the name of current bytecode instruction.

When Python calculates an expression, it uses stack, which stores an intermediate result of calculation. The decompiler object also has its own stack, but this stack stores not the result of expression calculation, but AST node for the expression.

When decompiler method for the next bytecode instruction is called, it takes AST nodes from the stack, combines them into a new AST node, and then puts this node on the top of the stack.

For example, let’s see how the subexpression c.country == 'USA' is calculated. The corresponding bytecode fragment is:

              9 LOAD_FAST                1 (c)
             12 LOAD_ATTR                0 (country)
             15 LOAD_CONST               0 ('USA')
             18 COMPARE_OP               2 (==)

So, the decompiler object does the following:

Calls decompiler.LOAD_FAST('c'). This method puts the Name('c') node on the top of the decompiler stack.
Calls decompiler.LOAD_ATTR('country'). This method takes the Name('c') node from the stack, creates the Geattr(Name('c'), 'country') node and puts it on the top of the stack.
Calls decompiler.LOAD_CONST('USA'). This method puts the Const('USA') node on top of the stack.
Calls decompiler.COMPARE_OP('=='). This method takes two nodes (Getattr and Const) from the stack, and then puts Compare(Getattr(Name('c'), 'country'), [('==', Const('USA'))]) on the top of the stack.

After all bytecode instructions are processed, the decompiler stack contains a single AST node which corresponds to the whole generator expression.

Since Pony ORM needs to decompile generators and lambdas only, this is not that complex, because the instruction flow for a generator is relatively straightforward – it is just a bunch of nested loops.

Currently Pony ORM covers the whole generator instructions set except two things:

Inline if expressions: a if b else c
Compound comparisons: a < b < c

If Pony encounters such expression it raises the NotImplementedError exception. But even in this case you can make it work by passing the generator expression as a string. When you pass a generator as a string Pony doesn’t use the decompiler module. Instead it gets the AST using the standard Python compiler.parse function.

Hope this answers your question.

Question 3

From something like this:

print(get_indentation_level())

    print(get_indentation_level())

        print(get_indentation_level())

I would like to get something like this:

1
2
3

Can the code read itself in this way?

All I want is the output from the more nested parts of the code to be more nested. In the same way that this makes code easier to read, it would make the output easier to read.

Of course I could implement this manually, using e.g. .format(), but what I had in mind was a custom print function which would print(i*' ' + string) where i is the indentation level. This would be a quick way to make readable output on my terminal.

Is there a better way to do this which avoids painstaking manual formatting?

Question 4

If you want indentation in terms of nesting level rather than spaces and tabs, things get tricky. For example, in the following code:

if True:
    print(
get_nesting_level())

the call to get_nesting_level is actually nested one level deep, despite the fact that there is no leading whitespace on the line of the get_nesting_level call. Meanwhile, in the following code:

print(1,
      2,
      get_nesting_level())

the call to get_nesting_level is nested zero levels deep, despite the presence of leading whitespace on its line.

In the following code:

if True:
  if True:
    print(get_nesting_level())

if True:
    print(get_nesting_level())

the two calls to get_nesting_level are at different nesting levels, despite the fact that the leading whitespace is identical.

In the following code:

if True: print(get_nesting_level())

is that nested zero levels, or one? In terms of INDENT and DEDENT tokens in the formal grammar, it’s zero levels deep, but you might not feel the same way.

If you want to do this, you’re going to have to tokenize the whole file up to the point of the call and count INDENT and DEDENT tokens. The tokenize module would be very useful for such a function:

import inspect
import tokenize

def get_nesting_level():
    caller_frame = inspect.currentframe().f_back
    filename, caller_lineno, _, _, _ = inspect.getframeinfo(caller_frame)
    with open(filename) as f:
        indentation_level = 0
        for token_record in tokenize.generate_tokens(f.readline):
            token_type, _, (token_lineno, _), _, _ = token_record
            if token_lineno > caller_lineno:
                break
            elif token_type == tokenize.INDENT:
                indentation_level += 1
            elif token_type == tokenize.DEDENT:
                indentation_level -= 1
        return indentation_level

Question 5

Yeah, that’s definitely possible, here’s a working example:

import inspect

def get_indentation_level():
    callerframerecord = inspect.stack()[1]
    frame = callerframerecord[0]
    info = inspect.getframeinfo(frame)
    cc = info.code_context[0]
    return len(cc) - len(cc.lstrip())

if 1:
    print get_indentation_level()
    if 1:
        print get_indentation_level()
        if 1:
            print get_indentation_level()

Question 6

You can use sys.current_frame.f_lineno in order to get the line number. Then in order to find the number of indentation level you need to find the previous line with zero indentation then be subtracting the current line number from that line’s number you’ll get the number of indentation:

import sys
current_frame = sys._getframe(0)

def get_ind_num():
    with open(__file__) as f:
        lines = f.readlines()
    current_line_no = current_frame.f_lineno
    to_current = lines[:current_line_no]
    previous_zoro_ind = len(to_current) - next(i for i, line in enumerate(to_current[::-1]) if not line[0].isspace())
    return current_line_no - previous_zoro_ind

Demo:

if True:
    print get_ind_num()
    if True:
        print(get_ind_num())
        if True:
            print(get_ind_num())
            if True: print(get_ind_num())
# Output
1
3
5
6

If you want the number of the indentation level based on the previouse lines with : you can just do it with a little change:

def get_ind_num():
    with open(__file__) as f:
        lines = f.readlines()

    current_line_no = current_frame.f_lineno
    to_current = lines[:current_line_no]
    previous_zoro_ind = len(to_current) - next(i for i, line in enumerate(to_current[::-1]) if not line[0].isspace())
    return sum(1 for line in lines[previous_zoro_ind-1:current_line_no] if line.strip().endswith(':'))

Demo:

if True:
    print get_ind_num()
    if True:
        print(get_ind_num())
        if True:
            print(get_ind_num())
            if True: print(get_ind_num())
# Output
1
2
3
3

And as an alternative answer here is a function for getting the number of indentation (whitespace):

import sys
from itertools import takewhile
current_frame = sys._getframe(0)

def get_ind_num():
    with open(__file__) as f:
        lines = f.readlines()
    return sum(1 for _ in takewhile(str.isspace, lines[current_frame.f_lineno - 1]))

Question 7

To solve the ”real” problem that lead to your question you could implement a contextmanager which keeps track of the indention level and make the with block structure in the code correspond to the indentation levels of the output. This way the code indentation still reflects the output indentation without coupling both too much. It is still possible to refactor the code into different functions and have other indentations based on code structure not messing with the output indentation.

#!/usr/bin/env python
# coding: utf8
from __future__ import absolute_import, division, print_function


class IndentedPrinter(object):

    def __init__(self, level=0, indent_with='  '):
        self.level = level
        self.indent_with = indent_with

    def __enter__(self):
        self.level += 1
        return self

    def __exit__(self, *_args):
        self.level -= 1

    def print(self, arg='', *args, **kwargs):
        print(self.indent_with * self.level + str(arg), *args, **kwargs)


def main():
    indented = IndentedPrinter()
    indented.print(indented.level)
    with indented:
        indented.print(indented.level)
        with indented:
            indented.print('Hallo', indented.level)
            with indented:
                indented.print(indented.level)
            indented.print('and back one level', indented.level)


if __name__ == '__main__':
    main()

Output:

0
  1
    Hallo 2
      3
    and back one level 2

Question 8

>>> import inspect
>>> help(inspect.indentsize)
Help on function indentsize in module inspect:

indentsize(line)
    Return the indent size, in spaces, at the start of a line of text.

Python 实用宝典

标签归档：metaprogramming

小马（ORM）如何发挥作用？

问题：小马（ORM）如何发挥作用？

回答 0

一行Python代码可以知道其缩进嵌套级别吗？

问题：一行Python代码可以知道其缩进嵌套级别吗？

回答 0

回答 1

回答 2

回答 3

回答 4

来自对象字段的Python字典

问题：来自对象字段的Python字典

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

PYTHON 3：

PYTHON 3:

回答 11

有趣好用的Python教程