标签归档:compiler-construction

我可以对代码执行哪种模式以使其更容易转换为另一种编程语言?[关闭]

问题:我可以对代码执行哪种模式以使其更容易转换为另一种编程语言?[关闭]

我正着手做一个副项目,目标是将代码从一种编程语言转换为另一种编程语言。我开始使用的语言是PHP和Python(Python到PHP应该更容易入手),但理想情况下,我可以(相对)轻松地添加其他语言。该计划是:

  • 这是针对Web开发的。原始代码和目标代码将位于框架的顶部(我也将不得不编写这些框架)。这些框架将包含MVC设计模式并遵循严格的编码约定。这应该使翻译更加容易。

  • 我还在研究IOC和依赖项注入,因为它们可能使翻译过程更容易且更不易出错。

  • 我将使用Python的解析器模块,该模块可让我摆弄抽象语法树。显然,我可以用PHP获得的最接近的是token_get_all(),这是一个开始。

  • 从那时起,我可以构建AST,符号表和控制流程。

然后,我相信我可以开始输出代码了。我不需要完美的翻译。我仍然需要查看生成的代码并解决问题。理想情况下,翻译人员应标记有问题的翻译。

在您问“这到底是什么意思?”之前 答案是……这将是一次有趣的学习经历。如果您对如何减少这种麻烦有任何见解,请告诉我。


编辑:

我更想知道我可以对代码强制执行哪种类型的模式,而不是如何进行翻译,从而使代码的翻译(即:IoC,SOA?)更容易。

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

  • This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.

  • I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.

  • I’ll make use of Python’s parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.

  • From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don’t need a perfect translation. I’ll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask “What the hell is the point of this?” The answer is… It’ll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.


EDIT:

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.


回答 0

自1995年以来,在强大的计算机科学家团队的支持下,我一直在构建工具(DMS Software Reengineering Toolkit)来进行通用程序处理(语言翻译为特例)。DMS提供通用解析,AST构建,符号表,控制和数据流分析,转换规则的应用,带有注释的源文本的再生等,所有这些都通过计算机语言的显式定义进行参数化。

机器,你需要做到这一点的量为广大的(特别是如果你希望能够在一个通用的方式来做到这一点对于多国语言),然后你需要用不可靠的定义语言可靠分析器(PHP是这个完美的例子)。

您考虑构建或尝试进行语言到语言的翻译并没有错,但是我认为您会发现,对于真正的语言而言,这是一项比您期望的大得多的任务。我们仅在DMS上投入了大约100个人年,在每种“可靠”的语言定义(包括我们为PHP痛苦地构建的一种语言)上又花了6到12个月的时间,对于讨厌的语言(例如C ++)则投入了更多。这将是“一次学习经历”;这一直在我们身上。(您可能会发现上述网站上的“技术论文”部分有趣,可快速开始学习)。

人们经常尝试从某种他们熟悉的技术入手来构建某种通用的机器。(Python AST是一个很好的例子)。好消息是,这项工作已经完成。坏消息是,机械中内置了无数种假设,直到您尝试将其用于其他用途之前,您几乎不会发现其中的大部分假设。到那时,您发现机器已连接起来可以执行其最初的工作,并且会真的,真的会抵制您使它做其他事情的尝试。(我怀疑尝试让Python AST建模PHP会很有趣)。

我最初开始构建DMS的原因是建立的基础很少内置这样的假设。它使我们有些头痛。到目前为止,还没有黑洞。(在过去的15年中,我工作中最难的部分是试图防止这种假设蔓延)。

很多人也犯了一个错误的假设,即如果他们可以解析(并且可能获得AST),那么他们就可以做复杂的事情了。困难的教训之一是,您需要符号表和流程分析才能进行良好的程序分析或转换。AST是必要的,但还不够。这就是Aho&Ullman的编译器书不止于第二章的原因。(OP拥有此权利,因为他计划在AST之外构建其他机器)。有关此主题的更多信息,请参见解析后的生命

关于“我不需要完美的翻译”的评论很麻烦。弱翻译的工作是转换80%的“简单”代码,而剩下20%的代码要手工完成。如果要转换的应用程序很小,并且只打算转换一次,那么20%就可以了。如果要转换许多应用程序(甚至是随时间变化很小的同一应用程序),那不是很好。如果您尝试转换100K SLOC,则20%是20,000原始代码行,这些代码很难翻译,理解和修改,而您还无法理解另外80,000行已翻译程序。这需要大量的努力。在百万行级别,这实际上是不可能的。更难,他们通常会长时间拖延,付出高昂的代价并经常彻底失败,这很痛苦。

要翻译大型系统,您需要拍摄的是90%的高转换率,或者您可能无法完成翻译活动的手动部分。

另一个关键考虑因素是要翻译的代码大小。即使使用良好的工具,也要花费大量的精力来构建能正常运行的强大翻译器。尽管构建翻译器而不是简单地进行手动转换似乎很酷,而且很酷,但是对于较小的代码库(例如,根据我们的经验,最多10万个SLOC),从经济角度讲,这样做并不合理。没有人喜欢这个答案,但是,如果您真的只需要翻译10K SLOC代码,则最好是硬着头皮做一下。是的,那很痛苦。

我认为我们的工具非常出色(但后来我颇有偏见)。建立一个好的翻译仍然非常困难。我们大约需要1.5到2个人工年,我们知道如何使用我们的工具。不同之处在于,有了如此多的设备,我们成功的次数多于失败的次数。

I’ve been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There’s nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you’ll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each “reliable” language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a “hell of a learning experience”; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won’t discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman’s compiler book doesn’t stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about “I don’t need a perfect translation” is troublesome. What weak translators do is convert the “easy” 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don’t understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that’s even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can’t complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don’t justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that’s painful.

I consider our tools to be extremely good (but then, I’m pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.


回答 1

我的答案将解决解析Python以便将其翻译为另一种语言的特定任务,而不是Ira在其答案中很好解决的更高层次的方面。

简而言之:不要使用解析器模块,这是一种更简单的方法。

ast自Python 2.6起提供的模块更加适合您的需求,因为它为您提供了现成的AST可以使用。我已经写了一本关于文章最后一年,但在短,使用parse的方法ast将Python源代码解析为AST。该parser模块将为您提供一个解析树,而不是AST。小心区别

现在,由于Python的AST非常详细,因此对于AST来说,前端工作并不困难。我想您可以很快为功能的某些部分准备一个简单的原型。但是,获得完整的解决方案将花费更多时间,这主要是因为语言的语义不同。语言的一个简单子集(功能,基本类型等)可以轻松翻译,但是一旦进入更复杂的层次,您将需要笨拙的机制来模仿一种语言的核心。例如,考虑一下Python的生成器和列表理解,这在PHP中是不存在的(据我所知,当涉及到PHP时,这是很差的)。

为了给您最后的提示,请考虑2to3由Python开发人员创建的将Python 2代码转换为Python 3代码的工具。从前端来看,它具有将Python转换成某种东西所需的大多数元素。但是,由于Python 2和3的内核相似,因此那里不需要仿真机制。

My answer will address the specific task of parsing Python in order to translate it to another language, and not the higher-level aspects which Ira addressed well in his answer.

In short: do not use the parser module, there’s an easier way.

The ast module, available since Python 2.6 is much more suitable for your needs, since it gives you a ready-made AST to work with. I’ve written an article on this last year, but in short, use the parse method of ast to parse Python source code into an AST. The parser module will give you a parse tree, not an AST. Be wary of the difference.

Now, since Python’s ASTs are quite detailed, given an AST the front-end job isn’t terribly hard. I suppose you can have a simple prototype for some parts of the functionality ready quite quickly. However, getting to a complete solution will take more time, mainly because the semantics of the languages are different. A simple subset of the language (functions, basic types and so on) can be readily translated, but once you get into the more complex layers, you’ll need heavy machinery to emulate one language’s core in another. For example consider Python’s generators and list comprehensions which don’t exist in PHP (to my best knowledge, which is admittedly poor when PHP is involved).

To give you one final tip, consider the 2to3 tool created by the Python devs to translate Python 2 code to Python 3 code. Front-end-wise, it has most of the elements you need to translate Python to something. However, since the cores of Python 2 and 3 are similar, no emulation machinery is required there.


回答 2

编写翻译不是没有可能,尤其是考虑到乔尔的实习生是在夏天完成的。

如果您想讲一种语言,这很容易。如果您想做更多的事情,那会有些困难,但不要太多。最难的部分是,尽管任何图灵完备的语言都可以完成另一种图灵完备的语言所能做的事情,但是内置数据类型却可以显着改变一种语言所要做的事情。

例如:

word = 'This is not a word'
print word[::-2]

需要很多复制的C ++代码(好的,您可以使用一些循环结构来做得很短,但是仍然可以)。

我想那是一个问题。

您是否曾经根据语言语法编写过分词器/解析器?如果没有,您可能想学习如何做,因为这是该项目的主要部分。我要做的是提供基本的Turing完整语法-与Python 字节码相当相似 。然后创建一个采用语言语法的词法分析器/解析器(也许使用BNF),并基于该语法将语言编译为中间语言。然后,您需要做的是相反的操作-根据语法将您的语言创建为目标语言的解析器。

我看到的最明显的问题是,一开始您可能会创建极其低效的代码,尤其是在Python等功能更强大的语言中。

但是,如果以这种方式进行操作,那么您可能会一直想出优化输出的方法。总结一下:

  • 阅读提供的语法
  • 将程序编译成中间(也包括图灵完整)语法
  • 将中间程序编译成最终语言(基于提供的语法)
  • …?
  • 利润!(?)

*功能强大,我的意思是这需要4行:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

向我展示另一种可以在4行中完成类似工作的语言,并且我将向您展示一种与Python一样强大的语言。

Writing a translator isn’t impossible, especially considering that Joel’s Intern did it over a summer.

If you want to do one language, it’s easy. If you want to do more, it’s a little more difficult, but not too much. The hardest part is that, while any turing complete language can do what another turing complete language does, built-in data types can change what a language does phenomenally.

For instance:

word = 'This is not a word'
print word[::-2]

takes a lot of C++ code to duplicate (ok, well you can do it fairly short with some looping constructs, but still).

That’s a bit of an aside, I guess.

Have you ever written a tokenizer/parser based on a language grammar? You’ll probably want to learn how to do that if you haven’t, because that’s the main part of this project. What I would do is come up with a basic Turing complete syntax – something fairly similar to Python bytecode. Then you create a lexer/parser that takes a language grammar (perhaps using BNF), and based on the grammar, compiles the language into your intermediate language. Then what you’ll want to do is do the reverse – create a parser from your language into target languages based on the grammar.

The most obvious problem I see is that at first you’ll probably create horribly inefficient code, especially in more powerful* languages like Python.

But if you do it this way then you’ll probably be able to figure out ways to optimize the output as you go along. To summarize:

  • read provided grammar
  • compile program into intermediate (but also Turing complete) syntax
  • compile intermediate program into final language (based on provided grammar)
  • …?
  • Profit!(?)

*by powerful I mean that this takes 4 lines:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

Show me another language that can do something like that in 4 lines, and I’ll show you a language that’s as powerful as Python.


回答 3

有几个答案告诉您不要打扰。好吧,那有什么帮助?你想学习吗?你可以学习。这是编译。碰巧您的目标语言不是机器代码,而是另一种高级语言。这一直都在做。

有一种相对简单的入门方法。首先,进入http://sourceforge.net/projects/lime-php/(如果您要使用PHP)或类似的代码,并查看示例代码。接下来,您可以使用一系列正则表达式编写词法分析器,并将令牌提供给生成的解析器。您的语义动作既可以直接使用另一种语言输出代码,也可以构建一些数据结构(例如对象,人),您可以对其进行按摩和遍历以生成输出代码。

您对PHP和Python很幸运,因为在很多方面,它们是彼此相同的语言,但是语法不同。困难的部分是克服语法形式和数据结构之间的语义差异。例如,Python具有列表和字典,而PHP仅具有assoc数组。

“学习者”方法是为语言的受限子集(例如仅打印语句,简单的数学和变量赋值)构建可以正常运行的内容,然后逐步消除限制。这基本上就是该领域的“大人物”所做的。

哦,由于您在Python中没有静态类型,因此最好编写并依赖PHP函数,例如“ python_add”,该函数根据Python的执行方式添加数字,字符串或对象。

显然,如果您允许它会变得更大。

There are a couple answers telling you not to bother. Well, how helpful is that? You want to learn? You can learn. This is compilation. It just so happens that your target language isn’t machine code, but another high-level language. This is done all the time.

There’s a relatively easy way to get started. First, go get http://sourceforge.net/projects/lime-php/ (if you want to work in PHP) or some such and go through the example code. Next, you can write a lexical analyzer using a sequence of regular expressions and feed tokens to the parser you generate. Your semantic actions can either output code directly in another language or build up some data structure (think objects, man) that you can massage and traverse to generate output code.

You’re lucky with PHP and Python because in many respects they are the same language as each other, but with different syntax. The hard part is getting over the semantic differences between the grammar forms and data structures. For example, Python has lists and dictionaries, while PHP only has assoc arrays.

The “learner” approach is to build something that works OK for a restricted subset of the language (such as only print statements, simple math, and variable assignment), and then progressively remove limitations. That’s basically what the “big” guys in the field all did.

Oh, and since you don’t have static types in Python, it might be best to write and rely on PHP functions like “python_add” which adds numbers, strings, or objects according to the way Python does it.

Obviously, this can get much bigger if you let it.


回答 4

对于使用ast.parse而不是解析器(我以前不知道)的观点,我将第二个@EliBendersky的观点。我也热烈建议您查看他的博客。我使用ast.parse做Python-> JavaScript转换器(@ https://bitbucket.org/amirouche/pythonium)。我通过一些审查其他实现并自己尝试来提出Pythonium设计。我从也是我开始的https://github.com/PythonJS/PythonJS分叉了Pythonium ,它实际上是一个完整的重写。整体设计灵感来自PyPy和http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf文件。

我尝试过的所有事情,从开始到最佳解决方案,即使看起来像是Pythonium营销,实际上也不是(不要犹豫告诉我,网络礼仪是否看起来不正确):

  • 使用原型继承在Plain Old JavaScript中实现Python语义:AFAIK无法使用JS原型对象系统实现Python多重继承。后来我确实尝试使用其他技巧来做到这一点(参见getattribute)。据我所知,JavaScript中没有实现Python多重继承,最好的是单一继承+ mixins,但我不确定它们是否可以处理钻石继承。类似于Skulpt,但没有Google Clojure。

  • 我尝试过使用Google clojure,就像Skulpt(编译器)一样,而不是实际阅读Skulpt代码#fail。无论如何因为基于JS原型的对象系统仍然是不可能的。创建绑定非常困难,您需要编写JavaScript和大量样板代码(请参阅https://github.com/skulpt/skulpt/issues/50,其中我是幽灵)。那时,还没有明确的方法将绑定集成到构建系统中。我认为Skulpt是一个库,您只需要在html中包含.py文件即可执行,开发人员无需进行任何编译阶段。

  • 尝试过pyjaco(编译器),但是创建绑定(从Python代码调用Javascript代码)非常困难,每次创建的样板代码太多。现在,我认为pyjaco更接近Pythonium。pyjaco是用Python编写的(也是ast.parse),但是很多是用JavaScript编写的,并且使用原型继承。

我从未真正成功运行过睡衣#fail,也从未尝试再次读取代码#fail。但是在我看来,睡衣正在执行API-> API转换(或框架到框架),而不是Python到JavaScript的转换。JavaScript框架使用页面中已经存在的数据或来自服务器的数据。Python代码只是“管道”。之后,我发现睡衣实际上是一个真正的python-> js转换器。

我仍然认为可以进行API-> API(或框架->框架)转换,这基本上是我在Pythonium中所做的,但级别较低。睡衣可能使用与Pythonium相同的算法…

然后,我发现brython完全用Javascript编写,例如Skulpt,不需要编译和大量的绒毛…而是用JavaScript编写。

自从在该项目的过程中编写了第一行代码以来,我就了解PyPy,甚至包括PyPy的JavaScript后端。是的,如果找到它,您可以直接从PyPy用JavaScript生成Python解释器。人们说,那是一场灾难。我没有读到为什么。但是我认为原因是它们用于实现解释器的中间语言RPython是为转换为C(也许是asm)而定制的Python子集。艾拉·巴克斯特(Ira Baxter)说,在构建某些东西时,您总是会做一些假设,并且可能会对其进行微调,使其在PyPy:Python-> C转换的情况下达到最佳效果。这些假设在其他情况下可能不相关,更糟糕的是,它们可以推断出开销,否则,说直接翻译很可能总是会更好。

用Python编写解释器听起来是一个(非常)好主意。但是出于性能原因,我对编译器更感兴趣,实际上将Python编译为JavaScript比解释它更容易。

我以将可以轻松转换为JavaScript的Python子集组合在一起的想法开始了PythonJS。起初,由于过去的经验,我什至没有去实施OO系统。我实现的翻译成JavaScript的Python子集是:

  • 在定义和调用中具有全参数语义的函数。这是我最引以为傲的部分。
  • while / if / elif / else
  • Python类型已转换为JavaScript类型(没有任何类型的python类型)
  • for只能迭代Javascript数组(对于in数组)
  • 透明访问JavaScript:如果您使用Python代码编写Array,它将被转换为JavaScript中的Array。就可用性而言,这是其竞争对手的最大成就。
  • 您可以将Python源代码中定义的函数传递给javascript函数。默认参数将被考虑在内。
  • 它添加了一个名为new的特殊功能,该功能被转换为JavaScript new,例如:new(Python)(1,2,spam,“ egg”)被转换为“ new Python(1,2,spam,” egg“)。
  • 翻译人员会自动处理“ var”。(来自Brett(PythonJS贡献者)的发现非常好。
  • 全局关键字
  • 关闭
  • Lambdas
  • 清单理解
  • 通过requirejs支持导入
  • 单类继承+通过classyjs的mixin

与Python的完整语义相比,这看起来很多,但实际上非常狭窄。它实际上是带有Python语法的JavaScript。

生成的JS是完美的,即。没有开销,无法通过进一步编辑来改善性能。如果您可以改善生成的代码,也可以从Python源文件中完成。此外,编译器也不依赖您可以在http://superherojs.com/编写的.js中找到的JS技巧。,因此它非常易于阅读。

PythonJS这部分的直接后代是Pythonium Veloce模式。完整的实现可以在@ https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master中找到 //bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/veloce/veloce.py?at master 793 SLOC +大约100 SLOC与其他翻译器共享的代码。

可以在Veloce模式下翻译pystones.py的改编版本。https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pystone/?at=master

设置基本的Python-> JavaScript转换后,我选择了另一条路径将完整的Python转换为JavaScript。除了目标语言外,glib进行基于对象的基于类的代码的方式是JS,因此您可以访问数组,类似地图的对象和许多其他技巧,而所有这些部分都是用Python编写的。IIRC没有Pythonium转换器编写的javascript代码。获得单一继承并不困难,以下是使Pythonium完全兼容Python的困难部分:

  • spam.egg 在Python中总是翻译为 getattribute(spam, "egg")我没有特别描述的内容,但我认为它会浪费很多时间,并且我不确定是否可以使用asm.js或其他任何方式对其进行改进。
  • 方法解析顺序:即使使用Python编写的算法,将其翻译成Python Veloce兼容代码也是一项巨大的努力。
  • getattributre:实际的getattribute解析算法有点棘手,它仍然不支持数据描述符
  • 基于元类的类:我知道在哪里插入代码,但仍然…
  • 最后一点不是最重要的:some_callable(…)始终转换为“ call(some_callable)”。AFAIK转换程序根本不使用推理,因此,每次调用时,都需要检查调用该对象的方式,以及调用该对象的方式。

这部分在https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compatible/runtime.py?at=master中进行了分解它是用Python编写的,与Python Veloce兼容。

实际的兼容翻译器https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compatible/compatible.py?at=master不会直接生成JavaScript代码,最重要的是不会进行ast-> ast转换。我尝试过ast-> ast事情,即使ast.NodeTransformer比cst都好,但也无法使用ast。> NodeTransformer,更重要的是,我不需要做ast-> ast。

就我而言,至少对python ast做python ast可能会提高性能,因为我有时会在生成与块相关的代码之前检查块的内容,例如:

  • var / global:要能够var某些东西,我必须知道我需要什么,而不是var。无需生成跟踪在给定块中创建哪个变量并将其插入到生成的功能块顶部的块,而是在进入该块之前实际访问子节点以生成相关代码之前,我只是寻找启示性的变量分配。
  • 到目前为止,生成器在JS中具有特殊的语法,因此当我要编写“ var my_generator = function”时,我需要知道哪个Python函数是生成器

因此,对于翻译的每个阶段,我都不会真正访问每个节点。

整个过程可以描述为:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python内置函数是用Python代码(!)编写的,IIRC有一些与引导类型相关的限制,但是您可以访问所有可以在兼容模式下转换Pythonium的内容。看看https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compatible/builtins/?at=master

可以理解从pythonium兼容生成的JS代码的阅读,但是源映射将有很大帮助。

根据这种经验,我可以给您的宝贵建议是老屁:

  • 无论是在文献上还是在现有项目中,都对该主题进行了广泛的审查,这些项目是封闭的或免费的。当我回顾现有的不同项目时,我应该给它更多的时间和动力。
  • 问问题!如果我事先知道PyPy后端是无用的,那是由于C / Javascript语义不匹配导致的开销。我可能会在6个月前或3年前提出Pythonium的想法。
  • 知道你想做什么,有一个目标。对于这个项目,我有不同的目标:使用一点点javascript,学习更多Python知识,并能够编写将在浏览器中运行的Python代码(更多内容以及下面的内容)。
  • 失败就是经验
  • 一小步就是一步
  • 从小开始
  • 远大的梦想
  • 做演示
  • 重复

仅使用Python Veloce模式,我感到非常高兴!但是一直以来,我发现我真正想要的是将我和其他人从Javascript中解放出来,但更重要的是能够以舒适的方式进行创建。这使我了解了Scheme,DSL,模型以及最终特定于域的模型(请参阅http://dsmforum.org/)。

关于Ira Baxter的回应:

估计完全没有帮助。我花了大约6个月的空闲时间来使用PythonJS和Pythonium。所以我可以期望从6个月的全职工作中得到更多。我想我们都知道在企业环境中100人年意味着什么,而根本没有意思…

当某人说某事很难解决或更经常是不可能的事情时,我回答说“只花时间找到不可能解决的问题的解决方案”,否则就说没有什么是不可能的,除非在这种情况下证明是不可能的。

如果没有证明不可能的话,那么它就有想象力的余地:

  • 寻找证明是不可能的

  • 如果这是不可能的,则可能存在可以解决的“劣等”问题。

要么

  • 如果不是不可能,那就找到解决办法

不只是乐观的想法。当我启动Python-> Javascript时,每个人都说这是不可能的。PyPy不可能。元类太难了。等…我认为,唯一使PyPy超过Scheme-> C纸(已有25年历史)的革命是一些自动JIT生成(基于我认为是用RPython解释器编写的提示)。

大多数说某事“困难”或“不可能”的人没有提供原因。C ++很难解析?我知道,它们仍然是(免费的)C ++解析器。细节是邪恶的吗?我知道。仅仅说不可能是没有帮助的,它比令人沮丧的“没有帮助”还要糟糕,而且有些人会劝阻其他人。我通过听说了这个问题 /programming/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus

什么对您来说是完美?这样便可以定义下一个目标,甚至可以达到整体目标。

我更想知道我可以对代码强制执行哪种类型的模式,而不是如何进行翻译,从而使代码的翻译(即:IoC,SOA?)更容易。

我看不到至少不能以一种不太完美的方式将一种语言不能翻译成另一种语言的模式。由于可以进行语言到语言的翻译,因此您最好首先瞄准。从那以后,我认为是根据http://en.wikipedia.org/wiki/Graph_isomorphism_problem两种计算机语言之间的翻译是树或DAG同构。即使我们已经知道他们都将完成学习,所以…

我最好将API-> API转换可视化为Framework-> Framework,但您可能仍要牢记这些内容,以改进生成的代码。例如:Prolog是非常特定的语法,但是您仍然可以通过在Python中描述相同的图形来像计算一样进行Prolog …如果我要实现从Prolog到Python的转换器,我不会在Python中实现统一,而是在C库中实现带有“ Python语法”,这对于Python编写者来说非常容易理解。最后,语法只是我们赋予其含义的“绘画”(这就是我开始使用scheme的原因)。语言的细节是邪恶的,我不是在谈论语法。语言中使用的概念 getattribute钩子(您可以没有它),但是所需的VM功能(如尾递归优化)可能很难处理。您不必担心初始程序是否不使用尾部递归,即使目标语言中没有尾部递归,也可以使用greenlets / event循环来模拟它。

对于目标语言和源语言,请查找:

  • 大而具体的想法
  • 微小且共同的想法

由此将出现:

  • 容易翻译的东西
  • 难以翻译的事物

您也许还可以知道将翻译成快速和慢速代码的内容。

还有stdlib或任何库的问题,但没有明确的答案,这取决于您的目标。

成语代码或可读的生成代码也有解决方案…

因为可以提供慢速和/或关键路径的C实现,所以针对PHP之类的平台比针对浏览器要容易得多。

鉴于您的第一个项目是将Python转换为PHP,至少对于我所知道的PHP3子集,自定义veloce.py是最好的选择。如果您可以为PHP实现veloce.py,则可能可以运行兼容模式…同样,如果您可以将PHP转换为可以用php_veloce.py生成的PHP子集,则意味着您可以将PHP转换为veloce.py可以使用的Python子集,这意味着您可以将PHP转换为Javascript。只是说…

您还可以查看这些库:

另外,您可能对此博客文章(和评论)感兴趣:https : //www.rfk.id.au/blog/entry/pypy-js-poc-jit/

I will second @EliBendersky point of view regarding using ast.parse instead of parser (which I did not know about before). I also warmly recommend you to review his blog. I used ast.parse to do Python->JavaScript translator (@https://bitbucket.org/amirouche/pythonium). I’ve come up with Pythonium design by somewhat reviewing other implementations and trying them on my own. I forked Pythonium from https://github.com/PythonJS/PythonJS which I also started, It’s actually a complete rewrite . The overall design is inspired from PyPy and http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf paper.

Everything I tried, from beginning to the best solution, even if it looks like Pythonium marketing it really isn’t (don’t hesitate to tell me if something doesn’t seem correct to the netiquette):

  • Implement Python semantic in Plain Old JavaScript using prototype inheritance: AFAIK it’s impossible to implement Python multiple inheritance using JS prototype object system. I did try to do it using other tricks later (cf. getattribute). As far as I know there is no implementation of Python multiple inheritance in JavaScript, the best that exists is Single inhertance + mixins and I’m not sure they handle diamond inheritance. Kind of similar to Skulpt but without google clojure.

  • I tried with Google clojure, just like Skulpt (compiler) instead of actually reading Skulpt code #fail. Anyway because of JS prototype based object system still impossible. Creating binding was very very difficult, you need to write JavaScript and a lot of boilerplate code (cf. https://github.com/skulpt/skulpt/issues/50 where I am the ghost). At that time there was no clear way to integrate the binding in the build system. I think that Skulpt is a library and you just have to include your .py files in the html to be executed, no compilation phase required to be done by the developer.

  • Tried pyjaco (compiler) but creating bindings (calling Javascript code from Python code) was very difficult, there was too much boilerplate code to create every time. Now I think pyjaco is the one that more near Pythonium. pyjaco is written in Python (ast.parse too) but a lot is written in JavaScript and it use prototype inheritance.

I never actually succeed at running Pyjamas #fail and never tried to read the code #fail again. But in my mind PyJamas was doing API->API tranlation (or framework to framework) and not Python to JavaScript translation. The JavaScript framework consume data that is already in the page or data from the server. Python code is only “plumbing”. After that I discovered that pyjamas was actually a real python->js translator.

Still I think it’s possible to do API->API (or framework->framework) translation and that’s basicly what I do in Pythonium but at lower level. Probably Pyjamas use the same algorithm as Pythonium…

Then I discovered brython fully written in Javascript like Skulpt, no need for compilation and lot of fluff… but written in JavaScript.

Since the initial line written in the course of this project, I knew about PyPy, even the JavaScript backend for PyPy. Yep, you can, if you find it, directly generate a Python interpreter in JavaScript from PyPy. People say, it was a disaster. I read no where why. But I think the reason is that the intermediate language they use to implement the interpreter, RPython, is a subset of Python tailored to be translated to C (and maybe asm). Ira Baxter says you always make assumptions when you build something and probably you fine tune it to be the best at what it’s meant to do in the case of PyPy: Python->C translation. Those assumptions might not be relevant in another context worse they can infere overhead otherwise said direct translation will most likely always be better.

Having the interpreter written in Python sounded like a (very) good idea. But I was more interested in a compiler for performance reasons also it’s actually more easy to compile Python to JavaScript than interpret it.

I started PythonJS with the idea of putting together a subset of Python that I could easily translate to JavaScript. At first I didn’t even bother to implement OO system because of past experience. The subset of Python that I achieved to translate to JavaScript are:

  • function with full parameters semantic both in definition and calling. This is the part I am most proud of.
  • while/if/elif/else
  • Python types were converted to JavaScript types (there is no python types of any kind)
  • for could iterate over Javascript arrays only (for a in array)
  • Transparent access to JavaScript: if you write Array in the Python code it will be translated to Array in javascript. This is the biggest achievement in terms of usability over its competitors.
  • You can pass function defined in Python source to javascript functions. Default arguments will be taken into account.
  • It add has special function called new which is translated to JavaScript new e.g: new(Python)(1, 2, spam, “egg”) is translated to “new Python(1, 2, spam, “egg”).
  • “var” are automatically handled by the translator. (very nice finding from Brett (PythonJS contributor).
  • global keyword
  • closures
  • lambdas
  • list comprehensions
  • imports are supported via requirejs
  • single class inheritance + mixin via classyjs

This seems like a lot but actually very narrow compared to full blown semantic of Python. It’s really JavaScript with a Python syntax.

The generated JS is perfect ie. there is no overhead, it can not be improved in terms of performance by further editing it. If you can improve the generated code, you can do it from the Python source file too. Also, the compiler did not rely on any JS tricks that you can find in .js written by http://superherojs.com/, so it’s very readable.

The direct descendant of this part of PythonJS is the Pythonium Veloce mode. The full implementation can be found @ https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + around 100 SLOC of shared code with the other translator.

An adapted version of pystones.py can be translated in Veloce mode cf. https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pystone/?at=master

After having setup basic Python->JavaScript translation I choosed another path to translate full Python to JavaScript. The way of glib doing object oriented class based code except the target language is JS so you have access to arrays, map-like objects and many other tricks and all that part was written in Python. IIRC there is no javascript code written by in Pythonium translator. Getting single inheritance is not difficult here are the difficult parts making Pythonium fully compliant with Python:

  • spam.egg in Python is always translated to getattribute(spam, "egg") I did not profile this in particular but I think that where it loose a lot of time and I’m not sure I can improve upon it with asm.js or anything else.
  • method resolution order: even with the algorithm written in Python, translating it to Python Veloce compatible code was a big endeavour.
  • getattributre: the actual getattribute resolution algorithm is kind of tricky and it still doesn’t support data descriptors
  • metaclass class based: I know where to plug the code, but still…
  • last bu not least: some_callable(…) is always transalted to “call(some_callable)”. AFAIK the translator doesn’t use inference at all, so every time you do a call you need to check which kind of object it is to call it they way it’s meant to be called.

This part is factored in https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/runtime.py?at=master It’s written in Python compatible with Python Veloce.

The actual compliant translator https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/compliant.py?at=master doesn’t generate JavaScript code directly and most importantly doesn’t do ast->ast transformation. I tried the ast->ast thing and ast even if nicer than cst is not nice to work with even with ast.NodeTransformer and more importantly I don’t need to do ast->ast.

Doing python ast to python ast in my case at least would maybe be a performance improvement since I sometime inspect the content of a block before generating the code associated with it, for instance:

  • var/global: to be able to var something I must know what I need to and not to var. Instead of generating a block tracking which variable are created in a given block and inserting it on top of the generated function block I just look for revelant variable assignation when I enter the block before actually visiting the child node to generate the associated code.
  • yield, generators have, as of yet, a special syntax in JS, so I need to know which Python function is a generator when I want to write the “var my_generator = function”

So I don’t really visit each node once for each phase of the translation.

The overall process can be described as:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python builtins are written in Python code (!), IIRC there is a few restrictions related to bootstraping types, but you have access to everything that can translate Pythonium in compliant mode. Have a look at https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/builtins/?at=master

Reading JS code generated from pythonium compliant can be understood but source maps will greatly help.

The valuable advice I can give you in the light of this experience are kind old farts:

  • extensively review the subject both in literature and existing projects closed source or free. When I reviewed the different existing projects I should have given it way more time and motivation.
  • ask questions! If I knew beforehand that PyPy backend was useless because of the overhead due to C/Javascript semantic mismatch. I would maybe had Pythonium idea way before 6 month ago maybe 3 years ago.
  • know what you want to do, have a target. For this project I had different objectives: pratice a bit a javascript, learn more of Python and be able to write Python code that would run in the browser (more and that below).
  • failure is experience
  • a small step is a step
  • start small
  • dream big
  • do demos
  • iterate

With Python Veloce mode only, I’m very happy! But along the way I discovered that what I was really looking for was liberating me and others from Javascript but more importantly being able to create in a comfortable way. This lead me to Scheme, DSL, Models and eventually domain specific models (cf. http://dsmforum.org/).

About what Ira Baxter response:

The estimations are not helpful at all. I took me more or less 6 month of free time for both PythonJS and Pythonium. So I can expect more from full time 6 month. I think we all know what 100 man-year in an enterprise context can mean and not mean at all…

When someone says something is hard or more often impossible, I answer that “it only takes time to find a solution for a problem that is impossible” otherwise said nothing is impossible except if it’s proven impossible in this case a math proof…

If it’s not proven impossible then it leaves room for imagination:

  • finding a proof proving it’s impossible

and

  • If it is impossible there may be an “inferior” problem that can have a solution.

or

  • if it’s not impossible, finding a solution

It’s not just optimistic thinking. When I started Python->Javascript everybody was saying it was impossible. PyPy impossible. Metaclasses too hard. etc… I think that the only revolution that brings PyPy over Scheme->C paper (which is 25 years old) is some automatic JIT generation (based hints written in the RPython interpreter I think).

Most people that say that a thing is “hard” or “impossible” don’t provide the reasons. C++ is hard to parse? I know that, still they are (free) C++ parser. Evil is in the detail? I know that. Saying it’s impossible alone is not helpful, It’s even worse than “not helpful” it’s discouraging, and some people mean to discourage others. I heard about this question via https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus.

What would be perfection for you? That’s how you define next goal and maybe reach the overall goal.

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

I see no patterns that can not be translated from one language to another language at least in a less than perfect way. Since language to language translation is possible, you’d better aim for this first. Since, I think according to http://en.wikipedia.org/wiki/Graph_isomorphism_problem, translation between two computer languages is a tree or DAG isomorphism. Even if we already know that they are both turing complete, so…

Framework->Framework which I better visualize as API->API translation might still be something that you might keep in mind as a way to improve the generated code. E.g: Prolog as very specific syntax but still you can do Prolog like computation by describing the same graph in Python… If I was to implement a Prolog to Python translator I wouldn’t implement unification in Python but in a C library and come up with a “Python syntax” that is very readable for a Pythonist. In the end, syntax is only “painting” for which we give a meaning (that’s why I started scheme). Evil is in the detail of the language and I’m not talking about the syntax. The concepts that are used in the language getattribute hook (you can live without it) but required VM features like tail-recursion optimisation can be difficult to deal with. You don’t care if the initial program doesn’t use tail recursion and even if there is no tail recursion in the target language you can emulate it using greenlets/event loop.

For target and source languages, look for:

  • Big and specific ideas
  • Tiny and common shared ideas

From this will emerge:

  • Things that are easy to translate
  • Things that are difficult to translate

You will also probably be able to know what will be translated to fast and slow code.

There is also the question of the stdlib or any library but there is no clear answer, it depends of your goals.

Idiomatic code or readable generated code have also solutions…

Targeting a platform like PHP is much more easy than targeting browsers since you can provide C-implementation of slow and/or critical path.

Given you first project is translating Python to PHP, at least for the PHP3 subset I know of, customising veloce.py is your best bet. If you can implement veloce.py for PHP then probably you will be able to run the compliant mode… Also if you can translate PHP to the subset of PHP you can generate with php_veloce.py it means that you can translate PHP to the subset of Python that veloce.py can consume which would mean that you can translate PHP to Javascript. Just saying…

You can also have a look at those libraries:

Also you might be interested by this blog post (and comments): https://www.rfk.id.au/blog/entry/pypy-js-poc-jit/


回答 5

您可以看一下Vala编译器,该编译器将Vala(一种类似于C#的语言)转换为C。

You could take a look at the Vala compiler, which translates Vala (a C#-like language) into C.


解析一个.py文件,读取AST,对其进行修改,然后写回修改后的源代码

问题:解析一个.py文件,读取AST,对其进行修改,然后写回修改后的源代码

我想以编程方式编辑python源代码。基本上,我想读取一个.py文件,生成AST,然后写回修改后的python源代码(即另一个.py文件)。

有多种方法可以使用标准python模块(例如ast或)来解析/编译python源代码compiler。但是,我认为它们都不支持修改源代码(例如删除此函数声明)然后写回修改后的python源代码的方法。

更新:我要这样做的原因是我想为python 编写一个Mutation测试库,主要是通过删除语句/表达式,重新运行测试并查看中断。

I want to programmatically edit python source code. Basically I want to read a .py file, generate the AST, and then write back the modified python source code (i.e. another .py file).

There are ways to parse/compile python source code using standard python modules, such as ast or compiler. However, I don’t think any of them support ways to modify the source code (e.g. delete this function declaration) and then write back the modifying python source code.

UPDATE: The reason I want to do this is I’d like to write a Mutation testing library for python, mostly by deleting statements / expressions, rerunning tests and seeing what breaks.


回答 0

Pythoscope会对自动生成的测试用例执行此操作,就像python 2.6 的2to3工具一样(它将python 2.x源转换为python 3.x源)。

这两个工具都使用lib2to3库,该库是python解析器/编译器机制的实现,当从源-> AST->源往返时,可以在源中保留注释。

绳项目,如果你想要做的更像变换重构可满足您的需求。

AST模块是你的其他选择,并有一个如何“unparse”语法树放回代码旧的例子(使用解析器模块)。但是,ast当对代码进行AST转换,然后将其转换为代码对象时,该模块更有用。

redbaron项目也可能是一个不错的选择(HT泽维尔Combelle)

Pythoscope does this to the test cases it automatically generates as does the 2to3 tool for python 2.6 (it converts python 2.x source into python 3.x source).

Both these tools uses the lib2to3 library which is a implementation of the python parser/compiler machinery that can preserve comments in source when it’s round tripped from source -> AST -> source.

The rope project may meet your needs if you want to do more refactoring like transforms.

The ast module is your other option, and there’s an older example of how to “unparse” syntax trees back into code (using the parser module). But the ast module is more useful when doing an AST transform on code that is then transformed into a code object.

The redbaron project also may be a good fit (ht Xavier Combelle)


回答 1

内置的ast模块似乎没有方法可以转换回源代码。但是,这里的codegen模块为ast提供了一台漂亮的打印机,使您能够这样做。例如。

import ast
import codegen

expr="""
def foo():
   print("hello world")
"""
p=ast.parse(expr)

p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"

print(codegen.to_source(p))

这将打印:

def foo():
    return 42

请注意,您可能会丢失准确的格式和注释,因为这些格式和注释不会保留。

但是,您可能不需要。如果您需要执行的只是替换的AST,则只需在ast上调用compile()并执行生成的代码对象即可。

The builtin ast module doesn’t seem to have a method to convert back to source. However, the codegen module here provides a pretty printer for the ast that would enable you do do so. eg.

import ast
import codegen

expr="""
def foo():
   print("hello world")
"""
p=ast.parse(expr)

p.body[0].body = [ ast.parse("return 42").body[0] ] # Replace function body with "return 42"

print(codegen.to_source(p))

This will print:

def foo():
    return 42

Note that you may lose the exact formatting and comments, as these are not preserved.

However, you may not need to. If all you require is to execute the replaced AST, you can do so simply by calling compile() on the ast, and execing the resulting code object.


回答 2

在一个不同的答案中,我建议使用该astor程序包,但此后我发现了一个名为AST的最新的非解析程序包astunparse

>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))


def foo(x):
    return (2 * x)

我已经在Python 3.5上进行了测试。

In a different answer I suggested using the astor package, but I have since found a more up-to-date AST un-parsing package called astunparse:

>>> import ast
>>> import astunparse
>>> print(astunparse.unparse(ast.parse('def foo(x): return 2 * x')))


def foo(x):
    return (2 * x)

I have tested this on Python 3.5.


回答 3

您可能不需要重新生成源代码。当然,这对我来说有点危险,因为您尚未真正解释为什么您认为需要生成一个充满代码的.py文件。但:

  • 如果您想生成一个供人们实际使用的.py文件,也许以便他们可以填写表格并获得一个有用的.py文件以插入其项目中,那么您就不想将其更改为AST和返回,因为您将丢失所有格式设置(想像一下通过将相关的行集合在一起使Python易于阅读的空白行)ast节点具有linenocol_offset属性)注释。相反,您可能需要使用模板引擎(例如,Django模板语言旨在简化模板文本文件)来自定义.py文件,或者使用Rick Copeland的MetaPython扩展。

  • 如果要在模块编译期间进行更改,请注意,您不必一直回到文本;您可以直接编译AST,而不必将其重新转换为.py文件。

  • 但是在几乎所有情况下,您可能都在尝试做一些动态的事情,像Python这样的语言实际上很容易,而无需编写新的.py文件!如果您扩展问题以使我们知道您实际要完成的工作,那么答案中可能根本不会涉及新的.py文件;我已经看到数百个Python项目在做数百个现实世界的事情,而编写一个.py文件并不需要它们中的任何一个。因此,我必须承认,我有点怀疑您已经找到了第一个好的用例。:-)

更新:既然您已经解释了您要做什么,那么无论如何我都会很想直接在AST上进行操作。您将希望通过删除而不是删除文件的行来进行更改(这可能导致半语句仅因SyntaxError而死),而是通过整个语句来进行更改,那么与AST相比,还有什么更好的地方呢?

You might not need to re-generate source code. That’s a bit dangerous for me to say, of course, since you have not actually explained why you think you need to generate a .py file full of code; but:

  • If you want to generate a .py file that people will actually use, maybe so that they can fill out a form and get a useful .py file to insert into their project, then you don’t want to change it into an AST and back because you’ll lose all formatting (think of the blank lines that make Python so readable by grouping related sets of lines together) (ast nodes have lineno and col_offset attributes) comments. Instead, you’ll probably want to use a templating engine (the Django template language, for example, is designed to make templating even text files easy) to customize the .py file, or else use Rick Copeland’s MetaPython extension.

  • If you are trying to make a change during compilation of a module, note that you don’t have to go all the way back to text; you can just compile the AST directly instead of turning it back into a .py file.

  • But in almost any and every case, you are probably trying to do something dynamic that a language like Python actually makes very easy, without writing new .py files! If you expand your question to let us know what you actually want to accomplish, new .py files will probably not be involved in the answer at all; I have seen hundreds of Python projects doing hundreds of real-world things, and not a single one of them needed to ever writer a .py file. So, I must admit, I’m a bit of a skeptic that you’ve found the first good use-case. :-)

Update: now that you’ve explained what you’re trying to do, I’d be tempted to just operate on the AST anyway. You will want to mutate by removing, not lines of a file (which could result in half-statements that simply die with a SyntaxError), but whole statements — and what better place to do that than in the AST?


回答 4

ast模块的帮助下,解析和修改代码结构当然是可能的,我将在稍后的示例中进行演示。但是,ast单独使用模块无法写回修改后的源代码。还有其他可用于此工作的模块,例如此处的一个。

注意:以下示例可被视为有关ast模块用法的入门教程,但是有关使用ast模块的更全面指南,可从Green Tree snakes教程有关ast模块的官方文档中获得

简介ast

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> exec(compile(tree, filename="<ast>", mode="exec"))
Hello Python!!

您可以通过简单地调用API来解析python代码(以字符串表示)ast.parse()。这将句柄返回到抽象语法树(AST)结构。有趣的是,您可以编译该结构并执行它,如上所示。

另一个非常有用的API是以ast.dump()字符串形式转储整个AST。它可用于检查树结构,并且在调试中非常有帮助。例如,

在Python 2.7上:

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> ast.dump(tree)
"Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"

在Python 3.5上:

>>> import ast
>>> tree = ast.parse("print ('Hello Python!!')")
>>> ast.dump(tree)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=[]))])"

请注意,Python 2.7与Python 3.5中的print语句在语法上的差异以及相应树中AST节点类型的差异。


如何使用ast以下方式修改代码:

现在,让我们看一下按ast模块修改python代码的示例。修改AST结构的主要工具是ast.NodeTransformer类。每当需要修改AST时,他/她都需要从AST中继承子类并相应地编写Node Transformation。

对于我们的示例,让我们尝试编写一个简单的实用程序,将Python 2的print语句转换为Python 3函数调用。

打印语句到Fun呼叫转换器实用程序:print2to3.py:

#!/usr/bin/env python
'''
This utility converts the python (2.7) statements to Python 3 alike function calls before running the code.

USAGE:
     python print2to3.py <filename>
'''
import ast
import sys

class P2to3(ast.NodeTransformer):
    def visit_Print(self, node):
        new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()),
            args=node.values,
            keywords=[], starargs=None, kwargs=None))
        ast.copy_location(new_node, node)
        return new_node

def main(filename=None):
    if not filename:
        return

    with open(filename, 'r') as fp:
        data = fp.readlines()
    data = ''.join(data)
    tree = ast.parse(data)

    print "Converting python 2 print statements to Python 3 function calls"
    print "-" * 35
    P2to3().visit(tree)
    ast.fix_missing_locations(tree)
    # print ast.dump(tree)

    exec(compile(tree, filename="p23", mode="exec"))

if __name__ == '__main__':
    if len(sys.argv) <=1:
        print ("\nUSAGE:\n\t print2to3.py <filename>")
        sys.exit(1)
    else:
        main(sys.argv[1])

可以在较小的示例文件(例如下面的示例文件)上尝试使用该实用程序,并且应该可以正常工作。

测试输入文件:py2.py

class A(object):
    def __init__(self):
        pass

def good():
    print "I am good"

main = good

if __name__ == '__main__':
    print "I am in main"
    main()

请注意,以上转换仅用于ast教程目的,在实际情况下,您必须查看所有不同的情况,例如print " x is %s" % ("Hello Python")

Parsing and modifying the code structure is certainly possible with the help of ast module and I will show it in an example in a moment. However, writing back the modified source code is not possible with ast module alone. There are other modules available for this job such as one here.

NOTE: Example below can be treated as an introductory tutorial on the usage of ast module but a more comprehensive guide on using ast module is available here at Green Tree snakes tutorial and official documentation on ast module.

Introduction to ast:

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> exec(compile(tree, filename="<ast>", mode="exec"))
Hello Python!!

You can parse the python code (represented in string) by simply calling the API ast.parse(). This returns the handle to Abstract Syntax Tree (AST) structure. Interestingly you can compile back this structure and execute it as shown above.

Another very useful API is ast.dump() which dumps the whole AST in a string form. It can be used to inspect the tree structure and is very helpful in debugging. For example,

On Python 2.7:

>>> import ast
>>> tree = ast.parse("print 'Hello Python!!'")
>>> ast.dump(tree)
"Module(body=[Print(dest=None, values=[Str(s='Hello Python!!')], nl=True)])"

On Python 3.5:

>>> import ast
>>> tree = ast.parse("print ('Hello Python!!')")
>>> ast.dump(tree)
"Module(body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Str(s='Hello Python!!')], keywords=[]))])"

Notice the difference in syntax for print statement in Python 2.7 vs. Python 3.5 and the difference in type of AST node in respective trees.


How to modify code using ast:

Now, let’s a have a look at an example of modification of python code by ast module. The main tool for modifying AST structure is ast.NodeTransformer class. Whenever one needs to modify the AST, he/she needs to subclass from it and write Node Transformation(s) accordingly.

For our example, let’s try to write a simple utility which transforms the Python 2 , print statements to Python 3 function calls.

Print statement to Fun call converter utility: print2to3.py:

#!/usr/bin/env python
'''
This utility converts the python (2.7) statements to Python 3 alike function calls before running the code.

USAGE:
     python print2to3.py <filename>
'''
import ast
import sys

class P2to3(ast.NodeTransformer):
    def visit_Print(self, node):
        new_node = ast.Expr(value=ast.Call(func=ast.Name(id='print', ctx=ast.Load()),
            args=node.values,
            keywords=[], starargs=None, kwargs=None))
        ast.copy_location(new_node, node)
        return new_node

def main(filename=None):
    if not filename:
        return

    with open(filename, 'r') as fp:
        data = fp.readlines()
    data = ''.join(data)
    tree = ast.parse(data)

    print "Converting python 2 print statements to Python 3 function calls"
    print "-" * 35
    P2to3().visit(tree)
    ast.fix_missing_locations(tree)
    # print ast.dump(tree)

    exec(compile(tree, filename="p23", mode="exec"))

if __name__ == '__main__':
    if len(sys.argv) <=1:
        print ("\nUSAGE:\n\t print2to3.py <filename>")
        sys.exit(1)
    else:
        main(sys.argv[1])

This utility can be tried on small example file, such as one below, and it should work fine.

Test Input file : py2.py

class A(object):
    def __init__(self):
        pass

def good():
    print "I am good"

main = good

if __name__ == '__main__':
    print "I am in main"
    main()

Please note that above transformation is only for ast tutorial purpose and in real case scenario one will have to look at all different scenarios such as print " x is %s" % ("Hello Python").


回答 5

我最近创建了相当稳定的(核心真的经过了很好的测试)和可扩展的代码,这些代码从ast树中生成了代码:https : //github.com/paluh/code-formatter

我将我的项目用作小vim插件的基础(我每天都在使用),所以我的目标是生成非常好的可读性python代码。

PS我已经尝试扩展,codegen但是它的体系结构是基于ast.NodeVisitor接口的,所以格式化程序(visitor_方法)只是功能。我发现这种结构相当局限且难以优化(在长且嵌套的表达式的情况下,保留对象树并缓存部分结果更容易-如果您要搜索最佳布局,则可以用其他方式达到指数复杂性)。但是, codegen由于光彦的每件作品(我读过的作品)都写得很简洁。

I’ve created recently quite stable (core is really well tested) and extensible piece of code which generates code from ast tree: https://github.com/paluh/code-formatter .

I’m using my project as a base for a small vim plugin (which I’m using every day), so my goal is to generate really nice and readable python code.

P.S. I’ve tried to extend codegen but it’s architecture is based on ast.NodeVisitor interface, so formatters (visitor_ methods) are just functions. I’ve found this structure quite limiting and hard to optimize (in case of long and nested expressions it’s easier to keep objects tree and cache some partial results – in other way you can hit exponential complexity if you want to search for best layout). BUT codegen as every piece of mitsuhiko’s work (which I’ve read) is very well written and concise.


回答 6

建议的其他答案之一codegen,似乎已被取代astorastorPyPI的版本(撰写本文时为0.5版)似乎也有些过时,因此您可以astor按以下方式安装开发版本。

pip install git+https://github.com/berkerpeksag/astor.git#egg=astor

然后,您可以用于astor.to_source将Python AST转换为人类可读的Python源代码:

>>> import ast
>>> import astor
>>> print(astor.to_source(ast.parse('def foo(x): return 2 * x')))
def foo(x):
    return 2 * x

我已经在Python 3.5上进行了测试。

One of the other answers recommends codegen, which seems to have been superceded by astor. The version of astor on PyPI (version 0.5 as of this writing) seems to be a little outdated as well, so you can install the development version of astor as follows.

pip install git+https://github.com/berkerpeksag/astor.git#egg=astor

Then you can use astor.to_source to convert a Python AST to human-readable Python source code:

>>> import ast
>>> import astor
>>> print(astor.to_source(ast.parse('def foo(x): return 2 * x')))
def foo(x):
    return 2 * x

I have tested this on Python 3.5.


回答 7

如果您在2019年查看此内容,则可以使用此libcs​​t 软件包。它的语法类似于ast。这就像一个魅力,并保留了代码结构。它对于必须保留注释,空格,换行符等的项目基本上是有帮助的。

如果您不需要关心保留的注释,空格和其他内容,则ast和astor的组合效果很好。

If you are looking at this in 2019, then you can use this libcst package. It has syntax similar to ast. This works like a charm, and preserve the code structure. It’s basically helpful for the project where you have to preserve comments, whitespace, newline etc.

If you don’t need to care about the preserving comments, whitespace and others, then the combination of ast and astor works well.


回答 8

我们有类似的需求,但这里没有其他答案可以解决。因此,我们为此创建了一个库ASTTokens,该库使用由astastroid生成的AST树模块,并用原始源代码中的文本范围对其进行标记。

它不会直接修改代码,但这并不难于添加,因为它确实告诉您需要修改的文本范围。

例如,这将一个函数调用包装在中WRAP(...),保留注释和其他所有内容:

example = """
def foo(): # Test
  '''My func'''
  log("hello world")  # Print
"""

import ast, asttokens
atok = asttokens.ASTTokens(example, parse=True)

call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call))
start, end = atok.get_text_range(call)
print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end])  + atok.text[end:])

生成:

def foo(): # Test
  '''My func'''
  WRAP(log("hello world"))  # Print

希望这可以帮助!

We had a similar need, which wasn’t solved by other answers here. So we created a library for this, ASTTokens, which takes an AST tree produced with the ast or astroid modules, and marks it with the ranges of text in the original source code.

It doesn’t do modifications of code directly, but that’s not hard to add on top, since it does tell you the range of text you need to modify.

For example, this wraps a function call in WRAP(...), preserving comments and everything else:

example = """
def foo(): # Test
  '''My func'''
  log("hello world")  # Print
"""

import ast, asttokens
atok = asttokens.ASTTokens(example, parse=True)

call = next(n for n in ast.walk(atok.tree) if isinstance(n, ast.Call))
start, end = atok.get_text_range(call)
print(atok.text[:start] + ('WRAP(%s)' % atok.text[start:end])  + atok.text[end:])

Produces:

def foo(): # Test
  '''My func'''
  WRAP(log("hello world"))  # Print

Hope this helps!


回答 9

一个程序变换系统是一个工具,解析源文本,建立AST的,允许您使用源到源转换(“如果你看到这个模式,通过该模式取代它”)对其进行修改。此类工具非常适合对现有源代码进行变异,这些变异只是“如果您看到此模式,请替换为模式变体”。

当然,您需要一个程序转换引擎,该引擎可以解析您感兴趣的语言,并且仍然进行模式导向的转换。我们的DMS软件再造工具包是一个可以执行此操作的系统,可以处理Python和多种其他语言。

请参阅此SO答案,以获取DMS解析的AST的示例,该AST用于Python准确捕获注释。DMS可以更改AST,并重新生成有效的文本,包括注释。您可以要求它使用自己的格式设置约定对AST进行漂亮的打印(可以更改这些格式),或者执行“保真打印”,它使用原始的行和列信息来最大程度地保留原始布局(对布局进行一些更改,其中使用了新代码)是不可避免的)。

要使用DMS为Python实现“变异”规则,您可以编写以下代码:

rule mutate_addition(s:sum, p:product):sum->sum =
  " \s + \p " -> " \s - \p"
 if mutate_this_place(s);

该规则以语法正确的方式用“-”替换“ +”;它在AST上运行,因此不会碰到看起来正确的字符串或注释。“ mutate_this_place”上的额外条件是让您控制这种情况发生的频率;您不想改变程序中的每个位置。

显然,您会想要更多这样的规则来检测各种代码结构,并将其替换为变异的版本。DMS很乐意应用一组规则。然后对突变的AST进行漂亮打印。

A Program Transformation System is a tool that parses source text, builds ASTs, allows you to modify them using source-to-source transformations (“if you see this pattern, replace it by that pattern”). Such tools are ideal for doing mutation of existing source codes, which are just “if you see this pattern, replace by a pattern variant”.

Of course, you need a program transformation engine that can parse the language of interest to you, and still do the pattern-directed transformations. Our DMS Software Reengineering Toolkit is a system that can do that, and handles Python, and a variety of other languages.

See this SO answer for an example of a DMS-parsed AST for Python capturing comments accurately. DMS can make changes to the AST, and regenerate valid text, including the comments. You can ask it to prettyprint the AST, using its own formatting conventions (you can changes these), or do “fidelity printing”, which uses the original line and column information to maximally preserve the original layout (some change in layout where new code is inserted is unavoidable).

To implement a “mutation” rule for Python with DMS, you could write the following:

rule mutate_addition(s:sum, p:product):sum->sum =
  " \s + \p " -> " \s - \p"
 if mutate_this_place(s);

This rule replace “+” with “-” in a syntactically correct way; it operates on the AST and thus won’t touch strings or comments that happen to look right. The extra condition on “mutate_this_place” is to let you control how often this occurs; you don’t want to mutate every place in the program.

You’d obviously want a bunch more rules like this that detect various code structures, and replace them by the mutated versions. DMS is happy to apply a set of rules. The mutated AST is then prettyprinted.


回答 10

我曾经为此使用男爵,但现在切换到parso,因为它与现代python保持同步。效果很好。

对于突变测试仪,我也需要它。用parso制作一个非常简单,请在https://github.com/boxed/mutmut上查看我的代码

I used to use baron for this, but have now switched to parso because it’s up to date with modern python. It works great.

I also needed this for a mutation tester. It’s really quite simple to make one with parso, check out my code at https://github.com/boxed/mutmut


使用if-return-return或if-else-return更有效吗?

问题:使用if-return-return或if-else-return更有效吗?

假设我有一个if带有的语句return。从效率的角度来看,我应该使用

if(A > B):
    return A+1
return A-1

要么

if(A > B):
    return A+1
else:
    return A-1

使用编译语言(C)或脚本化语言(Python)时,我应该选择一种还是另一种?

Suppose I have an if statement with a return. From the efficiency perspective, should I use

if(A > B):
    return A+1
return A-1

or

if(A > B):
    return A+1
else:
    return A-1

Should I prefer one or another when using a compiled language (C) or a scripted one (Python)?


回答 0

由于该return语句终止了当前函数的执行,因此两种形式是等效的(尽管第二种形式比第一种更具可读性)。

两种形式的效率都相当,如果if条件为假,则基础机器代码必须执行跳转。

请注意,Python支持一种语法,该语法仅允许您使用一种return情况:

return A+1 if A > B else A-1

Since the return statement terminates the execution of the current function, the two forms are equivalent (although the second one is arguably more readable than the first).

The efficiency of both forms is comparable, the underlying machine code has to perform a jump if the if condition is false anyway.

Note that Python supports a syntax that allows you to use only one return statement in your case:

return A+1 if A > B else A-1

回答 1

根据Chromium的风格指南:

返回后请勿使用其他:

# Bad
if (foo)
  return 1
else
  return 2

# Good
if (foo)
  return 1
return 2

return 1 if foo else 2

From Chromium’s style guide:

Don’t use else after return:

# Bad
if (foo)
  return 1
else
  return 2

# Good
if (foo)
  return 1
return 2

return 1 if foo else 2

回答 2

关于编码风格:

无论哪种语言,大多数编码标准都禁止从单个函数中使用多个返回语句,这是一种不好的做法。

(尽管我个人会说在某些情况下多个返回语句确实有意义:文本/数据协议解析器,具有大量错误处理的功能等)

所有这些行业编码标准的共识是,该表达式应写为:

int result;

if(A > B)
{
  result = A+1;
}
else
{
  result = A-1;
}
return result;

关于效率:

上面的示例和问题中的两个示例在效率方面都完全等效。在所有这些情况下,机器码都必须比较A> B,然后跳转到A + 1或A-1计算,然后将结果存储在CPU寄存器或堆栈中。

编辑:

资料来源:

  • MISRA-C:2004规则14.7,依次引用…:
  • IEC 61508-3。第3部分,表B.9。
  • IEC 61508-7。C.2.9。

Regarding coding style:

Most coding standards no matter language ban multiple return statements from a single function as bad practice.

(Although personally I would say there are several cases where multiple return statements do make sense: text/data protocol parsers, functions with extensive error handling etc)

The consensus from all those industry coding standards is that the expression should be written as:

int result;

if(A > B)
{
  result = A+1;
}
else
{
  result = A-1;
}
return result;

Regarding efficiency:

The above example and the two examples in the question are all completely equivalent in terms of efficiency. The machine code in all these cases have to compare A > B, then branch to either the A+1 or the A-1 calculation, then store the result of that in a CPU register or on the stack.

EDIT :

Sources:

  • MISRA-C:2004 rule 14.7, which in turn cites…:
  • IEC 61508-3. Part 3, table B.9.
  • IEC 61508-7. C.2.9.

回答 3

对于任何明智的编译器,您都应该观察到没有区别。它们应该被编译为相同的机器代码,因为它们是等效的。

With any sensible compiler, you should observe no difference; they should be compiled to identical machine code as they’re equivalent.


回答 4

因为口译员不在乎,所以这是一个风格(或偏好)问题。就我个人而言,我尽量不要对以函数基础以外的缩进级别返回值的函数做最终声明。示例1中的else会(即使只是稍微)掩盖了函数的结束位置。

根据偏好,我使用:

return A+1 if (A > B) else A-1

因为它遵循了将单个return语句作为函数中的最后一条语句的良好约定(如已提到的那样)以及避免命令式中间结果的良好的函数编程范例。

对于更复杂的功能,我更喜欢将功能分解为多个子功能,以避免可能的过早返回。否则,我将恢复使用称为rval的命令式样式变量。我尽量不要使用多个return语句,除非该函数是微不足道的,或者在结束之前的return语句是由于错误导致的。过早返回会突出显示您无法继续前进的事实。对于旨在分解为多个子功能的复杂功能,我尝试将它们编码为case语句(例如,由dict驱动)。

一些海报提到了运行速度。对于我来说,运行时的速度是次要的,因为如果您需要执行速度,那么Python并不是最好的语言。我将Python用作对我很重要的编码效率(即编写无错误代码)。

This is a question of style (or preference) since the interpreter does not care. Personally I would try not to make the final statement of a function which returns a value at an indent level other than the function base. The else in example 1 obscures, if only slightly, where the end of the function is.

By preference I use:

return A+1 if (A > B) else A-1

As it obeys both the good convention of having a single return statement as the last statement in the function (as already mentioned) and the good functional programming paradigm of avoiding imperative style intermediate results.

For more complex functions I prefer to break the function into multiple sub-functions to avoid premature returns if possible. Otherwise I revert to using an imperative style variable called rval. I try not to use multiple return statements unless the function is trivial or the return statement before the end is as a result of an error. Returning prematurely highlights the fact that you cannot go on. For complex functions that are designed to branch off into multiple subfunctions I try to code them as case statements (driven by a dict for instance).

Some posters have mentioned speed of operation. Speed of Run-time is secondary for me since if you need speed of execution Python is not the best language to use. I use Python as its the efficiency of coding (i.e. writing error free code) that matters to me.


回答 5

我个人else尽可能避免阻塞。参见反假宣传运动

另外,他们不收取“额外”费用,你知道:p

“简单胜于复杂”和“可读性为王”

delta = 1 if (A > B) else -1
return A + delta

I personally avoid else blocks when possible. See the Anti-if Campaign

Also, they don’t charge ‘extra’ for the line, you know :p

“Simple is better than complex” & “Readability is king”

delta = 1 if (A > B) else -1
return A + delta

回答 6

版本A更简单,这就是我要使用它的原因。

而且,如果您打开Java中的所有编译器警告,您将在第二个版本上收到警告,因为它是不必要的,并且增加了代码复杂度。

Version A is simpler and that’s why I would use it.

And if you turn on all compiler warnings in Java you will get a warning on the second Version because it is unnecesarry and turns up code complexity.


回答 7

我知道这个问题被标记为python,但是它提到了动态语言,因此我想我应该提到在ruby中if语句实际上具有一个返回类型,因此您可以执行以下操作

def foo
  rv = if (A > B)
         A+1
       else
         A-1
       end
  return rv 
end

或者因为它也有隐式的回报

def foo 
  if (A>B)
    A+1
  else 
    A-1
  end
end

解决了没有很好的多次收益的样式问题。

I know the question is tagged python, but it mentions dynamic languages so thought I should mention that in ruby the if statement actually has a return type so you can do something like

def foo
  rv = if (A > B)
         A+1
       else
         A-1
       end
  return rv 
end

Or because it also has implicit return simply

def foo 
  if (A>B)
    A+1
  else 
    A-1
  end
end

which gets around the style issue of not having multiple returns quite nicely.


编译语言与口译语言

问题:编译语言与口译语言

我正在尝试更好地理解它们之间的区别。我在网上找到了很多解释,但是它们倾向于抽象的差异,而不是实际的含义。

我的大部分编程经验都来自CPython(动态的,解释的)和Java(静态的,编译的)。但是,我知道还有其他种类的解释和编译语言。除了可以从以编译语言编写的程序中分发可执行文件这一事实之外,每种类型是否都有优点/缺点?通常,我听到人们争辩说解释语言可以交互使用,但是我相信编译语言也可以具有交互实现,对吗?

I’m trying to get a better understanding of the difference. I’ve found a lot of explanations online, but they tend towards the abstract differences rather than the practical implications.

Most of my programming experiences has been with CPython (dynamic, interpreted), and Java (static, compiled). However, I understand that there are other kinds of interpreted and compiled languages. Aside from the fact that executable files can be distributed from programs written in compiled languages, are there any advantages/disadvantages to each type? Oftentimes, I hear people arguing that interpreted languages can be used interactively, but I believe that compiled languages can have interactive implementations as well, correct?


回答 0

编译语言是一种程序,一旦编译,该程序就会在目标计算机的指令中表示出来。例如,源代码中的加号“ +”操作可以直接转换为机器代码中的“ ADD”指令。

一种解释语言是其中所述指令不被目标机器直接执行,而是读取和执行通过一些其它方案(其通常写入本机机器的语言)。例如,解释器将在运行时识别相同的“ +”操作,然后使用适当的参数调用其自己的“ add(a,b)”函数,然后执行机器代码“ ADD”指令。

您可以使用解释语言或编译语言来做任何事情,反之亦然-它们都是图灵完整的。但是,两者在实现和使用上都有优点和缺点。

我将完全概括(纯粹主义者请原谅!),但是,粗略地讲,这是编译语言的优点:

  • 通过直接使用目标计算机的本机代码来提高性能
  • 在编译阶段进行功能强大的优化的机会

以下是解释语言的优点:

  • 易于实现(编写好的编译器非常困难!)
  • 无需运行编译阶段:可以“即时”直接执行代码
  • 可以更方便地使用动态语言

请注意,诸如字节码编译之类的现代技术增加了一些额外的复杂性-此处发生的是,编译器针对的是“虚拟机”,该“虚拟机”与底层硬件不同。然后可以在以后的阶段再次编译这些虚拟机指令,以获取本机代码(例如,由Java JVM JIT编译器完成)。

A compiled language is one where the program, once compiled, is expressed in the instructions of the target machine. For example, an addition “+” operation in your source code could be translated directly to the “ADD” instruction in machine code.

An interpreted language is one where the instructions are not directly executed by the target machine, but instead read and executed by some other program (which normally is written in the language of the native machine). For example, the same “+” operation would be recognised by the interpreter at run time, which would then call its own “add(a,b)” function with the appropriate arguments, which would then execute the machine code “ADD” instruction.

You can do anything that you can do in an interpreted language in a compiled language and vice-versa – they are both Turing complete. Both however have advantages and disadvantages for implementation and use.

I’m going to completely generalise (purists forgive me!) but, roughly, here are the advantages of compiled languages:

  • Faster performance by directly using the native code of the target machine
  • Opportunity to apply quite powerful optimisations during the compile stage

And here are the advantages of interpreted languages:

  • Easier to implement (writing good compilers is very hard!!)
  • No need to run a compilation stage: can execute code directly “on the fly”
  • Can be more convenient for dynamic languages

Note that modern techniques such as bytecode compilation add some extra complexity – what happens here is that the compiler targets a “virtual machine” which is not the same as the underlying hardware. These virtual machine instructions can then be compiled again at a later stage to get native code (e.g. as done by the Java JVM JIT compiler).


回答 1

语言本身既不会编译也不会解释,而仅是语言的特定实现。Java是一个完美的例子。有一个基于字节码的平台(JVM),一个本机编译器(gcj)和一个Java超集的插入器(bsh)。那么,Java现在是什么?字节码编译,本机编译还是解释?

Scala,Haskell或Ocaml是经过编译和解释的其他语言。这些语言中的每一种都有一个交互式解释器,以及用于字节码或本机代码的编译器。

因此,通常通过“编译”和“解释”对语言进行分类没有多大意义。

A language itself is neither compiled nor interpreted, only a specific implementation of a language is. Java is a perfect example. There is a bytecode-based platform (the JVM), a native compiler (gcj) and an interpeter for a superset of Java (bsh). So what is Java now? Bytecode-compiled, native-compiled or interpreted?

Other languages, which are compiled as well as interpreted, are Scala, Haskell or Ocaml. Each of these languages has an interactive interpreter, as well as a compiler to byte-code or native machine code.

So generally categorizing languages by “compiled” and “interpreted” doesn’t make much sense.


回答 2

开始思考:过去的爆炸

很久很久以前,曾经有计算解释器和编译器。各种大惊小怪的结果使一个人的优胜劣汰。普遍的观点在当时是沿着线的东西:

  • 口译员:快速开发(编辑和运行)。执行速度很慢,因为每个语句每次执行时都必须将其解释为机器代码(想想这对于执行数千次循环意味着什么)。
  • 编译器:开发缓慢(编辑,编译,链接和运行。编译/链接步骤可能需要花费大量时间)。快速执行。整个程序已经在本机代码中。

在解释的程序和编译的程序之间存在一个或两个数量级的运行时性能差异。其他区别点,例如代码的运行时可变性,也引起了一些关注,但主要区别在于运行时性能问题。

如今,景观已发展到某种程度,以至于汇编/解释的区别几乎是无关紧要的。许多编译语言要求运行时服务不是完全基于机器代码的。而且,大多数解释语言在执行之前都会被“编译”为字节码。字节码解释器可能非常高效,并且从执行速度的角度来看,可以与某些编译器生成的代码相媲美。

经典的区别是,编译器使用某种运行时系统生成本机代码,解释器读取源代码并动态生成代码。如今,几乎没有经典的解释器了-几乎所有经典的解释器都编译成字节码(或其他半编译状态),然后在虚拟“机器”上运行。

Start thinking in terms of a: blast from the past

Once upon a time, long long ago, there lived in the land of computing interpreters and compilers. All kinds of fuss ensued over the merits of one over the other. The general opinion at that time was something along the lines of:

  • Interpreter: Fast to develop (edit and run). Slow to execute because each statement had to be interpreted into machine code every time it was executed (think of what this meant for a loop executed thousands of times).
  • Compiler: Slow to develop (edit, compile, link and run. The compile/link steps could take serious time). Fast to execute. The whole program was already in native machine code.

A one or two order of magnitude difference in the runtime performance existed between an interpreted program and a compiled program. Other distinguishing points, run-time mutability of the code for example, were also of some interest but the major distinction revolved around the run-time performance issues.

Today the landscape has evolved to such an extent that the compiled/interpreted distinction is pretty much irrelevant. Many compiled languages call upon run-time services that are not completely machine code based. Also, most interpreted languages are “compiled” into byte-code before execution. Byte-code interpreters can be very efficient and rival some compiler generated code from an execution speed point of view.

The classic difference is that compilers generated native machine code, interpreters read source code and generated machine code on the fly using some sort of run-time system. Today there are very few classic interpreters left – almost all of them compile into byte-code (or some other semi-compiled state) which then runs on a virtual “machine”.


回答 3

极端简单的情况:

  • 编译器将生成目标计算机的本机可执行文件格式的二进制可执行文件。该二进制文件包含除系统库以外的所有必需资源。它无需任何准备和处理即可运行,并且像闪电一样运行,因为该代码是目标计算机上CPU的本机代码。

  • 解释器将在循环中向用户显示提示,用户可以在其中输入语句或代码,并且在命中RUN或等效命令时,解释器将检查,扫描,解析并以解释方式执行每一行,直到程序运行至停止点或错误为止。因为每一行都是独立处理的,并且解释器不会从以前的行中“学到”任何东西,所以每行每次都需要将人类可读的语言转换为机器指令,所以这太慢了。从好的方面来说,用户可以通过各种方式检查程序并与之交互:更改变量,更改代码,在跟踪或调试模式下运行……等等。

顺便说一句,让我解释一下生活不再那么简单了。例如,

  • 许多解释器会预编译给出的代码,因此不必一次又一次地重复翻译步骤。
  • 一些编译器不编译为特定于CPU的机器指令,而是编译为字节码,这是一种虚拟机器的人造机器代码。这使编译后的程序更具可移植性,但是在每个目标系统上都需要一个字节码解释器。
  • 字节码解释器(我现在在这里看着Java)倾向于在执行之前为目标部分的CPU重新编译它们获得的字节码(称为JIT)。为了节省时间,通常只对经常运行的代码(热点)执行此操作。
  • 一些看起来和行为像解释器的系统(例如,Clojure)会立即编译它们获得的任何代码,但允许以交互方式访问程序环境。从根本上来说,这就是二进制编译器为解释器带来的便利。
  • 一些编译器并没有真正编译,只是预消化和压缩代码。我听说前阵子就是Perl的工作方式。因此,有时编译器只是在做一些工作,而且大部分仍在解释中。

最终,如今,解释与编译是一个折衷方案,花费(一次)编译的时间通常会因更好的运行时性能而获得回报,但是解释性环境为交互提供了更多机会。编译与解释主要是关于如何“理解”程序的工作如何在不同的过程之间进行划分的问题,而如今,由于语言和产品试图同时兼顾两者的优势,这条线有些模糊。

The extreme and simple cases:

  • A compiler will produce a binary executable in the target machine’s native executable format. This binary file contains all required resources except for system libraries; it’s ready to run with no further preparation and processing and it runs like lightning because the code is the native code for the CPU on the target machine.

  • An interpreter will present the user with a prompt in a loop where he can enter statements or code, and upon hitting RUN or the equivalent the interpreter will examine, scan, parse and interpretatively execute each line until the program runs to a stopping point or an error. Because each line is treated on its own and the interpreter doesn’t “learn” anything from having seen the line before, the effort of converting human-readable language to machine instructions is incurred every time for every line, so it’s dog slow. On the bright side, the user can inspect and otherwise interact with his program in all kinds of ways: Changing variables, changing code, running in trace or debug modes… whatever.

With those out of the way, let me explain that life ain’t so simple any more. For instance,

  • Many interpreters will pre-compile the code they’re given so the translation step doesn’t have to be repeated again and again.
  • Some compilers compile not to CPU-specific machine instructions but to bytecode, a kind of artificial machine code for a ficticious machine. This makes the compiled program a bit more portable, but requires a bytecode interpreter on every target system.
  • The bytecode interpreters (I’m looking at Java here) recently tend to re-compile the bytecode they get for the CPU of the target section just before execution (called JIT). To save time, this is often only done for code that runs often (hotspots).
  • Some systems that look and act like interpreters (Clojure, for instance) compile any code they get, immediately, but allow interactive access to the program’s environment. That’s basically the convenience of interpreters with the speed of binary compilation.
  • Some compilers don’t really compile, they just pre-digest and compress code. I heard a while back that’s how Perl works. So sometimes the compiler is just doing a bit of the work and most of it is still interpretation.

In the end, these days, interpreting vs. compiling is a trade-off, with time spent (once) compiling often being rewarded by better runtime performance, but an interpretative environment giving more opportunities for interaction. Compiling vs. interpreting is mostly a matter of how the work of “understanding” the program is divided up between different processes, and the line is a bit blurry these days as languages and products try to offer the best of both worlds.


回答 4

来自http://www.quora.com/What-is-the-difference-between-compiled-and-interpreted-programming-languages

这没有什么区别,因为“编译程序设计语言”和“解释程序设计语言”不是有意义的概念。任何编程语言(实际上是任何一种编程语言)都可以解释或编译。因此,解释和编译是实现技术,而不是语言的属性。

解释是一种技术,通过该技术,另一个程序(解释器)代表正在解释的程序执行操作以使其运行。如果您可以想象阅读程序并按照程序说的做一步一步,例如在草稿纸上说,那也是解释器的工作。解释程序的常见原因是解释器相对容易编写。另一个原因是,解释器可以监视程序在运行时试图执行的操作,以执行安全性策略。

编译是一种技术,通过该技术可以将用一种语言(“源语言”)编写的程序转换为另一种语言(“目标语言”)的程序,这希望与原始程序具有相同的含义。在进行翻译时,编译器通常还会尝试以使目标程序更快的方式(不改变其含义!)对程序进行转换。编译程序的一个常见原因是,有一种很好的方法可以以目标语言快速运行程序,而无需一路解释源语言。

根据上述定义,您可能已经猜到这两种实现技术不是互斥的,甚至可能是互补的。传统上,编译器的目标语言是机器代码或类似的东西,它表示特定计算机CPU可以理解的任何数量的编程语言。然后,机器代码将“运行在金属上”(尽管如果看起来足够接近,人们可能会发现“金属”的工作原理很像解释器)。但是,如今,使用编译器生成要解释的目标代码已经非常普遍了,例如,Java曾经(有时仍然这样做)就是这种方式。有些编译器会将其他语言翻译成JavaScript,然后通常在网络浏览器中运行,这些浏览器可能会解释JavaScript,或将其编译为虚拟机或本机代码。我们还提供了机器码解释器,可用于在另一种机器上模拟一种硬件。或者,可以使用编译器生成目标代码,然后该目标代码将成为另一编译器的源代码,后者甚至可以及时在内存中编译代码以使其运行,然后依次运行。。。你明白了。有很多方法可以组合这些概念。

From http://www.quora.com/What-is-the-difference-between-compiled-and-interpreted-programming-languages

There is no difference, because “compiled programming language” and “interpreted programming language” aren’t meaningful concepts. Any programming language, and I really mean any, can be interpreted or compiled. Thus, interpretation and compilation are implementation techniques, not attributes of languages.

Interpretation is a technique whereby another program, the interpreter, performs operations on behalf of the program being interpreted in order to run it. If you can imagine reading a program and doing what it says to do step-by-step, say on a piece of scratch paper, that’s just what an interpreter does as well. A common reason to interpret a program is that interpreters are relatively easy to write. Another reason is that an interpreter can monitor what a program tries to do as it runs, to enforce a policy, say, for security.

Compilation is a technique whereby a program written in one language (the “source language”) is translated into a program in another language (the “object language”), which hopefully means the same thing as the original program. While doing the translation, it is common for the compiler to also try to transform the program in ways that will make the object program faster (without changing its meaning!). A common reason to compile a program is that there’s some good way to run programs in the object language quickly and without the overhead of interpreting the source language along the way.

You may have guessed, based on the above definitions, that these two implementation techniques are not mutually exclusive, and may even be complementary. Traditionally, the object language of a compiler was machine code or something similar, which refers to any number of programming languages understood by particular computer CPUs. The machine code would then run “on the metal” (though one might see, if one looks closely enough, that the “metal” works a lot like an interpreter). Today, however, it’s very common to use a compiler to generate object code that is meant to be interpreted—for example, this is how Java used to (and sometimes still does) work. There are compilers that translate other languages to JavaScript, which is then often run in a web browser, which might interpret the JavaScript, or compile it a virtual machine or native code. We also have interpreters for machine code, which can be used to emulate one kind of hardware on another. Or, one might use a compiler to generate object code that is then the source code for another compiler, which might even compile code in memory just in time for it to run, which in turn . . . you get the idea. There are many ways to combine these concepts.


回答 5

与已编译的源代码相比,已解释的源代码的最大优势是PORTABILITY

如果您的源代码已编译,则需要为要在其上运行程序的每种类型的处理器和/或平台编译一个不同的可执行文件(例如,一个用于Windows x86,一个用于Windows x64,一个用于Linux x64,等等。上)。此外,除非您的代码完全符合标准并且不使用任何平台特定的功能/库,否则您实际上将需要编写和维护多个代码库!

如果您的源代码被解释,则只需编写一次即可,并且可以由任何平台上的适当解释器来解释和执行它!它是便携式的!需要注意的是一个解释器本身是一个可执行程序编写和编译为特定平台。

编译后代码的一个优点是,它向最终用户隐藏了源代码(可能是知识产权),因为您部署了晦涩的二进制可执行文件,而不是部署原始的人类可读源代码。

The biggest advantage of interpreted source code over compiled source code is PORTABILITY.

If your source code is compiled, you need to compile a different executable for each type of processor and/or platform that you want your program to run on (e.g. one for Windows x86, one for Windows x64, one for Linux x64, and so on). Furthermore, unless your code is completely standards compliant and does not use any platform-specific functions/libraries, you will actually need to write and maintain multiple code bases!

If your source code is interpreted, you only need to write it once and it can be interpreted and executed by an appropriate interpreter on any platform! It’s portable! Note that an interpreter itself is an executable program that is written and compiled for a specific platform.

An advantage of compiled code is that it hides the source code from the end user (which might be intellectual property) because instead of deploying the original human-readable source code, you deploy an obscure binary executable file.


回答 6

编译器和解释器完成相同的工作:将编程语言翻译为另一种编程语言,通常更接近硬件,通常指导可执行的机器代码。

传统上,“编译”是指这种翻译一次完成,由开发人员完成,然后将生成的可执行文件分发给用户。纯粹的例子:C ++。编译通常花费很长时间,并尝试进行大量昂贵的优化,以使生成的可执行文件运行更快。最终用户没有工具和知识来自己编译东西,并且可执行文件通常必须在各种硬件上运行,因此您不能进行许多针对硬件的优化。在开发过程中,单独的编译步骤意味着更长的反馈周期。

传统上,“已解释”是指当用户要运行程序时,翻译是“即时”进行的。纯粹的例子:香草PHP。天真的解释器每次运行时都必须解析和翻译每段代码,这使其非常慢。它无法进行复杂,成本高昂的优化,因为它们所花费的时间比执行所节省的时间还要长。但是它可以充分利用其运行的硬件的功能。缺少单独的编译步骤可减少开发过程中的反馈时间。

但是如今,“编译与解释”已不是一个黑白问题,介于两者之间。天真的,简单的口译员已经绝迹了。许多语言使用两步过程,其中将高级代码转换为平台无关的字节码(解释起来更快)。然后,您将拥有“及时编译器”,每个程序运行一次最多编译一次代码,有时缓存结果,甚至可以明智地决定解释很少运行的代码,并对运行频繁的代码进行强大的优化。在开发过程中,调试器甚至可以针对传统编译语言在正在运行的程序中切换代码。

A compiler and an interpreter do the same job: translating a programming language to another pgoramming language, usually closer to the hardware, often direct executable machine code.

Traditionally, “compiled” means that this translation happens all in one go, is done by a developer, and the resulting executable is distributed to users. Pure example: C++. Compilation usually takes pretty long and tries to do lots of expensive optmization so that the resulting executable runs faster. End users don’t have the tools and knowledge to compile stuff themselves, and the executable often has to run on a variety of hardware, so you can’t do many hardware-specific optimizations. During development, the separate compilation step means a longer feedback cycle.

Traditionally, “interpreted” means that the translation happens “on the fly”, when the user wants to run the program. Pure example: vanilla PHP. A naive interpreter has to parse and translate every piece of code every time it runs, which makes it very slow. It can’t do complex, costly optimizations because they’d take longer than the time saved in execution. But it can fully use the capabilities of the hardware it runs on. The lack of a separrate compilation step reduces feedback time during development.

But nowadays “compiled vs. interpreted” is not a black-or-white issue, there are shades in between. Naive, simple interpreters are pretty much extinct. Many languages use a two-step process where the high-level code is translated to a platform-independant bytecode (which is much faster to interpret). Then you have “just in time compilers” which compile code at most once per program run, sometimes cache results, and even intelligently decide to interpret code that’s run rarely, and do powerful optimizations for code that runs a lot. During development, debuggers are capable of switching code inside a running program even for traditionally compiled languages.


回答 7

首先,澄清一下,Java不是完全以C ++的方式静态编译和链接的。它被编译成字节码,然后由JVM解释。JVM可以及时对本机语言进行编译,但不必这样做。

更重要的是:我认为互动是主要的实际差异。由于所有内容均已解释,因此您可以摘录一小段代码,然后针对环境的当前状态进行解析和运行。因此,如果您已经执行过初始化变量的代码,则可以访问该变量,等等。它确实可以将其自身用于诸如功能样式之类的事情。

但是,解释会花费很多,尤其是当您拥有一个具有大量引用和上下文的大型系统时。根据定义,这是浪费的,因为可能必须两次解释和优化相同的代码(尽管大多数运行时对此都有一些缓存和优化)。尽管如此,您仍然需要支付运行时成本,并且经常需要运行时环境。您也不太可能看到复杂的过程间优化,因为目前它们的性能还不够互动。

因此,对于那些变化不大的大型系统,对于某些语言而言,预编译和预链接所有内容更有意义,请执行您可以做的所有优化。最终将获得非常精益的运行时,该运行时已针对目标计算机进行了优化。

至于生成可执行文件,与恕我直言无关。您通常可以使用编译语言创建可执行文件。但是,您也可以使用解释语言创建可执行文件,只是解释器和运行时已打包在可执行文件中,并且对您隐藏了。这意味着您通常仍需支付运行时成本(尽管我确信对于某些语言,有一些方法可以将所有内容转换为树可执行文件)。

我不同意所有语言都可以互动。某些语言(例如C)与机器和整个链接结构紧密相关,因此我不确定您是否可以构建有意义的完整交互式版本

First, a clarification, Java is not fully static-compiled and linked in the way C++. It is compiled into bytecode, which is then interpreted by a JVM. The JVM can go and do just-in-time compilation to the native machine language, but doesn’t have to do it.

More to the point: I think interactivity is the main practical difference. Since everything is interpreted, you can take a small excerpt of code, parse and run it against the current state of the environment. Thus, if you had already executed code that initialized a variable, you would have access to that variable, etc. It really lends itself way to things like the functional style.

Interpretation, however, costs a lot, especially when you have a large system with a lot of references and context. By definition, it is wasteful because identical code may have to be interpreted and optimized twice (although most runtimes have some caching and optimizations for that). Still, you pay a runtime cost and often need a runtime environment. You are also less likely to see complex interprocedural optimizations because at present their performance is not sufficiently interactive.

Therefore, for large systems that are not going to change much, and for certain languages, it makes more sense to precompile and prelink everything, do all the optimizations that you can do. This ends up with a very lean runtime that is already optimized for the target machine.

As for generating executbles, that has little to do with it, IMHO. You can often create an executable from a compiled language. But you can also create an executable from an interpreted language, except that the interpreter and runtime is already packaged in the exectuable and hidden from you. This means that you generally still pay the runtime costs (although I am sure that for some language there are ways to translate everything to a tree executable).

I disagree that all languages could be made interactive. Certain languages, like C, are so tied to the machine and the entire link structure that I’m not sure you can build a meaningful fully-fledged interactive version


回答 8

给出实际答案相当困难,因为区别在于语言定义本身。可以为每种编译语言构建解释器,但不可能为每种解释语言构建编译器。语言的形式定义非常重要。从而使理论上的信息学方面的东西更受大学的欢迎。

It’s rather difficult to give a practical answer because the difference is about the language definition itself. It’s possible to build an interpreter for every compiled language, but it’s not possible to build an compiler for every interpreted language. It’s very much about the formal definition of a language. So that theoretical informatics stuff noboby likes at university.


回答 9

Python图书©2015 Imagine Publishing Ltd,仅通过第10页中提到的以下提示来区分差异:

一种解释性语言(例如Python)是一种将源代码转换为机器代码,然后在每次程序运行时执行的语言。这与诸如C之类的编译语言不同,后者仅将源代码转换为机器代码一次-每次程序运行时都会执行生成的机器代码。

The Python Book © 2015 Imagine Publishing Ltd, simply distunguishes the difference by the following hint mentioned in page 10 as:

An interpreted language such as Python is one where the source code is converted to machine code and then executed each time the program runs. This is different from a compiled language such as C, where the source code is only converted to machine code once – the resulting machine code is then executed each time the program runs.


回答 10

编译是从以编译的编程语言编写的代码创建可执行程序的过程。编译允许计算机运行和理解程序,而无需使用用于创建该程序的编程软件。编译程序时,通常是针对特定平台(例如IBM平台)编译的,该平台可与IBM兼容计算机一起使用,但不适用于其他平台(例如Apple平台)。第一个编译器是由Grace Hopper在哈佛Mark I计算机上开发的。今天,大多数高级语言将包括其自己的编译器或可用的工具包,可用于编译程序。与Java一起使用的编译器的一个很好的例子是Eclipse,而与C和C ++一起使用的编译器的一个例子是gcc命令。

Compile is the process of creating an executable program from code written in a compiled programming language. Compiling allows the computer to run and understand the program without the need of the programming software used to create it. When a program is compiled it is often compiled for a specific platform (e.g. IBM platform) that works with IBM compatible computers, but not other platforms (e.g. Apple platform). The first compiler was developed by Grace Hopper while working on the Harvard Mark I computer. Today, most high-level languages will include their own compiler or have toolkits available that can be used to compile the program. A good example of a compiler used with Java is Eclipse and an example of a compiler used with C and C++ is the gcc command. Depending on how big the program is it should take a few seconds or minutes to compile and if no errors are encountered while being compiled an executable file is created.check this information


回答 11

简短(不精确)的定义:

编译语言:整个程序立即转换为机器代码,然后由CPU运行机器代码。

解释的语言:逐行读取程序,一旦读取一行,CPU就会执行该行的机器指令。

但是,实际上,如今只有很少的语言是纯编译的或纯解释的,通常是混合的。有关图片的详细说明,请参见以下线程:

编译和解释之间有什么区别?

或我后来的博客文章:

https://orangejuiceliberationfront.com/the-difference-between-compiler-and-interpreter/

Short (un-precise) definition:

Compiled language: Entire program is translated to machine code at once, then the machine code is run by the CPU.

Interpreted language: Program is read line-by-line and as soon as a line is read the machine instructions for that line are executed by the CPU.

But really, few languages these days are purely compiled or purely interpreted, it often is a mix. For a more detailed description with pictures, see this thread:

What is the difference between compilation and interpretation?

Or my later blog post:

https://orangejuiceliberationfront.com/the-difference-between-compiler-and-interpreter/