问题:我可以对代码执行哪种模式以使其更容易转换为另一种编程语言?[关闭]

我正着手做一个副项目,目标是将代码从一种编程语言转换为另一种编程语言。我开始使用的语言是PHP和Python(Python到PHP应该更容易入手),但理想情况下,我可以(相对)轻松地添加其他语言。该计划是:

  • 这是针对Web开发的。原始代码和目标代码将位于框架的顶部(我也将不得不编写这些框架)。这些框架将包含MVC设计模式并遵循严格的编码约定。这应该使翻译更加容易。

  • 我还在研究IOC和依赖项注入,因为它们可能使翻译过程更容易且更不易出错。

  • 我将使用Python的解析器模块,该模块可让我摆弄抽象语法树。显然,我可以用PHP获得的最接近的是token_get_all(),这是一个开始。

  • 从那时起,我可以构建AST,符号表和控制流程。

然后,我相信我可以开始输出代码了。我不需要完美的翻译。我仍然需要查看生成的代码并解决问题。理想情况下,翻译人员应标记有问题的翻译。

在您问“这到底是什么意思?”之前 答案是……这将是一次有趣的学习经历。如果您对如何减少这种麻烦有任何见解,请告诉我。


编辑:

我更想知道我可以对代码强制执行哪种类型的模式,而不是如何进行翻译,从而使代码的翻译(即:IoC,SOA?)更容易。

I am setting out to do a side project that has the goal of translating code from one programming language to another. The languages I am starting with are PHP and Python (Python to PHP should be easier to start with), but ideally I would be able to add other languages with (relative) ease. The plan is:

  • This is geared towards web development. The original and target code will be be sitting on top of frameworks (which I will also have to write). These frameworks will embrace an MVC design pattern and follow strict coding conventions. This should make translation somewhat easier.

  • I am also looking at IOC and dependency injection, as they might make the translation process easier and less error prone.

  • I’ll make use of Python’s parser module, which lets me fiddle with the Abstract Syntax Tree. Apparently the closest I can get with PHP is token_get_all(), which is a start.

  • From then on I can build the AST, symbol tables and control flow.

Then I believe I can start outputting code. I don’t need a perfect translation. I’ll still have to review the generated code and fix problems. Ideally the translator should flag problematic translations.

Before you ask “What the hell is the point of this?” The answer is… It’ll be an interesting learning experience. If you have any insights on how to make this less daunting, please let me know.


EDIT:

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.


回答 0

自1995年以来,在强大的计算机科学家团队的支持下,我一直在构建工具(DMS Software Reengineering Toolkit)来进行通用程序处理(语言翻译为特例)。DMS提供通用解析,AST构建,符号表,控制和数据流分析,转换规则的应用,带有注释的源文本的再生等,所有这些都通过计算机语言的显式定义进行参数化。

机器,你需要做到这一点的量为广大的(特别是如果你希望能够在一个通用的方式来做到这一点对于多国语言),然后你需要用不可靠的定义语言可靠分析器(PHP是这个完美的例子)。

您考虑构建或尝试进行语言到语言的翻译并没有错,但是我认为您会发现,对于真正的语言而言,这是一项比您期望的大得多的任务。我们仅在DMS上投入了大约100个人年,在每种“可靠”的语言定义(包括我们为PHP痛苦地构建的一种语言)上又花了6到12个月的时间,对于讨厌的语言(例如C ++)则投入了更多。这将是“一次学习经历”;这一直在我们身上。(您可能会发现上述网站上的“技术论文”部分有趣,可快速开始学习)。

人们经常尝试从某种他们熟悉的技术入手来构建某种通用的机器。(Python AST是一个很好的例子)。好消息是,这项工作已经完成。坏消息是,机械中内置了无数种假设,直到您尝试将其用于其他用途之前,您几乎不会发现其中的大部分假设。到那时,您发现机器已连接起来可以执行其最初的工作,并且会真的,真的会抵制您使它做其他事情的尝试。(我怀疑尝试让Python AST建模PHP会很有趣)。

我最初开始构建DMS的原因是建立的基础很少内置这样的假设。它使我们有些头痛。到目前为止,还没有黑洞。(在过去的15年中,我工作中最难的部分是试图防止这种假设蔓延)。

很多人也犯了一个错误的假设,即如果他们可以解析(并且可能获得AST),那么他们就可以做复杂的事情了。困难的教训之一是,您需要符号表和流程分析才能进行良好的程序分析或转换。AST是必要的,但还不够。这就是Aho&Ullman的编译器书不止于第二章的原因。(OP拥有此权利,因为他计划在AST之外构建其他机器)。有关此主题的更多信息,请参见解析后的生命

关于“我不需要完美的翻译”的评论很麻烦。弱翻译的工作是转换80%的“简单”代码,而剩下20%的代码要手工完成。如果要转换的应用程序很小,并且只打算转换一次,那么20%就可以了。如果要转换许多应用程序(甚至是随时间变化很小的同一应用程序),那不是很好。如果您尝试转换100K SLOC,则20%是20,000原始代码行,这些代码很难翻译,理解和修改,而您还无法理解另外80,000行已翻译程序。这需要大量的努力。在百万行级别,这实际上是不可能的。更难,他们通常会长时间拖延,付出高昂的代价并经常彻底失败,这很痛苦。

要翻译大型系统,您需要拍摄的是90%的高转换率,或者您可能无法完成翻译活动的手动部分。

另一个关键考虑因素是要翻译的代码大小。即使使用良好的工具,也要花费大量的精力来构建能正常运行的强大翻译器。尽管构建翻译器而不是简单地进行手动转换似乎很酷,而且很酷,但是对于较小的代码库(例如,根据我们的经验,最多10万个SLOC),从经济角度讲,这样做并不合理。没有人喜欢这个答案,但是,如果您真的只需要翻译10K SLOC代码,则最好是硬着头皮做一下。是的,那很痛苦。

我认为我们的工具非常出色(但后来我颇有偏见)。建立一个好的翻译仍然非常困难。我们大约需要1.5到2个人工年,我们知道如何使用我们的工具。不同之处在于,有了如此多的设备,我们成功的次数多于失败的次数。

I’ve been building tools (DMS Software Reengineering Toolkit) to do general purpose program manipulation (with language translation being a special case) since 1995, supported by a strong team of computer scientists. DMS provides generic parsing, AST building, symbol tables, control and data flow analysis, application of translation rules, regeneration of source text with comments, etc., all parameterized by explicit definitions of computer languages.

The amount of machinery you need to do this well is vast (especially if you want to be able to do this for multiple languages in a general way), and then you need reliable parsers for languages with unreliable definitions (PHP is perfect example of this).

There’s nothing wrong with you thinking about building a language-to-language translator or attempting it, but I think you’ll find this a much bigger task for real languages than you expect. We have some 100 man-years invested in just DMS, and another 6-12 months in each “reliable” language definition (including the one we painfully built for PHP), much more for nasty languages such as C++. It will be a “hell of a learning experience”; it has been for us. (You might find the technical Papers section at the above website interesting to jump start that learning).

People often attempt to build some kind of generalized machinery by starting with some piece of technology with which they are familiar, that does a part of the job. (Python ASTs are great example). The good news, is that part of the job is done. The bad news is that machinery has a zillion assumptions built into it, most of which you won’t discover until you try to wrestle it into doing something else. At that point you find out the machinery is wired to do what it originally does, and will really, really resist your attempt to make it do something else. (I suspect trying to get the Python AST to model PHP is going to be a lot of fun).

The reason I started to build DMS originally was to build foundations that had very few such assumptions built in. It has some that give us headaches. So far, no black holes. (The hardest part of my job over the last 15 years is to try to prevent such assumptions from creeping in).

Lots of folks also make the mistake of assuming that if they can parse (and perhaps get an AST), they are well on the way to doing something complicated. One of the hard lessons is that you need symbol tables and flow analysis to do good program analysis or transformation. ASTs are necessary but not sufficient. This is the reason that Aho&Ullman’s compiler book doesn’t stop at chapter 2. (The OP has this right in that he is planning to build additional machinery beyond the AST). For more on this topic, see Life After Parsing.

The remark about “I don’t need a perfect translation” is troublesome. What weak translators do is convert the “easy” 80% of the code, leaving the hard 20% to do by hand. If the application you intend to convert are pretty small, and you only intend to convert it once well, then that 20% is OK. If you want to convert many applications (or even the same one with minor changes over time), this is not nice. If you attempt to convert 100K SLOC then 20% is 20,000 original lines of code that are hard to translate, understand and modify in the context of another 80,000 lines of translated program you already don’t understand. That takes a huge amount of effort. At the million line level, this is simply impossible in practice. (Amazingly there are people that distrust automated tools and insist on translating million line systems by hand; that’s even harder and they normally find out painfully with long time delays, high costs and often outright failure.)

What you have to shoot for to translate large-scale systems is high nineties percentage conversion rates, or it is likely that you can’t complete the manual part of the translation activity.

Another key consideration is size of code to be translated. It takes a lot of energy to build a working, robust translator, even with good tools. While it seems sexy and cool to build a translator instead of simply doing a manual conversion, for small code bases (e.g., up to about 100K SLOC in our experience) the economics simply don’t justify it. Nobody likes this answer, but if you really have to translate just 10K SLOC of code, you are probably better off just biting the bullet and doing it. And yes, that’s painful.

I consider our tools to be extremely good (but then, I’m pretty biased). And it is still very hard to build a good translator; it takes us about 1.5-2 man-years and we know how to use our tools. The difference is that with this much machinery, we succeed considerably more often than we fail.


回答 1

我的答案将解决解析Python以便将其翻译为另一种语言的特定任务,而不是Ira在其答案中很好解决的更高层次的方面。

简而言之:不要使用解析器模块,这是一种更简单的方法。

ast自Python 2.6起提供的模块更加适合您的需求,因为它为您提供了现成的AST可以使用。我已经写了一本关于文章最后一年,但在短,使用parse的方法ast将Python源代码解析为AST。该parser模块将为您提供一个解析树,而不是AST。小心区别

现在,由于Python的AST非常详细,因此对于AST来说,前端工作并不困难。我想您可以很快为功能的某些部分准备一个简单的原型。但是,获得完整的解决方案将花费更多时间,这主要是因为语言的语义不同。语言的一个简单子集(功能,基本类型等)可以轻松翻译,但是一旦进入更复杂的层次,您将需要笨拙的机制来模仿一种语言的核心。例如,考虑一下Python的生成器和列表理解,这在PHP中是不存在的(据我所知,当涉及到PHP时,这是很差的)。

为了给您最后的提示,请考虑2to3由Python开发人员创建的将Python 2代码转换为Python 3代码的工具。从前端来看,它具有将Python转换成某种东西所需的大多数元素。但是,由于Python 2和3的内核相似,因此那里不需要仿真机制。

My answer will address the specific task of parsing Python in order to translate it to another language, and not the higher-level aspects which Ira addressed well in his answer.

In short: do not use the parser module, there’s an easier way.

The ast module, available since Python 2.6 is much more suitable for your needs, since it gives you a ready-made AST to work with. I’ve written an article on this last year, but in short, use the parse method of ast to parse Python source code into an AST. The parser module will give you a parse tree, not an AST. Be wary of the difference.

Now, since Python’s ASTs are quite detailed, given an AST the front-end job isn’t terribly hard. I suppose you can have a simple prototype for some parts of the functionality ready quite quickly. However, getting to a complete solution will take more time, mainly because the semantics of the languages are different. A simple subset of the language (functions, basic types and so on) can be readily translated, but once you get into the more complex layers, you’ll need heavy machinery to emulate one language’s core in another. For example consider Python’s generators and list comprehensions which don’t exist in PHP (to my best knowledge, which is admittedly poor when PHP is involved).

To give you one final tip, consider the 2to3 tool created by the Python devs to translate Python 2 code to Python 3 code. Front-end-wise, it has most of the elements you need to translate Python to something. However, since the cores of Python 2 and 3 are similar, no emulation machinery is required there.


回答 2

编写翻译不是没有可能,尤其是考虑到乔尔的实习生是在夏天完成的。

如果您想讲一种语言,这很容易。如果您想做更多的事情,那会有些困难,但不要太多。最难的部分是,尽管任何图灵完备的语言都可以完成另一种图灵完备的语言所能做的事情,但是内置数据类型却可以显着改变一种语言所要做的事情。

例如:

word = 'This is not a word'
print word[::-2]

需要很多复制的C ++代码(好的,您可以使用一些循环结构来做得很短,但是仍然可以)。

我想那是一个问题。

您是否曾经根据语言语法编写过分词器/解析器?如果没有,您可能想学习如何做,因为这是该项目的主要部分。我要做的是提供基本的Turing完整语法-与Python 字节码相当相似 。然后创建一个采用语言语法的词法分析器/解析器(也许使用BNF),并基于该语法将语言编译为中间语言。然后,您需要做的是相反的操作-根据语法将您的语言创建为目标语言的解析器。

我看到的最明显的问题是,一开始您可能会创建极其低效的代码,尤其是在Python等功能更强大的语言中。

但是,如果以这种方式进行操作,那么您可能会一直想出优化输出的方法。总结一下:

  • 阅读提供的语法
  • 将程序编译成中间(也包括图灵完整)语法
  • 将中间程序编译成最终语言(基于提供的语法)
  • …?
  • 利润!(?)

*功能强大,我的意思是这需要4行:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

向我展示另一种可以在4行中完成类似工作的语言,并且我将向您展示一种与Python一样强大的语言。

Writing a translator isn’t impossible, especially considering that Joel’s Intern did it over a summer.

If you want to do one language, it’s easy. If you want to do more, it’s a little more difficult, but not too much. The hardest part is that, while any turing complete language can do what another turing complete language does, built-in data types can change what a language does phenomenally.

For instance:

word = 'This is not a word'
print word[::-2]

takes a lot of C++ code to duplicate (ok, well you can do it fairly short with some looping constructs, but still).

That’s a bit of an aside, I guess.

Have you ever written a tokenizer/parser based on a language grammar? You’ll probably want to learn how to do that if you haven’t, because that’s the main part of this project. What I would do is come up with a basic Turing complete syntax – something fairly similar to Python bytecode. Then you create a lexer/parser that takes a language grammar (perhaps using BNF), and based on the grammar, compiles the language into your intermediate language. Then what you’ll want to do is do the reverse – create a parser from your language into target languages based on the grammar.

The most obvious problem I see is that at first you’ll probably create horribly inefficient code, especially in more powerful* languages like Python.

But if you do it this way then you’ll probably be able to figure out ways to optimize the output as you go along. To summarize:

  • read provided grammar
  • compile program into intermediate (but also Turing complete) syntax
  • compile intermediate program into final language (based on provided grammar)
  • …?
  • Profit!(?)

*by powerful I mean that this takes 4 lines:

myinput = raw_input("Enter something: ")
print myinput.replace('a', 'A')
print sum(ord(c) for c in myinput)
print myinput[::-1]

Show me another language that can do something like that in 4 lines, and I’ll show you a language that’s as powerful as Python.


回答 3

有几个答案告诉您不要打扰。好吧,那有什么帮助?你想学习吗?你可以学习。这是编译。碰巧您的目标语言不是机器代码,而是另一种高级语言。这一直都在做。

有一种相对简单的入门方法。首先,进入http://sourceforge.net/projects/lime-php/(如果您要使用PHP)或类似的代码,并查看示例代码。接下来,您可以使用一系列正则表达式编写词法分析器,并将令牌提供给生成的解析器。您的语义动作既可以直接使用另一种语言输出代码,也可以构建一些数据结构(例如对象,人),您可以对其进行按摩和遍历以生成输出代码。

您对PHP和Python很幸运,因为在很多方面,它们是彼此相同的语言,但是语法不同。困难的部分是克服语法形式和数据结构之间的语义差异。例如,Python具有列表和字典,而PHP仅具有assoc数组。

“学习者”方法是为语言的受限子集(例如仅打印语句,简单的数学和变量赋值)构建可以正常运行的内容,然后逐步消除限制。这基本上就是该领域的“大人物”所做的。

哦,由于您在Python中没有静态类型,因此最好编写并依赖PHP函数,例如“ python_add”,该函数根据Python的执行方式添加数字,字符串或对象。

显然,如果您允许它会变得更大。

There are a couple answers telling you not to bother. Well, how helpful is that? You want to learn? You can learn. This is compilation. It just so happens that your target language isn’t machine code, but another high-level language. This is done all the time.

There’s a relatively easy way to get started. First, go get http://sourceforge.net/projects/lime-php/ (if you want to work in PHP) or some such and go through the example code. Next, you can write a lexical analyzer using a sequence of regular expressions and feed tokens to the parser you generate. Your semantic actions can either output code directly in another language or build up some data structure (think objects, man) that you can massage and traverse to generate output code.

You’re lucky with PHP and Python because in many respects they are the same language as each other, but with different syntax. The hard part is getting over the semantic differences between the grammar forms and data structures. For example, Python has lists and dictionaries, while PHP only has assoc arrays.

The “learner” approach is to build something that works OK for a restricted subset of the language (such as only print statements, simple math, and variable assignment), and then progressively remove limitations. That’s basically what the “big” guys in the field all did.

Oh, and since you don’t have static types in Python, it might be best to write and rely on PHP functions like “python_add” which adds numbers, strings, or objects according to the way Python does it.

Obviously, this can get much bigger if you let it.


回答 4

对于使用ast.parse而不是解析器(我以前不知道)的观点,我将第二个@EliBendersky的观点。我也热烈建议您查看他的博客。我使用ast.parse做Python-> JavaScript转换器(@ https://bitbucket.org/amirouche/pythonium)。我通过一些审查其他实现并自己尝试来提出Pythonium设计。我从也是我开始的https://github.com/PythonJS/PythonJS分叉了Pythonium ,它实际上是一个完整的重写。整体设计灵感来自PyPy和http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf文件。

我尝试过的所有事情,从开始到最佳解决方案,即使看起来像是Pythonium营销,实际上也不是(不要犹豫告诉我,网络礼仪是否看起来不正确):

  • 使用原型继承在Plain Old JavaScript中实现Python语义:AFAIK无法使用JS原型对象系统实现Python多重继承。后来我确实尝试使用其他技巧来做到这一点(参见getattribute)。据我所知,JavaScript中没有实现Python多重继承,最好的是单一继承+ mixins,但我不确定它们是否可以处理钻石继承。类似于Skulpt,但没有Google Clojure。

  • 我尝试过使用Google clojure,就像Skulpt(编译器)一样,而不是实际阅读Skulpt代码#fail。无论如何因为基于JS原型的对象系统仍然是不可能的。创建绑定非常困难,您需要编写JavaScript和大量样板代码(请参阅https://github.com/skulpt/skulpt/issues/50,其中我是幽灵)。那时,还没有明确的方法将绑定集成到构建系统中。我认为Skulpt是一个库,您只需要在html中包含.py文件即可执行,开发人员无需进行任何编译阶段。

  • 尝试过pyjaco(编译器),但是创建绑定(从Python代码调用Javascript代码)非常困难,每次创建的样板代码太多。现在,我认为pyjaco更接近Pythonium。pyjaco是用Python编写的(也是ast.parse),但是很多是用JavaScript编写的,并且使用原型继承。

我从未真正成功运行过睡衣#fail,也从未尝试再次读取代码#fail。但是在我看来,睡衣正在执行API-> API转换(或框架到框架),而不是Python到JavaScript的转换。JavaScript框架使用页面中已经存在的数据或来自服务器的数据。Python代码只是“管道”。之后,我发现睡衣实际上是一个真正的python-> js转换器。

我仍然认为可以进行API-> API(或框架->框架)转换,这基本上是我在Pythonium中所做的,但级别较低。睡衣可能使用与Pythonium相同的算法…

然后,我发现brython完全用Javascript编写,例如Skulpt,不需要编译和大量的绒毛…而是用JavaScript编写。

自从在该项目的过程中编写了第一行代码以来,我就了解PyPy,甚至包括PyPy的JavaScript后端。是的,如果找到它,您可以直接从PyPy用JavaScript生成Python解释器。人们说,那是一场灾难。我没有读到为什么。但是我认为原因是它们用于实现解释器的中间语言RPython是为转换为C(也许是asm)而定制的Python子集。艾拉·巴克斯特(Ira Baxter)说,在构建某些东西时,您总是会做一些假设,并且可能会对其进行微调,使其在PyPy:Python-> C转换的情况下达到最佳效果。这些假设在其他情况下可能不相关,更糟糕的是,它们可以推断出开销,否则,说直接翻译很可能总是会更好。

用Python编写解释器听起来是一个(非常)好主意。但是出于性能原因,我对编译器更感兴趣,实际上将Python编译为JavaScript比解释它更容易。

我以将可以轻松转换为JavaScript的Python子集组合在一起的想法开始了PythonJS。起初,由于过去的经验,我什至没有去实施OO系统。我实现的翻译成JavaScript的Python子集是:

  • 在定义和调用中具有全参数语义的函数。这是我最引以为傲的部分。
  • while / if / elif / else
  • Python类型已转换为JavaScript类型(没有任何类型的python类型)
  • for只能迭代Javascript数组(对于in数组)
  • 透明访问JavaScript:如果您使用Python代码编写Array,它将被转换为JavaScript中的Array。就可用性而言,这是其竞争对手的最大成就。
  • 您可以将Python源代码中定义的函数传递给javascript函数。默认参数将被考虑在内。
  • 它添加了一个名为new的特殊功能,该功能被转换为JavaScript new,例如:new(Python)(1,2,spam,“ egg”)被转换为“ new Python(1,2,spam,” egg“)。
  • 翻译人员会自动处理“ var”。(来自Brett(PythonJS贡献者)的发现非常好。
  • 全局关键字
  • 关闭
  • Lambdas
  • 清单理解
  • 通过requirejs支持导入
  • 单类继承+通过classyjs的mixin

与Python的完整语义相比,这看起来很多,但实际上非常狭窄。它实际上是带有Python语法的JavaScript。

生成的JS是完美的,即。没有开销,无法通过进一步编辑来改善性能。如果您可以改善生成的代码,也可以从Python源文件中完成。此外,编译器也不依赖您可以在http://superherojs.com/编写的.js中找到的JS技巧。,因此它非常易于阅读。

PythonJS这部分的直接后代是Pythonium Veloce模式。完整的实现可以在@ https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master中找到 //bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/veloce/veloce.py?at master 793 SLOC +大约100 SLOC与其他翻译器共享的代码。

可以在Veloce模式下翻译pystones.py的改编版本。https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pystone/?at=master

设置基本的Python-> JavaScript转换后,我选择了另一条路径将完整的Python转换为JavaScript。除了目标语言外,glib进行基于对象的基于类的代码的方式是JS,因此您可以访问数组,类似地图的对象和许多其他技巧,而所有这些部分都是用Python编写的。IIRC没有Pythonium转换器编写的javascript代码。获得单一继承并不困难,以下是使Pythonium完全兼容Python的困难部分:

  • spam.egg 在Python中总是翻译为 getattribute(spam, "egg")我没有特别描述的内容,但我认为它会浪费很多时间,并且我不确定是否可以使用asm.js或其他任何方式对其进行改进。
  • 方法解析顺序:即使使用Python编写的算法,将其翻译成Python Veloce兼容代码也是一项巨大的努力。
  • getattributre:实际的getattribute解析算法有点棘手,它仍然不支持数据描述符
  • 基于元类的类:我知道在哪里插入代码,但仍然…
  • 最后一点不是最重要的:some_callable(…)始终转换为“ call(some_callable)”。AFAIK转换程序根本不使用推理,因此,每次调用时,都需要检查调用该对象的方式,以及调用该对象的方式。

这部分在https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compatible/runtime.py?at=master中进行了分解它是用Python编写的,与Python Veloce兼容。

实际的兼容翻译器https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compatible/compatible.py?at=master不会直接生成JavaScript代码,最重要的是不会进行ast-> ast转换。我尝试过ast-> ast事情,即使ast.NodeTransformer比cst都好,但也无法使用ast。> NodeTransformer,更重要的是,我不需要做ast-> ast。

就我而言,至少对python ast做python ast可能会提高性能,因为我有时会在生成与块相关的代码之前检查块的内容,例如:

  • var / global:要能够var某些东西,我必须知道我需要什么,而不是var。无需生成跟踪在给定块中创建哪个变量并将其插入到生成的功能块顶部的块,而是在进入该块之前实际访问子节点以生成相关代码之前,我只是寻找启示性的变量分配。
  • 到目前为止,生成器在JS中具有特殊的语法,因此当我要编写“ var my_generator = function”时,我需要知道哪个Python函数是生成器

因此,对于翻译的每个阶段,我都不会真正访问每个节点。

整个过程可以描述为:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python内置函数是用Python代码(!)编写的,IIRC有一些与引导类型相关的限制,但是您可以访问所有可以在兼容模式下转换Pythonium的内容。看看https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced3​​92f1c369afd746c25d7/pythonium/compatible/builtins/?at=master

可以理解从pythonium兼容生成的JS代码的阅读,但是源映射将有很大帮助。

根据这种经验,我可以给您的宝贵建议是老屁:

  • 无论是在文献上还是在现有项目中,都对该主题进行了广泛的审查,这些项目是封闭的或免费的。当我回顾现有的不同项目时,我应该给它更多的时间和动力。
  • 问问题!如果我事先知道PyPy后端是无用的,那是由于C / Javascript语义不匹配导致的开销。我可能会在6个月前或3年前提出Pythonium的想法。
  • 知道你想做什么,有一个目标。对于这个项目,我有不同的目标:使用一点点javascript,学习更多Python知识,并能够编写将在浏览器中运行的Python代码(更多内容以及下面的内容)。
  • 失败就是经验
  • 一小步就是一步
  • 从小开始
  • 远大的梦想
  • 做演示
  • 重复

仅使用Python Veloce模式,我感到非常高兴!但是一直以来,我发现我真正想要的是将我和其他人从Javascript中解放出来,但更重要的是能够以舒适的方式进行创建。这使我了解了Scheme,DSL,模型以及最终特定于域的模型(请参阅http://dsmforum.org/)。

关于Ira Baxter的回应:

估计完全没有帮助。我花了大约6个月的空闲时间来使用PythonJS和Pythonium。所以我可以期望从6个月的全职工作中得到更多。我想我们都知道在企业环境中100人年意味着什么,而根本没有意思…

当某人说某事很难解决或更经常是不可能的事情时,我回答说“只花时间找到不可能解决的问题的解决方案”,否则就说没有什么是不可能的,除非在这种情况下证明是不可能的。

如果没有证明不可能的话,那么它就有想象力的余地:

  • 寻找证明是不可能的

  • 如果这是不可能的,则可能存在可以解决的“劣等”问题。

要么

  • 如果不是不可能,那就找到解决办法

不只是乐观的想法。当我启动Python-> Javascript时,每个人都说这是不可能的。PyPy不可能。元类太难了。等…我认为,唯一使PyPy超过Scheme-> C纸(已有25年历史)的革命是一些自动JIT生成(基于我认为是用RPython解释器编写的提示)。

大多数说某事“困难”或“不可能”的人没有提供原因。C ++很难解析?我知道,它们仍然是(免费的)C ++解析器。细节是邪恶的吗?我知道。仅仅说不可能是没有帮助的,它比令人沮丧的“没有帮助”还要糟糕,而且有些人会劝阻其他人。我通过听说了这个问题 /programming/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus

什么对您来说是完美?这样便可以定义下一个目标,甚至可以达到整体目标。

我更想知道我可以对代码强制执行哪种类型的模式,而不是如何进行翻译,从而使代码的翻译(即:IoC,SOA?)更容易。

我看不到至少不能以一种不太完美的方式将一种语言不能翻译成另一种语言的模式。由于可以进行语言到语言的翻译,因此您最好首先瞄准。从那以后,我认为是根据http://en.wikipedia.org/wiki/Graph_isomorphism_problem两种计算机语言之间的翻译是树或DAG同构。即使我们已经知道他们都将完成学习,所以…

我最好将API-> API转换可视化为Framework-> Framework,但您可能仍要牢记这些内容,以改进生成的代码。例如:Prolog是非常特定的语法,但是您仍然可以通过在Python中描述相同的图形来像计算一样进行Prolog …如果我要实现从Prolog到Python的转换器,我不会在Python中实现统一,而是在C库中实现带有“ Python语法”,这对于Python编写者来说非常容易理解。最后,语法只是我们赋予其含义的“绘画”(这就是我开始使用scheme的原因)。语言的细节是邪恶的,我不是在谈论语法。语言中使用的概念 getattribute钩子(您可以没有它),但是所需的VM功能(如尾递归优化)可能很难处理。您不必担心初始程序是否不使用尾部递归,即使目标语言中没有尾部递归,也可以使用greenlets / event循环来模拟它。

对于目标语言和源语言,请查找:

  • 大而具体的想法
  • 微小且共同的想法

由此将出现:

  • 容易翻译的东西
  • 难以翻译的事物

您也许还可以知道将翻译成快速和慢速代码的内容。

还有stdlib或任何库的问题,但没有明确的答案,这取决于您的目标。

成语代码或可读的生成代码也有解决方案…

因为可以提供慢速和/或关键路径的C实现,所以针对PHP之类的平台比针对浏览器要容易得多。

鉴于您的第一个项目是将Python转换为PHP,至少对于我所知道的PHP3子集,自定义veloce.py是最好的选择。如果您可以为PHP实现veloce.py,则可能可以运行兼容模式…同样,如果您可以将PHP转换为可以用php_veloce.py生成的PHP子集,则意味着您可以将PHP转换为veloce.py可以使用的Python子集,这意味着您可以将PHP转换为Javascript。只是说…

您还可以查看这些库:

另外,您可能对此博客文章(和评论)感兴趣:https : //www.rfk.id.au/blog/entry/pypy-js-poc-jit/

I will second @EliBendersky point of view regarding using ast.parse instead of parser (which I did not know about before). I also warmly recommend you to review his blog. I used ast.parse to do Python->JavaScript translator (@https://bitbucket.org/amirouche/pythonium). I’ve come up with Pythonium design by somewhat reviewing other implementations and trying them on my own. I forked Pythonium from https://github.com/PythonJS/PythonJS which I also started, It’s actually a complete rewrite . The overall design is inspired from PyPy and http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-89-1.pdf paper.

Everything I tried, from beginning to the best solution, even if it looks like Pythonium marketing it really isn’t (don’t hesitate to tell me if something doesn’t seem correct to the netiquette):

  • Implement Python semantic in Plain Old JavaScript using prototype inheritance: AFAIK it’s impossible to implement Python multiple inheritance using JS prototype object system. I did try to do it using other tricks later (cf. getattribute). As far as I know there is no implementation of Python multiple inheritance in JavaScript, the best that exists is Single inhertance + mixins and I’m not sure they handle diamond inheritance. Kind of similar to Skulpt but without google clojure.

  • I tried with Google clojure, just like Skulpt (compiler) instead of actually reading Skulpt code #fail. Anyway because of JS prototype based object system still impossible. Creating binding was very very difficult, you need to write JavaScript and a lot of boilerplate code (cf. https://github.com/skulpt/skulpt/issues/50 where I am the ghost). At that time there was no clear way to integrate the binding in the build system. I think that Skulpt is a library and you just have to include your .py files in the html to be executed, no compilation phase required to be done by the developer.

  • Tried pyjaco (compiler) but creating bindings (calling Javascript code from Python code) was very difficult, there was too much boilerplate code to create every time. Now I think pyjaco is the one that more near Pythonium. pyjaco is written in Python (ast.parse too) but a lot is written in JavaScript and it use prototype inheritance.

I never actually succeed at running Pyjamas #fail and never tried to read the code #fail again. But in my mind PyJamas was doing API->API tranlation (or framework to framework) and not Python to JavaScript translation. The JavaScript framework consume data that is already in the page or data from the server. Python code is only “plumbing”. After that I discovered that pyjamas was actually a real python->js translator.

Still I think it’s possible to do API->API (or framework->framework) translation and that’s basicly what I do in Pythonium but at lower level. Probably Pyjamas use the same algorithm as Pythonium…

Then I discovered brython fully written in Javascript like Skulpt, no need for compilation and lot of fluff… but written in JavaScript.

Since the initial line written in the course of this project, I knew about PyPy, even the JavaScript backend for PyPy. Yep, you can, if you find it, directly generate a Python interpreter in JavaScript from PyPy. People say, it was a disaster. I read no where why. But I think the reason is that the intermediate language they use to implement the interpreter, RPython, is a subset of Python tailored to be translated to C (and maybe asm). Ira Baxter says you always make assumptions when you build something and probably you fine tune it to be the best at what it’s meant to do in the case of PyPy: Python->C translation. Those assumptions might not be relevant in another context worse they can infere overhead otherwise said direct translation will most likely always be better.

Having the interpreter written in Python sounded like a (very) good idea. But I was more interested in a compiler for performance reasons also it’s actually more easy to compile Python to JavaScript than interpret it.

I started PythonJS with the idea of putting together a subset of Python that I could easily translate to JavaScript. At first I didn’t even bother to implement OO system because of past experience. The subset of Python that I achieved to translate to JavaScript are:

  • function with full parameters semantic both in definition and calling. This is the part I am most proud of.
  • while/if/elif/else
  • Python types were converted to JavaScript types (there is no python types of any kind)
  • for could iterate over Javascript arrays only (for a in array)
  • Transparent access to JavaScript: if you write Array in the Python code it will be translated to Array in javascript. This is the biggest achievement in terms of usability over its competitors.
  • You can pass function defined in Python source to javascript functions. Default arguments will be taken into account.
  • It add has special function called new which is translated to JavaScript new e.g: new(Python)(1, 2, spam, “egg”) is translated to “new Python(1, 2, spam, “egg”).
  • “var” are automatically handled by the translator. (very nice finding from Brett (PythonJS contributor).
  • global keyword
  • closures
  • lambdas
  • list comprehensions
  • imports are supported via requirejs
  • single class inheritance + mixin via classyjs

This seems like a lot but actually very narrow compared to full blown semantic of Python. It’s really JavaScript with a Python syntax.

The generated JS is perfect ie. there is no overhead, it can not be improved in terms of performance by further editing it. If you can improve the generated code, you can do it from the Python source file too. Also, the compiler did not rely on any JS tricks that you can find in .js written by http://superherojs.com/, so it’s very readable.

The direct descendant of this part of PythonJS is the Pythonium Veloce mode. The full implementation can be found @ https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/veloce/veloce.py?at=master 793 SLOC + around 100 SLOC of shared code with the other translator.

An adapted version of pystones.py can be translated in Veloce mode cf. https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pystone/?at=master

After having setup basic Python->JavaScript translation I choosed another path to translate full Python to JavaScript. The way of glib doing object oriented class based code except the target language is JS so you have access to arrays, map-like objects and many other tricks and all that part was written in Python. IIRC there is no javascript code written by in Pythonium translator. Getting single inheritance is not difficult here are the difficult parts making Pythonium fully compliant with Python:

  • spam.egg in Python is always translated to getattribute(spam, "egg") I did not profile this in particular but I think that where it loose a lot of time and I’m not sure I can improve upon it with asm.js or anything else.
  • method resolution order: even with the algorithm written in Python, translating it to Python Veloce compatible code was a big endeavour.
  • getattributre: the actual getattribute resolution algorithm is kind of tricky and it still doesn’t support data descriptors
  • metaclass class based: I know where to plug the code, but still…
  • last bu not least: some_callable(…) is always transalted to “call(some_callable)”. AFAIK the translator doesn’t use inference at all, so every time you do a call you need to check which kind of object it is to call it they way it’s meant to be called.

This part is factored in https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/runtime.py?at=master It’s written in Python compatible with Python Veloce.

The actual compliant translator https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/compliant.py?at=master doesn’t generate JavaScript code directly and most importantly doesn’t do ast->ast transformation. I tried the ast->ast thing and ast even if nicer than cst is not nice to work with even with ast.NodeTransformer and more importantly I don’t need to do ast->ast.

Doing python ast to python ast in my case at least would maybe be a performance improvement since I sometime inspect the content of a block before generating the code associated with it, for instance:

  • var/global: to be able to var something I must know what I need to and not to var. Instead of generating a block tracking which variable are created in a given block and inserting it on top of the generated function block I just look for revelant variable assignation when I enter the block before actually visiting the child node to generate the associated code.
  • yield, generators have, as of yet, a special syntax in JS, so I need to know which Python function is a generator when I want to write the “var my_generator = function”

So I don’t really visit each node once for each phase of the translation.

The overall process can be described as:

Python source code -> Python ast -> Python source code compatible with Veloce mode -> Python ast -> JavaScript source code

Python builtins are written in Python code (!), IIRC there is a few restrictions related to bootstraping types, but you have access to everything that can translate Pythonium in compliant mode. Have a look at https://bitbucket.org/amirouche/pythonium/src/33898da731ee2d768ced392f1c369afd746c25d7/pythonium/compliant/builtins/?at=master

Reading JS code generated from pythonium compliant can be understood but source maps will greatly help.

The valuable advice I can give you in the light of this experience are kind old farts:

  • extensively review the subject both in literature and existing projects closed source or free. When I reviewed the different existing projects I should have given it way more time and motivation.
  • ask questions! If I knew beforehand that PyPy backend was useless because of the overhead due to C/Javascript semantic mismatch. I would maybe had Pythonium idea way before 6 month ago maybe 3 years ago.
  • know what you want to do, have a target. For this project I had different objectives: pratice a bit a javascript, learn more of Python and be able to write Python code that would run in the browser (more and that below).
  • failure is experience
  • a small step is a step
  • start small
  • dream big
  • do demos
  • iterate

With Python Veloce mode only, I’m very happy! But along the way I discovered that what I was really looking for was liberating me and others from Javascript but more importantly being able to create in a comfortable way. This lead me to Scheme, DSL, Models and eventually domain specific models (cf. http://dsmforum.org/).

About what Ira Baxter response:

The estimations are not helpful at all. I took me more or less 6 month of free time for both PythonJS and Pythonium. So I can expect more from full time 6 month. I think we all know what 100 man-year in an enterprise context can mean and not mean at all…

When someone says something is hard or more often impossible, I answer that “it only takes time to find a solution for a problem that is impossible” otherwise said nothing is impossible except if it’s proven impossible in this case a math proof…

If it’s not proven impossible then it leaves room for imagination:

  • finding a proof proving it’s impossible

and

  • If it is impossible there may be an “inferior” problem that can have a solution.

or

  • if it’s not impossible, finding a solution

It’s not just optimistic thinking. When I started Python->Javascript everybody was saying it was impossible. PyPy impossible. Metaclasses too hard. etc… I think that the only revolution that brings PyPy over Scheme->C paper (which is 25 years old) is some automatic JIT generation (based hints written in the RPython interpreter I think).

Most people that say that a thing is “hard” or “impossible” don’t provide the reasons. C++ is hard to parse? I know that, still they are (free) C++ parser. Evil is in the detail? I know that. Saying it’s impossible alone is not helpful, It’s even worse than “not helpful” it’s discouraging, and some people mean to discourage others. I heard about this question via https://stackoverflow.com/questions/22621164/how-to-automatically-generate-a-parser-code-to-code-translator-from-a-corpus.

What would be perfection for you? That’s how you define next goal and maybe reach the overall goal.

I am more interested in knowing what kinds of patterns I could enforce on the code to make it easier to translate (ie: IoC, SOA ?) the code than how to do the translation.

I see no patterns that can not be translated from one language to another language at least in a less than perfect way. Since language to language translation is possible, you’d better aim for this first. Since, I think according to http://en.wikipedia.org/wiki/Graph_isomorphism_problem, translation between two computer languages is a tree or DAG isomorphism. Even if we already know that they are both turing complete, so…

Framework->Framework which I better visualize as API->API translation might still be something that you might keep in mind as a way to improve the generated code. E.g: Prolog as very specific syntax but still you can do Prolog like computation by describing the same graph in Python… If I was to implement a Prolog to Python translator I wouldn’t implement unification in Python but in a C library and come up with a “Python syntax” that is very readable for a Pythonist. In the end, syntax is only “painting” for which we give a meaning (that’s why I started scheme). Evil is in the detail of the language and I’m not talking about the syntax. The concepts that are used in the language getattribute hook (you can live without it) but required VM features like tail-recursion optimisation can be difficult to deal with. You don’t care if the initial program doesn’t use tail recursion and even if there is no tail recursion in the target language you can emulate it using greenlets/event loop.

For target and source languages, look for:

  • Big and specific ideas
  • Tiny and common shared ideas

From this will emerge:

  • Things that are easy to translate
  • Things that are difficult to translate

You will also probably be able to know what will be translated to fast and slow code.

There is also the question of the stdlib or any library but there is no clear answer, it depends of your goals.

Idiomatic code or readable generated code have also solutions…

Targeting a platform like PHP is much more easy than targeting browsers since you can provide C-implementation of slow and/or critical path.

Given you first project is translating Python to PHP, at least for the PHP3 subset I know of, customising veloce.py is your best bet. If you can implement veloce.py for PHP then probably you will be able to run the compliant mode… Also if you can translate PHP to the subset of PHP you can generate with php_veloce.py it means that you can translate PHP to the subset of Python that veloce.py can consume which would mean that you can translate PHP to Javascript. Just saying…

You can also have a look at those libraries:

Also you might be interested by this blog post (and comments): https://www.rfk.id.au/blog/entry/pypy-js-poc-jit/


回答 5

您可以看一下Vala编译器,该编译器将Vala(一种类似于C#的语言)转换为C。

You could take a look at the Vala compiler, which translates Vala (a C#-like language) into C.


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。