问题:Java“虚拟机”与Python“解释器”的用语?

在Java中始终使用“虚拟机”时,很少会读到Python“虚拟机”。

两者都解释字节码;为什么一个叫虚拟机,另一个叫解释器?

It seems rare to read of a Python “virtual machine” while in Java “virtual machine” is used all the time.

Both interpret byte codes; why call one a virtual machine and the other an interpreter?


回答 0

虚拟机是一种虚拟计算环境,具有一组特定的原子定义良好的指令,这些指令独立于任何特定语言而受到支持,通常将其视为自身的沙箱。VM与特定CPU的指令集相似,并且趋向于在更基本的级别上使用与下一条指令无关的此类指令(或字节码)的非常基本的构建块。指令仅基于虚拟机的当前状态确定性地执行,而不依赖于该时间点上指令流中其他位置的信息。

另一方面,解释器则更为复杂,因为它经过精心设计以解析某种语法流,该流必须是特定语言和特定语法的流,这些流必须在周围标记的上下文中进行解码。您无法孤立地查看每个字节甚至每一行,而确切地知道下一步该做什么。语言中的令牌不能像虚拟机的指令(字节码)那样孤立地获取。

Java编译器将Java语言转换为字节码流,这与C编译器将C语言程序转换为汇编代码无异。另一方面,解释器并没有真正将程序转换为任何定义良好的中间形式,它只是将程序操作作为解释源代码的过程。

对VM和解释器之间差异的另一项测试是,您是否认为它与语言无关。我们所知道的Java VM并不是真的特定于Java。您可以使用其他语言制作编译器,从而产生可以在JVM上运行的字节码。另一方面,我认为我们真的不会考虑将除Python以外的其他语言“编译”为Python以便由Python解释器解释。

由于解释过程的复杂性,这可能是一个相对缓慢的过程……特别是解析和标识语言标记等,并理解源的上下文以能够在解释器中进行执行过程。为了帮助加速此类解释语言,我们可以在此处定义更容易直接解释的预解析,预标记化源代码的中间形式。这种二进制形式仍在执行时进行解释,它只是从一种人类可读性差的形式开始,以提高性能。但是,执行该形式的逻辑不是虚拟机,因为仍然不能孤立地获取这些代码-周围令牌的上下文仍然很重要,它们现在处于另一种计算机效率更高的形式。

A virtual machine is a virtual computing environment with a specific set of atomic well defined instructions that are supported independent of any specific language and it is generally thought of as a sandbox unto itself. The VM is analogous to an instruction set of a specific CPU and tends to work at a more fundamental level with very basic building blocks of such instructions (or byte codes) that are independent of the next. An instruction executes deterministically based only on the current state of the virtual machine and does not depend on information elsewhere in the instruction stream at that point in time.

An interpreter on the other hand is more sophisticated in that it is tailored to parse a stream of some syntax that is of a specific language and of a specific grammer that must be decoded in the context of the surrounding tokens. You can’t look at each byte or even each line in isolation and know exactly what to do next. The tokens in the language can’t be taken in isolation like they can relative to the instructions (byte codes) of a VM.

A Java compiler converts Java language into a byte-code stream no different than a C compiler converts C Language programs into assembly code. An interpreter on the other hand doesn’t really convert the program into any well defined intermediate form, it just takes the program actions as a matter of the process of interpreting the source.

Another test of the difference between a VM and an interpreter is whether you think of it as being language independent. What we know as the Java VM is not really Java specific. You could make a compiler from other languages that result in byte codes that can be run on the JVM. On the other hand, I don’t think we would really think of “compiling” some other language other than Python into Python for interpretation by the Python interpreter.

Because of the sophistication of the interpretation process, this can be a relatively slow process….specifically parsing and identifying the language tokens, etc. and understanding the context of the source to be able to undertake the execution process within the interpreter. To help accelerate such interpreted languages, this is where we can define intermediate forms of pre-parsed, pre-tokenized source code that is more readily directly interpreted. This sort of binary form is still interpreted at execution time, it is just starting from a much less human readable form to improve performance. However, the logic executing that form is not a virtual machine, because those codes still can’t be taken in isolation – the context of the surrounding tokens still matter, they are just now in a different more computer efficient form.


回答 1

在这篇文章中,“虚拟机”是指进程虚拟机,而不是诸如Qemu或Virtualbox之类的系统虚拟机。流程虚拟机只是提供一般编程环境的程序-可以编程的程序。

Java具有解释器和虚拟机,而Python具有虚拟器和解释器。“虚拟机”是Java中更常见的术语,而“解释器”是Python中更常见的术语的原因与两种语言之间的主要差异有很大关系:静态类型(Java)与动态类型(Python)。在这种情况下,“类型”是指 原始数据类型 -这些类型建议数据在内存中的存储大小。Java虚拟机很容易。它要求程序员指定每个变量的原始数据类型。这为Java字节码提供了足够的信息,不仅可以由Java虚拟机解释和执行,甚至可以编译成机器指令。从某种意义上说,Python虚拟机更为复杂,因为它承担着在执行每个操作之前暂停的额外任务,以确定该操作涉及的每个变量或数据结构的原始数据类型。Python使程序员摆脱了对原始数据类型的思考,并允许在更高层次上表达操作。这种自由的代价是性能。“解释器”是Python的首选术语,因为它必须暂停以检查数据类型,并且因为动态类型语言的语法相对简洁,非常适合交互式界面。构建交互式Java接口没有技术上的障碍,但是尝试以交互方式编写任何静态类型的代码将很繁琐,因此并不能做到这一点。

在Java世界中,虚拟机抢占了舞台,因为它运行以某种语言编写的程序,这些程序实际上可以编译成机器指令,从而提高了速度和资源效率。相对而言,Java字节码可以由Java虚拟机执行,其性能接近已编译程序的性能。这是由于字节码中存在原始数据类型信息。Java虚拟机将Java归为自己的类别:

便携式解释静态类型语言

下一个最接近的是LLVM,但是LLVM在不同的级别上运行:

便携式解释汇编语言

Java和Python都使用了术语“字节码”,但是并非所有字节码都是一样的。字节码只是编译器/解释器使用的中间语言的通用术语。甚至gcc之类的C编译器都使用一种或多种中间语言来完成工作。Java字节码包含有关原始数据类型的信息,而Python字节码则不包含。在这方面,Python(以及Bash,Perl,Ruby等)虚拟机从根本上说比Java虚拟机要慢,或者说,它要做的工作更多。考虑不同字节码格式中包含哪些信息是很有用的:

  • llvm: cpu寄存器
  • Java:原始数据类型
  • Python:用户定义的类型

进行现实世界的类比:LLVM使用原子,Java虚拟机使用分子,Python虚拟机使用材料。由于一切最终都必须分解为亚原子粒子(真实机器操作),因此Python虚拟机具有最复杂的任务。

静态类型语言的解释器/编译器与动态类型语言的解释器/编译器所拥有的负担不同。静态类型的语言的程序员必须承担懈怠的工作,为此付出的代价是性能。但是,就像所有非确定性函数都是秘密确定性的一样,所有动态类型的语言也都是秘密静态类型的。因此,在Python将其名称更改为HAL 9000的同时,两种语言系列之间的性能差异应趋于平稳。

像Python这样的动态语言的虚拟机实现了一些理想化的逻辑机,并且不一定与任何实际的物理硬件都非常接近。相比之下,Java虚拟机在功能上与经典的C编译器更为相似,不同之处在于,它不发出机器指令,而是执行内置例程。在Python中,整数是一个Python对象,具有一系列附加的属性和方法。在Java中,int是指定的位数,通常为32。这并不是一个公平的比较。确实应该将Python整数与Java Integer类进行比较。Java的“ int”基元数据类型无法与Python语言中的任何事物进行比较,因为Python语言只是缺少这一层基元,Python字节码也是如此。

因为Java变量是显式类型的,所以可以合理地期望Jython性能之类的东西与cPython处于同一 水平。另一方面,几乎可以肯定,用Python实现的Java虚拟机的速度要比泥浆慢。而且不要指望Ruby,Perl等会有更好的表现。他们并非旨在这样做。它们是为“脚本编写”而设计的,这就是所谓的动态语言编程。

虚拟机中发生的每个操作最终都必须使用实际硬件。虚拟机包含预编译的例程,这些例程足够通用以执行逻辑操作的任何组合。虚拟机可能不会发出新的机器指令,但是虚拟机肯定正在以复杂的顺序一遍又一遍地执行自己的例程。Java虚拟机,Python虚拟机以及所有其他通用虚拟机在意义上是相同的,它们可以被诱使执行您可以梦想的任何逻辑,但是它们在执行哪些任务方面有所不同承担什么,他们留给程序员什么任务。

Psyco for Python并不是一个完整的Python虚拟机,而是一个即时编译器,它在认为可以编译几行代码的点上劫持了常规的Python虚拟机,主要是在它认为某些原始类型的地方循环即使值在每次迭代中都在变化,变量也将保持不变。在这种情况下,它可以放弃常规虚拟机的某些不可靠的类型确定。但是,您必须小心一点,以免将其从Psyco的脚下拉出来。但是,Pysco通常知道,如果不能完全确定类型不会更改,则只能使用常规虚拟机。

这个故事的寓意是原始数据类型信息确实对编译器/虚拟机有帮助。

最后,从所有角度来看,请考虑一下:由Python解释器/虚拟机执行的Python程序,该Java解释器/虚拟机在LLVM中运行的Java解释器/虚拟机在iPhone上运行的qemu虚拟机中运行。

永久链接

In this post, “virtual machine” refers to process virtual machines, not to system virtual machines like Qemu or Virtualbox. A process virtual machine is simply a program which provides a general programming environment — a program which can be programmed.

Java has an interpreter as well as a virtual machine, and Python has a virtual machine as well as an interpreter. The reason “virtual machine” is a more common term in Java and “interpreter” is a more common term in Python has a lot to do with the major difference between the two languages: static typing (Java) vs dynamic typing (Python). In this context, “type” refers to primitive data types — types which suggest the in-memory storage size of the data. The Java virtual machine has it easy. It requires the programmer to specify the primitive data type of each variable. This provides sufficient information for Java bytecode not only to be interpreted and executed by the Java virtual machine, but even to be compiled into machine instructions. The Python virtual machine is more complex in the sense that it takes on the additional task of pausing before the execution of each operation to determine the primitive data types for each variable or data structure involved in the operation. Python frees the programmer from thinking in terms of primitive data types, and allows operations to be expressed at a higher level. The price of this freedom is performance. “Interpreter” is the preferred term for Python because it has to pause to inspect data types, and also because the comparatively concise syntax of dynamically-typed languages is a good fit for interactive interfaces. There’s no technical barrier to building an interactive Java interface, but trying to write any statically-typed code interactively would be tedious, so it just isn’t done that way.

In the Java world, the virtual machine steals the show because it runs programs written in a language which can actually be compiled into machine instructions, and the result is speed and resource efficiency. Java bytecode can be executed by the Java virtual machine with performance approaching that of compiled programs, relatively speaking. This is due to the presence of primitive data type information in the bytecode. The Java virtual machine puts Java in a category of its own:

portable interpreted statically-typed language

The next closest thing is LLVM, but LLVM operates at a different level:

portable interpreted assembly language

The term “bytecode” is used in both Java and Python, but not all bytecode is created equal. bytecode is just the generic term for intermediate languages used by compilers/interpreters. Even C compilers like gcc use an intermediate language (or several) to get the job done. Java bytecode contains information about primitive data types, whereas Python bytecode does not. In this respect, the Python (and Bash,Perl,Ruby, etc.) virtual machine truly is fundamentally slower than the Java virtual machine, or rather, it simply has more work to do. It is useful to consider what information is contained in different bytecode formats:

  • llvm: cpu registers
  • Java: primitive data types
  • Python: user-defined types

To draw a real-world analogy: LLVM works with atoms, the Java virtual machine works with molecules, and The Python virtual machine works with materials. Since everything must eventually decompose into subatomic particles (real machine operations), the Python virtual machine has the most complex task.

Intepreters/compilers of statically-typed languages just don’t have the same baggage that interpreters/compilers of dynamically-typed languages have. Programmers of statically-typed languages have to take up the slack, for which the payoff is performance. However, just as all nondeterministic functions are secretly deterministic, so are all dynamically-typed languages secretly statically-typed. Performance differences between the two language families should therefore level out around the time Python changes its name to HAL 9000.

The virtual machines of dynamic languages like Python implement some idealized logical machine, and don’t necessarily correspond very closely to any real physical hardware. The Java virtual machine, in contrast, is more similar in functionality to a classical C compiler, except that instead of emitting machine instructions, it executes built-in routines. In Python, an integer is a Python object with a bunch of attributes and methods attached to it. In Java, an int is a designated number of bits, usually 32. It’s not really a fair comparison. Python integers should really be compared to the Java Integer class. Java’s “int” primitive data type can’t be compared to anything in the Python language, because the Python language simply lacks this layer of primitives, and so does Python bytecode.

Because Java variables are explicitly typed, one can reasonably expect something like Jython performance to be in the same ballpark as cPython. On the other hand, a Java virtual machine implemented in Python is almost guaranteed to be slower than mud. And don’t expect Ruby, Perl, etc., to fare any better. They weren’t designed to do that. They were designed for “scripting”, which is what programming in a dynamic language is called.

Every operation that takes place in a virtual machine eventually has to hit real hardware. Virtual machines contain pre-compiled routines which are general enough to to execute any combination of logical operations. A virtual machine may not be emitting new machine instructions, but it certainly is executing its own routines over and over in arbirtrarily complex sequences. The Java virtual machine, the Python virtual machine, and all the other general-purpose virtual machines out there are equal in the sense that they can be coaxed into performing any logic you can dream up, but they are different in terms of what tasks they take on, and what tasks they leave to the programmer.

Psyco for Python is not a full Python virtual machine, but a just-in-time compiler that hijacks the regular Python virtual machine at points it thinks it can compile a few lines of code — mainly loops where it thinks the primitive type of some variable will remain constant even if the value is changing with each iteration. In that case, it can forego some of the incessent type determination of the regular virtual machine. You have to be a little careful, though, lest you pull the type out from under Psyco’s feet. Pysco, however, usually knows to just fall back to the regular virtual machine if it isn’t completely confident the type won’t change.

The moral of the story is that primitive data type information is really helpful to a compiler/virtual machine.

Finally, to put it all in perspective consider this: a Python program executed by a Python interpreter/virtual machine implemented in Java running on a Java interpreter/virtual machine implemented in LLVM running in a qemu virtual machine running on an iPhone.

permalink


回答 2

术语不同的一个可能原因是,人们通常考虑将python解释器提供给人类可读的原始源代码,而不用担心字节码和所有这些。

在Java中,必须显式编译为字节码,然后仅运行字节码,而不是在VM上运行源代码。

即使Python在后台使用虚拟机,但从用户的角度来看,大多数时候都可以忽略此细节。

Probably one reason for the different terminology is that one normally thinks of feeding the python interpreter raw human-readable source code and not worrying about bytecode and all that.

In Java, you have to explicitly compile to bytecode and then run just the bytecode, not source code on the VM.

Even though Python uses a virtual machine under the covers, from a user’s perspective, one can ignore this detail most of the time.


回答 3

解释器,将源代码转换为一些有效的中间表示(代码),并立即执行该代码。

虚拟机,显式执行由解释器系统一部分的编译器构建的存储的预编译代码。

虚拟机的一个非常重要的特征是,内部运行的软件仅限于虚拟机提供的资源。准确地说,它无法突破其虚拟世界。考虑一下安全执行远程代码Java Applets。

在python的情况下,如果我们保留pyc文件(如本文评论中所述),则该机制将变得更像VM,并且此字节码执行速度更快-仍将对其进行解释,但会采用更加计算机友好的形式。如果从整体上看,PVM是Python Interpreter的最后一步。

底线是,当提及Python Interpreter时,这意味着我们将其作为一个整体来提及;而当我们说PVM时,这意味着我们仅是在谈论Python Interpreter的一部分,即运行时环境。与Java类似,我们引用了differyl,JRE,JVM,JDK等不同部分。

有关更多信息,请参阅Wikipedia Entry:解释器Virtual Machine这里还有另一个。在这里,您可以找到应用程序虚拟机比较。它有助于理解编译器,解释器和VM之间的区别。

Interpreter, translates source code into some efficient intermediate representation (code) and immediately executes this.

Virtual Machine, explicitly executes stored pre-compiled code built by a compiler which is part of the interpreter system.

A very important characteristic of a virtual machine is that the software running inside, is limited to the resources provided by the virtual machine. Precisely, it cannot break out of its virtual world. Think of secure execution of remote code, Java Applets.

In case of python, if we are keeping pyc files, as mentioned in the comment of this post, then the mechanism would become more like a VM, and this bytecode executes faster — it would still be interpreted but from a much computer friendlier form. If we look at this as a whole, PVM is a last step of Python Interpreter.

The bottomline is, when refer Python Interpreter, it means we are referring it as a whole, and when we say PVM, that means we are just talking about a part of Python Interpreter, a runtime-environment. Similar to that of Java, we refer different parts differentyl, JRE, JVM, JDK, etc.

For more, Wikipedia Entry: Interpreter, and Virtual Machine. Yet another one here. Here you can find the Comparison of application virtual machines. It helps in understanding the difference between, Compilers, Interpreters, and VMs.


回答 4

术语解释器是一个可以追溯到早期Shell脚本语言的传统术语。随着“脚本语言”发展成为功能齐全的语言,并且其相应的平台变得更加复杂和沙盒化,虚拟机和解释器(在Python的意义上)之间的区别非常小或根本不存在。

Python解释器的功能仍然与Shell脚本相同,因为它无需执行单独的编译步骤即可执行。除此之外,Python的解释器(或Perl或Ruby的)与Java的虚拟机之间的差异主要是实现细节。(有人可能会说Java比Python更能进行沙盒测试,但最终两者都可以通过本机C接口提供对基础体系结构的访问。)

The term interpreter is a legacy term dating back to earlier shell scripting languages. As “scripting languages” have evolved into full featured languages and their corresponding platforms have become more sophisticated and sandboxed, the distinction between a virtual machine and an interpreter (in the Python sense), is very small or non-existent.

The Python interpreter still functions in the same way as a shell script, in the sense that it can be executed without a separate compile step. Beyond that, the differences between Python’s interpreter (or Perl or Ruby’s) and Java’s virtual machine are mostly implementation details. (One could argue that Java is more fully sandboxed than Python, but both ultimately provide access to the underlying architecture via a native C interface.)


回答 5

为了提供对“ 为什么要使用Java虚拟机,但要使用Python解释器 ” 这个问题的深刻答案,让我们尝试回到编译理论领域来作为讨论的起点。

程序编译的典型过程包括以下步骤:

  1. 词法分析。将程序文本拆分为有意义的“单词”,称为标记(作为过程的一部分,所有注释,空格,换行符等都将被删除,因为它们不影响程序行为)。结果是令牌的有序流。
  2. 语法分析。从令牌流中构建所谓的抽象语法树(AST)。AST建立令牌之间的关系,并因此定义程序评估的顺序。
  3. 语义分析。使用有关编程语言的类型和一组语义规则的信息来验证AST的语义正确性。(例如,a = b + c从syntaxis的角度来看是正确的语句,但是从语义的角度来看,如果a声明为常量对象则是完全错误的)
  4. 中间代码生成。将AST序列化为与机器无关的“原始”操作的线性顺序流。实际上,代码生成器遍历AST并记录评估步骤的顺序。结果,从程序的树状表示中,我们获得了更简单的列表状表示,其中保留了程序评估的顺序。
  5. 机器代码生成。机器独立的“原始”字节码形式的程序被转换为特定处理器体系结构的机器码。

好。现在让我们定义术语。

在该词的经典含义中,解释程序假定执行基于直接从程序文本产生的基于AST的程序评估。在这种情况下,程序将以源代码而解释器通常是通过动态方式(逐条陈述或逐行)由程序文本提供的。对于每个输入语句,解释器都会构建其AST并立即评估它以更改程序的“状态”。这是脚本语言演示的一种典型行为。考虑一下Bash,Windows CMD等。从概念上讲,Python也采用这种方式。

如果我们在解释器中生成与机器无关的二进制二进制字节码的中间步骤时替换了基于AST的执行步骤,我们会将程序执行的整个过程分为两个独立的阶段:编译和执行。在那种情况下,以前是解释器的程序将成为字节码编译器,它将程序从文本形式转换为某种二进制形式。然后,该程序以该二进制形式而不是源代码形式进行分发。在用户计算机上,该字节码被馈送到一个新的实体- 虚拟机中,该实体实际上会解释该字节码。因此,虚拟机也称为字节码解释器。但是请注意这里!古典口译员是文本解释器,但是虚拟机是二进制解释器!这是Java和C#采取的方法。

最后,如果将机器代码生成添加到字节码编译器中,则可以实现所谓的经典编译器。经典的编译器将程序源代码转换为特定处理器的机器代码。然后,可以在目标处理器上直接执行该机器代码,而无需任何其他中介(没有任何类型的解释器,既没有文本解释器也没有二进制解释器)。

现在让我们回到最初的问题,并考虑Java与Python。

Java最初被设计为具有尽可能少的实现依赖性。它的设计基于“一次编写,随处运行”(WORA)的原则。为了实现它,Java最初被设计为一种编程语言,可以编译为独立于机器的二进制字节码,然后可以在支持Java的所有平台上执行而无需重新编译。您可以像基于WORA的C ++一样思考Java。实际上,JavaPython等脚本语言更接近C ++。但是与C ++相比Java成可以编译成二进制字节码然后C ++虚拟机环境中执行,而C ++被设计为以计算机代码编译,然后由目标处理器直接执行。

Python最初被设计为一种脚本编程语言,可以解释脚本(按照编程语言规则编写的文本形式的程序)。因此,Python最初像Bash或Windows CMD一样,支持单行命令或语句的动态解释。出于同样的原因,巨蟒的初步实现了没有任何一种字节码编译器和这样的字节码内的执行虚拟机,而是从一开始的Python有需要解释这是能够理解和评价Python程序文本

由于这个原因,在历史上,Java的开发者倾向于谈论Java虚拟机(因为最初的Java已经为包的Java字节码编译器和字节码解释器JVM)和Python的开发者倾向于谈论的Python解释器(因为最初的Python有而不是任何虚拟机,而是一种经典的文本解释器,可以直接执行程序文本,而无需进行任何形式的编译或转换为任何形式的二进制代码)。

目前,Python还具有虚拟机,可以编译和解释Python字节码。这个事实为混乱带来了额外的投资”为什么要使用Java虚拟机,但要使用Python解释器?程序将演示完全相同的行为,并从相等的输入产生相同的输出。唯一可观察到的差异是程序执行的速度和解释器消耗的内存量。因此,Python中的虚拟机并不是语言设计中不可避免的部分,而只是主要Python解释器的可选扩展。

可以类似的方式考虑Java。底层的Java具有JIT编译器,可以有选择地将Java类的方法编译为目标平台的机器代码,然后直接执行它。但!Java仍然使用字节码解释作为Java程序执行的主要方式。就像Python实施将引擎盖下的虚拟机专门用作优化技术一样,Java虚拟机仅将即时编译器用于优化目的。同样,仅由于直接执行机器代码比解释Java字节码至少快十倍的事实。就像Python一样,对于Java语言设计人员和Java程序开发人员而言,JVM罩下的JIT编译器的存在绝对是透明的。带有和不带有JIT编译器的JVM都可以实现相同的Java编程语言。并且以相同的方式,可以在带有和不带有JIT的JVM中执行相同的程序,并且相同的程序将表现出完全相同的行为,并从两个JVM(带有和不带有JIT)的相等输入中产生相同的输出。就像Python一样,它们之间唯一可观察到的差异将在于执行速度和JVM消耗的内存量。最后,就像Python一样,Java中的JIT也不是语言设计中不可避免的一部分,而只是主要JVM实现的可选扩展。并且相同的程序将演示完全相同的行为,并从两个JVM(带有和不带有JIT)的相等输入中产生相同的输出。就像Python一样,它们之间唯一可观察到的差异将在于执行速度和JVM消耗的内存量。最后,就像Python一样,Java中的JIT也不是语言设计中不可避免的一部分,而只是主要JVM实现的可选扩展。并且相同的程序将演示完全相同的行为,并从两个JVM(带有和不带有JIT)的相等输入中产生相同的输出。就像Python一样,它们之间唯一可观察到的差异将在于执行速度和JVM消耗的内存量。最后,就像Python一样,Java中的JIT也不是语言设计中不可避免的一部分,而只是主要JVM实现的可选扩展。

从Java和Python虚拟机的设计和实现的角度来看,它们有显着差异,而(注意!)这两个都仍然是虚拟机。JVM是具有简单基本操作和高指令分发成本的低级虚拟机的示例。Python本身就是一个高级虚拟机,其指令说明了复杂的行为,而指令的派发成本却不那么高。Java以非常低的抽象级别运行。JVM在定义良好的原始类型的小型集合上运行,并且在字节码指令和本机代码指令之间具有非常紧密的对应关系(通常是一对一)。相反,Python虚拟机以较高的抽象级别运行,它以复杂的数据类型(对象)运行,并支持即席多态性,而字节码指令则暴露出复杂的行为,可以通过一系列多个本机机器码指令来表示。例如,Python支持无限范围数学。因此,Python VM被迫对可能的大整数采用长运算,对此运算结果可能会使机器字溢出。因此,Python中用于算术的一个字节码指令可以公开给Python VM内部的函数调用,而在JVM中,算术操作将公开给由一个或几条本机机器指令表示的简单操作。因此,Python VM被迫对可能的大整数采用长运算,对此运算结果可能会使机器字溢出。因此,Python中用于算术的一个字节码指令可以公开给Python VM内部的函数调用,而在JVM中,算术操作将公开给由一个或几条本机机器指令表示的简单操作。因此,Python VM被迫对可能的大整数采用长运算,对此运算结果可能会使机器字溢出。因此,Python中用于算术的一个字节码指令可以公开给Python VM内部的函数调用,而在JVM中,算术操作将公开给由一个或几条本机机器指令表示的简单操作。

结果,我们可以得出以下结论。Java虚拟机,但Python解释器是因为:

  1. 虚拟机一词假定二进制字节码解释,而术语解释器假定程序文本解释。
  2. 过去,Java是为二进制字节码解释而设计和实现的,而Python最初是为程序文本解释而设计和实现的。因此,术语“ Java虚拟机”是历史悠久的,并且在Java社区中已得到很好的确立。同样,术语“ Python解释器”是历史悠久的,在Python社区中已得到很好的确立。人民倾向于延长传统并使用很久以前使用的相同术语。
  3. 最后,当前,对于Java,二进制字节码解释是程序执行的主要方式,而JIT编译只是可选的透明优化。对于Python,目前,程序文本解释是Python程序执行的主要方式,而编译为Python VM字节码只是一种可选的透明优化。

因此,Java和Python拥有的虚拟机都是二进制字节码解释器,这可能导致混淆,例如“ 为什么使用Java虚拟机,但是使用Python解释器?“这里的关键是,对于Python来说,虚拟机不是程序执行的主要手段或必要手段;它只是经典文本解释器的可选扩展。另一方面,虚拟机是核心并且不可避免Java程序执行生态系统的一部分,用于编程语言设计的静态或动态类型选择主要仅影响虚拟机抽象级别,但并不决定是否需要虚拟机,可以设计使用两种类型系统的语言进行编译在虚拟机环境中进行,解释或执行,具体取决于其所需的执行模型。

To provide a deep answer to the question “Why Java Virtual Machine, but Python interpreter?” let’s try to go back to the field of compilation theory as to the starting point of the discussion.

The typical process of program compilation includes next steps:

  1. Lexical analysis. Splits program text into meaningful “words” called tokens (as part of the process all comments, spaces, new-lines etc. are removed, because they do not affect program behavior). The result is an ordered stream of tokens.
  2. Syntax analysis. Builds the so-called Abstract Syntax Tree (AST) from the stream of tokens. AST establish relations between tokens and, as a consequence, defines an order of evaluation of the program.
  3. Semantic analysis. Verifies semantical correctness of the AST using information about types and a set of semantical rules of the programming language. (For example, a = b + c is a correct statement from the syntaxis point of view, but completely incorrect from the semantic point of view if a was declared as a constant object)
  4. Intermediate code generation. Serializes AST into the linearly ordered stream of machine independent “primitive” operations. In fact, code generator traverses AST and logs the order of evaluation steps. As a result, from the tree-like representation of the program, we achieve much more simple list-like representation in which order of program evaluation is preserved.
  5. Machine code generation. The program in the form of machine independent “primitive” bytecode is translated into machine code of particular processor architecture.

Ok. Lets now define the terms.

Interpreter, in the classical meaning of that word, assumes execution based on the program evaluation based on AST produced directly from the program text. In that case, a program is distributed in the form of source code and the interpreter is fed by program text, frequently in a dynamic way (statement-by-statement or line-by-line). For each input statement, interpreter builds its AST and immediately evaluates it changing the “state” of the program. This is a typical behavior demonstrated by scripting languages. Consider for example Bash, Windows CMD etc. Conceptually, Python takes this way too.

If we replace the AST-based execution step on the generation of intermediate machine-independent binary bytecode step in the interpreter we will split the entire process of program execution into two separate phases: compilation and execution. In that case what previously was an interpreter will become a bytecode compiler, which will transform the program from the form of the text into some binary form. Then the program is distributed in that binary form, but not in the form of source code. On the user machine, that bytecode is fed into a new entity — virtual machine, which in fact interpret that bytecode. Due to this, virtual machines are also called bytecode interpreter. But put your attention here! A classical interpreter is a text interpreter, but a virtual machine is a binary interpreter! This is an approach taken by Java and C#.

Finally, if we add the machine code generation to the bytecode compiler we achieve in result what we call a classical compiler. A classical compiler converts the program source code into the machine code of a particular processor. That machine code then can be directly executed on the target processor without any additional mediation (without any kind of interpreter neither text interpreter nor binary interpreter).

Lets now go back to the original question and consider Java vs Python.

Java was initially designed to have as few implementation dependencies as possible. Its design is based on the principle “write once, run anywhere” (WORA). To implement it, Java was initially designed as a programming language that compiles into machine-independent binary bytecode, which then can be executed on all platforms that support Java without the need for its recompilation. You can think about Java like about WORA-based C++. Actually, Java is closer to C++ than to the scripting languages like Python. But in contrast to C++, Java was designed to be compiled into binary bytecode which then is executed in the environment of the virtual machine, while C++ was designed to be compiled in machine code and then directly executed by the target processor.

Python was initially designed as a kind of scripting programing language which interprets scripts (programs in the form of the text written in accordance with the programming language rules). Due to this, Python has initially supported a dynamic interpretation of one-line commands or statements, as the Bash or Windows CMD do. For the same reason, initial implementations of Python had not any kind of bytecode compilers and virtual machines for execution of such bytecode inside, but from the start Python had required interpreter which is capable to understand and evaluate Python program text.

Due to this, historically, Java developers tended to talk about Java Virtual Machine (because initially, Java has come as package of Java bytecode compiler and bytecode interpreterJVM), and Python developers tended to talk about Python interpreter (because initially Python has not any virtual machine and was a kind of classical text interpreter that executes program text directly without any sort of compilation or transformation into any form of binary code).

Currently, Python also has the virtual machine under the hood and can compile and interpret Python bytecode. And that fact makes an additional investment into the confusion “Why Java Virtual Machine, but Python interpreter?“, because it seems that implementations of both languages contain virtual machines. But! Even in the current moment interpretation of program text is a primary way of Python programs execution. Python implementations exploit virtual machines under the hood exclusively as an optimization technique. Interpretation of binary bytecode in the virtual machine is much more efficient than a direct interpretation of the original program text. At the same time, the presence of the virtual machine in the Python is absolutely transparent for both Python language designers and Python programs developers. The same language can be implemented in interpreters with and without the virtual machine. In the same way, the same programs can be executed in interpreters with and without the virtual machine, and that programs will demonstrate exactly the same behavior and produce equally the same output from the equal input. The only observable difference will be the speed of program execution and the amount of memory consumed by the interpreter. Thus, the virtual machine in Python is not an unavoidable part of the language design, but just an optional extension of the major Python interpreter.

Java can be considered in a similar way. Java under the hood has a JIT compiler and can selectively compile methods of Java class into machine code of the target platform and then directly execute it. But! Java still uses bytecode interpretation as a primary way of Java program execution. Like Python implementations which exploit virtual machines under the hood exclusively as an optimization technique, the Java virtual machines use Just-In-Time compilers exclusively for optimization purposes. Similarly, just because of the fact that direct execution of the machine code at least ten times faster than the interpretation of Java bytecode. And like in the case of Python, the presence of JIT compiler under the hood of JVM is absolutely transparent for both Java language designers and Java program developers. The same Java programming language can be implemented by JVM with and without JIT compiler. And in the same way, the same programs can be executed in JVMs with and without JIT inside, and the same programs will demonstrate exactly the same behavior and produce equally the same output from the equal input on both JVMs (with and without JIT). And like in the case of Python, the only observable difference between them, will be in the speed of execution and in the amount of memory consumed by JVM. And finally, like in the case of Python, JIT in Java also is not an unavoidable part of the language design, but just an optional extension of the major JVM implementations.

From the point of view of design and implementation of virtual machines of Java and Python, they differ significantly, while (attention!) both still stay virtual machines. JVM is an example of a low-level virtual machine with simple basic operations and high instruction dispatch cost. Python in its turn is a high-level virtual machine, for which instructions demonstrate complex behavior, and instruction dispatch cost is not so significant. Java operates with very low abstraction level. JVM operates on the small well-defined set of primitive types and has very tight correspondence (typically one to one) between bytecode instructions and native machine code instructions. In contrary, Python virtual machine operates at high abstraction level, it operates with complex data types (objects) and supports ad-hoc polymorphism, while bytecode instructions expose complex behavior, which can be represented by a series of multiple native machine code instructions. For example, Python supports unbounded range mathematics. Thus Python VM is forced to exploit long arithmetics for potentially big integers for which result of the operation can overflow the machine word. Hence, one bytecode instruction for arithmetics in Python can expose into the function call inside Python VM, while in JVM arithmetic operation will expose into simple operation expressed by one or few native machine instructions.

As a result, we can draw the next conclusions. Java Virtual Machine but Python interpreter is because:

  1. The term of virtual machine assumes binary bytecode interpretation, while the term interpreter assumes program text interpretation.
  2. Historically, Java was designed and implemented for binary bytecode interpretation and Python was initially designed and implemented for program text interpretation. Thus, the term “Java Virtual Machine” is historical and well established in the Java community. And similarly, the term “Python Interpreter” is historical and well established in the Python community. Peoples tend to prolong the tradition and use the same terms that were used long before.
  3. Finally, currently, for Java, binary bytecode interpretation is a primary way of programs execution, while JIT-compilation is just an optional and transparent optimization. And for Python, currently, program text interpretation is a primary way of Python programs execution, while compilation into Python VM bytecode is just an optional and transparent optimization.

Therefore, both Java and Python have virtual machines are binary bytecode interpreters, which can lead to confusion such as “Why Java Virtual Machine, but Python interpreter?“. The key point here is that for Python, a virtual machine is not a primary or necessary means of program execution; it is just an optional extension of the classical text interpreter. On the other hand, a virtual machine is a core and unavoidable part of Java program execution ecosystem. Static or dynamic typing choice for the programming language design affects mainly the virtual machine abstraction level only, but does not dictate whether or not a virtual machine is needed. Languages using both typing systems can be designed to be compiled, interpreted, or executed within the environment of virtual machine, depending on their desired execution model.


回答 6

它们之间没有真正的区别,人们只是遵循创建者选择的约定。

There’s no real difference between them, people just follow the conventions the creators have chosen.


回答 7

别忘了Python具有适用于x86的JIT编译器,这使问题更加混乱。(请参见psyco)。

仅当讨论VM的性能问题时,对“解释语言”进行更严格的解释才有用,例如,与Python相比,Ruby被认为较慢,因为它是一种解释语言,与Python不同-在其他方面话语,情境就是一切。

Don’t forget that Python has JIT compilers available for x86, further confusing the issue. (See psyco).

A more strict interpretation of an ‘interpreted language’ only becomes useful when discussing performance issues of the VM, for example, compared with Python, Ruby was (is?) considered to be slower because it is an interpreted language, unlike Python – in other words, context is everything.


回答 8

Python可以解释代码,而无需将其编译为字节码。Java不能

Python是一种解释型语言,与编译型语言相反,尽管由于字节码编译器的存在,两者之间的区别可能很模糊。这意味着源文件可以直接运行,而无需显式创建然后运行的可执行文件。

(来自文档)。

在Java中,每个文件必须编译为.class文件,然后文件在JVM上运行。相反,python是通过您的主脚本导入的,以帮助加快这些文件的后续使用。

但是,在典型情况下,大多数python(至少是CPython)代码在仿真的堆栈机中运行,其指令与JVM的指令几乎相同,因此没有太大的区别。

但是,造成这种区分的真正原因是,从一开始,java就将其自身标记为“便携式可执行字节码”,而python则将其自身标记为具有REPL的动态解释语言。地名棒!

Python can interpret code without compiling it to bytecode. Java can’t.

Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run.

(from the documentation).

In java, every single file has to be compiled to a .class file, which then runs on the JVM. On the contrary, python does that are imported by your main script, to help speed up subsequent uses of those files.

However, in the typical case, most of the python (at least, CPython) code runs in an emulated stack machine, which has nearly identical instructions to those of the JVM, so there’s no great difference.

The real reason for the distiction however is because, from the beginning, java branded itself as “portable, executable bytecode” and python branded itself as dynamic, interpreted language with a REPL. Names stick!


回答 9

首先,您应该了解,编程或计算机科学通常不是数学,并且对于我们经常使用的大多数术语,我们没有严格的定义。

现在您的问题:

什么是口译员(计算机科学中)

它按最小的可执行单元翻译源代码,然后执行该单元。

什么是虚拟机

对于JVM,虚拟机是一种包含解释器,类加载器,垃圾收集器,线程调度程序,JIT编译器和许多其他东西的软件。

如您所见,解释器是JVM的一部分,而整个JVM不能称为解释器,因为它包含许多其他组件。

为什么在谈论python时使用单词“解释器”

使用java,编译部分是显式的。另一方面,关于java的编译和解释过程,python并不像java那样明确,从最终用户的角度来看,解释是用于执行python程序的唯一机制

First of all you should understand that programming or computer science in general is not mathematics and we don’t have rigorous definitions for most of the terms we use often.

now to your question :

what is an Interpreter (in computer science)

It translates source code by smallest executable unit and then executes that unit.

what is a virtual machine

in case of JVM the virtual machine is a software which contains an Interpreter, class loaders, garbage collector, thread scheduler , JIT compiler and many other things.

as you can see interpreter is a part or JVM and whole JVM can not be called an interpreter because it contains many other components.

why use word “Interpreter” when talking about python

with java the compilation part is explicit. python on the other hand is not explicit as java about its compilation and interpretation process, from end user’s perspective interpretation is the only mechanism used to execute python programs


回答 10

不,他们都不都解释字节码。

如果您使用pypy运行,Python仅解释字节码。否则,它将被编译为C并在该级别进行解释。

Java编译为字节码。

No, they don’t both interpret byte code.

Python only interprets bytecode if you are running with pypy. Otherwise it is compiled into C and interpreted at that level.

Java compiles to bytecode.


回答 11

我认为两者之间的界线是模糊的,人们大多参数“解释器”一词的含义以及该语言与“解释器…编译器”谱的每一侧的距离。没有人能100%赚钱。我认为编写具有任何价值的Java或Python实现很容易。

目前,Java和Python都具有虚拟机和字节码,尽管其中一个使用具体的值大小(例如32位整数)进行操作,而另一个则必须确定每个调用的大小,我认为这并没有定义术语之间的边界。

关于Python没有正式定义字节码并且仅存在于内存中的说法也没有使我信服,因为我正计划开发仅识别Python字节码且编译部分将在浏览器JS机器中完成的设备。

性能只是关于具体的实现。我们不需要知道对象的大小即可使用它,最后,在大多数情况下,我们使用结构而不是基本类型。通过重新使用现有的Python,可以优化Python VM,从而消除在表达式计算期间每次创建新对象的需求。一旦完成,计算两个整数之和之间就不会有全局性能差异,这就是Java的亮点。

两者之间没有致命的区别,只有一些实现上的细微差别和缺乏优化与最终用户无关,也许是在她开始注意到性能落后的时候,但这又是实现而不是体系结构问题。

I think the lines between both are blurred, people mostly argue around meaning of word “interpreter” and how close the language stands to each side of “interpreter…compiler” spectrum. None makes 100% however. I think it is easy to write Java or Python implementation which be of any value of the spectrum.

Currently both Java and Python have virtual machines and bytecode, though one operates by concrete value sizes (like 32-bit integer) while other has to determine the size for each call, which in my opinion doesn’t define the border between the terms.

The argument that Python doesn’t have officially defined bytecode and it exists only in memory also doesn’t convince me, just because I am planning to develop devices which will recognize only Python bytecode and the compilation part will be done in browser JS machine.

Performance is only about the concrete implementation. We don’t need to know the size of the object to be able to work with it, and finally, in most cases, we work with structures, not basic types. It is possible to optimize Python VM in the way that it will eliminate the need of creating new object each time during expression calculation, by reusing existing one. Once it is done, there is no global performance difference between calculating sum of two integers, which is where Java shines.

There is no killer difference between the two, only some implementation nuances and lack of optimization which are irrelevant to the end user, maybe up at the point where she starts to notice performance lags, but again it is implementation and not architecture issue.


回答 12

对于提到python不需要生成字节码的帖子,我不确定那是真的。似乎Python中的所有可调用对象都必须具有.__code__.co_code包含字节码的属性。我看不出有一个有意义的理由将python称为“未编译”,仅仅是因为编译后的工件可能无法保存。并且通常不是通过设计在Python中保存的,例如,所有理解都会为其输入编译新的字节码,这就是为什么理解变量作用域之间compile(mode='exec, ...)和编译(compile(mode='single', ...)例如在运行python脚本和使用pdb之间)不一致的原因

for posts that mention that python does not need to generate byte code, I’m not sure that’s true. it seems that all callables in Python must have a .__code__.co_code attribute which contains the byte code. I don’t see a meaningful reason to call python “not compiled” just because the compiled artifacts may not be saved; and often aren’t saved by design in Python, for example all comprehension compile new bytecode for it’s input, this is the reason comprehension variable scope is not consistent between compile(mode='exec, ...) and compile compile(mode='single', ...) such as between running a python script and using pdb


声明:本站所有文章,如无特殊说明或标注,均为本站原创发布。任何个人或组织,在未征得本站同意时,禁止复制、盗用、采集、发布本站内容到任何网站、书籍等各类媒体平台。如若本站内容侵犯了原著者的合法权益,可联系我们进行处理。