I know that print(e) (where e is an Exception) prints the occurred exception
but, I was trying to find the python equivalent of Java’s e.printStackTrace() that exactly traces the exception to what line it occurred and prints the entire trace of it.
Could anyone please tell me the equivalent of e.printStackTrace() in Python?
import logging...try:
g()exceptExceptionas ex:
logging.exception("Something awful happened!")# will print this message followed by traceback
输出:
ERROR 2007-09-1823:30:19,913 error 1294Something awful happened!Traceback(most recent call last):File"b.py", line 22,in f
g()File"b.py", line 14,in g1/0ZeroDivisionError: integer division or modulo by zero
import logging
...
try:
g()
except Exception as ex:
logging.exception("Something awful happened!")
# will print this message followed by traceback
Output:
ERROR 2007-09-18 23:30:19,913 error 1294 Something awful happened!
Traceback (most recent call last):
File "b.py", line 22, in f
g()
File "b.py", line 14, in g
1/0
ZeroDivisionError: integer division or modulo by zero
try{// code that may raise an error}catch(IOException e){// exception handling
e.printStackTrace();}
在Java中,标准错误流没有缓冲,因此输出立即到达。
Python 2中的相同语义是:
import traceback
import sys
try:# code that may raise an errorpassexceptIOErroras e:# exception handling# in Python 2, stderr is also unbufferedprint>> sys.stderr, traceback.format_exc()# in Python 2, you can also from __future__ import print_functionprint(traceback.format_exc(), file=sys.stderr)# or as the top answer here demonstrates, use:
traceback.print_exc()# which also uses stderr.
Prints this throwable and its backtrace to the standard error stream…
This is used like this:
try
{
// code that may raise an error
}
catch (IOException e)
{
// exception handling
e.printStackTrace();
}
In Java, the Standard Error stream is unbuffered so that output arrives immediately.
The same semantics in Python 2 are:
import traceback
import sys
try: # code that may raise an error
pass
except IOError as e: # exception handling
# in Python 2, stderr is also unbuffered
print >> sys.stderr, traceback.format_exc()
# in Python 2, you can also from __future__ import print_function
print(traceback.format_exc(), file=sys.stderr)
# or as the top answer here demonstrates, use:
traceback.print_exc()
# which also uses stderr.
Python 3
In Python 3, we can get the traceback directly from the exception object (which likely behaves better for threaded code).
Also, stderr is line-buffered, but the print function gets
a flush argument, so this would be immediately printed to stderr:
print(traceback.format_exception(None, # <- type(e) by docs, but ignored
e, e.__traceback__),
file=sys.stderr, flush=True)
Conclusion:
In Python 3, therefore, traceback.print_exc(), although it uses sys.stderrby default, would buffer the output, and you may possibly lose it. So to get as equivalent semantics as possible, in Python 3, use print with flush=True.
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)def f():
a ={'foo':None}# the following line will raise KeyError
b = a['bar']def g():
f()try:
g()exceptExceptionas e:
logger.error(str(e), exc_info=True)
它将输出:
'bar'Traceback(most recent call last):File"<ipython-input-2-8ae09e08766b>", line 18,in<module>
g()File"<ipython-input-2-8ae09e08766b>", line 14,in g
f()File"<ipython-input-2-8ae09e08766b>", line 10,in f
b = a['bar']KeyError:'bar'
Adding to the other great answers, we can use the Python logging library’s debug(), info(), warning(), error(), and critical() methods. Quoting from the docs for Python 3.7.4,
There are three keyword arguments in kwargs which are inspected: exc_info which, if it does not evaluate as false, causes exception information to be added to the logging message.
What this means is, you can use the Python logging library to output a debug(), or other type of message, and the logging library will include the stack trace in its output. With this in mind, we can do the following:
import logging
logger = logging.getLogger()
logger.setLevel(logging.DEBUG)
def f():
a = { 'foo': None }
# the following line will raise KeyError
b = a['bar']
def g():
f()
try:
g()
except Exception as e:
logger.error(str(e), exc_info=True)
And it will output:
'bar'
Traceback (most recent call last):
File "<ipython-input-2-8ae09e08766b>", line 18, in <module>
g()
File "<ipython-input-2-8ae09e08766b>", line 14, in g
f()
File "<ipython-input-2-8ae09e08766b>", line 10, in f
b = a['bar']
KeyError: 'bar'
A virtual machine is a virtual computing environment with a specific set of atomic well defined instructions that are supported independent of any specific language and it is generally thought of as a sandbox unto itself. The VM is analogous to an instruction set of a specific CPU and tends to work at a more fundamental level with very basic building blocks of such instructions (or byte codes) that are independent of the next. An instruction executes deterministically based only on the current state of the virtual machine and does not depend on information elsewhere in the instruction stream at that point in time.
An interpreter on the other hand is more sophisticated in that it is tailored to parse a stream of some syntax that is of a specific language and of a specific grammer that must be decoded in the context of the surrounding tokens. You can’t look at each byte or even each line in isolation and know exactly what to do next. The tokens in the language can’t be taken in isolation like they can relative to the instructions (byte codes) of a VM.
A Java compiler converts Java language into a byte-code stream no different than a C compiler converts C Language programs into assembly code. An interpreter on the other hand doesn’t really convert the program into any well defined intermediate form, it just takes the program actions as a matter of the process of interpreting the source.
Another test of the difference between a VM and an interpreter is whether you think of it as being language independent. What we know as the Java VM is not really Java specific. You could make a compiler from other languages that result in byte codes that can be run on the JVM. On the other hand, I don’t think we would really think of “compiling” some other language other than Python into Python for interpretation by the Python interpreter.
Because of the sophistication of the interpretation process, this can be a relatively slow process….specifically parsing and identifying the language tokens, etc. and understanding the context of the source to be able to undertake the execution process within the interpreter. To help accelerate such interpreted languages, this is where we can define intermediate forms of pre-parsed, pre-tokenized source code that is more readily directly interpreted. This sort of binary form is still interpreted at execution time, it is just starting from a much less human readable form to improve performance. However, the logic executing that form is not a virtual machine, because those codes still can’t be taken in isolation – the context of the surrounding tokens still matter, they are just now in a different more computer efficient form.
Psyco for Python并不是一个完整的Python虚拟机,而是一个即时编译器,它在认为可以编译几行代码的点上劫持了常规的Python虚拟机,主要是在它认为某些原始类型的地方循环即使值在每次迭代中都在变化,变量也将保持不变。在这种情况下,它可以放弃常规虚拟机的某些不可靠的类型确定。但是,您必须小心一点,以免将其从Psyco的脚下拉出来。但是,Pysco通常知道,如果不能完全确定类型不会更改,则只能使用常规虚拟机。
In this post, “virtual machine” refers to process virtual machines, not to
system virtual machines like Qemu or Virtualbox. A process virtual machine is
simply a program which provides a general programming environment — a program
which can be programmed.
Java has an interpreter as well as a virtual machine, and Python has a virtual
machine as well as an interpreter. The reason “virtual machine” is a more
common term in Java and “interpreter” is a more common term in Python has a lot
to do with the major difference between the two languages: static typing
(Java) vs dynamic typing (Python). In this context, “type” refers to
primitive data types — types which suggest the in-memory storage size of
the data. The Java virtual machine has it easy. It requires the programmer to
specify the primitive data type of each variable. This provides sufficient
information for Java bytecode not only to be interpreted and executed by the
Java virtual machine, but even to be compiled into machine instructions.
The Python virtual machine is more complex in the sense that it takes on the
additional task of pausing before the execution of each operation to determine
the primitive data types for each variable or data structure involved in the
operation. Python frees the programmer from thinking in terms of primitive data
types, and allows operations to be expressed at a higher level. The price of
this freedom is performance. “Interpreter” is the preferred term for Python
because it has to pause to inspect data types, and also because the
comparatively concise syntax of dynamically-typed languages is a good fit for
interactive interfaces. There’s no technical barrier to building an interactive
Java interface, but trying to write any statically-typed code interactively
would be tedious, so it just isn’t done that way.
In the Java world, the virtual machine steals the show because it runs programs
written in a language which can actually be compiled into machine instructions,
and the result is speed and resource efficiency. Java bytecode can be executed
by the Java virtual machine with performance approaching that of compiled
programs, relatively speaking. This is due to the presence of primitive data
type information in the bytecode. The Java virtual machine puts Java in a
category of its own:
portable interpreted statically-typed language
The next closest thing is LLVM, but LLVM operates at a different level:
portable interpreted assembly language
The term “bytecode” is used in both Java and Python, but not all bytecode is
created equal. bytecode is just the generic term for intermediate languages
used by compilers/interpreters. Even C compilers like gcc use an intermediate
language (or several) to get the job done. Java bytecode contains
information about primitive data types, whereas Python bytecode does not. In
this respect, the Python (and Bash,Perl,Ruby, etc.) virtual machine truly is
fundamentally slower than the Java virtual machine, or rather, it simply has
more work to do. It is useful to consider what information is contained in
different bytecode formats:
llvm: cpu registers
Java: primitive data types
Python: user-defined types
To draw a real-world analogy: LLVM works with atoms, the Java virtual machine
works with molecules, and The Python virtual machine works with materials.
Since everything must eventually decompose into subatomic particles (real
machine operations), the Python virtual machine has the most complex task.
Intepreters/compilers of statically-typed languages just don’t have the same
baggage that interpreters/compilers of dynamically-typed languages have.
Programmers of statically-typed languages have to take up the slack, for which
the payoff is performance. However, just as all nondeterministic functions are
secretly deterministic, so are all dynamically-typed languages secretly
statically-typed. Performance differences between the two language families
should therefore level out around the time Python changes its name to HAL 9000.
The virtual machines of dynamic languages like Python implement some idealized
logical machine, and don’t necessarily correspond very closely to any real
physical hardware. The Java virtual machine, in contrast, is more similar in
functionality to a classical C compiler, except that instead of emitting
machine instructions, it executes built-in routines. In Python, an integer is
a Python object with a bunch of attributes and methods attached to it. In
Java, an int is a designated number of bits, usually 32. It’s not really a
fair comparison. Python integers should really be compared to the Java
Integer class. Java’s “int” primitive data type can’t be compared to anything in
the Python language, because the Python language simply lacks this layer of
primitives, and so does Python bytecode.
Because Java variables are explicitly typed, one can reasonably expect
something like Jython performance to be in the same ballpark as
cPython. On the other hand, a Java virtual machine implemented in Python
is almost guaranteed to be slower than mud. And don’t expect Ruby, Perl, etc.,
to fare any better. They weren’t designed to do that. They were designed for
“scripting”, which is what programming in a dynamic language is called.
Every operation that takes place in a virtual machine eventually has to hit real hardware. Virtual machines contain pre-compiled routines which are general enough to to execute any combination of logical operations. A virtual machine may not be emitting new machine instructions, but it certainly is executing its own routines over and over in arbirtrarily complex sequences. The Java virtual machine, the Python virtual machine, and all the other general-purpose virtual machines out there are equal in the sense that they can be coaxed into performing any logic you can dream up, but they are different in terms of what tasks they take on, and what tasks they leave to the programmer.
Psyco for Python is not a full Python virtual machine, but a just-in-time
compiler that hijacks the regular Python virtual machine at points it thinks it
can compile a few lines of code — mainly loops where it thinks the primitive
type of some variable will remain constant even if the value is changing with
each iteration. In that case, it can forego some of the incessent type
determination of the regular virtual machine. You have to be a little careful,
though, lest you pull the type out from under Psyco’s feet. Pysco, however,
usually knows to just fall back to the regular virtual machine if it isn’t
completely confident the type won’t change.
The moral of the story is that primitive data type information is really
helpful to a compiler/virtual machine.
Finally, to put it all in perspective consider this: a Python program executed
by a Python interpreter/virtual machine implemented in Java running on a Java
interpreter/virtual machine implemented in LLVM running in a qemu virtual
machine running on an iPhone.
Probably one reason for the different terminology is that one normally thinks of feeding the python interpreter raw human-readable source code and not worrying about bytecode and all that.
In Java, you have to explicitly compile to bytecode and then run just the bytecode, not source code on the VM.
Even though Python uses a virtual machine under the covers, from a user’s perspective, one can ignore this detail most of the time.
Interpreter, translates source code into some efficient intermediate representation (code) and immediately executes this.
Virtual Machine, explicitly executes stored pre-compiled code built by a compiler which is part of the interpreter system.
A very important characteristic of a virtual machine is that the software running inside, is limited to the resources provided by the virtual machine. Precisely, it cannot break out of its virtual world. Think of secure execution of remote code, Java Applets.
In case of python, if we are keeping pyc files, as mentioned in the comment of this post, then the mechanism would become more like a VM, and this bytecode executes faster — it would still be interpreted but from a much computer friendlier form. If we look at this as a whole, PVM is a last step of Python Interpreter.
The bottomline is, when refer Python Interpreter, it means we are referring it as a whole, and when we say PVM, that means we are just talking about a part of Python Interpreter, a runtime-environment. Similar to that of Java, we refer different parts differentyl, JRE, JVM, JDK, etc.
The term interpreter is a legacy term dating back to earlier shell scripting languages. As “scripting languages” have evolved into full featured languages and their corresponding platforms have become more sophisticated and sandboxed, the distinction between a virtual machine and an interpreter (in the Python sense), is very small or non-existent.
The Python interpreter still functions in the same way as a shell script, in the sense that it can be executed without a separate compile step. Beyond that, the differences between Python’s interpreter (or Perl or Ruby’s) and Java’s virtual machine are mostly implementation details. (One could argue that Java is more fully sandboxed than Python, but both ultimately provide access to the underlying architecture via a native C interface.)
To provide a deep answer to the question “Why Java Virtual Machine, but Python interpreter?” let’s try to go back to the field of compilation theory as to the starting point of the discussion.
The typical process of program compilation includes next steps:
Lexical analysis. Splits program text into meaningful “words” called tokens (as part of the process all comments, spaces, new-lines etc. are removed, because they do not affect program behavior). The result is an ordered stream of tokens.
Syntax analysis. Builds the so-called Abstract Syntax Tree (AST) from the stream of tokens. AST establish relations between tokens and, as a consequence, defines an order of evaluation of the program.
Semantic analysis. Verifies semantical correctness of the AST using information about types and a set of semantical rules of the programming language. (For example, a = b + c is a correct statement from the syntaxis point of view, but completely incorrect from the semantic point of view if a was declared as a constant object)
Intermediate code generation. Serializes AST into the linearly ordered stream of machine independent “primitive” operations. In fact, code generator traverses AST and logs the order of evaluation steps. As a result, from the tree-like representation of the program, we achieve much more simple list-like representation in which order of program evaluation is preserved.
Machine code generation. The program in the form of machine independent “primitive” bytecode is translated into machine code of particular processor architecture.
Ok. Lets now define the terms.
Interpreter, in the classical meaning of that word, assumes execution based on the program evaluation based on AST produced directly from the program text. In that case, a program is distributed in the form of source code and the interpreter is fed by program text, frequently in a dynamic way (statement-by-statement or line-by-line). For each input statement, interpreter builds its AST and immediately evaluates it changing the “state” of the program. This is a typical behavior demonstrated by scripting languages. Consider for example Bash, Windows CMD etc. Conceptually, Python takes this way too.
If we replace the AST-based execution step on the generation of intermediate machine-independent binary bytecode step in the interpreter we will split the entire process of program execution into two separate phases: compilation and execution. In that case what previously was an interpreter will become a bytecode compiler, which will transform the program from the form of the text into some binary form. Then the program is distributed in that binary form, but not in the form of source code. On the user machine, that bytecode is fed into a new entity — virtual machine, which in fact interpret that bytecode. Due to this, virtual machines are also called bytecode interpreter. But put your attention here! A classical interpreter is a text interpreter, but a virtual machine is a binary interpreter! This is an approach taken by Java and C#.
Finally, if we add the machine code generation to the bytecode compiler we achieve in result what we call a classical compiler. A classical compiler converts the program source code into the machine code of a particular processor. That machine code then can be directly executed on the target processor without any additional mediation (without any kind of interpreter neither text interpreter nor binary interpreter).
Lets now go back to the original question and consider Java vs Python.
Java was initially designed to have as few implementation dependencies as possible. Its design is based on the principle “write once, run anywhere” (WORA). To implement it, Java was initially designed as a programming language that compiles into machine-independent binary bytecode, which then can be executed on all platforms that support Java without the need for its recompilation. You can think about Java like about WORA-based C++. Actually, Java is closer to C++ than to the scripting languages like Python. But in contrast to C++, Java was designed to be compiled into binary bytecode which then is executed in the environment of the virtual machine, while C++ was designed to be compiled in machine code and then directly executed by the target processor.
Python was initially designed as a kind of scripting programing language which interprets scripts (programs in the form of the text written in accordance with the programming language rules). Due to this, Python has initially supported a dynamic interpretation of one-line commands or statements, as the Bash or Windows CMD do. For the same reason, initial implementations of Python had not any kind of bytecode compilers and virtual machines for execution of such bytecode inside, but from the start Python had required interpreter which is capable to understand and evaluate Python program text.
Due to this, historically, Java developers tended to talk about Java Virtual Machine (because initially, Java has come as package of Java bytecode compiler and bytecode interpreter — JVM), and Python developers tended to talk about Python interpreter (because initially Python has not any virtual machine and was a kind of classical text interpreter that executes program text directly without any sort of compilation or transformation into any form of binary code).
Currently, Python also has the virtual machine under the hood and can compile and interpret Python bytecode. And that fact makes an additional investment into the confusion “Why Java Virtual Machine, but Python interpreter?“, because it seems that implementations of both languages contain virtual machines.
But! Even in the current moment interpretation of program text is a primary way of Python programs execution. Python implementations exploit virtual machines under the hood exclusively as an optimization technique. Interpretation of binary bytecode in the virtual machine is much more efficient than a direct interpretation of the original program text. At the same time, the presence of the virtual machine in the Python is absolutely transparent for both Python language designers and Python programs developers. The same language can be implemented in interpreters with and without the virtual machine. In the same way, the same programs can be executed in interpreters with and without the virtual machine, and that programs will demonstrate exactly the same behavior and produce equally the same output from the equal input. The only observable difference will be the speed of program execution and the amount of memory consumed by the interpreter. Thus, the virtual machine in Python is not an unavoidable part of the language design, but just an optional extension of the major Python interpreter.
Java can be considered in a similar way. Java under the hood has a JIT compiler and can selectively compile methods of Java class into machine code of the target platform and then directly execute it. But! Java still uses bytecode interpretation as a primary way of Java program execution. Like Python implementations which exploit virtual machines under the hood exclusively as an optimization technique, the Java virtual machines use Just-In-Time compilers exclusively for optimization purposes. Similarly, just because of the fact that direct execution of the machine code at least ten times faster than the interpretation of Java bytecode. And like in the case of Python, the presence of JIT compiler under the hood of JVM is absolutely transparent for both Java language designers and Java program developers. The same Java programming language can be implemented by JVM with and without JIT compiler. And in the same way, the same programs can be executed in JVMs with and without JIT inside, and the same programs will demonstrate exactly the same behavior and produce equally the same output from the equal input on both JVMs (with and without JIT). And like in the case of Python, the only observable difference between them, will be in the speed of execution and in the amount of memory consumed by JVM. And finally, like in the case of Python, JIT in Java also is not an unavoidable part of the language design, but just an optional extension of the major JVM implementations.
From the point of view of design and implementation of virtual machines of Java and Python, they differ significantly, while (attention!) both still stay virtual machines. JVM is an example of a low-level virtual machine with simple basic operations and high instruction dispatch cost. Python in its turn is a high-level virtual machine, for which instructions demonstrate complex behavior, and instruction dispatch cost is not so significant. Java operates with very low abstraction level. JVM operates on the small well-defined set of primitive types and has very tight correspondence (typically one to one) between bytecode instructions and native machine code instructions. In contrary, Python virtual machine operates at high abstraction level, it operates with complex data types (objects) and supports ad-hoc polymorphism, while bytecode instructions expose complex behavior, which can be represented by a series of multiple native machine code instructions. For example, Python supports unbounded range mathematics. Thus Python VM is forced to exploit long arithmetics for potentially big integers for which result of the operation can overflow the machine word. Hence, one bytecode instruction for arithmetics in Python can expose into the function call inside Python VM, while in JVM arithmetic operation will expose into simple operation expressed by one or few native machine instructions.
As a result, we can draw the next conclusions. Java Virtual Machine but Python interpreter is because:
The term of virtual machine assumes binary bytecode interpretation, while the term interpreter assumes program text interpretation.
Historically, Java was designed and implemented for binary bytecode interpretation and Python was initially designed and implemented for program text interpretation. Thus, the term “Java Virtual Machine” is historical and well established in the Java community. And similarly, the term “Python Interpreter” is historical and well established in the Python community. Peoples tend to prolong the tradition and use the same terms that were used long before.
Finally, currently, for Java, binary bytecode interpretation is a primary way of programs execution, while JIT-compilation is just an optional and transparent optimization. And for Python, currently, program text interpretation is a primary way of Python programs execution, while compilation into Python VM bytecode is just an optional and transparent optimization.
Therefore, both Java and Python have virtual machines are binary bytecode interpreters, which can lead to confusion such as “Why Java Virtual Machine, but Python interpreter?“. The key point here is that for Python, a virtual machine is not a primary or necessary means of program execution; it is just an optional extension of the classical text interpreter. On the other hand, a virtual machine is a core and unavoidable part of Java program execution ecosystem. Static or dynamic typing choice for the programming language design affects mainly the virtual machine abstraction level only, but does not dictate whether or not a virtual machine is needed. Languages using both typing systems can be designed to be compiled, interpreted, or executed within the environment of virtual machine, depending on their desired execution model.
Don’t forget that Python has JIT compilers available for x86, further confusing the issue. (See psyco).
A more strict interpretation of an ‘interpreted language’ only becomes useful when discussing performance issues of the VM, for example, compared with Python, Ruby was (is?) considered to be slower because it is an interpreted language, unlike Python – in other words, context is everything.
Python can interpret code without compiling it to bytecode. Java can’t.
Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run.
(from the documentation).
In java, every single file has to be compiled to a .class file, which then runs on the JVM. On the contrary, python does that are imported by your main script, to help speed up subsequent uses of those files.
However, in the typical case, most of the python (at least, CPython) code runs in an emulated stack machine, which has nearly identical instructions to those of the JVM, so there’s no great difference.
The real reason for the distiction however is because, from the beginning, java branded itself as “portable, executable bytecode” and python branded itself as dynamic, interpreted language with a REPL. Names stick!
First of all you should understand that programming or computer science in general is not mathematics and we don’t have rigorous definitions for most of the terms we use often.
now to your question :
what is an Interpreter (in computer science)
It translates source code by smallest executable unit and then executes that unit.
what is a virtual machine
in case of JVM the virtual machine is a software which contains an Interpreter, class loaders, garbage collector, thread scheduler , JIT compiler and many other things.
as you can see interpreter is a part or JVM and whole JVM can not be called an interpreter because it contains many other components.
why use word “Interpreter” when talking about python
with java the compilation part is explicit.
python on the other hand is not explicit as java about its compilation and interpretation process, from end user’s perspective interpretation is the only mechanism used to execute python programs
I think the lines between both are blurred, people mostly argue around meaning of word “interpreter” and how close the language stands to each side of “interpreter…compiler” spectrum. None makes 100% however. I think it is easy to write Java or Python implementation which be of any value of the spectrum.
Currently both Java and Python have virtual machines and bytecode, though one operates by concrete value sizes (like 32-bit integer) while other has to determine the size for each call, which in my opinion doesn’t define the border between the terms.
The argument that Python doesn’t have officially defined bytecode and it exists only in memory also doesn’t convince me, just because I am planning to develop devices which will recognize only Python bytecode and the compilation part will be done in browser JS machine.
Performance is only about the concrete implementation. We don’t need to know the size of the object to be able to work with it, and finally, in most cases, we work with structures, not basic types. It is possible to optimize Python VM in the way that it will eliminate the need of creating new object each time during expression calculation, by reusing existing one. Once it is done, there is no global performance difference between calculating sum of two integers, which is where Java shines.
There is no killer difference between the two, only some implementation nuances and lack of optimization which are irrelevant to the end user, maybe up at the point where she starts to notice performance lags, but again it is implementation and not architecture issue.
for posts that mention that python does not need to generate byte code, I’m not sure that’s true. it seems that all callables in Python must have a .__code__.co_code attribute which contains the byte code. I don’t see a meaningful reason to call python “not compiled” just because the compiled artifacts may not be saved; and often aren’t saved by design in Python, for example all comprehension compile new bytecode for it’s input, this is the reason comprehension variable scope is not consistent between compile(mode='exec, ...) and compile compile(mode='single', ...) such as between running a python script and using pdb
I think I understand strong typing, but every time I look for examples for what is weak typing I end up finding examples of programming languages that simply coerce/convert types automatically.
1 + "1"
Traceback (most recent call last):
File "", line 1, in ?
TypeError: unsupported operand type(s) for +: 'int' and 'str'
However, such thing is possible in Java and in C#, and we do not consider them weakly typed just for that.
Java
int a = 10;
String b = "b";
String result = a + b;
System.out.println(result);
C#
int a = 10;
string b = "b";
string c = a + b;
Console.WriteLine(c);
In this another article named Weakly Type Languages the author says that Perl is weakly typed simply because I can concatenate a string to a number and viceversa without any explicit conversion.
Perl
$a=10;
$b="a";
$c=$a.$b;
print $c; #10a
So the same example makes Perl weakly typed, but not Java and C#?.
Gee, this is confusing
The authors seem to imply that a language that prevents the application of certain operations on values of different types is strongly typed and the contrary means weakly typed.
Therefore, at some point I have felt prompted to believe that if a language provides a lot of automatic conversions or coercion between types (as perl) may end up being considered weakly typed, whereas other languages that provide only a few conversions may end up being considered strongly typed.
I am inclined to believe, though, that I must be wrong in this interepretation, I just do not know why or how to explain it.
So, my questions are:
What does it really mean for a language to be truly weakly typed?
Could you mention any good examples of weakly typing that are not related to automatic conversion/automatic coercion done by the language?
Can a language be weakly typed and strongly typed at the same time?
What does it really mean for a language to be “weakly typed”?
It means “this language uses a type system that I find distasteful”. A “strongly typed” language by contrast is a language with a type system that I find pleasant.
The terms are essentially meaningless and you should avoid them. Wikipedia lists eleven different meanings for “strongly typed”, several of which are contradictory. This indicates that the odds of confusion being created are high in any conversation involving the term “strongly typed” or “weakly typed”.
All that you can really say with any certainty is that a “strongly typed” language under discussion has some additional restriction in the type system, either at runtime or compile time, that a “weakly typed” language under discussion lacks. What that restriction might be cannot be determined without further context.
Instead of using “strongly typed” and “weakly typed”, you should describe in detail what kind of type safety you mean. For example, C# is a statically typed language and a type safe language and a memory safe language, for the most part. C# allows all three of those forms of “strong” typing to be violated. The cast operator violates static typing; it says to the compiler “I know more about the runtime type of this expression than you do”. If the developer is wrong, then the runtime will throw an exception in order to protect type safety. If the developer wishes to break type safety or memory safety, they can do so by turning off the type safety system by making an “unsafe” block. In an unsafe block you can use pointer magic to treat an int as a float (violating type safety) or to write to memory you do not own. (Violating memory safety.)
C# imposes type restrictions that are checked at both compile-time and at runtime, thereby making it a “strongly typed” language compared to languages that do less compile-time checking or less runtime checking. C# also allows you to in special circumstances do an end-run around those restrictions, making it a “weakly typed” language compared with languages which do not allow you to do such an end-run.
Which is it really? It is impossible to say; it depends on the point of view of the speaker and their attitude towards the various language features.
As others have noted, the terms “strongly typed” and “weakly typed” have so many different meanings that there’s no single answer to your question. However, since you specifically mentioned Perl in your question, let me try to explain in what sense Perl is weakly typed.
The point is that, in Perl, there is no such thing as an “integer variable”, a “float variable”, a “string variable” or a “boolean variable”. In fact, as far as the user can (usually) tell, there aren’t even integer, float, string or boolean values: all you have are “scalars”, which are all of these things at the same time. So you can, for example, write:
Of course, as you correctly note, all of this can be seen as just type coercion. But the point is that, in Perl, types are always coerced. In fact, it’s quite hard for a user to tell what the internal “type” of a variable might be: at line 2 in my example above, asking whether the value of $bar is the string "9" or the number 9 is pretty much meaningless, since, as far as Perl is concerned, those are the same thing. Indeed, it’s even possible for a Perl scalar to internally have both a string and a numeric value at the same time, as is e.g. the case for $foo after line 2 above.
The flip side of all this is that, since Perl variables are untyped (or, rather, don’t expose their internal type to the user), operators cannot be overloaded to do different things for different types of arguments; you can’t just say “this operator will do X for numbers and Y for strings”, because the operator can’t (won’t) tell which kind of values its arguments are.
Thus, for example, Perl has and needs both a numeric addition operator (+) and a string concatenation operator (.): as you saw above, it’s perfectly fine to add strings ("1" + "2" == "3") or to concatenate numbers (1 . 2 == 12). Similarly, the numeric comparison operators ==, !=, <, >, <=, >= and <=> compare the numeric values of their arguments, while the string comparison operators eq, ne, lt, gt, le, ge and cmp compare them lexicographically as strings. So 2 < 10, but 2 gt 10 (but "02" lt 10, while "02" == 2). (Mind you, certain other languages, like JavaScript, try to accommodate Perl-like weak typing while also doing operator overloading. This often leads to ugliness, like the loss of associativity for +.)
(The fly in the ointment here is that, for historical reasons, Perl 5 does have a few corner cases, like the bitwise logical operators, whose behavior depends on the internal representation of their arguments. Those are generally considered an annoying design flaw, since the internal representation can change for surprising reasons, and so predicting just what those operators do in a given situation can be tricky.)
All that said, one could argue that Perl does have strong types; they’re just not the kind of types you might expect. Specifically, in addition to the “scalar” type discussed above, Perl also has two structured types: “array” and “hash”. Those are very distinct from scalars, to the point where Perl variables have different sigils indicating their type ($ for scalars, @ for arrays, % for hashes)1. There are coercion rules between these types, so you can write e.g. %foo = @bar, but many of them are quite lossy: for example, $foo = @bar assigns the length of the array @bar to $foo, not its contents. (Also, there are a few other strange types, like typeglobs and I/O handles, that you don’t often see exposed.)
Also, a slight chink in this nice design is the existence of reference types, which are a special kind of scalars (and which can be distinguished from normal scalars, using the ref operator). It’s possible to use references as normal scalars, but their string/numeric values are not particularly useful, and they tend to lose their special reference-ness if you modify them using normal scalar operations. Also, any Perl variable2 can be blessed to a class, turning it into an object of that class; the OO class system in Perl is somewhat orthogonal to the primitive type (or typelessness) system described above, although it’s also “weak” in the sense of following the duck typing paradigm. The general opinion is that, if you find yourself checking the class of an object in Perl, you’re doing something wrong.
1 Actually, the sigil denotes the type of the value being accessed, so that e.g. the first scalar in the array @foo is denoted $foo[0]. See perlfaq4 for more details.
2 Objects in Perl are (normally) accessed through references to them, but what actually gets blessed is the (possibly anonymous) variable the reference points to. However, the blessing is indeed a property of the variable, not of its value, so e.g. that assigning the actual blessed variable to another one just gives you a shallow, unblessed copy of it. See perlobj for more details.
In addition to what Eric has said, consider the following C code:
void f(void* x);
f(42);
f("hello");
In contrast to languages such as Python, C#, Java or whatnot, the above is weakly typed because we lose type information. Eric correctly pointed out that in C# we can circumvent the compiler by casting, effectively telling it “I know more about the type of this variable than you”.
But even then, the runtime will still check the type! If the cast is invalid, the runtime system will catch it and throw an exception.
With type erasure, this doesn’t happen – type information is thrown away. A cast to void* in C does exactly that. In this regard, the above is fundamentally different from a C# method declaration such as void f(Object x).
(Technically, C# also allows type erasure through unsafe code or marshalling.)
This is as weakly typed as it gets. Everything else is just a matter of static vs. dynamic type checking, i.e. of the time when a type is checked.
Generally strong typing implies that the programming language places severe restrictions on the intermixing that is permitted to occur.
Weak Typing
a = 2
b = "2"
concatenate(a, b) # returns "22"
add(a, b) # returns 4
Strong Typing
a = 2
b = "2"
concatenate(a, b) # Type Error
add(a, b) # Type Error
concatenate(str(a), b) #Returns "22"
add(a, int(b)) # Returns 4
Notice that a weak typing language can intermix different types without errors. A strong type language requires the input types to be the expected types. In a strong type language a type can be converted (str(a) converts an integer to a string) or cast (int(b)).
I would like to contribute to the discussion with my own research on the subject, as others comment and contribute I have been reading their answers and following their references and I have found interesting information. As suggested, it is probable that most of this would be better discussed in the Programmers forum, since it appears to be more theoretical than practical.
A type may be viewed as a set of clothes (or a suit of armor) that
protects an underlying untyped representation from arbitrary or
unintended use. It provides a protective covering that hides the
underlying representation and constrains the way objects may interact
with other objects. In an untyped system untyped objects are naked
in that the underlying representation is exposed for all to see.
Violating the type system involves removing the protective set of
clothing and operating directly on the naked representation.
This statement seems to suggest that weakly typing would let us access the inner structure of a type and manipulate it as if it was something else (another type). Perhaps what we could do with unsafe code (mentioned by Eric) or with c type-erased pointers mentioned by Konrad.
The article continues…
Languages in which all expressions are type-consistent are called
strongly typed languages. If a language is strongly typed its compiler
can guarantee that the programs it accepts will execute without type
errors. In general, we should strive for strong typing, and adopt
static typing whenever possible. Note that every statically typed
language is strongly typed but the converse is not necessarily true.
As such, strong typing means the absence of type errors, I can only assume that weak typing means the contrary: the likely presence of type errors. At runtime or compile time? Seems irrelevant here.
Funny thing, as per this definition, a language with powerful type coercions like Perl would be considered strongly typed, because the system is not failing, but it is dealing with the types by coercing them into appropriate and well defined equivalences.
On the other hand, could I say than the allowance of ClassCastException and ArrayStoreException (in Java) and InvalidCastException, ArrayTypeMismatchException (in C#) would indicate a level of weakly typing, at least at compile time? Eric’s answer seems to agree with this.
In a second article named Typeful Programming provided in one of the references provided in one of the answers in this question, Luca Cardelli delves into the concept of type violations:
Most system programming languages allow arbitrary type violations,
some indiscriminately, some only in restricted parts of a program.
Operations that involve type violations are called unsound. Type
violations fall in several classes [among which we can mention]:
Basic-value coercions: These include conversions between integers, booleans, characters, sets, etc. There is no need for type violations
here, because built-in interfaces can be provided to carry out the
coercions in a type-sound way.
As such, type coercions like those provided by operators could be considered type violations, but unless they break the consistency of the type system, we might say that they do not lead to a weakly typed system.
Based on this neither Python, Perl, Java or C# are weakly typed.
Cardelli mentions two type vilations that I very well consider cases of truly weak typing:
Address arithmetic. If necessary, there should be a built-in (unsound) interface, providing the adequate operations on addresses
and type conversions. Various situations involve pointers into the
heap (very dangerous with relocating collectors), pointers to the
stack, pointers to static areas, and pointers into other address
spaces. Sometimes array indexing can replace address arithmetic.
Memory mapping. This involves looking at an area of memory as an unstructured array, although it contains structured data. This is
typical of memory allocators and collectors.
This kind of things possible in languages like C (mentioned by Konrad) or through unsafe code in .Net (mentioned by Eric) would truly imply weakly typing.
I believe the best answer so far is Eric’s, because the definition of this concepts is very theoretical, and when it comes to a particular language, the interpretations of all these concepts may lead to different debatable conclusions.
Weak typing does indeed mean that a high percentage of types can be implicitly coerced, attempting to guess what the coder intended.
Strong typing means that types are not coerced, or at least coerced less.
Static typing means your variables’ types are determined at compile time.
Many people have recently been confusing “manifestly typed” with “strongly typed”. “Manifestly typed” means that you declare your variables’ types explicitly.
Python is mostly strongly typed, though you can use almost anything in a boolean context, and booleans can be used in an integer context, and you can use an integer in a float context. It is not manifestly typed, because you don’t need to declare your types (except for Cython, which isn’t entirely python, albeit interesting). It is also not statically typed.
C and C++ are manifestly typed, statically typed, and somewhat strongly typed, because you declare your types, types are determined at compile time, and you can mix integers and pointers, or integers and doubles, or even cast a pointer to one type into a pointer to another type.
Haskell is an interesting example, because it is not manifestly typed, but it’s also statically and strongly typed.
packageFoo;packageBar;my $val =42;# $val is now a scalar value set from double
bless \$val,Foo;# all references to $val now belong to class Foomy $obj = \$val;# now $obj refers to the SV stored in $val# thus this prints: Foo=SCALAR(0x1c7d8c8)print \$val,"\n";# all references to $val now belong to class Bar
bless \$val,Bar;# thus this prints Bar=SCALAR(0x1c7d8c8)print \$val,"\n";# we change the value stored in $val from number to a string
$val ='abc';# yet still the SV is blessed: Bar=SCALAR(0x1c7d8c8)print \$val,"\n";# and on the course, the $obj now refers to a "Bar" even though# at the time of copying it did refer to a "Foo".print $obj,"\n";
The strong <=> weak typing is not only about the continuum on how much or how little of the values are coerced automatically by the language for one datatype to another, but how strongly or weakly the actual values are typed. In Python and Java, and mostly in C#, the values have their types set in stone. In Perl, not so much – there are really only a handful of different valuetypes to store in a variable.
Let’s open the cases one by one.
Python
In Python example 1 + "1", + operator calls the __add__ for type int giving it the string "1" as an argument – however, this results in NotImplemented:
>>> (1).__add__('1')
NotImplemented
Next, the interpreter tries the __radd__ of str:
>>> '1'.__radd__(1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'str' object has no attribute '__radd__'
As it fails, the + operator fails with the the result TypeError: unsupported operand type(s) for +: 'int' and 'str'. As such, the exception does not say much about strong typing, but the fact that the operator +does not coerce its arguments automatically to the same type, is a pointer to the fact that Python is not the most weakly typed language in the continuum.
On the other hand, in Python 'a' * 5is implemented:
>>> 'a' * 5
'aaaaa'
That is,
>>> 'a'.__mul__(5)
'aaaaa'
The fact that the operation is different requires some strong typing – however the opposite of * coercing the values to numbers before multiplying still would not necessarily make the values weakly typed.
Java
The Java example, String result = "1" + 1; works only because as a fact of convenience, the operator + is overloaded for strings. The Java + operator replaces the sequence with creating a StringBuilder (see this):
String result = a + b;
// becomes something like
String result = new StringBuilder().append(a).append(b).toString()
This is rather an example of very static typing, without no actual coercion – StringBuilder has a method append(Object) that is specifically used here. The documentation says the following:
Appends the string representation of the Object argument.
The overall effect is exactly as if the argument were converted to a
string by the method String.valueOf(Object), and the characters of
that string were then appended to this character sequence.
Returns the string representation of the Object argument.
[Returns] if the argument is null, then a string equal to "null"; otherwise, the value of obj.toString() is returned.
Thus this is a case of absolutely no coercion by the language – delegating every concern to the objects itself.
C#
According to the Jon Skeet answer here, operator + is not even overloaded for the string class – akin to Java, this is just convenience generated by the compiler, thanks to both static and strong typing.
Perl has three built-in data types: scalars, arrays of scalars, and associative arrays of scalars, known as “hashes”. A scalar is a single string (of any size, limited only by the available memory), number, or a reference to something (which will be discussed in perlref). Normal arrays are ordered lists of scalars indexed by number, starting with 0. Hashes are unordered collections of scalar values indexed by their associated string key.
Perl however does not have a separate data type for numbers, booleans, strings, nulls, undefineds, references to other objects etc – it just has one type for these all, the scalar type; 0 is a scalar value as much as is “0”. A scalar variable that was set as a string can really change into a number, and from there on behave differently from “just a string”, if it is accessed in a numerical context. The scalar can hold anything in Perl, it is as much the object as it exists in the system. whereas in Python the names just refers to the objects, in Perl the scalar values in the names are changeable objects. Furthermore, the Object Oriented Type system is glued on top of this: there are just 3 datatypes in perl – scalars, lists and hashes. A user defined object in Perl is a reference (that is a pointer to any of the 3 previous) blessed to a package – you can take any such value and bless it to any class at any instant you want.
Perl even allows you to change the classes of values at whim – this is not possible in Python where to create a value of some class you need to explicitly construct the value belonging to that class with object.__new__ or similar. In Python you cannot really change the essence of the object after the creation, in Perl you can do much anything:
package Foo;
package Bar;
my $val = 42;
# $val is now a scalar value set from double
bless \$val, Foo;
# all references to $val now belong to class Foo
my $obj = \$val;
# now $obj refers to the SV stored in $val
# thus this prints: Foo=SCALAR(0x1c7d8c8)
print \$val, "\n";
# all references to $val now belong to class Bar
bless \$val, Bar;
# thus this prints Bar=SCALAR(0x1c7d8c8)
print \$val, "\n";
# we change the value stored in $val from number to a string
$val = 'abc';
# yet still the SV is blessed: Bar=SCALAR(0x1c7d8c8)
print \$val, "\n";
# and on the course, the $obj now refers to a "Bar" even though
# at the time of copying it did refer to a "Foo".
print $obj, "\n";
thus the type identity is weakly bound to the variable, and it can be changed through any reference on the fly. In fact, if you do
my $another = $val;
\$another does not have the class identity, even though \$val will still give the blessed reference.
TL;DR
There are much more about weak typing to Perl than just automatic coercions, and it is more about that the types of the values themselves are not set into stone, unlike the Python which is dynamically yet very strongly typed language. That python gives TypeError on 1 + "1" is an indication that the language is strongly typed, even though the contrary one of doing something useful, as in Java or C# does not preclude them being strongly typed languages.
As many others have expressed, the entire notion of “strong” vs “weak” typing is problematic.
As a archetype, Smalltalk is very strongly typed — it will always raise an exception if an operation between two objects is incompatible. However, I suspect few on this list would call Smalltalk a strongly-typed language, because it is dynamically typed.
I find the notion of “static” versus “dynamic” typing more useful than “strong” versus “weak.” A statically-typed language has all the types figured out at compile-time, and the programmer has to explicitly declare if otherwise.
Contrast with a dynamically-typed language, where typing is performed at run-time. This is typically a requirement for polymorphic languages, so that decisions about whether an operation between two objects is legal does not have to be decided by the programmer in advance.
In polymorphic, dynamically-typed languages (like Smalltalk and Ruby), it’s more useful to think of a “type” as a “conformance to protocol.” If an object obeys a protocol the same way another object does — even if the two objects do not share any inheritance or mixins or other voodoo — they are considered the same “type” by the run-time system. More correctly, an object in such systems is autonomous, and can decide if it makes sense to respond to any particular message referring to any particular argument.
Want an object that can make some meaningful response to the message “+” with an object argument that describes the colour blue? You can do that in dynamically-typed languages, but it is a pain in statically-typed languages.
I like @Eric Lippert’s answer, but to address the question – strongly typed languages typically have explicit knowledge of the types of variables at each point of the program. Weakly typed languages do not, so they can attempt to perform an operation that may not be possible for a particular type.
It think the easiest way to see this is in a function.
C++:
void func(string a) {...}
The variable a is known to be of type string and any incompatible operation will be caught at compile time.
Python:
def func(a)
...
The variable a could be anything and we can have code that calls an invalid method, which will only get caught at runtime.
I’m hoping someone can provide some insight as to what’s fundamentally different about the Java Virtual Machine that allows it to implement threads nicely without the need for a Global Interpreter Lock (GIL), while Python necessitates such an evil.
Python (the language) doesn’t need a GIL (which is why it can perfectly be implemented on JVM [Jython] and .NET [IronPython], and those implementations multithread freely). CPython (the popular implementation) has always used a GIL for ease of coding (esp. the coding of the garbage collection mechanisms) and of integration of non-thread-safe C-coded libraries (there used to be a ton of those around;-).
The Unladen Swallow project, among other ambitious goals, does plan a GIL-free virtual machine for Python — to quote that site, “In addition, we intend to remove the GIL and fix the state of multithreading in Python. We believe this is possible through the implementation of a more sophisticated GC system, something like IBM’s Recycler (Bacon et al, 2001).”
The JVM (at least hotspot) does have a similar concept to the “GIL”, it’s just much finer in its lock granularity, most of this comes from the GC’s in hotspot which are more advanced.
In CPython it’s one big lock (probably not that true, but good enough for arguments sake), in the JVM it’s more spread about with different concepts depending on where it is used.
Take a look at, for example, vm/runtime/safepoint.hpp in the hotspot code, which is effectively a barrier. Once at a safepoint the entire VM has stopped with regard to java code, much like the python VM stops at the GIL.
In the Java world such VM pausing events are known as “stop-the-world”, at these points only native code that is bound to certain criteria is free running, the rest of the VM has been stopped.
Also the lack of a coarse lock in java makes JNI much more difficult to write, as the JVM makes less guarantees about its environment for FFI calls, one of the things that cpython makes fairly easy (although not as easy as using ctypes).
There is a comment down below in this blog post http://www.grouplens.org/node/244 that hints at the reason why it was so easy dispense with a GIL for IronPython or Jython, it is that CPython uses reference counting whereas the other 2 VMs have garbage collectors.
The exact mechanics of why this is so I don’t get, but it does sounds like a plausible reason.
… “Parts of the Interpreter aren’t threadsafe, though mostly because making them all threadsafe by massive lock usage would slow single-threaded extremely (source). This seems to be related to the CPython garbage collector using reference counting (the JVM and CLR don’t, and therefore don’t need to lock/release a reference count every time). But even if someone thought of an acceptable solution and implemented it, third party libraries would still have the same problems.”
Python lacks jit/aot and the time frame it was written at multithreaded processors didn’t exist. Alternatively you could recompile everything in Julia lang which lacks GIL and gain some speed boost on your Python code. Also Jython kind of sucks it’s slower than Cpython and Java. If you want to stick to Python consider using parallel plugins, you won’t gain an instant speed boost but you can do parallel programming with the right plugin.
直通JS客户端:如果您希望在Azure Bot服务中使用Direct Line频道,并且没有使用Webchat客户端,则可以在您的自定义应用程序中使用Direct Line JS客户端。[Readme]
直接线路语音信道:我们正在将Bot框架和Microsoft的语音服务结合在一起,以提供一个通道,支持从客户端到BOT应用程序的双向流式语音和文本。若要注册,请向您的Azure Bot服务添加“Direct Line Speech”频道
为您的Bot-Direct Line App服务扩展提供更好的隔离:Direct Line App Service Extension可以作为VNET的一部分进行部署,使IT管理员能够更好地控制会话流量,并由于减少了跳数而改善了会话延迟。单击此处开始使用Direct Line App Service Extension。VNET允许您在Azure中创建自己的私有空间,并且对您的云网络至关重要,因为它提供隔离、分段和其他主要优势
I’m trying to get a better understanding of the difference. I’ve found a lot of explanations online, but they tend towards the abstract differences rather than the practical implications.
Most of my programming experiences has been with CPython (dynamic, interpreted), and Java (static, compiled). However, I understand that there are other kinds of interpreted and compiled languages. Aside from the fact that executable files can be distributed from programs written in compiled languages, are there any advantages/disadvantages to each type? Oftentimes, I hear people arguing that interpreted languages can be used interactively, but I believe that compiled languages can have interactive implementations as well, correct?
A compiled language is one where the program, once compiled, is expressed in the instructions of the target machine. For example, an addition “+” operation in your source code could be translated directly to the “ADD” instruction in machine code.
An interpreted language is one where the instructions are not directly executed by the target machine, but instead read and executed by some other program (which normally is written in the language of the native machine). For example, the same “+” operation would be recognised by the interpreter at run time, which would then call its own “add(a,b)” function with the appropriate arguments, which would then execute the machine code “ADD” instruction.
You can do anything that you can do in an interpreted language in a compiled language and vice-versa – they are both Turing complete. Both however have advantages and disadvantages for implementation and use.
I’m going to completely generalise (purists forgive me!) but, roughly, here are the advantages of compiled languages:
Faster performance by directly using the native code of the target machine
Opportunity to apply quite powerful optimisations during the compile stage
And here are the advantages of interpreted languages:
Easier to implement (writing good compilers is very hard!!)
No need to run a compilation stage: can execute code directly “on the fly”
Can be more convenient for dynamic languages
Note that modern techniques such as bytecode compilation add some extra complexity – what happens here is that the compiler targets a “virtual machine” which is not the same as the underlying hardware. These virtual machine instructions can then be compiled again at a later stage to get native code (e.g. as done by the Java JVM JIT compiler).
A language itself is neither compiled nor interpreted, only a specific implementation of a language is. Java is a perfect example. There is a bytecode-based platform (the JVM), a native compiler (gcj) and an interpeter for a superset of Java (bsh). So what is Java now? Bytecode-compiled, native-compiled or interpreted?
Other languages, which are compiled as well as interpreted, are Scala, Haskell or Ocaml. Each of these languages has an interactive interpreter, as well as a compiler to byte-code or native machine code.
So generally categorizing languages by “compiled” and “interpreted” doesn’t make much sense.
Once upon a time, long long ago, there lived in the land of computing
interpreters and compilers. All kinds of fuss ensued over the merits of
one over the other. The general opinion at that time was something along the lines of:
Interpreter: Fast to develop (edit and run). Slow to execute because each statement had to be interpreted into
machine code every time it was executed (think of what this meant for a loop executed thousands of times).
Compiler: Slow to develop (edit, compile, link and run. The compile/link steps could take serious time). Fast
to execute. The whole program was already in native machine code.
A one or two order of magnitude difference in the runtime
performance existed between an interpreted program and a compiled program. Other distinguishing
points, run-time mutability of the code for example, were also of some interest but the major
distinction revolved around the run-time performance issues.
Today the landscape has evolved to such an extent that the compiled/interpreted distinction is
pretty much irrelevant. Many
compiled languages call upon run-time services that are not
completely machine code based. Also, most interpreted languages are “compiled” into byte-code
before execution. Byte-code interpreters can be very efficient and rival some compiler generated
code from an execution speed point of view.
The classic difference is that compilers generated native machine code, interpreters read source code and
generated machine code on the fly using some sort of run-time system.
Today there are very few classic interpreters left – almost all of them
compile into byte-code (or some other semi-compiled state) which then runs on a virtual “machine”.
A compiler will produce a binary executable in the target machine’s native executable format. This binary file contains all required resources except for system libraries; it’s ready to run with no further preparation and processing and it runs like lightning because the code is the native code for the CPU on the target machine.
An interpreter will present the user with a prompt in a loop where he can enter statements or code, and upon hitting RUN or the equivalent the interpreter will examine, scan, parse and interpretatively execute each line until the program runs to a stopping point or an error. Because each line is treated on its own and the interpreter doesn’t “learn” anything from having seen the line before, the effort of converting human-readable language to machine instructions is incurred every time for every line, so it’s dog slow. On the bright side, the user can inspect and otherwise interact with his program in all kinds of ways: Changing variables, changing code, running in trace or debug modes… whatever.
With those out of the way, let me explain that life ain’t so simple any more. For instance,
Many interpreters will pre-compile the code they’re given so the translation step doesn’t have to be repeated again and again.
Some compilers compile not to CPU-specific machine instructions but to bytecode, a kind of artificial machine code for a ficticious machine. This makes the compiled program a bit more portable, but requires a bytecode interpreter on every target system.
The bytecode interpreters (I’m looking at Java here) recently tend to re-compile the bytecode they get for the CPU of the target section just before execution (called JIT). To save time, this is often only done for code that runs often (hotspots).
Some systems that look and act like interpreters (Clojure, for instance) compile any code they get, immediately, but allow interactive access to the program’s environment. That’s basically the convenience of interpreters with the speed of binary compilation.
Some compilers don’t really compile, they just pre-digest and compress code. I heard a while back that’s how Perl works. So sometimes the compiler is just doing a bit of the work and most of it is still interpretation.
In the end, these days, interpreting vs. compiling is a trade-off, with time spent (once) compiling often being rewarded by better runtime performance, but an interpretative environment giving more opportunities for interaction. Compiling vs. interpreting is mostly a matter of how the work of “understanding” the program is divided up between different processes, and the line is a bit blurry these days as languages and products try to offer the best of both worlds.
There is no difference, because “compiled programming language” and
“interpreted programming language” aren’t meaningful concepts. Any
programming language, and I really mean any, can be interpreted or
compiled. Thus, interpretation and compilation are implementation
techniques, not attributes of languages.
Interpretation is a technique whereby another program, the
interpreter, performs operations on behalf of the program being
interpreted in order to run it. If you can imagine reading a program
and doing what it says to do step-by-step, say on a piece of scratch
paper, that’s just what an interpreter does as well. A common reason
to interpret a program is that interpreters are relatively easy to
write. Another reason is that an interpreter can monitor what a
program tries to do as it runs, to enforce a policy, say, for
security.
Compilation is a technique whereby a program written in one language
(the “source language”) is translated into a program in another
language (the “object language”), which hopefully means the same thing
as the original program. While doing the translation, it is common for
the compiler to also try to transform the program in ways that will
make the object program faster (without changing its meaning!). A
common reason to compile a program is that there’s some good way to
run programs in the object language quickly and without the overhead
of interpreting the source language along the way.
You may have guessed, based on the above definitions, that these two
implementation techniques are not mutually exclusive, and may even be
complementary. Traditionally, the object language of a compiler was
machine code or something similar, which refers to any number of
programming languages understood by particular computer CPUs. The
machine code would then run “on the metal” (though one might see, if
one looks closely enough, that the “metal” works a lot like an
interpreter). Today, however, it’s very common to use a compiler to
generate object code that is meant to be interpreted—for example, this
is how Java used to (and sometimes still does) work. There are
compilers that translate other languages to JavaScript, which is then
often run in a web browser, which might interpret the JavaScript, or
compile it a virtual machine or native code. We also have interpreters
for machine code, which can be used to emulate one kind of hardware on
another. Or, one might use a compiler to generate object code that is
then the source code for another compiler, which might even compile
code in memory just in time for it to run, which in turn . . . you get
the idea. There are many ways to combine these concepts.
The biggest advantage of interpreted source code over compiled source code is PORTABILITY.
If your source code is compiled, you need to compile a different executable for each type of processor and/or platform that you want your program to run on (e.g. one for Windows x86, one for Windows x64, one for Linux x64, and so on). Furthermore, unless your code is completely standards compliant and does not use any platform-specific functions/libraries, you will actually need to write and maintain multiple code bases!
If your source code is interpreted, you only need to write it once and it can be interpreted and executed by an appropriate interpreter on any platform! It’s portable! Note that an interpreter itself is an executable program that is written and compiled for a specific platform.
An advantage of compiled code is that it hides the source code from the end user (which might be intellectual property) because instead of deploying the original human-readable source code, you deploy an obscure binary executable file.
A compiler and an interpreter do the same job: translating a programming language to another pgoramming language, usually closer to the hardware, often direct executable machine code.
Traditionally, “compiled” means that this translation happens all in one go, is done by a developer, and the resulting executable is distributed to users. Pure example: C++.
Compilation usually takes pretty long and tries to do lots of expensive optmization so that the resulting executable runs faster. End users don’t have the tools and knowledge to compile stuff themselves, and the executable often has to run on a variety of hardware, so you can’t do many hardware-specific optimizations. During development, the separate compilation step means a longer feedback cycle.
Traditionally, “interpreted” means that the translation happens “on the fly”, when the user wants to run the program. Pure example: vanilla PHP. A naive interpreter has to parse and translate every piece of code every time it runs, which makes it very slow. It can’t do complex, costly optimizations because they’d take longer than the time saved in execution. But it can fully use the capabilities of the hardware it runs on. The lack of a separrate compilation step reduces feedback time during development.
But nowadays “compiled vs. interpreted” is not a black-or-white issue, there are shades in between. Naive, simple interpreters are pretty much extinct. Many languages use a two-step process where the high-level code is translated to a platform-independant bytecode (which is much faster to interpret). Then you have “just in time compilers” which compile code at most once per program run, sometimes cache results, and even intelligently decide to interpret code that’s run rarely, and do powerful optimizations for code that runs a lot. During development, debuggers are capable of switching code inside a running program even for traditionally compiled languages.
First, a clarification, Java is not fully static-compiled and linked in the way C++. It is compiled into bytecode, which is then interpreted by a JVM. The JVM can go and do just-in-time compilation to the native machine language, but doesn’t have to do it.
More to the point: I think interactivity is the main practical difference. Since everything is interpreted, you can take a small excerpt of code, parse and run it against the current state of the environment. Thus, if you had already executed code that initialized a variable, you would have access to that variable, etc. It really lends itself way to things like the functional style.
Interpretation, however, costs a lot, especially when you have a large system with a lot of references and context. By definition, it is wasteful because identical code may have to be interpreted and optimized twice (although most runtimes have some caching and optimizations for that). Still, you pay a runtime cost and often need a runtime environment. You are also less likely to see complex interprocedural optimizations because at present their performance is not sufficiently interactive.
Therefore, for large systems that are not going to change much, and for certain languages, it makes more sense to precompile and prelink everything, do all the optimizations that you can do. This ends up with a very lean runtime that is already optimized for the target machine.
As for generating executbles, that has little to do with it, IMHO. You can often create an executable from a compiled language. But you can also create an executable from an interpreted language, except that the interpreter and runtime is already packaged in the exectuable and hidden from you. This means that you generally still pay the runtime costs (although I am sure that for some language there are ways to translate everything to a tree executable).
I disagree that all languages could be made interactive. Certain languages, like C, are so tied to the machine and the entire link structure that I’m not sure you can build a meaningful fully-fledged interactive version
It’s rather difficult to give a practical answer because the difference is about the language definition itself. It’s possible to build an interpreter for every compiled language, but it’s not possible to build an compiler for every interpreted language. It’s very much about the formal definition of a language. So that theoretical informatics stuff noboby likes at university.
An interpreted language such as Python is one where the source code is converted to machine code and then executed each time the program runs. This is different from a compiled language such as C, where the source code is only converted to machine code once – the resulting machine code is then executed each time the program runs.
Compile is the process of creating an executable program from code written in a compiled programming language. Compiling allows the computer to run and understand the program without the need of the programming software used to create it. When a program is compiled it is often compiled for a specific platform (e.g. IBM platform) that works with IBM compatible computers, but not other platforms (e.g. Apple platform).
The first compiler was developed by Grace Hopper while working on the Harvard Mark I computer. Today, most high-level languages will include their own compiler or have toolkits available that can be used to compile the program. A good example of a compiler used with Java is Eclipse and an example of a compiler used with C and C++ is the gcc command. Depending on how big the program is it should take a few seconds or minutes to compile and if no errors are encountered while being compiled an executable file is created.check this information
Compiled language: Entire program is translated to machine code at once, then the machine code is run by the CPU.
Interpreted language: Program is read line-by-line and as soon as a line is read the machine instructions for that line are executed by the CPU.
But really, few languages these days are purely compiled or purely interpreted, it often is a mix. For a more detailed description with pictures, see this thread:
cd impls/haxe
# Neko
make all-neko
neko ./stepX_YYY.n
# Python
make all-python
python3 ./stepX_YYY.py
# C++
make all-cpp
./cpp/stepX_YYY
# JavaScript
make all-js
node ./stepX_YYY.js
干草
MAL的Hy实现已经用Hy 0.13.0进行了测试
cd impls/hy
./stepX_YYY.hy
IO
已使用IO版本20110905测试了MAL的IO实现
cd impls/io
io ./stepX_YYY.io
珍妮特
MAIL的Janet实现已经使用Janet版本1.12.2进行了测试
cd impls/janet
janet ./stepX_YYY.janet
Java 1.7
mal的Java实现需要maven2来构建
cd impls/java
mvn compile
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY
# OR
mvn -quiet exec:java -Dexec.mainClass=mal.stepX_YYY -Dexec.args="CMDLINE_ARGS"
Java,将Truffle用于GraalVM
这个Java实现可以在OpenJDK上运行,但是多亏了Truffle框架,它在GraalVM上的运行速度可以提高30倍。它已经在OpenJDK 11、GraalVM CE 20.1.0和GraalVM CE 21.1.0上进行了测试
cd impls/java-truffle
./gradlew build
STEP=stepX_YYY ./run
JavaScript/节点
cd impls/js
npm install
node stepX_YYY.js
朱莉娅
Mal的Julia实现需要Julia 0.4
cd impls/julia
julia stepX_YYY.jl
JQ
针对1.6版进行了测试,IO部门存在大量作弊行为
cd impls/jq
STEP=stepA_YYY ./run
# with Debug
DEBUG=true STEP=stepA_YYY ./run
科特林
MAL的Kotlin实现已经使用Kotlin 1.0进行了测试
cd impls/kotlin
make
java -jar stepX_YYY.jar
LiveScript
已使用LiveScript 1.5测试了mal的LiveScript实现
cd impls/livescript
make
node_modules/.bin/lsc stepX_YYY.ls
徽标
MAL的Logo实现已经用UCBLogo 6.0进行了测试
cd impls/logo
logo stepX_YYY.lg
路亚
Mal的Lua实现已经使用Lua 5.3.5进行了测试。该实现需要安装luarock
cd impls/lua
make # to build and link linenoise.so and rex_pcre.so
./stepX_YYY.lua
cd impls/miniMAL
# Download miniMAL and dependencies
npm install
export PATH=`pwd`/node_modules/minimal-lisp/:$PATH
# Now run mal implementation in miniMAL
miniMAL ./stepX_YYY
make MAL_IMPL=IMPL "test^mal^step2"
# e.g.
make "test^mal^step2" # js is default
make MAL_IMPL=ruby "test^mal^step2"
make MAL_IMPL=python "test^mal^step2"
启动REPL
要在特定步骤中启动实施的REPL,请执行以下操作:
make "repl^IMPL^stepX"
# e.g
make "repl^ruby^step3"
make "repl^ps^step4"
如果您省略了这一步,那么stepA使用的是:
make "repl^IMPL"
# e.g
make "repl^ruby"
make "repl^ps"