标签归档:compiled

是否可以将已编译的.pyc文件反编译为.py文件?

问题:是否可以将已编译的.pyc文件反编译为.py文件?

是否可以从.py文件生成的.pyc文件中获取一些信息?

Is it possible to get some information out of the .pyc file that is generated from a .py file?


回答 0

Uncompyle6适用于Python 3.x和2.7-推荐的选项,因为它是最新的工具,旨在统一早期的fork,并专注于自动化单元测试。在GitHub的页面有更多的细节。

  • 如果您使用的是Python 3.7+,则还可以尝试decompile3,它是Uncompyle6的一个分支,专注于3.7及更高版本。
  • 如果需要,确实会在这些项目上引发GitHub问题-两个都在一系列Python版本上运行单元测试套件

较旧的Uncompyle2仅支持Python 2.7。一段时间前,这对我很有效,可以将.pyc字节码反编译为.py,而unpyclib崩溃并发生异常。

使用所有这些工具,您可以获得包括变量名在内的代码,但没有注释。

Uncompyle6 works for Python 3.x and 2.7 – recommended option as it’s most recent tool, aiming to unify earlier forks and focusing on automated unit testing. The GitHub page has more details.

  • if you use Python 3.7+, you could also try decompile3, a fork of Uncompyle6 focusing on 3.7 and higher.
  • do raise GitHub issues on these projects if needed – both run unit test suites on a range of Python versions

With these tools, you get your code back including variable names and docstrings, but without the comments.

The older Uncompyle2 supports Python 2.7 only. This worked well for me some time ago to decompile the .pyc bytecode into .py, whereas unpyclib crashed with an exception.


回答 1

是的,您可以unpyclibpypi上找到它。

$ pip install unpyclib

比您可以反编译.pyc文件

$ python -m unpyclib.application -Dq path/to/file.pyc

Yes, you can get it with unpyclib that can be found on pypi.

$ pip install unpyclib

Than you can decompile your .pyc file

$ python -m unpyclib.application -Dq path/to/file.pyc

回答 2

您可以尝试Easy Python Decompiler。它基于Decompyle ++和Uncompyle2。它支持反编译python版本1.0-3.3

注意:我是上述工具的作者。

You may try Easy Python Decompiler. It’s based on Decompyle++ and Uncompyle2. It’s supports decompiling python versions 1.0-3.3

Note: I am the author of the above tool.


如果解释了Python,那么什么是.pyc文件?

问题:如果解释了Python,那么什么是.pyc文件?

我已经了解Python是一种解释语言…
但是,当我看我的 Python源代码时,我看到的.pyc是Windows标识为“编译的Python文件”的文件。

这些从哪里来?

I’ve been given to understand that Python is an interpreted language…
However, when I look at my Python source code I see .pyc files, which Windows identifies as “Compiled Python Files”.

Where do these come in?


回答 0

它们包含字节码,这是Python解释器将源代码编译到的字节码。然后,此代码由Python的虚拟机执行。

Python的文档解释了这样的定义:

Python是一种解释型语言,与编译型语言相反,尽管由于字节码编译器的存在,两者之间的区别可能很模糊。这意味着可以直接运行源文件,而无需显式创建然后运行的可执行文件。

They contain byte code, which is what the Python interpreter compiles the source to. This code is then executed by Python’s virtual machine.

Python’s documentation explains the definition like this:

Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run.


回答 1

我已经了解Python是一种解释语言…

这种流行的模因是不正确的,或者是基于对(自然)语言水平的误解造成的:类似的错误是说“圣经是一本精装书”。让我解释一下这个比喻…

“圣经”是“一书的”,即一个感(标识为实际的物理对象)的书籍; 被认为是“圣经副本”的书应该具有基本的共同点(内容,尽管即使这些书可以使用不同的语言,具有不同的可接受的翻译,脚注和其他注释的级别),但是这些书是完全可以在被认为是基础的许多方面进行区别-装订类型,装订颜色,打印中使用的字体,插图(如有),可写边距宽,是否内置书签,数量和种类, 等等等等。

很有可能典型的圣经印刷确实是精装书本-毕竟,这是一本书,通常一遍又一遍地读,在几个地方加上书签,通过寻找给定的章节指针来翻阅等等,而良好的精装书装订可以使给定的副本在这种使用下的使用寿命更长。但是,这些都是平凡的(实用的)问题,不能用来确定给定的实际书本对象是否是圣经的副本:平装本完全可以印刷!

同样,从定义一类语言实现的意义上讲,Python是“一种语言”,这些实现必须在某些基本方面都相似(语法,大多数语义,但明确允许它们不同的部分除外),但必须完全允许几乎在每个“实现”细节上都各不相同-包括它们如何处理给定的源文件,是否将源代码编译为较低级别的形式(如果可以,将其编译为哪种形式)以及是否保存此类已编译的表单(到磁盘或其他位置),它们如何执行所述表单等。

经典实现CPython通常简称为“ Python”,但是它只是几种生产质量实现,与Microsoft的IronPython(编译为CLR代码,即“ .NET”),Jython并存。 (可编译为JVM代码),PyPy(可使用Python本身编写,并且可以编译为多种“后端”形式,包括“即时”生成的机器语言)。它们都是Python(Python语言的实现),就像许多表面上不同的书本都可以是圣经(圣经的副本)一样。

如果您对CPython特别感兴趣:它将源文件编译为特定于Python的较低级形式(称为“字节码”),在需要时自动进行(当没有与源文件相对应的字节码文件时),或者字节码文件早于源代码或由其他Python版本编译),通常将字节码文件保存到磁盘中(以避免将来再次编译它们)。OTOH IronPython通常将编译为CLR代码(取决于是否将其保存到磁盘),将Jython编译为JVM代码(将它们保存至磁盘或不保存- .class如果确实将其保存,则将使用扩展名)。

然后,这些较低级别的表单由适当的“虚拟机”(也称为“解释器”)执行-CPython VM,.Net运行时,Java VM(也称为JVM)。

因此,从这个意义上讲(典型的实现方式是什么),Python是一种“解释语言”,当且仅当C#和Java是:它们都具有一种典型的实现策略,即首先生成字节码,然后通过VM /解释器执行字节码。 。

更有可能的重点是编译过程的“繁重”,缓慢和高仪式性。CPython旨在尽可能快地编译,尽可能轻量级,尽可能少地执行仪式-编译器几乎不执行错误检查和优化,因此它可以快速运行并占用少量内存,这反过来又使它可以运行可以在任何需要的时候自动透明地运行,而用户甚至在大多数情况下都不需要知道正在进行编译。Java和C#通常在编译期间接受更多工作(因此不执行自动编译),以便更彻底地检查错误并执行更多优化。这是灰度的连续体,而不是黑白情况,

I’ve been given to understand that Python is an interpreted language…

This popular meme is incorrect, or, rather, constructed upon a misunderstanding of (natural) language levels: a similar mistake would be to say “the Bible is a hardcover book”. Let me explain that simile…

“The Bible” is “a book” in the sense of being a class of (actual, physical objects identified as) books; the books identified as “copies of the Bible” are supposed to have something fundamental in common (the contents, although even those can be in different languages, with different acceptable translations, levels of footnotes and other annotations) — however, those books are perfectly well allowed to differ in a myriad of aspects that are not considered fundamental — kind of binding, color of binding, font(s) used in the printing, illustrations if any, wide writable margins or not, numbers and kinds of builtin bookmarks, and so on, and so forth.

It’s quite possible that a typical printing of the Bible would indeed be in hardcover binding — after all, it’s a book that’s typically meant to be read over and over, bookmarked at several places, thumbed through looking for given chapter-and-verse pointers, etc, etc, and a good hardcover binding can make a given copy last longer under such use. However, these are mundane (practical) issues that cannot be used to determine whether a given actual book object is a copy of the Bible or not: paperback printings are perfectly possible!

Similarly, Python is “a language” in the sense of defining a class of language implementations which must all be similar in some fundamental respects (syntax, most semantics except those parts of those where they’re explicitly allowed to differ) but are fully allowed to differ in just about every “implementation” detail — including how they deal with the source files they’re given, whether they compile the sources to some lower level forms (and, if so, which form — and whether they save such compiled forms, to disk or elsewhere), how they execute said forms, and so forth.

The classical implementation, CPython, is often called just “Python” for short — but it’s just one of several production-quality implementations, side by side with Microsoft’s IronPython (which compiles to CLR codes, i.e., “.NET”), Jython (which compiles to JVM codes), PyPy (which is written in Python itself and can compile to a huge variety of “back-end” forms including “just-in-time” generated machine language). They’re all Python (==”implementations of the Python language”) just like many superficially different book objects can all be Bibles (==”copies of The Bible”).

If you’re interested in CPython specifically: it compiles the source files into a Python-specific lower-level form (known as “bytecode”), does so automatically when needed (when there is no bytecode file corresponding to a source file, or the bytecode file is older than the source or compiled by a different Python version), usually saves the bytecode files to disk (to avoid recompiling them in the future). OTOH IronPython will typically compile to CLR codes (saving them to disk or not, depending) and Jython to JVM codes (saving them to disk or not — it will use the .class extension if it does save them).

These lower level forms are then executed by appropriate “virtual machines” also known as “interpreters” — the CPython VM, the .Net runtime, the Java VM (aka JVM), as appropriate.

So, in this sense (what do typical implementations do), Python is an “interpreted language” if and only if C# and Java are: all of them have a typical implementation strategy of producing bytecode first, then executing it via a VM/interpreter.

More likely the focus is on how “heavy”, slow, and high-ceremony the compilation process is. CPython is designed to compile as fast as possible, as lightweight as possible, with as little ceremony as feasible — the compiler does very little error checking and optimization, so it can run fast and in small amounts of memory, which in turns lets it be run automatically and transparently whenever needed, without the user even needing to be aware that there is a compilation going on, most of the time. Java and C# typically accept more work during compilation (and therefore don’t perform automatic compilation) in order to check errors more thoroughly and perform more optimizations. It’s a continuum of gray scales, not a black or white situation, and it would be utterly arbitrary to put a threshold at some given level and say that only above that level you call it “compilation”!-)


回答 2

没有所谓的解释语言。使用解释器还是编译器纯粹是实现的特征,与该语言绝对无关。

每种语言都可以由解释器或编译器实现。绝大多数语言至少每种类型都有一种实现。(例如,有C和C ++的解释器,有JavaScript,PHP,Perl,Python和Ruby的编译器。)此外,大多数现代语言实现实际上将解释器和编译器(甚至是多个编译器)结合在一起。

语言只是一组抽象的数学规则。解释器是一种语言的几种具体实现策略之一。这两个人生活在完全不同的抽象级别上。如果英语是一种打字语言,则术语“解释语言”将是一种打字错误。语句“ Python是一种解释性语言”不仅是错误的(因为如果错误,则意味着该语句甚至是有意义的,即使它是错误的),它只是简单的没有意义,因为一种语言永远无法将定义为“解释。”

特别是,如果您查看当前现有的Python实现,则以下是它们正在使用的实现策略:

  • IronPython:编译为DLR树,然后DLR编译为CIL字节码。CIL字节码会发生什么情况取决于您运行的CLI VES,但是Microsoft .NET,GNU Portable.NET和Novell Mono最终会将其编译为本机代码。
  • Jython:解释Python源代码,直到它标识热代码路径,然后将其编译为JVML字节码。JVML字节码会发生什么情况取决于您在哪个JVM上运行。Maxine将直接将其编译为未优化的本机代码,直到它识别出热代码路径,然后将其重新编译为优化的本机代码。HotSpot将首先解释JVML字节码,然后最终将热代码路径编译为优化的机器代码。
  • PyPy:编译为PyPy字节码,然后由PyPy VM解释,直到它标识热代码路径,然后根据运行的平台将其编译为本机代码,JVML字节码或CIL字节码。
  • CPython:编译为CPython字节码,然后对其进行解释。
  • 无堆栈Python:编译为CPython字节码,然后对其进行解释。
  • Unladen Swallow:编译为CPython字节码,然后对其进行解释,直到识别出热代码路径,然后将其编译为LLVM IR,然后LLVM编译器再将其编译为本机代码。
  • Cython:将Python代码编译为可移植的C代码,然后使用标准C编译器对其进行编译
  • Nuitka:将Python代码编译为机器相关的C ++代码,然后使用标准C编译器进行编译

您可能会注意到,列表中的每个实现(以及我未提及的其他一些实现,例如tinypy,Shedskin或Psyco)都有一个编译器。实际上,据我所知,目前尚没有纯粹解释的Python实现,没有计划好的实现,也从来没有这样的实现。

即使您将“解释语言”一词解释为“具有解释性实现的语言”的含义,这也不是没有道理,但事实并非如此。谁告诉你的,显然不知道他在说什么。

特别是,.pyc您看到的文件是CPython,Stackless Python或Unladen Swallow生成的缓存字节码文件。

There is no such thing as an interpreted language. Whether an interpreter or a compiler is used is purely a trait of the implementation and has absolutely nothing whatsoever to do with the language.

Every language can be implemented by either an interpreter or a compiler. The vast majority of languages have at least one implementation of each type. (For example, there are interpreters for C and C++ and there are compilers for JavaScript, PHP, Perl, Python and Ruby.) Besides, the majority of modern language implementations actually combine both an interpreter and a compiler (or even multiple compilers).

A language is just a set of abstract mathematical rules. An interpreter is one of several concrete implementation strategies for a language. Those two live on completely different abstraction levels. If English were a typed language, the term “interpreted language” would be a type error. The statement “Python is an interpreted language” is not just false (because being false would imply that the statement even makes sense, even if it is wrong), it just plain doesn’t make sense, because a language can never be defined as “interpreted.”

In particular, if you look at the currently existing Python implementations, these are the implementation strategies they are using:

  • IronPython: compiles to DLR trees which the DLR then compiles to CIL bytecode. What happens to the CIL bytecode depends upon which CLI VES you are running on, but Microsoft .NET, GNU Portable.NET and Novell Mono will eventually compile it to native machine code.
  • Jython: interprets Python sourcecode until it identifies the hot code paths, which it then compiles to JVML bytecode. What happens to the JVML bytecode depends upon which JVM you are running on. Maxine will directly compile it to un-optimized native code until it identifies the hot code paths, which it then recompiles to optimized native code. HotSpot will first interpret the JVML bytecode and then eventually compile the hot code paths to optimized machine code.
  • PyPy: compiles to PyPy bytecode, which then gets interpreted by the PyPy VM until it identifies the hot code paths which it then compiles into native code, JVML bytecode or CIL bytecode depending on which platform you are running on.
  • CPython: compiles to CPython bytecode which it then interprets.
  • Stackless Python: compiles to CPython bytecode which it then interprets.
  • Unladen Swallow: compiles to CPython bytecode which it then interprets until it identifies the hot code paths which it then compiles to LLVM IR which the LLVM compiler then compiles to native machine code.
  • Cython: compiles Python code to portable C code, which is then compiled with a standard C compiler
  • Nuitka: compiles Python code to machine-dependent C++ code, which is then compiled with a standard C compiler

You might notice that every single one of the implementations in that list (plus some others I didn’t mention, like tinypy, Shedskin or Psyco) has a compiler. In fact, as far as I know, there is currently no Python implementation which is purely interpreted, there is no such implementation planned and there never has been such an implementation.

Not only does the term “interpreted language” not make sense, even if you interpret it as meaning “language with interpreted implementation”, it is clearly not true. Whoever told you that, obviously doesn’t know what he is talking about.

In particular, the .pyc files you are seeing are cached bytecode files produced by CPython, Stackless Python or Unladen Swallow.


回答 3

它们是由Python解释器在.py导入文件时创建的,它们包含导入的模块/程序的“已编译字节码”,其想法是从源代码“转换”为字节码(只需要执行一次)。import如果s .pyc比相应.py文件新,则可以在后续s 上跳过,从而加快启动速度。但是它仍然被解释。

These are created by the Python interpreter when a .py file is imported, and they contain the “compiled bytecode” of the imported module/program, the idea being that the “translation” from source code to bytecode (which only needs to be done once) can be skipped on subsequent imports if the .pyc is newer than the corresponding .py file, thus speeding startup a little. But it’s still interpreted.


回答 4

为了加快模块的加载速度,Python将模块的编译内容缓存在.pyc中。

CPython将其源代码编译为“字节代码”,并且出于性能方面的考虑,只要源文件发生更改,它都会在文件系统上缓存该字节代码。由于可以绕过编译阶段,因此可以更快地加载Python模块。当您的源文件是foo.py时,CPython将字节代码缓存在源代码旁边的foo.pyc文件中。

在python3中,扩展了Python的导入机制,以在每个Python包目录内的单个目录中编写和搜索字节码缓存文件。该目录将称为__pycache__。

这是描述如何加载模块的流程图:

欲获得更多信息:

参考:PEP3147
参考:“已编译” Python文件

To speed up loading modules, Python caches the compiled content of modules in .pyc.

CPython compiles its source code into “byte code”, and for performance reasons, it caches this byte code on the file system whenever the source file has changes. This makes loading of Python modules much faster because the compilation phase can be bypassed. When your source file is foo.py , CPython caches the byte code in a foo.pyc file right next to the source.

In python3, Python’s import machinery is extended to write and search for byte code cache files in a single directory inside every Python package directory. This directory will be called __pycache__ .

Here is a flow chart describing how modules are loaded:

For more information:

ref:PEP3147
ref:“Compiled” Python files


回答 5

这是给初学者的,

在运行脚本之前,Python会自动将脚本编译为已编译的代码,即字节代码。

运行脚本不被视为导入,并且不会创建.pyc。

举例来说,如果你有一个脚本文件abc.py是进口的另一个模块xyz.py,当你运行abc.pyxyz.pyc将被创建,因为XYZ是进口的,但没有abc.pyc文件将被创建以来的ABC。 py未导入。

如果您需要为未导入的模块创建.pyc文件,则可以使用py_compilecompileall模块。

py_compile模块可以手动编译任何模块。一种方法是py_compile.compile交互使用该模块中的功能:

>>> import py_compile
>>> py_compile.compile('abc.py')

这会将.pyc写入与abc.py相同的位置(您可以使用可选参数覆盖它 cfile)。

您还可以使用compileall模块自动编译一个或多个目录中的所有文件。

python -m compileall

如果省略了目录名(此示例中为当前目录),则模块将编译在 sys.path

THIS IS FOR BEGINNERS,

Python automatically compiles your script to compiled code, so called byte code, before running it.

Running a script is not considered an import and no .pyc will be created.

For example, if you have a script file abc.py that imports another module xyz.py, when you run abc.py, xyz.pyc will be created since xyz is imported, but no abc.pyc file will be created since abc.py isn’t being imported.

If you need to create a .pyc file for a module that is not imported, you can use the py_compile and compileall modules.

The py_compile module can manually compile any module. One way is to use the py_compile.compile function in that module interactively:

>>> import py_compile
>>> py_compile.compile('abc.py')

This will write the .pyc to the same location as abc.py (you can override that with the optional parameter cfile).

You can also automatically compile all files in a directory or directories using the compileall module.

python -m compileall

If the directory name (the current directory in this example) is omitted, the module compiles everything found on sys.path


回答 6

Python(至少是最常见的实现)遵循一种将原始源编译为字节码,然后在虚拟机上解释字节码的模式。这意味着(同样,最常见的实现)既不是纯解释器也不是纯编译器。

但是,另一方面是,编译过程基本上是隐藏的-.pyc文件基本上被视为高速缓存;它们加快了速度,但是您通常根本不需要意识到它们。必要时,它会根据文件时间/日期戳自动使它们无效并重新加载(重新编译源代码)。

我唯一一次看到的问题是,经过编译的字节码文件以某种方式获得了未来的时间戳,这意味着它看起来总是比源文件新。由于它看起来较新,因此从未重新编译源文件,因此无论您进行了什么更改,它们都将被忽略…

Python (at least the most common implementation of it) follows a pattern of compiling the original source to byte codes, then interpreting the byte codes on a virtual machine. This means (again, the most common implementation) is neither a pure interpreter nor a pure compiler.

The other side of this is, however, that the compilation process is mostly hidden — the .pyc files are basically treated like a cache; they speed things up, but you normally don’t have to be aware of them at all. It automatically invalidates and re-loads them (re-compiles the source code) when necessary based on file time/date stamps.

About the only time I’ve seen a problem with this was when a compiled bytecode file somehow got a timestamp well into the future, which meant it always looked newer than the source file. Since it looked newer, the source file was never recompiled, so no matter what changes you made, they were ignored…


回答 7

Python的* .py文件只是一个文本文件,您可以在其中编写一些代码行。当您尝试使用“ python filename.py”执行该文件时

此命令调用Python虚拟机。Python虚拟机具有2个组件:“编译器”和“解释器”。解释器无法直接读取* .py文件中的文本,因此该文本首先被转换为针对PVM的字节码(不是硬件,而是PVM)。PVM执行此字节代码。* .pyc文件也作为运行它的一部分生成,该文件对shell中的文件或其他文件中的文件执行导入操作。

如果此* .pyc文件已经生成,则下次您运行/执行* .py文件时,系统会直接加载* .pyc文件,而无需进行任何编译(这将为您节省一些处理器的机器周期)。

生成* .pyc文件后,除非您进行编辑,否则不需要* .py文件。

Python’s *.py file is just a text file in which you write some lines of code. When you try to execute this file using say “python filename.py”

This command invokes Python Virtual Machine. Python Virtual Machine has 2 components: “compiler” and “interpreter”. Interpreter cannot directly read the text in *.py file, so this text is first converted into a byte code which is targeted to the PVM (not hardware but PVM). PVM executes this byte code. *.pyc file is also generated, as part of running it which performs your import operation on file in shell or in some other file.

If this *.pyc file is already generated then every next time you run/execute your *.py file, system directly loads your *.pyc file which won’t need any compilation(This will save you some machine cycles of processor).

Once the *.pyc file is generated, there is no need of *.py file, unless you edit it.


回答 8

Python代码经历两个阶段。第一步,将代码编译成实际上是字节码的.pyc文件。然后,使用CPython解释器解释此.pyc文件(字节码)。请参考链接。在这里,用简单的术语解释了代码编译和执行的过程。

Python code goes through 2 stages. First step compiles the code into .pyc files which is actually a bytecode. Then this .pyc file(bytecode) is interpreted using CPython interpreter. Please refer to this link. Here process of code compilation and execution is explained in easy terms.