标签归档:python-2.x

如何在Windows上运行多个Python版本

问题:如何在Windows上运行多个Python版本

我在计算机上安装了两个版本的Python(版本2.6和2.5)。我想为一个项目运行2.6,为另一个项目运行2.5。

如何指定我要使用哪个?

我正在使用Windows XP SP2。

I had two versions of Python installed on my machine (versions 2.6 and 2.5). I want to run 2.6 for one project and 2.5 for another.

How can I specify which I want to use?

I am working on Windows XP SP2.


回答 0

运行不同的Python副本就像启动正确的可执行文件一样容易。您提到您已经从命令行通过简单输入以下内容启动了python实例:python

这在Windows下的作用是拖曳%PATH%环境变量,检查可执行文件,无论是批处理文件(.bat),命令文件(.cmd)还是其他要运行的可执行文件(由可执行文件控制PATHEXT环境变量)是否与给定名称匹配。当找到正确的文件来运行时,该文件正在运行。

现在,如果您已经安装了两个Python版本2.5和2.6,则路径中将同时包含它们的两个目录,例如 PATH=c:\python\2.5;c:\python\2.6但是Windows将在找到匹配项时停止检查该路径。

您真正需要做的是显式调用一个或两个应用程序,例如c:\python\2.5\python.exec:\python\2.6\python.exe

另一种选择是创建一个快捷方式,以分别python.exe调用其中一个python25和另一个python26;然后python25,您只需在命令行上运行即可。

Running a different copy of Python is as easy as starting the correct executable. You mention that you’ve started a python instance, from the command line, by simply typing python.

What this does under Windows, is to trawl the %PATH% environment variable, checking for an executable, either batch file (.bat), command file (.cmd) or some other executable to run (this is controlled by the PATHEXT environment variable), that matches the name given. When it finds the correct file to run the file is being run.

Now, if you’ve installed two python versions 2.5 and 2.6, the path will have both of their directories in it, something like PATH=c:\python\2.5;c:\python\2.6 but Windows will stop examining the path when it finds a match.

What you really need to do is to explicitly call one or both of the applications, such as c:\python\2.5\python.exe or c:\python\2.6\python.exe.

The other alternative is to create a shortcut to the respective python.exe calling one of them python25 and the other python26; you can then simply run python25 on your command line.


回答 1

为该问题添加了两个解决方案:

  • 使用pylauncher(如果您使用的是Python 3.3或更高版本,则无需安装它,因为它已经随Python一起提供了),然后在脚本中添加shebang行;

#! c:\[path to Python 2.5]\python.exe-适用于要与Python 2.5一起运行
#! c:\[path to Python 2.6]\python.exe的脚本-适用于要与Python 2.6一起运行的脚本

或者代替运行python命令run pylauncher command(py)指定要使用哪个版本的Python;或者

py -2.6–版本2.6
py -2–最新安装的版本2.x
py -3.4–版本3.4
py -3–最新安装的版本3.x

virtualenv -p c:\[path to Python 2.5]\python.exe [path where you want to have virtualenv using Python 2.5 created]\[name of virtualenv]

virtualenv -p c:\[path to Python 2.6]\python.exe [path where you want to have virtualenv using Python 2.6 created]\[name of virtualenv]

例如

virtualenv -p c:\python2.5\python.exe c:\venvs\2.5

virtualenv -p c:\python2.6\python.exe c:\venvs\2.6

那么您可以激活第一个并像这样使用Python 2.5,
c:\venvs\2.5\activate
并且当您想切换到Python 2.6时,

deactivate  
c:\venvs\2.6\activate

Adding two more solutions to the problem:

  • Use pylauncher (if you have Python 3.3 or newer there’s no need to install it as it comes with Python already) and either add shebang lines to your scripts;

#! c:\[path to Python 2.5]\python.exe – for scripts you want to be run with Python 2.5
#! c:\[path to Python 2.6]\python.exe – for scripts you want to be run with Python 2.6

or instead of running python command run pylauncher command (py) specyfing which version of Python you want;

py -2.6 – version 2.6
py -2 – latest installed version 2.x
py -3.4 – version 3.4
py -3 – latest installed version 3.x

virtualenv -p c:\[path to Python 2.5]\python.exe [path where you want to have virtualenv using Python 2.5 created]\[name of virtualenv]

virtualenv -p c:\[path to Python 2.6]\python.exe [path where you want to have virtualenv using Python 2.6 created]\[name of virtualenv]

for example

virtualenv -p c:\python2.5\python.exe c:\venvs\2.5

virtualenv -p c:\python2.6\python.exe c:\venvs\2.6

then you can activate the first and work with Python 2.5 like this
c:\venvs\2.5\activate
and when you want to switch to Python 2.6 you do

deactivate  
c:\venvs\2.6\activate

回答 2

从Python 3.3开始,有适用于Windows的官方Python启动器http://www.python.org/dev/peps/pep-0397/)。现在,您也可以#!pythonX在Windows上使用来确定所需的解释器版本。在我的其他评论中查看更多详细信息或阅读PEP 397。

总结:py script.pyPython的版本中规定发布#!,如果或Python 2 #!缺失。该py -3 script.py运行Python 3。

From Python 3.3 on, there is the official Python launcher for Windows (http://www.python.org/dev/peps/pep-0397/). Now, you can use the #!pythonX to determine the wanted version of the interpreter also on Windows. See more details in my another comment or read the PEP 397.

Summary: The py script.py launches the Python version stated in #! or Python 2 if #! is missing. The py -3 script.py launches the Python 3.


回答 3

按照@alexander,您可以建立如下的符号链接集。将它们放在您的路径中包含的某个位置,以便可以轻松调用它们

> cd c:\bin
> mklink python25.exe c:\python25\python.exe
> mklink python26.exe c:\python26\python.exe

只要您将c:\ bin或您放置在其中的任何位置都在路径中,现在就可以

> python25

As per @alexander you can make a set of symbolic links like below. Put them somewhere which is included in your path so they can be easily invoked

> cd c:\bin
> mklink python25.exe c:\python25\python.exe
> mklink python26.exe c:\python26\python.exe

As long as c:\bin or where ever you placed them in is in your path you can now go

> python25

回答 4

  1. 安装python

    • C:\ Python27
    • C:\ Python36
  2. 环境变量

    • PYTHON2_HOME: C:\Python27
    • PYTHON3_HOME: C:\Python36
    • Path: %PYTHON2_HOME%;%PYTHON2_HOME%\Scripts;%PYTHON3_HOME%;%PYTHON3_HOME%\Scripts;
  3. 文件重命名

    • C:\ Python27 \ python.exe→C:\ Python27 \ python2.exe
    • C:\ Python36 \ python.exe→C:\ Python36 \ python3.exe
  4. 点子

    • python2 -m pip install package
    • python3 -m pip install package
  1. install python

    • C:\Python27
    • C:\Python36
  2. environment variable

    • PYTHON2_HOME: C:\Python27
    • PYTHON3_HOME: C:\Python36
    • Path: %PYTHON2_HOME%;%PYTHON2_HOME%\Scripts;%PYTHON3_HOME%;%PYTHON3_HOME%\Scripts;
  3. file rename

    • C:\Python27\python.exe → C:\Python27\python2.exe
    • C:\Python36\python.exe → C:\Python36\python3.exe
  4. pip

    • python2 -m pip install package
    • python3 -m pip install package

回答 5

例如对于3.6版本类型py -3.6。如果您同时具有32位和64位版本,则只需键入py -3.6-64或即可py -3.6-32

For example for 3.6 version type py -3.6. If you have also 32bit and 64bit versions, you can just type py -3.6-64 or py -3.6-32.


回答 6

当您安装Python时,它不会覆盖其他主要版本的其他安装。因此,安装Python 2.5.x不会覆盖Python 2.6.x,尽管安装2.6.6会覆盖2.6.5。

因此,您只需安装它即可。然后,调用所需的Python版本。例如:

C:\Python2.5\Python.exe

适用于Windows上的Python 2.5和

C:\Python2.6\Python.exe

适用于Windows上的Python 2.6,或

/usr/local/bin/python-2.5

要么

/usr/local/bin/python-2.6

Windows Unix(包括Linux和OS X)上。

在Unix(包括Linux和OS X)上python安装时,将安装通用命令,这是最后安装的命令。大多数情况下这不是问题,因为大多数脚本会显式调用/usr/local/bin/python2.5或一些用于防止这种情况的文件。但是,如果您不想这样做,并且您可能不想这样做,可以这样安装:

./configure
make
sudo make altinstall

请注意,“ altinstall”表示它将安装它,但不会代替python命令。

python据我所知,在Windows上您没有得到全局命令,所以这不是问题。

When you install Python, it will not overwrite other installs of other major versions. So installing Python 2.5.x will not overwrite Python 2.6.x, although installing 2.6.6 will overwrite 2.6.5.

So you can just install it. Then you call the Python version you want. For example:

C:\Python2.5\Python.exe

for Python 2.5 on windows and

C:\Python2.6\Python.exe

for Python 2.6 on windows, or

/usr/local/bin/python-2.5

or

/usr/local/bin/python-2.6

on Windows Unix (including Linux and OS X).

When you install on Unix (including Linux and OS X) you will get a generic python command installed, which will be the last one you installed. This is mostly not a problem as most scripts will explicitly call /usr/local/bin/python2.5 or something just to protect against that. But if you don’t want to do that, and you probably don’t you can install it like this:

./configure
make
sudo make altinstall

Note the “altinstall” that means it will install it, but it will not replace the python command.

On Windows you don’t get a global python command as far as I know so that’s not an issue.


回答 7

我强烈推荐pyenv-win项目。

由于kirankotari的工作,现在我们有了Windows版本的pyenv。

I strongly recommend the pyenv-win project.

Thanks to kirankotari‘s work, now we have a Windows version of pyenv.


回答 8

这是一个快速的技巧:

  1. 转到您要运行的python版本的目录
  2. 右键点击python.exe
  3. 选择“ 创建快捷方式
  4. 给该快捷方式起个呼叫的名字(我使用p27,p33等)
  5. 将该快捷方式移至您的主目录(C:\Users\Your name
  6. 打开命令提示符并输入name_of_your_shortcut.lnk(我使用p27.lnk

Here’s a quick hack:

  1. Go to the directory of the version of python you want to run
  2. Right click on python.exe
  3. Select ‘Create Shortcut
  4. Give that shortcut a name to call by( I use p27, p33 etc.)
  5. Move that shortcut to your home directory(C:\Users\Your name)
  6. Open a command prompt and enter name_of_your_shortcut.lnk(I use p27.lnk)

回答 9

cp c:\ python27 \ bin \ python.exe作为python2.7.exe

cp c:\ python34 \ bin \ python.exe作为python3.4.exe

它们都在系统路径中,请选择要运行的版本

C:\Users\username>python2.7
Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>>

C:\Users\username>python3.4
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

cp c:\python27\bin\python.exe as python2.7.exe

cp c:\python34\bin\python.exe as python3.4.exe

they are all in the system path, choose the version you want to run

C:\Users\username>python2.7
Python 2.7.8 (default, Jun 30 2014, 16:03:49) [MSC v.1500 32 bit (Intel)] on win
32
Type "help", "copyright", "credits" or "license" for more information.
>>>

C:\Users\username>python3.4
Python 3.4.1 (v3.4.1:c0e311e010fc, May 18 2014, 10:38:22) [MSC v.1600 32 bit Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>>

回答 10

使用批处理文件进行切换,在Windows 7上轻松高效。我使用以下命令:

在环境变量对话框(C:\ Windows \ System32 \ SystemPropertiesAdvanced.exe)中,

在用户变量部分

  1. 在路径环境变量中添加了%pathpython%

  2. 删除了对python路径的任何引用

在系统变量部分

  1. 删除了对python路径的任何引用

我为每个python安装创建了批处理文件(例如3.4 x64

名称= SetPathPython34x64 !!! ToExecuteAsAdmin.bat ;-)只是为了记住。

文件内容=

     Set PathPython=C:\Python36AMD64\Scripts\;C:\Python36AMD64\;C:\Tcl\bin

     setx PathPython %PathPython%

要在版本之间切换,我在admin模式下执行批处理文件。

!!!!! 该更改对SUBSEQUENT命令提示符窗口OPENED有效。!!!

因此,我对此有完全的控制权。

Using a batch file to switch, easy and efficient on windows 7. I use this:

In the environment variable dialog (C:\Windows\System32\SystemPropertiesAdvanced.exe),

In the section user variables

  1. added %pathpython% to the path environment variable

  2. removed any references to python pathes

In the section system variables

  1. removed any references to python pathes

I created batch files for every python installation (exmple for 3.4 x64

Name = SetPathPython34x64 !!! ToExecuteAsAdmin.bat ;-) just to remember.

Content of the file =

     Set PathPython=C:\Python36AMD64\Scripts\;C:\Python36AMD64\;C:\Tcl\bin

     setx PathPython %PathPython%

To switch between versions, I execute the batch file in admin mode.

!!!!! The changes are effective for the SUBSEQUENT command prompt windows OPENED. !!!

So I have exact control on it.


回答 11

在Windows上运行多个版本的python的最简单方法如下所述:

1)从python.org/downloads下载最新版本的python通过选择系统的相关版本,。

2)运行安装程序,然后选择将python 3.x添加到路径中以在python 3中自动设置路径(您只需单击复选框)。对于python 2,请打开python 2安装程序,选择所需的任何首选项,但只需记住将Add python.exe设置为路径将其安装在本地硬盘上,现在只需单击下一步,然后等待安装程序完成即可。

3)两个安装都完成后。右键单击我的计算机-转到属性-选择高级系统设置-转到环境变量-单击系统变量下的新建,然后添加一个新的系统变量,其变量名称PY_PYTHON并将此变量值设置为3。现在单击确定,您应该完成。

4)现在要对此进行测试,请打开命令提示符。一旦进入pythonpy,它应该打开python3

5)现在通过键入exit()退出 python3 。现在输入py -2应该会打开python 2。

如果这些都不起作用,请重新启动计算机,如果问题仍然存在,请卸载所有组件并重复步骤。

谢谢。

The easiest way to run multiple versions of python on windows is described below as follows:-

1)Download the latest versions of python from python.org/downloads by selecting the relevant version for your system.

2)Run the installer and select Add python 3.x to the path to set path automatically in python 3 (you just have to click the checkbox). For python 2 open up your python 2 installer, select whatever preferences you want but just remember to set Add python.exe to path to Will be installed on local hard drive, Now just click next and wait for the installer to finish.

3)When both the installations are complete. Right click on my computer–Go to properties–Select advanced system settings–Go to environment variables–Click on new under System variables and add a new system variable with variable name as PY_PYTHON and set this variable value to 3. Now click on OK and you should be done.

4)Now to test this open the command prompt. Once you are in there type python or py, It should open up python3.

5)Now exit out of python3 by typing exit(). Now type py -2 it should open python 2.

If none of this works then restart the computer and if the problem still persists then uninstall everything and repeat the steps.

Thanks.


回答 12

您可以从Anaconda Navigator图形化地创建不同的python开发环境。在使用不同的python版本时遇到相同的问题,因此我使用anaconda导航器创建了不同的python开发环境,并在每个环境中使用了不同的python版本。

这是帮助文档。

https://docs.anaconda.com/anaconda/navigator/tutorials/manage-environments/

You can create different python development environments graphically from Anaconda Navigator. I had same problem while working with different python versions so I used anaconda navigator to create different python development environments and used different python versions in each environments.

Here is the help documentation for this.

https://docs.anaconda.com/anaconda/navigator/tutorials/manage-environments/


回答 13

使用Rapid Environment Editor, 您可以将所需的Python安装目录推到顶部。例如,要从c:\ Python27目录启动python,请确保c:\ Python27目录在Path环境变量中的c:\ Python36目录之前或之上。根据我的经验,正在执行Path环境中找到的第一个python可执行文件。例如,我已经在Python27上安装了MSYS2,并且由于我已经将C:\ MSYS2添加到C:\ Python36之前的路径中,因此正在执行C:\ MSYS2 ….文件夹中的python.exe。

Using the Rapid Environment Editor you can push to the top the directory of the desired Python installation. For example, to start python from the c:\Python27 directory, ensure that c:\Python27 directory is before or on top of the c:\Python36 directory in the Path environment variable. From my experience, the first python executable found in the Path environment is being executed. For example, I have MSYS2 installed with Python27 and since I’ve added C:\MSYS2 to the path before C:\Python36, the python.exe from the C:\MSYS2…. folder is being executed.


回答 14

只需调用正确的可执行文件

Just call the correct executable


为什么我们不应该在py脚本中使用sys.setdefaultencoding(“ utf-8”)?

问题:为什么我们不应该在py脚本中使用sys.setdefaultencoding(“ utf-8”)?

我在脚本顶部看到了几个使用此脚本的py脚本。在什么情况下应该使用它?

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

I have seen few py scripts which use this at the top of the script. In what cases one should use it?

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

回答 0

根据文档:这允许您从默认的ASCII切换到其他编码,例如UTF-8,Python运行时在必须将字符串缓冲区解码为unicode时将使用该编码。

此功能仅在Python扫描环境时在Python启动时可用。必须在系统范围的模块中调用,sitecustomize.py评估完setdefaultencoding()sys模块后,将从该模块中删除该功能。

实际使用它的唯一方法是通过将属性重新带回的重载hack。

此外,使用sys.setdefaultencoding()一直气馁,它已成为一个无操作的py3k。py3k的编码硬连线到“ utf-8”,更改它会引发错误。

我建议您阅读一些指针:

As per the documentation: This allows you to switch from the default ASCII to other encodings such as UTF-8, which the Python runtime will use whenever it has to decode a string buffer to unicode.

This function is only available at Python start-up time, when Python scans the environment. It has to be called in a system-wide module, sitecustomize.py, After this module has been evaluated, the setdefaultencoding() function is removed from the sys module.

The only way to actually use it is with a reload hack that brings the attribute back.

Also, the use of sys.setdefaultencoding() has always been discouraged, and it has become a no-op in py3k. The encoding of py3k is hard-wired to “utf-8” and changing it raises an error.

I suggest some pointers for reading:


回答 1

tl; dr

答案是永不(除非您真的知道自己在做什么)

在正确理解编码/解码的情况下,可以解决9/10倍的解决方案。

1/10个人的语言环境或环境定义错误,需要设置:

PYTHONIOENCODING="UTF-8"  

在他们的环境中解决控制台打印问题。

它有什么作用?

sys.setdefaultencoding("utf-8")(为了避免重复使用,请删除),更改了Python 2.x需要将Unicode()转换为str()(反之亦然)且未给出编码时使用的默认编码/解码。即:

str(u"\u20AC")
unicode("€")
"{}".format(u"\u20AC") 

在Python 2.x中,默认编码设置为ASCII,并且上面的示例将失败,并显示以下内容:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

(我的控制台配置为UTF-8,因此"€" = '\xe2\x82\xac',因此为exceptions\xe2

要么

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

sys.setdefaultencoding("utf-8")将允许这些代码对有用,但对于不使用UTF-8的用户不一定有用。ASCII的默认设置可确保不会将编码假设纳入代码

安慰

sys.setdefaultencoding("utf-8")sys.stdout.encoding在将字符打印到控制台时,也具有出现fix的副作用。Python使用用户的语言环境(Linux / OS X / Un * x)或代码页(Windows)进行设置。有时,用户的语言环境已损坏,仅需要PYTHONIOENCODING修复控制台编码

例:

$ export LANG=en_GB.gibberish
$ python
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> print u"\u20AC"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
>>> exit()

$ PYTHONIOENCODING=UTF-8 python
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> print u"\u20AC"
€

sys.setdefaultencoding(“ utf-8”)有什么不好?

人们已经认识到默认的编码是ASCII,因此针对Python 2.x进行了16年的开发。UnicodeError已经编写了异常处理方法来处理发现包含非ASCII的字符串从字符串到Unicode的转换。

来自https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

def welcome_message(byte_string):
    try:
        return u"%s runs your business" % byte_string
    except UnicodeError:
        return u"%s runs your business" % unicode(byte_string,
            encoding=detect_encoding(byte_string))

print(welcome_message(u"Angstrom (Å®)".encode("latin-1"))

在设置defaultencoding之前,此代码将无法解码ascii编码中的“Å”,然后将进入异常处理程序以猜测编码并将其正确转换为unicode。打印:埃斯特朗(Å®)经营您的业务。将defaultencoding设置为utf-8后,代码将发现byte_string可以解释为utf-8,因此它将处理数据并返回该值:Angstrom(Ů)经营您的业务。

更改应为常数的值将对您依赖的模块产生巨大影响。最好只修复代码中传入和传出的数据。

示例问题

虽然在以下示例中将defaultencoding设置为UTF-8并不是根本原因,但它显示了如何掩盖问题以及如何在输入编码更改时以不明显的方式中断代码: UnicodeDecodeError:’utf8’编解码器可以在位置3131中解码字节0x80:无效的起始字节

tl;dr

The answer is NEVER! (unless you really know what you’re doing)

9/10 times the solution can be resolved with a proper understanding of encoding/decoding.

1/10 people have an incorrectly defined locale or environment and need to set:

PYTHONIOENCODING="UTF-8"  

in their environment to fix console printing problems.

What does it do?

sys.setdefaultencoding("utf-8") (struck through to avoid re-use) changes the default encoding/decoding used whenever Python 2.x needs to convert a Unicode() to a str() (and vice-versa) and the encoding is not given. I.e:

str(u"\u20AC")
unicode("€")
"{}".format(u"\u20AC") 

In Python 2.x, the default encoding is set to ASCII and the above examples will fail with:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

(My console is configured as UTF-8, so "€" = '\xe2\x82\xac', hence exception on \xe2)

or

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

sys.setdefaultencoding("utf-8") will allow these to work for me, but won’t necessarily work for people who don’t use UTF-8. The default of ASCII ensures that assumptions of encoding are not baked into code

Console

sys.setdefaultencoding("utf-8") also has a side effect of appearing to fix sys.stdout.encoding, used when printing characters to the console. Python uses the user’s locale (Linux/OS X/Un*x) or codepage (Windows) to set this. Occasionally, a user’s locale is broken and just requires PYTHONIOENCODING to fix the console encoding.

Example:

$ export LANG=en_GB.gibberish
$ python
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> print u"\u20AC"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
>>> exit()

$ PYTHONIOENCODING=UTF-8 python
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> print u"\u20AC"
€

What’s so bad with sys.setdefaultencoding(“utf-8”)?

People have been developing against Python 2.x for 16 years on the understanding that the default encoding is ASCII. UnicodeError exception handling methods have been written to handle string to Unicode conversions on strings that are found to contain non-ASCII.

From https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

def welcome_message(byte_string):
    try:
        return u"%s runs your business" % byte_string
    except UnicodeError:
        return u"%s runs your business" % unicode(byte_string,
            encoding=detect_encoding(byte_string))

print(welcome_message(u"Angstrom (Å®)".encode("latin-1"))

Previous to setting defaultencoding this code would be unable to decode the “Å” in the ascii encoding and then would enter the exception handler to guess the encoding and properly turn it into unicode. Printing: Angstrom (Å®) runs your business. Once you’ve set the defaultencoding to utf-8 the code will find that the byte_string can be interpreted as utf-8 and so it will mangle the data and return this instead: Angstrom (Ů) runs your business.

Changing what should be a constant will have dramatic effects on modules you depend upon. It’s better to just fix the data coming in and out of your code.

Example problem

While the setting of defaultencoding to UTF-8 isn’t the root cause in the following example, it shows how problems are masked and how, when the input encoding changes, the code breaks in an unobvious way: UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x80 in position 3131: invalid start byte


回答 2

#!/usr/bin/env python
#-*- coding: utf-8 -*-
u = u'moçambique'
print u.encode("utf-8")
print u

chmod +x test.py
./test.py
moçambique
moçambique

./test.py > output.txt
Traceback (most recent call last):
  File "./test.py", line 5, in <module>
    print u
UnicodeEncodeError: 'ascii' codec can't encode character 
u'\xe7' in position 2: ordinal not in range(128)

在shell上工作时,不发送到sdtout,因此这是写stdout的一种解决方法。

我做了另一种方法,如果未定义sys.stdout.encoding,或者换句话说,需要先导出PYTHONIOENCODING = UTF-8才能写入stdout,否则该方法将不运行。

import sys
if (sys.stdout.encoding is None):            
    print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout." 
    exit(1)


因此,使用相同的示例:

export PYTHONIOENCODING=UTF-8
./test.py > output.txt

将工作

#!/usr/bin/env python
#-*- coding: utf-8 -*-
u = u'moçambique'
print u.encode("utf-8")
print u

chmod +x test.py
./test.py
moçambique
moçambique

./test.py > output.txt
Traceback (most recent call last):
  File "./test.py", line 5, in <module>
    print u
UnicodeEncodeError: 'ascii' codec can't encode character 
u'\xe7' in position 2: ordinal not in range(128)

on shell works , sending to sdtout not , so that is one workaround, to write to stdout .

I made other approach, which is not run if sys.stdout.encoding is not define, or in others words , need export PYTHONIOENCODING=UTF-8 first to write to stdout.

import sys
if (sys.stdout.encoding is None):            
    print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout." 
    exit(1)


so, using same example:

export PYTHONIOENCODING=UTF-8
./test.py > output.txt

will work


回答 3

  • 第一个危险在于reload(sys)

    重新加载模块时,实际上在运行时中获得了该模块的两个副本。旧模块是一个Python对象,就像其他所有模块一样,只要存在对它的引用,它就会保持活动状态。因此,一半的对象将指向旧模块,而另一半则指向新模块。进行更改时,当某些随机对象看不到更改时,您将永远看不到它:

    (This is IPython shell)
    
    In [1]: import sys
    
    In [2]: sys.stdout
    Out[2]: <colorama.ansitowin32.StreamWrapper at 0x3a2aac8>
    
    In [3]: reload(sys)
    <module 'sys' (built-in)>
    
    In [4]: sys.stdout
    Out[4]: <open file '<stdout>', mode 'w' at 0x00000000022E20C0>
    
    In [11]: import IPython.terminal
    
    In [14]: IPython.terminal.interactiveshell.sys.stdout
    Out[14]: <colorama.ansitowin32.StreamWrapper at 0x3a9aac8>
  • 现在,sys.setdefaultencoding()适当的

    它所影响的只是隐式转换str<->unicode。现在,这utf-8是地球上最聪明的编码(向后兼容ASCII和所有语言),现在转换“正常”了,可能出什么问题了吗?

    好吧,什么都可以。那就是危险。

    • 可能有些代码依赖于UnicodeError为非ASCII输入抛出的代码,或者使用错误处理程序进行代码转换,这现在会产生意外结果。而且,由于所有代码都是使用默认设置进行测试的,因此您在此处严格处于“不受支持”的范围,并且没人能保证它们的代码将如何运行。
    • 如果系统上并非所有组件都使用UTF-8,则转码可能会产生意外或无法使用的结果,因为Python 2实际上具有多个独立的“默认字符串编码”。(请记住,程序必须在客户的设备上为客户工作。)
      • 同样,最糟糕的是您永远不会知道,因为转换是隐式的 -您实际上并不知道转换的时间和地点。(Python Zen,koan 2 ahoy!)您将永远不知道为什么(如果)代码可以在一个系统上运行而在另一个系统上中断。(或者更好的是,可以在IDE中工作,并且可以在控制台中中断。)
  • The first danger lies in reload(sys).

    When you reload a module, you actually get two copies of the module in your runtime. The old module is a Python object like everything else, and stays alive as long as there are references to it. So, half of the objects will be pointing to the old module, and half to the new one. When you make some change, you will never see it coming when some random object doesn’t see the change:

    (This is IPython shell)
    
    In [1]: import sys
    
    In [2]: sys.stdout
    Out[2]: <colorama.ansitowin32.StreamWrapper at 0x3a2aac8>
    
    In [3]: reload(sys)
    <module 'sys' (built-in)>
    
    In [4]: sys.stdout
    Out[4]: <open file '<stdout>', mode 'w' at 0x00000000022E20C0>
    
    In [11]: import IPython.terminal
    
    In [14]: IPython.terminal.interactiveshell.sys.stdout
    Out[14]: <colorama.ansitowin32.StreamWrapper at 0x3a9aac8>
    
  • Now, sys.setdefaultencoding() proper

    All that it affects is implicit conversion str<->unicode. Now, utf-8 is the sanest encoding on the planet (backward-compatible with ASCII and all), the conversion now “just works”, what could possibly go wrong?

    Well, anything. And that is the danger.

    • There may be some code that relies on the UnicodeError being thrown for non-ASCII input, or does the transcoding with an error handler, which now produces an unexpected result. And since all code is tested with the default setting, you’re strictly on “unsupported” territory here, and no-one gives you guarantees about how their code will behave.
    • The transcoding may produce unexpected or unusable results if not everything on the system uses UTF-8 because Python 2 actually has multiple independent “default string encodings”. (Remember, a program must work for the customer, on the customer’s equipment.)
      • Again, the worst thing is you will never know that because the conversion is implicit — you don’t really know when and where it happens. (Python Zen, koan 2 ahoy!) You will never know why (and if) your code works on one system and breaks on another. (Or better yet, works in IDE and breaks in console.)

为什么在lambda中无法打印?

问题:为什么在lambda中无法打印?

为什么不起作用?

lambda: print "x"

这不是一个单一的陈述,还是其他?该文档对于lambda允许的内容似乎有点稀疏…

Why doesn’t this work?

lambda: print "x"

Is this not a single statement, or is it something else? The documentation seems a little sparse on what is allowed in a lambda…


回答 0

一个lambda人的身体必须是一个单一的表情。在Python 2.x中,print是一条语句。但是,在Python 3中,print函数(而函数应用程序是表达式,因此它将在lambda中工作)。如果您使用的是最新的Python 2.x,则可以(并且应该,为了向前兼容:)使用向后打印功能:

In [1324]: from __future__ import print_function

In [1325]: f = lambda x: print(x)

In [1326]: f("HI")
HI

A lambda‘s body has to be a single expression. In Python 2.x, print is a statement. However, in Python 3, print is a function (and a function application is an expression, so it will work in a lambda). You can (and should, for forward compatibility :) use the back-ported print function if you are using the latest Python 2.x:

In [1324]: from __future__ import print_function

In [1325]: f = lambda x: print(x)

In [1326]: f("HI")
HI

回答 1

在我将其用于简单存根的​​情况下,请使用以下方法:

fn = lambda x: sys.stdout.write(str(x) + "\n")

完美地运作。

In cases where I am using this for simple stubbing out I use this:

fn = lambda x: sys.stdout.write(str(x) + "\n")

which works perfectly.


回答 2

你写的等同于

def anon():
    return print "x"

这也会导致SyntaxError,python不允许您分配要在2.xx中打印的值;在python3中,你可以说

lambda: print('hi')

这样做是可行的,因为他们将print更改为函数而不是语句。

what you’ve written is equivalent to

def anon():
    return print "x"

which also results in a SyntaxError, python doesn’t let you assign a value to print in 2.xx; in python3 you could say

lambda: print('hi')

and it would work because they’ve changed print to be a function instead of a statement.


回答 3

Lambda的主体必须是一个返回值的表达式。 print作为语句,不会返回任何东西,甚至也不返回None。同样,您不能将的结果分配给print变量:

>>> x = print "hello"
  File "<stdin>", line 1
    x = print "hello"
            ^
SyntaxError: invalid syntax

您也不能将变量赋值放在lambda中,因为赋值是语句:

>>> lambda y: (x = y)
  File "<stdin>", line 1
    lambda y: (x = y)
                 ^
SyntaxError: invalid syntax

The body of a lambda has to be an expression that returns a value. print, being a statement, doesn’t return anything, not even None. Similarly, you can’t assign the result of print to a variable:

>>> x = print "hello"
  File "<stdin>", line 1
    x = print "hello"
            ^
SyntaxError: invalid syntax

You also can’t put a variable assignment in a lambda, since assignments are statements:

>>> lambda y: (x = y)
  File "<stdin>", line 1
    lambda y: (x = y)
                 ^
SyntaxError: invalid syntax

回答 4

你可以做这样的事情。

创建一个函数以将打印语句转换为函数:

def printf(text):
   print text

并打印:

lambda: printf("Testing")

You can do something like this.

Create a function to transform print statement into a function:

def printf(text):
   print text

And print it:

lambda: printf("Testing")

回答 5

使用Python 3.x,打印可以在lambda中工作,而无需更改lambda的语义。

以特殊的方式使用,这对于调试非常方便。我发布此“最新答案”,因为这是我经常使用的实用技巧。

假设您的“非工具化” lambda为:

lambda: 4

然后,您的“工具化” lambda为:

lambda: (print (3), 4) [1]

With Python 3.x, print CAN work in a lambda, without changing the semantics of the lambda.

Used in a special way this is very handy for debugging. I post this ‘late answer’, because it’s a practical trick that I often use.

Suppose your ‘uninstrumented’ lambda is:

lambda: 4

Then your ‘instrumented’ lambda is:

lambda: (print (3), 4) [1]

回答 6

Lambda的主体必须是单个表达式print是一个声明,很遗憾,它已经退出了。

The body of a lambda has to be a single expression. print is a statement, so it’s out, unfortunately.


回答 7

在这里,您会看到问题的答案。 print它说不是在Python中表达。

Here, you see an answer for your question. print is not expression in Python, it says.


Python:对Unicode转义的字符串使用.format()

问题:Python:对Unicode转义的字符串使用.format()

我正在使用Python 2.6.5。我的代码要求使用“大于或等于”符号。它去了:

>>> s = u'\u2265'
>>> print s
>>> 
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`  

为什么会出现此错误?有正确的方法吗?我需要使用该.format()功能。

I am using Python 2.6.5. My code requires the use of the “more than or equal to” sign. Here it goes:

>>> s = u'\u2265'
>>> print s
>>> ≥
>>> print "{0}".format(s)
Traceback (most recent call last):
     File "<input>", line 1, in <module> 
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265'
  in position 0: ordinal not in range(128)`  

Why do I get this error? Is there a right way to do this? I need to use the .format() function.


回答 0

只需将第二个字符串也设为unicode字符串

>>> s = u'\u2265'
>>> print s

>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>> 

Just make the second string also a unicode string

>>> s = u'\u2265'
>>> print s
≥
>>> print "{0}".format(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)
>>> print u"{0}".format(s)
≥
>>> 

回答 1

unicode需要unicode格式字符串。

>>> print u'{0}'.format(s)

unicodes need unicode format strings.

>>> print u'{0}'.format(s)
≥

回答 2

一点的更多信息,为什么出现这种情况。

>>> s = u'\u2265'
>>> print s

之所以起作用,是因为print自动为您的环境使用系统编码,该编码很可能已设置为UTF-8。(您可以通过做检查import sys; print sys.stdout.encoding

>>> print "{0}".format(s)

失败,因为format尝试匹配调用它的类型的编码(我找不到关于它的文档,但这是我注意到的行为)。由于字符串文字是python 2中编码为ASCII的字节字符串,因此format尝试将其编码s为ASCII,然后导致该异常。观察:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

因此,这基本上就是这些方法起作用的原因:

>>> s = u'\u2265'
>>> print u'{}'.format(s)

>>> print '{}'.format(s.encode('utf-8'))

源字符集由编码声明定义。如果源文件中没有给出编码声明,则为ASCII(https://docs.python.org/2/reference/lexical_analysis.html#string-literals

A bit more information on why that happens.

>>> s = u'\u2265'
>>> print s

works because print automatically uses the system encoding for your environment, which was likely set to UTF-8. (You can check by doing import sys; print sys.stdout.encoding)

>>> print "{0}".format(s)

fails because format tries to match the encoding of the type that it is called on (I couldn’t find documentation on this, but this is the behavior I’ve noticed). Since string literals are byte strings encoded as ASCII in python 2, format tries to encode s as ASCII, which then results in that exception. Observe:

>>> s = u'\u2265'
>>> s.encode('ascii')
Traceback (most recent call last):
  File "<input>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2265' in position 0: ordinal not in range(128)

So that is basically why these approaches work:

>>> s = u'\u2265'
>>> print u'{}'.format(s)
≥
>>> print '{}'.format(s.encode('utf-8'))
≥

The source character set is defined by the encoding declaration; it is ASCII if no encoding declaration is given in the source file (https://docs.python.org/2/reference/lexical_analysis.html#string-literals)


为什么默认编码为ASCII时Python为什么打印unicode字符?

问题:为什么默认编码为ASCII时Python为什么打印unicode字符?

从Python 2.6 shell:

>>> import sys
>>> print sys.getdefaultencoding()
ascii
>>> print u'\xe9'
é
>>> 

我希望在打印语句后出现一些乱码或错误,因为“é”字符不是ASCII的一部分,并且我未指定编码。我想我不明白ASCII是默认编码的意思。

编辑

我将编辑移至“ 答案”部分,并按建议接受。

From the Python 2.6 shell:

>>> import sys
>>> print sys.getdefaultencoding()
ascii
>>> print u'\xe9'
é
>>> 

I expected to have either some gibberish or an Error after the print statement, since the “é” character isn’t part of ASCII and I haven’t specified an encoding. I guess I don’t understand what ASCII being the default encoding means.

EDIT

I moved the edit to the Answers section and accepted it as suggested.


回答 0

多亏各方面的答复,我认为我们可以做出一个解释。

通过尝试打印unicode字符串u’\ xe9’,Python隐式尝试使用当前存储在sys.stdout.encoding中的编码方案对该字符串进行编码。Python实际上是从启动它的环境中选取此设置的。如果它无法从环境中找到合适的编码,则只有它才能恢复为其默认值 ASCII。

例如,我使用bash shell,其编码默认为UTF-8。如果我从中启动Python,它将启动并使用该设置:

$ python

>>> import sys
>>> print sys.stdout.encoding
UTF-8

让我们暂时退出Python shell,并使用一些伪造的编码设置bash的环境:

$ export LC_CTYPE=klingon
# we should get some error message here, just ignore it.

然后再次启动python shell并确认它确实恢复为默认的ascii编码。

$ python

>>> import sys
>>> print sys.stdout.encoding
ANSI_X3.4-1968

答对了!

如果现在尝试在ascii之外输出一些Unicode字符,则应该会收到一条不错的错误消息

>>> print u'\xe9'
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' 
in position 0: ordinal not in range(128)

让我们退出Python并丢弃bash shell。

现在,我们将观察Python输出字符串之后发生的情况。为此,我们首先在图形终端(我使用Gnome Terminal)中启动bash shell,然后将终端设置为使用ISO-8859-1 aka latin-1解码输出(图形终端通常可以选择设置字符)在其下拉菜单之一中编码)。请注意,这不会更改实际shell环境的编码,仅会更改终端本身将解码给定输出的方式,就像Web浏览器一样。因此,您可以独立于外壳环境而更改终端的编码。然后让我们从外壳启动Python,并验证sys.stdout.encoding是否设置为外壳环境的编码(对我来说是UTF-8):

$ python

>>> import sys

>>> print sys.stdout.encoding
UTF-8

>>> print '\xe9' # (1)
é
>>> print u'\xe9' # (2)
é
>>> print u'\xe9'.encode('latin-1') # (3)
é
>>>

(1)python按原样输出二进制字符串,终端将其接收并尝试将其值与latin-1字符映射进行匹配。在latin-1中,0xe9或233产生字符“é”,这就是终端显示的内容。

(2)python尝试使用sys.stdout.encoding中当前设置的任何方案对Unicode字符串进行隐式编码,在本例中为“ UTF-8”。经过UTF-8编码后,生成的二进制字符串为’\ xc3 \ xa9’(请参阅后面的说明)。终端按原样接收流,并尝试使用latin-1解码0xc3a9,但是latin-1从0到255,因此,一次仅解码1个字节的流。0xc3a9为2个字节长,因此latin-1解码器将其解释为0xc3(195)和0xa9(169),并产生2个字符:Ã和©。

(3)python使用latin-1方案对unicode代码点u’\ xe9’(233)进行编码。原来latin-1代码点的范围是0-255,并指向该范围内与Unicode完全相同的字符。因此,以latin-1编码时,该范围内的Unicode代码点将产生相同的值。因此,以latin-1编码的u’\ xe9’(233)也将产生二进制字符串’\ xe9’。终端接收到该值,并尝试在latin-1字符映射上进行匹配。就像情况(1)一样,它会产生“é”,这就是显示的内容。

现在,从下拉菜单中将终端的编码设置更改为UTF-8(就像您将更改Web浏览器的编码设置一样)。无需停止Python或重新启动Shell。终端的编码现在与Python匹配。让我们再次尝试打印:

>>> print '\xe9' # (4)

>>> print u'\xe9' # (5)
é
>>> print u'\xe9'.encode('latin-1') # (6)

>>>

(4)python 按原样输出二进制字符串。终端尝试使用UTF-8解码该流。但是UTF-8无法理解值0xe9(请参阅后面的说明),因此无法将其转换为unicode代码点。找不到代码点,没有打印字符。

(5)python尝试使用sys.stdout.encoding中的任何内容隐式编码Unicode字符串。仍然是“ UTF-8”。生成的二进制字符串为“ \ xc3 \ xa9”。终端接收流,并尝试使用UTF-8解码0xc3a9。它会产生回码值0xe9(233),该值在Unicode字符映射表上指向符号“é”。终端显示“é”。

(6)python使用latin-1编码unicode字符串,它产生一个具有相同值’\ xe9’的二进制字符串。同样,对于终端,这与情况(4)几乎相同。

结论:-Python将非Unicode字符串作为原始数据输出,而不考虑其默认编码。如果终端的当前编码与数据匹配,则终端恰好显示它们。-Python使用sys.stdout.encoding中指定的方案对Unicode字符串进行编码后输出。-Python从Shell的环境中获取该设置。-终端根据其自身的编码设置显示输出。-终端的编码独立于外壳的编码。


有关Unicode,UTF-8和latin-1的更多详细信息:

Unicode基本上是一个字符表,其中按常规分配了一些键(代码点)以指向某些符号。例如,根据约定,已确定键0xe9(233)是指向符号’é’的值。ASCII和Unicode使用相同的代码点(从0到127),latin-1和Unicode使用的代码点也从0到255。也就是说,0x41指向ASCII,latin-1和Unicode中的“ A”,0xc8指向ASCII中的“Ü” latin-1和Unicode,0xe9指向latin-1和Unicode中的’é’。

在使用电子设备时,Unicode代码点需要一种有效的方式以电子方式表示。这就是编码方案。存在各种Unicode编码方案(utf7,UTF-8,UTF-16,UTF-32)。最直观,最直接的编码方法是简单地使用Unicode映射中的代码点值作为其电子形式的值,但是Unicode当前有超过一百万个代码点,这意味着其中一些代码点需要3个字节表达。为了有效地处理文本,一对一的映射将是不切实际的,因为它将要求所有代码点都存储在完全相同的空间中,每个字符至少要占用3个字节,而不管它们的实际需要如何。

大多数编码方案在空间要求上都有缺点,最经济的方案不能覆盖所有unicode码点,例如ascii仅覆盖前128个,而latin-1覆盖前256个。这是浪费的,因为即使对于常见的“便宜”字符,它们也需要更多的字节。例如,UTF-16每个字符至少使用2个字节,包括在ASCII范围内的字符(“ B”为65,在UTF-16中仍需要2个字节的存储空间)。UTF-32更加浪费,因为它将所有字符存储在4个字节中。

UTF-8恰好巧妙地解决了这一难题,该方案能够存储带有可变数量字节空间的代码点。作为其编码策略的一部分,UTF-8在代码点上附加标志位,这些标志位指示(可能是解码器)其空间要求和边界。

Unicode编码点在ASCII范围(0-127)中的UTF-8编码:

0xxx xxxx  (in binary)
  • x表示在编码过程中为“存储”代码点保留的实际空间
  • 前导0是一个标志,向UTF-8解码器指示此代码点仅需要1个字节。
  • 编码后,UTF-8不会在该特定范围内更改代码点的值(即,以UTF-8编码的65也是65)。考虑到Unicode和ASCII在相同范围内也兼容,因此附带地使UTF-8和ASCII在该范围内也兼容。

例如,“ B”的Unicode代码点是“ 0x42”或二进制的0100 0010(正如我们所说的,在ASCII中是相同的)。用UTF-8编码后,它变为:

0xxx xxxx  <-- UTF-8 encoding for Unicode code points 0 to 127
*100 0010  <-- Unicode code point 0x42
0100 0010  <-- UTF-8 encoded (exactly the same)

127以上的Unicode代码点的UTF-8编码(非ascii):

110x xxxx 10xx xxxx            <-- (from 128 to 2047)
1110 xxxx 10xx xxxx 10xx xxxx  <-- (from 2048 to 65535)
  • 前导比特“ 110”向UTF-8解码器指示以2个字节编码的代码点的开始,而“ 1110”指示3个字节,11110将指示4个字节,依此类推。
  • 内部的“ 10”标志位用于表示内部字节的开始。
  • 再次,x标记编码后存储Unicode代码点值的空间。

例如,“é” Unicode代码点为0xe9(233)。

1110 1001    <-- 0xe9

当UTF-8对该值进行编码时,它确定该值大于127且小于2048,因此应以2个字节进行编码:

110x xxxx 10xx xxxx   <-- UTF-8 encoding for Unicode 128-2047
***0 0011 **10 1001   <-- 0xe9
1100 0011 1010 1001   <-- 'é' after UTF-8 encoding
C    3    A    9

UTF-8编码之后的0xe9 Unicode代码指向变为0xc3a9。终端接收的确切方式。如果将您的终端设置为使用latin-1(一种非unicode遗留编码)对字符串进行解码,则会看到é,因为恰好发生在latin-1中的0xc3指向Ã,而0xa9则指向©。

Thanks to bits and pieces from various replies, I think we can stitch up an explanation.

By trying to print an unicode string, u’\xe9′, Python implicitly try to encode that string using the encoding scheme currently stored in sys.stdout.encoding. Python actually picks up this setting from the environment it’s been initiated from. If it can’t find a proper encoding from the environment, only then does it revert to its default, ASCII.

For example, I use a bash shell which encoding defaults to UTF-8. If I start Python from it, it picks up and use that setting:

$ python

>>> import sys
>>> print sys.stdout.encoding
UTF-8

Let’s for a moment exit the Python shell and set bash’s environment with some bogus encoding:

$ export LC_CTYPE=klingon
# we should get some error message here, just ignore it.

Then start the python shell again and verify that it does indeed revert to its default ascii encoding.

$ python

>>> import sys
>>> print sys.stdout.encoding
ANSI_X3.4-1968

Bingo!

If you now try to output some unicode character outside of ascii you should get a nice error message

>>> print u'\xe9'
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe9' 
in position 0: ordinal not in range(128)

Lets exit Python and discard the bash shell.

We’ll now observe what happens after Python outputs strings. For this we’ll first start a bash shell within a graphic terminal (I use Gnome Terminal) and we’ll set the terminal to decode output with ISO-8859-1 aka latin-1 (graphic terminals usually have an option to Set Character Encoding in one of their dropdown menus). Note that this doesn’t change the actual shell environment’s encoding, it only changes the way the terminal itself will decode output it’s given, a bit like a web browser does. You can therefore change the terminal’s encoding, independantly from the shell’s environment. Let’s then start Python from the shell and verify that sys.stdout.encoding is set to the shell environment’s encoding (UTF-8 for me):

$ python

>>> import sys

>>> print sys.stdout.encoding
UTF-8

>>> print '\xe9' # (1)
é
>>> print u'\xe9' # (2)
é
>>> print u'\xe9'.encode('latin-1') # (3)
é
>>>

(1) python outputs binary string as is, terminal receives it and tries to match its value with latin-1 character map. In latin-1, 0xe9 or 233 yields the character “é” and so that’s what the terminal displays.

(2) python attempts to implicitly encode the Unicode string with whatever scheme is currently set in sys.stdout.encoding, in this instance it’s “UTF-8”. After UTF-8 encoding, the resulting binary string is ‘\xc3\xa9’ (see later explanation). Terminal receives the stream as such and tries to decode 0xc3a9 using latin-1, but latin-1 goes from 0 to 255 and so, only decodes streams 1 byte at a time. 0xc3a9 is 2 bytes long, latin-1 decoder therefore interprets it as 0xc3 (195) and 0xa9 (169) and that yields 2 characters: Ã and ©.

(3) python encodes unicode code point u’\xe9′ (233) with the latin-1 scheme. Turns out latin-1 code points range is 0-255 and points to the exact same character as Unicode within that range. Therefore, Unicode code points in that range will yield the same value when encoded in latin-1. So u’\xe9′ (233) encoded in latin-1 will also yields the binary string ‘\xe9’. Terminal receives that value and tries to match it on the latin-1 character map. Just like case (1), it yields “é” and that’s what’s displayed.

Let’s now change the terminal’s encoding settings to UTF-8 from the dropdown menu (like you would change your web browser’s encoding settings). No need to stop Python or restart the shell. The terminal’s encoding now matches Python’s. Let’s try printing again:

>>> print '\xe9' # (4)

>>> print u'\xe9' # (5)
é
>>> print u'\xe9'.encode('latin-1') # (6)

>>>

(4) python outputs a binary string as is. Terminal attempts to decode that stream with UTF-8. But UTF-8 doesn’t understand the value 0xe9 (see later explanation) and is therefore unable to convert it to a unicode code point. No code point found, no character printed.

(5) python attempts to implicitly encode the Unicode string with whatever’s in sys.stdout.encoding. Still “UTF-8”. The resulting binary string is ‘\xc3\xa9’. Terminal receives the stream and attempts to decode 0xc3a9 also using UTF-8. It yields back code value 0xe9 (233), which on the Unicode character map points to the symbol “é”. Terminal displays “é”.

(6) python encodes unicode string with latin-1, it yields a binary string with the same value ‘\xe9’. Again, for the terminal this is pretty much the same as case (4).

Conclusions: – Python outputs non-unicode strings as raw data, without considering its default encoding. The terminal just happens to display them if its current encoding matches the data. – Python outputs Unicode strings after encoding them using the scheme specified in sys.stdout.encoding. – Python gets that setting from the shell’s environment. – the terminal displays output according to its own encoding settings. – the terminal’s encoding is independant from the shell’s.


More details on unicode, UTF-8 and latin-1:

Unicode is basically a table of characters where some keys (code points) have been conventionally assigned to point to some symbols. e.g. by convention it’s been decided that key 0xe9 (233) is the value pointing to the symbol ‘é’. ASCII and Unicode use the same code points from 0 to 127, as do latin-1 and Unicode from 0 to 255. That is, 0x41 points to ‘A’ in ASCII, latin-1 and Unicode, 0xc8 points to ‘Ü’ in latin-1 and Unicode, 0xe9 points to ‘é’ in latin-1 and Unicode.

When working with electronic devices, Unicode code points need an efficient way to be represented electronically. That’s what encoding schemes are about. Various Unicode encoding schemes exist (utf7, UTF-8, UTF-16, UTF-32). The most intuitive and straight forward encoding approach would be to simply use a code point’s value in the Unicode map as its value for its electronic form, but Unicode currently has over a million code points, which means that some of them require 3 bytes to be expressed. To work efficiently with text, a 1 to 1 mapping would be rather impractical, since it would require that all code points be stored in exactly the same amount of space, with a minimum of 3 bytes per character, regardless of their actual need.

Most encoding schemes have shortcomings regarding space requirement, the most economic ones don’t cover all unicode code points, for example ascii only covers the first 128, while latin-1 covers the first 256. Others that try to be more comprehensive end up also being wasteful, since they require more bytes than necessary, even for common “cheap” characters. UTF-16 for instance, uses a minimum of 2 bytes per character, including those in the ascii range (‘B’ which is 65, still requires 2 bytes of storage in UTF-16). UTF-32 is even more wasteful as it stores all characters in 4 bytes.

UTF-8 happens to have cleverly resolved the dilemma, with a scheme able to store code points with a variable amount of byte spaces. As part of its encoding strategy, UTF-8 laces code points with flag bits that indicate (presumably to decoders) their space requirements and their boundaries.

UTF-8 encoding of unicode code points in the ascii range (0-127):

0xxx xxxx  (in binary)
  • the x’s show the actual space reserved to “store” the code point during encoding
  • The leading 0 is a flag that indicates to the UTF-8 decoder that this code point will only require 1 byte.
  • upon encoding, UTF-8 doesn’t change the value of code points in that specific range (i.e. 65 encoded in UTF-8 is also 65). Considering that Unicode and ASCII are also compatible in the same range, it incidentally makes UTF-8 and ASCII also compatible in that range.

e.g. Unicode code point for ‘B’ is ‘0x42’ or 0100 0010 in binary (as we said, it’s the same in ASCII). After encoding in UTF-8 it becomes:

0xxx xxxx  <-- UTF-8 encoding for Unicode code points 0 to 127
*100 0010  <-- Unicode code point 0x42
0100 0010  <-- UTF-8 encoded (exactly the same)

UTF-8 encoding of Unicode code points above 127 (non-ascii):

110x xxxx 10xx xxxx            <-- (from 128 to 2047)
1110 xxxx 10xx xxxx 10xx xxxx  <-- (from 2048 to 65535)
  • the leading bits ‘110’ indicate to the UTF-8 decoder the beginning of a code point encoded in 2 bytes, whereas ‘1110’ indicates 3 bytes, 11110 would indicate 4 bytes and so forth.
  • the inner ’10’ flag bits are used to signal the beginning of an inner byte.
  • again, the x’s mark the space where the Unicode code point value is stored after encoding.

e.g. ‘é’ Unicode code point is 0xe9 (233).

1110 1001    <-- 0xe9

When UTF-8 encodes this value, it determines that the value is larger than 127 and less than 2048, therefore should be encoded in 2 bytes:

110x xxxx 10xx xxxx   <-- UTF-8 encoding for Unicode 128-2047
***0 0011 **10 1001   <-- 0xe9
1100 0011 1010 1001   <-- 'é' after UTF-8 encoding
C    3    A    9

The 0xe9 Unicode code points after UTF-8 encoding becomes 0xc3a9. Which is exactly how the terminal receives it. If your terminal is set to decode strings using latin-1 (one of the non-unicode legacy encodings), you’ll see é, because it just so happens that 0xc3 in latin-1 points to à and 0xa9 to ©.


回答 1

将Unicode字符打印到stdout时,sys.stdout.encoding使用。假定包含一个非Unicode字符,sys.stdout.encoding并将其发送到终端。在我的系统上(Python 2):

>>> import unicodedata as ud
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> ud.name(u'\xe9') # U+00E9 Unicode codepoint
'LATIN SMALL LETTER E WITH ACUTE'
>>> ud.name('\xe9'.decode('cp437')) 
'GREEK CAPITAL LETTER THETA'
>>> '\xe9'.decode('cp437') # byte E9 decoded using code page 437 is U+0398.
u'\u0398'
>>> ud.name(u'\u0398')
'GREEK CAPITAL LETTER THETA'
>>> print u'\xe9' # Unicode is encoded to CP437 correctly
é
>>> print '\xe9'  # Byte is just sent to terminal and assumed to be CP437.
Θ

sys.getdefaultencoding() 仅在Python没有其他选项时使用。

请注意,Python 3.6或更高版本会忽略Windows上的编码,并使用Unicode API将Unicode写入终端。没有UnicodeEncodeError警告,并且如果字体支持,则显示正确的字符。即使字体支持,仍可以将字符从终端剪切到带有支持字体的应用程序中,这是正确的。升级!

When Unicode characters are printed to stdout, sys.stdout.encoding is used. A non-Unicode character is assumed to be in sys.stdout.encoding and is just sent to the terminal. On my system (Python 2):

>>> import unicodedata as ud
>>> import sys
>>> sys.stdout.encoding
'cp437'
>>> ud.name(u'\xe9') # U+00E9 Unicode codepoint
'LATIN SMALL LETTER E WITH ACUTE'
>>> ud.name('\xe9'.decode('cp437')) 
'GREEK CAPITAL LETTER THETA'
>>> '\xe9'.decode('cp437') # byte E9 decoded using code page 437 is U+0398.
u'\u0398'
>>> ud.name(u'\u0398')
'GREEK CAPITAL LETTER THETA'
>>> print u'\xe9' # Unicode is encoded to CP437 correctly
é
>>> print '\xe9'  # Byte is just sent to terminal and assumed to be CP437.
Θ

sys.getdefaultencoding() is only used when Python doesn’t have another option.

Note that Python 3.6 or later ignores encodings on Windows and uses Unicode APIs to write Unicode to the terminal. No UnicodeEncodeError warnings and the correct character is displayed if the font supports it. Even if the font doesn’t support it the characters can still be cut-n-pasted from the terminal to an application with a supporting font and it will be correct. Upgrade!


回答 2

Python REPL尝试从您的环境中选择要使用的编码。如果它发现一个理智的东西,那就一切正常。在无法弄清楚到底是什么情况时,它才会出错。

>>> print sys.stdout.encoding
UTF-8

The Python REPL tries to pick up what encoding to use from your environment. If it finds something sane then it all Just Works. It’s when it can’t figure out what’s going on that it bugs out.

>>> print sys.stdout.encoding
UTF-8

回答 3

已经通过输入一个明确的Unicode字符串指定了一种编码。比较不使用u前缀的结果。

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> '\xe9'
'\xe9'
>>> u'\xe9'
u'\xe9'
>>> print u'\xe9'
é
>>> print '\xe9'

>>> 

在这种情况下,\xe9Python会采用您的默认编码(Ascii),从而将…打印为空白。

You have specified an encoding by entering an explicit Unicode string. Compare the results of not using the u prefix.

>>> import sys
>>> sys.getdefaultencoding()
'ascii'
>>> '\xe9'
'\xe9'
>>> u'\xe9'
u'\xe9'
>>> print u'\xe9'
é
>>> print '\xe9'

>>> 

In the case of \xe9 then Python assumes your default encoding (Ascii), thus printing … something blank.


回答 4

这个对我有用:

import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('utf-8')

It works for me:

import sys
stdin, stdout = sys.stdin, sys.stdout
reload(sys)
sys.stdin, sys.stdout = stdin, stdout
sys.setdefaultencoding('utf-8')

回答 5

根据Python默认/隐式字符串编码和转换

  • print荷兰国际集团unicode,它的encoded用<file>.encoding
    • encoding未设置时,会将unicode隐式转换为str(因为该的编解码器为sys.getdefaultencoding(),即ascii任何国家字符都会导致UnicodeEncodeError
    • 对于标准流,encoding是从环境推断的。通常是设置fot tty流(从终端的语言环境设置),但可能没有为管道设置
      • 因此print u'\xe9',当输出到终端时,a 可能会成功,而如果将其重定向到,则a可能会失败。一个解决方案是encode()print输入前对具有所需编码的字符串进行处理。
  • print荷兰国际集团str,由于是字节被发送到流中。终端显示的字形将取决于其区域设置。

As per Python default/implicit string encodings and conversions :

  • When printing unicode, it’s encoded with <file>.encoding.
    • when the encoding is not set, the unicode is implicitly converted to str (since the codec for that is sys.getdefaultencoding(), i.e. ascii, any national characters would cause a UnicodeEncodeError)
    • for standard streams, the encoding is inferred from environment. It’s typically set fot tty streams (from the terminal’s locale settings), but is likely to not be set for pipes
      • so a print u'\xe9' is likely to succeed when the output is to a terminal, and fail if it’s redirected. A solution is to encode() the string with the desired encoding before printing.
  • When printing str, the bytes are sent to the stream as is. What glyphs the terminal shows will depend on its locale settings.

将Unicode文本写入文本文件?

问题:将Unicode文本写入文本文件?

我正在从Google文档中提取数据,进行处理,然后将其写入文件(最终我将其粘贴到Wordpress页面中)。

它具有一些非ASCII符号。如何将这些安全地转换为可以在HTML源代码中使用的符号?

目前,我正在将所有内容都转换为Unicode,将它们全部组合成Python字符串,然后执行以下操作:

import codecs
f = codecs.open('out.txt', mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))

最后一行存在编码错误:

UnicodeDecodeError:’ascii’编解码器无法解码位置12286的字节0xa0:序数不在范围内(128)

部分解决方案:

此Python运行无错误:

row = [unicode(x.strip()) if x is not None else u'' for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open('out.txt', 'w')
f.write(all_html.encode("utf-8"))

但是,如果我打开实际的文本文件,则会看到很多符号,例如:

Qur’an 

也许我需要写文本文件以外的东西?

I’m pulling data out of a Google doc, processing it, and writing it to a file (that eventually I will paste into a WordPress page).

It has some non-ASCII symbols. How can I convert these safely to symbols that can be used in HTML source?

Currently I’m converting everything to Unicode on the way in, joining it all together in a Python string, then doing:

import codecs
f = codecs.open('out.txt', mode="w", encoding="iso-8859-1")
f.write(all_html.encode("iso-8859-1", "replace"))

There is an encoding error on the last line:

UnicodeDecodeError: ‘ascii’ codec can’t decode byte 0xa0 in position 12286: ordinal not in range(128)

Partial solution:

This Python runs without an error:

row = [unicode(x.strip()) if x is not None else u'' for x in row]
all_html = row[0] + "<br/>" + row[1]
f = open('out.txt', 'w')
f.write(all_html.encode("utf-8"))

But then if I open the actual text file, I see lots of symbols like:

Qur’an 

Maybe I need to write to something other than a text file?


回答 0

通过在首次获取对象时将其解码为unicode对象,并在出路时根据需要对其进行编码,从而尽可能地专门处理unicode对象。

如果您的字符串实际上是unicode对象,则需要先将其转换为unicode编码的字符串对象,然后再将其写入文件:

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

再次读取该文件时,您将获得一个unicode编码的字符串,可以将其解码为unicode对象:

f = file('test', 'r')
print f.read().decode('utf8')

Deal exclusively with unicode objects as much as possible by decoding things to unicode objects when you first get them and encoding them as necessary on the way out.

If your string is actually a unicode object, you’ll need to convert it to a unicode-encoded string object before writing it to a file:

foo = u'Δ, Й, ק, ‎ م, ๗, あ, 叶, 葉, and 말.'
f = open('test', 'w')
f.write(foo.encode('utf8'))
f.close()

When you read that file again, you’ll get a unicode-encoded string that you can decode to a unicode object:

f = file('test', 'r')
print f.read().decode('utf8')

回答 1

在Python 2.6+中,您可以在Python 3上使用io.open()默认设置(内置open()):

import io

with io.open(filename, 'w', encoding=character_encoding) as file:
    file.write(unicode_text)

如果您需要增量编写文本(不需要unicode_text.encode(character_encoding)多次调用),可能会更方便。与codecs模块不同,io模块具有适当的通用换行符支持。

In Python 2.6+, you could use io.open() that is default (builtin open()) on Python 3:

import io

with io.open(filename, 'w', encoding=character_encoding) as file:
    file.write(unicode_text)

It might be more convenient if you need to write the text incrementally (you don’t need to call unicode_text.encode(character_encoding) multiple times). Unlike codecs module, io module has a proper universal newlines support.


回答 2

Unicode字符串处理已在Python 3中标准化。

  1. 字符已经以Unicode(32位)存储在内存中
  2. 您只需要以utf-8打开文件
    (从内存到文件自动执行32位Unicode到可变字节长度的utf-8转换)。

    out1 = "(嘉南大圳 ㄐㄧㄚ ㄋㄢˊ ㄉㄚˋ ㄗㄨㄣˋ )"
    fobj = open("t1.txt", "w", encoding="utf-8")
    fobj.write(out1)
    fobj.close()
    

Unicode string handling is already standardized in Python 3.

  1. char’s are already stored in Unicode (32-bit) in memory
  2. You only need to open file in utf-8
    (32-bit Unicode to variable-byte-length utf-8 conversion is automatically performed from memory to file.)

    out1 = "(嘉南大圳 ㄐㄧㄚ ㄋㄢˊ ㄉㄚˋ ㄗㄨㄣˋ )"
    fobj = open("t1.txt", "w", encoding="utf-8")
    fobj.write(out1)
    fobj.close()
    

回答 3

打开的文件codecs.open是一个接收unicode数据,对其进行编码并将其iso-8859-1写入文件的文件。但是,您尝试写什么不是unicode; 您可以自己unicode进行编码。这就是方法的作用,对unicode字符串进行编码的结果是一个字节字符串(一种类型)。iso-8859-1 unicode.encodestr

您应该使用normal open()并自己对unicode编码,或者(通常是一个更好的主意)使用codecs.open()不是对数据进行编码。

The file opened by codecs.open is a file that takes unicode data, encodes it in iso-8859-1 and writes it to the file. However, what you try to write isn’t unicode; you take unicode and encode it in iso-8859-1 yourself. That’s what the unicode.encode method does, and the result of encoding a unicode string is a bytestring (a str type.)

You should either use normal open() and encode the unicode yourself, or (usually a better idea) use codecs.open() and not encode the data yourself.


回答 4

前言:您的查看器会工作吗?

确保查看器/编辑器/终端(无论与utf-8编码的文件进行交互)都可以读取该文件。这在Windows(例如记事本)上经常是一个问题。

将Unicode文本写入文本文件?

在Python 2中,使用open来自io模块(这与openPython 3中的内置功能相同):

import io

通常,最佳实践UTF-8用于写入文件(我们甚至不必担心utf-8的字节顺序)。

encoding = 'utf-8'

utf-8是最现代且通用的编码-适用于所有Web浏览器,大多数文本编辑器(如果有问题,请参阅设置)和大多数终端/外壳。

在Windows上,utf-16le如果您仅限于在记事本(或其他受限制的查看器)中查看输出,则可以尝试。

encoding = 'utf-16le' # sorry, Windows users... :(

只需使用上下文管理器打开它,然后将您的unicode字符写出来:

with io.open(filename, 'w', encoding=encoding) as f:
    f.write(unicode_object)

使用许多Unicode字符的示例

这是一个示例,尝试将每个可能的字符映射到数字表示形式(整数)中最多三位宽(最大为4,但这会有点远)到编码的可打印输出及其名称,如果可能(将其放入名为的文件中uni.py):

from __future__ import print_function
import io
from unicodedata import name, category
from curses.ascii import controlnames
from collections import Counter

try: # use these if Python 2
    unicode_chr, range = unichr, xrange
except NameError: # Python 3
    unicode_chr = chr

exclude_categories = set(('Co', 'Cn'))
counts = Counter()
control_names = dict(enumerate(controlnames))
with io.open('unidata', 'w', encoding='utf-8') as f:
    for x in range((2**8)**3): 
        try:
            char = unicode_chr(x)
        except ValueError:
            continue # can't map to unicode, try next x
        cat = category(char)
        counts.update((cat,))
        if cat in exclude_categories:
            continue # get rid of noise & greatly shorten result file
        try:
            uname = name(char)
        except ValueError: # probably control character, don't use actual
            uname = control_names.get(x, '')
            f.write(u'{0:>6x} {1}    {2}\n'.format(x, cat, uname))
        else:
            f.write(u'{0:>6x} {1}  {2}  {3}\n'.format(x, cat, char, uname))
# may as well describe the types we logged.
for cat, count in counts.items():
    print('{0} chars of category, {1}'.format(count, cat))

此过程应大约运行一分钟,您可以查看数据文件,如果文件查看器可以显示unicode,则可以看到它。有关类别的信息可在此处找到。根据计数,我们可以通过排除没有关联符号的Cn和Co类别来改善结果。

$ python uni.py

它将显示十六进制映射,category,symbol(除非无法获得名称,因此可能是控制字符)以及该符号的名称。例如

我建议less在Unix或Cygwin上使用(不要将整个文件打印/保存到输出中):

$ less unidata

例如,它将显示类似于以下使用Python 2(unicode 5.2)从中采样的行:

     0 Cc NUL
    20 Zs     SPACE
    21 Po  !  EXCLAMATION MARK
    b6 So    PILCROW SIGN
    d0 Lu  Ð  LATIN CAPITAL LETTER ETH
   e59 Nd    THAI DIGIT NINE
  2887 So    BRAILLE PATTERN DOTS-1238
  bc13 Lo    HANGUL SYLLABLE MIH
  ffeb Sm    HALFWIDTH RIGHTWARDS ARROW

我来自Anaconda的Python 3.5具有unicode 8.0,我认为大多数都是3。

Preface: will your viewer work?

Make sure your viewer/editor/terminal (however you are interacting with your utf-8 encoded file) can read the file. This is frequently an issue on Windows, for example, Notepad.

Writing Unicode text to a text file?

In Python 2, use open from the io module (this is the same as the builtin open in Python 3):

import io

Best practice, in general, use UTF-8 for writing to files (we don’t even have to worry about byte-order with utf-8).

encoding = 'utf-8'

utf-8 is the most modern and universally usable encoding – it works in all web browsers, most text-editors (see your settings if you have issues) and most terminals/shells.

On Windows, you might try utf-16le if you’re limited to viewing output in Notepad (or another limited viewer).

encoding = 'utf-16le' # sorry, Windows users... :(

And just open it with the context manager and write your unicode characters out:

with io.open(filename, 'w', encoding=encoding) as f:
    f.write(unicode_object)

Example using many Unicode characters

Here’s an example that attempts to map every possible character up to three bits wide (4 is the max, but that would be going a bit far) from the digital representation (in integers) to an encoded printable output, along with its name, if possible (put this into a file called uni.py):

from __future__ import print_function
import io
from unicodedata import name, category
from curses.ascii import controlnames
from collections import Counter

try: # use these if Python 2
    unicode_chr, range = unichr, xrange
except NameError: # Python 3
    unicode_chr = chr

exclude_categories = set(('Co', 'Cn'))
counts = Counter()
control_names = dict(enumerate(controlnames))
with io.open('unidata', 'w', encoding='utf-8') as f:
    for x in range((2**8)**3): 
        try:
            char = unicode_chr(x)
        except ValueError:
            continue # can't map to unicode, try next x
        cat = category(char)
        counts.update((cat,))
        if cat in exclude_categories:
            continue # get rid of noise & greatly shorten result file
        try:
            uname = name(char)
        except ValueError: # probably control character, don't use actual
            uname = control_names.get(x, '')
            f.write(u'{0:>6x} {1}    {2}\n'.format(x, cat, uname))
        else:
            f.write(u'{0:>6x} {1}  {2}  {3}\n'.format(x, cat, char, uname))
# may as well describe the types we logged.
for cat, count in counts.items():
    print('{0} chars of category, {1}'.format(count, cat))

This should run in the order of about a minute, and you can view the data file, and if your file viewer can display unicode, you’ll see it. Information about the categories can be found here. Based on the counts, we can probably improve our results by excluding the Cn and Co categories, which have no symbols associated with them.

$ python uni.py

It will display the hexadecimal mapping, category, symbol (unless can’t get the name, so probably a control character), and the name of the symbol. e.g.

I recommend less on Unix or Cygwin (don’t print/cat the entire file to your output):

$ less unidata

e.g. will display similar to the following lines which I sampled from it using Python 2 (unicode 5.2):

     0 Cc NUL
    20 Zs     SPACE
    21 Po  !  EXCLAMATION MARK
    b6 So  ¶  PILCROW SIGN
    d0 Lu  Ð  LATIN CAPITAL LETTER ETH
   e59 Nd  ๙  THAI DIGIT NINE
  2887 So  ⢇  BRAILLE PATTERN DOTS-1238
  bc13 Lo  밓  HANGUL SYLLABLE MIH
  ffeb Sm  →  HALFWIDTH RIGHTWARDS ARROW

My Python 3.5 from Anaconda has unicode 8.0, I would presume most 3’s would.


回答 5

如何将unicode字符打印到文件中:

将此保存到文件:foo.py:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys 
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print(u'e with obfuscation: é')

运行它,并将输出管道传输到文件:

python foo.py > tmp.txt

打开tmp.txt并查看内部,您会看到以下内容:

el@apollo:~$ cat tmp.txt 
e with obfuscation: é

因此,您已将带有混淆标记的unicode e保存到文件中。

How to print unicode characters into a file:

Save this to file: foo.py:

#!/usr/bin/python -tt
# -*- coding: utf-8 -*-
import codecs
import sys 
UTF8Writer = codecs.getwriter('utf8')
sys.stdout = UTF8Writer(sys.stdout)
print(u'e with obfuscation: é')

Run it and pipe output to file:

python foo.py > tmp.txt

Open tmp.txt and look inside, you see this:

el@apollo:~$ cat tmp.txt 
e with obfuscation: é

Thus you have saved unicode e with a obfuscation mark on it to a file.


回答 6

当您尝试对非unicode字符串进行编码时,会出现该错误:假定它使用纯ASCII,它将尝试对其进行解码。有两种可能性:

  1. 您正在将其编码为字节串,但是由于使用过codecs.open,因此write方法需要一个unicode对象。因此,您对其进行编码,然后它将尝试再次对其进行解码。试试:f.write(all_html)代替。
  2. 实际上,all_html不是unicode对象。当您这样做时.encode(...),它首先尝试对其进行解码。

That error arises when you try to encode a non-unicode string: it tries to decode it, assuming it’s in plain ASCII. There are two possibilities:

  1. You’re encoding it to a bytestring, but because you’ve used codecs.open, the write method expects a unicode object. So you encode it, and it tries to decode it again. Try: f.write(all_html) instead.
  2. all_html is not, in fact, a unicode object. When you do .encode(...), it first tries to decode it.

回答 7

如果用python3编写

>>> a = u'bats\u00E0'
>>> print a
batsà
>>> f = open("/tmp/test", "w")
>>> f.write(a)
>>> f.close()
>>> data = open("/tmp/test").read()
>>> data
'batsà'

如果使用python2编写:

>>> a = u'bats\u00E0'
>>> f = open("/tmp/test", "w")
>>> f.write(a)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

为避免此错误,您将必须使用编解码器“ utf-8”将其编码为字节,如下所示:

>>> f.write(a.encode("utf-8"))
>>> f.close()

并在使用编解码器“ utf-8”读取时解码数据:

>>> data = open("/tmp/test").read()
>>> data.decode("utf-8")
u'bats\xe0'

而且,如果您尝试在此字符串上执行打印,它将使用“ utf-8”编解码器自动解码,如下所示

>>> print a
batsà

In case of writing in python3

>>> a = u'bats\u00E0'
>>> print a
batsà
>>> f = open("/tmp/test", "w")
>>> f.write(a)
>>> f.close()
>>> data = open("/tmp/test").read()
>>> data
'batsà'

In case of writing in python2:

>>> a = u'bats\u00E0'
>>> f = open("/tmp/test", "w")
>>> f.write(a)

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xe0' in position 4: ordinal not in range(128)

To avoid this error you would have to encode it to bytes using codecs “utf-8” like this:

>>> f.write(a.encode("utf-8"))
>>> f.close()

and decode the data while reading using the codecs “utf-8”:

>>> data = open("/tmp/test").read()
>>> data.decode("utf-8")
u'bats\xe0'

And also if you try to execute print on this string it will automatically decode using the “utf-8” codecs like this

>>> print a
batsà

在Python中将float转换为整数的最安全方法?

问题:在Python中将float转换为整数的最安全方法?

Python的math模块包含诸如floor&的便捷函数ceil。这些函数采用浮点数,并在其下或上返回最接近的整数。但是,这些函数将答案作为浮点数返回。例如:

import math
f=math.floor(2.3)

现在f返回:

2.0

从该浮点数中获取整数而不冒取舍入错误风险的最安全方法是什么(例如,如果浮点数等于1.99999),或者我应该完全使用另一个函数?

Python’s math module contain handy functions like floor & ceil. These functions take a floating point number and return the nearest integer below or above it. However these functions return the answer as a floating point number. For example:

import math
f=math.floor(2.3)

Now f returns:

2.0

What is the safest way to get an integer out of this float, without running the risk of rounding errors (for example if the float is the equivalent of 1.99999) or perhaps I should use another function altogether?


回答 0

可以用浮点数表示的所有整数均具有精确的表示形式。这样您就可以安全地使用int结果了。仅当您尝试使用非2的幂的分母来表示有理数时,才会出现不精确的表示。

这项工作一点都不小!IEEE浮点表示的一个属性是int∘floor=⌊⋅⌋,如果所讨论的数字的大小足够小,但是int(floor(2.3))可能为1的情况下,可能会有不同的表示形式。

要引用维基百科

绝对值小于或等于2 24的任何整数都可以用单精度格式准确表示,绝对值小于或等于2 53的任何整数都可以用双精度格式准确表示。

All integers that can be represented by floating point numbers have an exact representation. So you can safely use int on the result. Inexact representations occur only if you are trying to represent a rational number with a denominator that is not a power of two.

That this works is not trivial at all! It’s a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.

To quote from Wikipedia,

Any integer with absolute value less than or equal to 224 can be exactly represented in the single precision format, and any integer with absolute value less than or equal to 253 can be exactly represented in the double precision format.


回答 1

使用int(your non integer number)将打钉。

print int(2.3) # "2"
print int(math.sqrt(5)) # "2"

Use int(your non integer number) will nail it.

print int(2.3) # "2"
print int(math.sqrt(5)) # "2"

回答 2

您可以使用舍入功能。如果您不使用第二个参数(有效数字位数),那么我认为您将获得想要的行为。

空闲输出。

>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2

You could use the round function. If you use no second parameter (# of significant digits) then I think you will get the behavior you want.

IDLE output.

>>> round(2.99999999999)
3
>>> round(2.6)
3
>>> round(2.5)
3
>>> round(2.4)
2

回答 3

结合之前的两个结果,我们得到:

int(round(some_float))

这可以相当可靠地将浮点数转换为整数。

Combining two of the previous results, we have:

int(round(some_float))

This converts a float to an integer fairly dependably.


回答 4

这项工作一点都不小!IEEE浮点表示的一个属性是int∘floor=⌊⋅⌋,如果所讨论的数字的大小足够小,但是int(floor(2.3))可能为1的情况下,可能会有不同的表示形式。

这篇文章解释了为什么它可以在这个范围内工作

在double中,您可以毫无问题地表示32位整数。有不能是任何四舍五入问题。更精确地,双精度数可以表示2 53-2 53之间(包括2 53-2 53)的所有整数。

简短说明:一个double最多可以存储53个二进制数字。当您需要更多时,该数字将在右边填充零。

由此可见,53个数字是无需填充即可存储的最大数字。自然,所有需要较少数字的(整数)数字都可以准确存储。

111加1(省略)111(53个)将产生100 … 000,(53个零)。众所周知,我们可以存储53位数字,即最右边的零填充。

这是2 53的来源。


详细信息:我们需要考虑IEEE-754浮点如何工作。

  1 bit    11 / 8     52 / 23      # bits double/single precision
[ sign |  exponent | mantissa ]

然后,该数字的计算方式如下(不包括此处无关的特殊情况):

-1 ×1.尾数×2 指数-偏差

其中偏压= 2 指数- 1 1 –分别,即,1023和127,用于双/单精度。

明知乘以2 X根本改变所有位X位的左侧,可以很容易地看到,任何整数必须具备的所有位尾数为此右上小数点零。

除零以外的任何整数都具有以下二进制形式:

1x … x,其中x -es表示MSB右侧的位(最高有效位)。

因为我们排除了零,所以总会有一个MSB为1,这就是为什么不存储它的原因。要存储整数,我们必须将其转换为上述形式:-1 符号 ×1.尾数×2 指数偏差

就是说,将这些位移到小数点后直到只有MSB朝MSB的左侧移动。然后,小数点右边的所有位都存储在尾数中。

由此可见,除MSB外,我们最多可以存储52个二进制数字。

因此,显式存储所有位的最高编号为

111(omitted)111.   that's 53 ones (52 + implicit 1) in the case of doubles.

为此,我们需要设置指数,以使小数点后移52位。如果我们将指数增加一,我们将无法知道小数点后左边的数字。

111(omitted)111x.

按照惯例,它是0。将整个尾数设置为零,我们收到以下数字:

100(omitted)00x. = 100(omitted)000.

这是一个1,后跟53个零,已存储52个,并且由于指数而加了1。

它代表2 53,它标志着我们可以准确表示所有整数的边界(负向和正向)。如果要将1加到2 53,则必须将隐式零(由表示x)设置为1,但这是不可能的。

That this works is not trivial at all! It’s a property of the IEEE floating point representation that int∘floor = ⌊⋅⌋ if the magnitude of the numbers in question is small enough, but different representations are possible where int(floor(2.3)) might be 1.

This post explains why it works in that range.

In a double, you can represent 32bit integers without any problems. There cannot be any rounding issues. More precisely, doubles can represent all integers between and including 253 and -253.

Short explanation: A double can store up to 53 binary digits. When you require more, the number is padded with zeroes on the right.

It follows that 53 ones is the largest number that can be stored without padding. Naturally, all (integer) numbers requiring less digits can be stored accurately.

Adding one to 111(omitted)111 (53 ones) yields 100…000, (53 zeroes). As we know, we can store 53 digits, that makes the rightmost zero padding.

This is where 253 comes from.


More detail: We need to consider how IEEE-754 floating point works.

  1 bit    11 / 8     52 / 23      # bits double/single precision
[ sign |  exponent | mantissa ]

The number is then calculated as follows (excluding special cases that are irrelevant here):

-1sign × 1.mantissa ×2exponent – bias

where bias = 2exponent – 1 – 1, i.e. 1023 and 127 for double/single precision respectively.

Knowing that multiplying by 2X simply shifts all bits X places to the left, it’s easy to see that any integer must have all bits in the mantissa that end up right of the decimal point to zero.

Any integer except zero has the following form in binary:

1x…x where the x-es represent the bits to the right of the MSB (most significant bit).

Because we excluded zero, there will always be a MSB that is one—which is why it’s not stored. To store the integer, we must bring it into the aforementioned form: -1sign × 1.mantissa ×2exponent – bias.

That’s saying the same as shifting the bits over the decimal point until there’s only the MSB towards the left of the MSB. All the bits right of the decimal point are then stored in the mantissa.

From this, we can see that we can store at most 52 binary digits apart from the MSB.

It follows that the highest number where all bits are explicitly stored is

111(omitted)111.   that's 53 ones (52 + implicit 1) in the case of doubles.

For this, we need to set the exponent, such that the decimal point will be shifted 52 places. If we were to increase the exponent by one, we cannot know the digit right to the left after the decimal point.

111(omitted)111x.

By convention, it’s 0. Setting the entire mantissa to zero, we receive the following number:

100(omitted)00x. = 100(omitted)000.

That’s a 1 followed by 53 zeroes, 52 stored and 1 added due to the exponent.

It represents 253, which marks the boundary (both negative and positive) between which we can accurately represent all integers. If we wanted to add one to 253, we would have to set the implicit zero (denoted by the x) to one, but that’s impossible.


回答 5

math.floor将始终返回整数,因此int(math.floor(some_float))永远不会引入舍入错误。

但是,舍入错误可能已经引入了math.floor(some_large_float),或者甚至当首先将大量存储在float中时也已引入。(存储在浮点数中的大数字可能会失去精度。)

math.floor will always return an integer number and thus int(math.floor(some_float)) will never introduce rounding errors.

The rounding error might already be introduced in math.floor(some_large_float), though, or even when storing a large number in a float in the first place. (Large numbers may lose precision when stored in floats.)


回答 6

如果需要将字符串float转换为int,则可以使用此方法。

例如:'38.0'38

为了将其转换为int,可以将其转换为float,然后转换为int。这也适用于浮点字符串或整数字符串。

>>> int(float('38.0'))
38
>>> int(float('38'))
38

注意:这将删除小数点后的所有数字。

>>> int(float('38.2'))
38

If you need to convert a string float to an int you can use this method.

Example: '38.0' to 38

In order to convert this to an int you can cast it as a float then an int. This will also work for float strings or integer strings.

>>> int(float('38.0'))
38
>>> int(float('38'))
38

Note: This will strip any numbers after the decimal.

>>> int(float('38.2'))
38

回答 7

另一个代码示例使用变量将实数/浮点数转换为整数。“ vel”是一个实数/浮点数,并转换为第二高的整数“ newvel”。

import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
 for row in cursor:
    curvel = float(row[1])
    newvel = int(math.ceil(curvel))

Another code sample to convert a real/float to an integer using variables. “vel” is a real/float number and converted to the next highest INTEGER, “newvel”.

import arcpy.math, os, sys, arcpy.da
.
.
with arcpy.da.SearchCursor(densifybkp,[floseg,vel,Length]) as cursor:
 for row in cursor:
    curvel = float(row[1])
    newvel = int(math.ceil(curvel))

回答 8

由于您要求的是“最安全”的方式,因此我将提供除最佳答案之外的另一个答案。

确保您不损失任何精度的一种简单方法是检查转换后的值是否相等。

if int(some_value) == some_value:
     some_value = int(some_value)

例如,如果float为1.0,则1.0等于1。因此将执行向int的转换。如果float为1.1,则int(1.1)等于1,并且1.1!=1。因此,该值将保持为float值,并且不会损失任何精度。

Since you’re asking for the ‘safest’ way, I’ll provide another answer other than the top answer.

An easy way to make sure you don’t lose any precision is to check if the values would be equal after you convert them.

if int(some_value) == some_value:
     some_value = int(some_value)

If the float is 1.0 for example, 1.0 is equal to 1. So the conversion to int will execute. And if the float is 1.1, int(1.1) equates to 1, and 1.1 != 1. So the value will remain a float and you won’t lose any precision.


回答 9

df [‘Column_Name’] = df [‘Column_Name’]。astype(int)

df[‘Column_Name’]=df[‘Column_Name’].astype(int)


如何从生成器中仅选择一项(在python中)?

问题:如何从生成器中仅选择一项(在python中)?

我有一个类似下面的生成器函数:

def myfunct():
  ...
  yield result

调用此函数的常用方法是:

for r in myfunct():
  dostuff(r)

我的问题是,有什么方法可以随时从生成器中获取一个元素吗?例如,我想做类似的事情:

while True:
  ...
  if something:
      my_element = pick_just_one_element(myfunct())
      dostuff(my_element)
  ...

I have a generator function like the following:

def myfunct():
  ...
  yield result

The usual way to call this function would be:

for r in myfunct():
  dostuff(r)

My question, is there a way to get just one element from the generator whenever I like? For example, I’d like to do something like:

while True:
  ...
  if something:
      my_element = pick_just_one_element(myfunct())
      dostuff(my_element)
  ...

回答 0

使用创建一个生成器

g = myfunct()

每当您想要一个项目时,请使用

next(g)

(或g.next()在Python 2.5或更低版本中)。

如果生成器退出,它将升高StopIteration。您可以根据需要捕获此异常,也可以将default参数用于next()

next(g, default_value)

Create a generator using

g = myfunct()

Everytime you would like an item, use

next(g)

(or g.next() in Python 2.5 or below).

If the generator exits, it will raise StopIteration. You can either catch this exception if necessary, or use the default argument to next():

next(g, default_value)

回答 1

要仅选择生成器的一个元素,请breakfor语句中使用,或list(itertools.islice(gen, 1))

根据您的示例(从字面上看),您可以执行以下操作:

while True:
  ...
  if something:
      for my_element in myfunct():
          dostuff(my_element)
          break
      else:
          do_generator_empty()

如果您想“ 每当我喜欢的时候就从 [生成的] 生成器中仅获取一个元素 ”(我想是最初意图的50%,也是最常见的意图),那么:

gen = myfunct()
while True:
  ...
  if something:
      for my_element in gen:
          dostuff(my_element)
          break
      else:
          do_generator_empty()

这样generator.next()可以避免显式使用,并且输入结束处理不需要(神秘的)StopIteration异常处理或额外的默认值比较。

else:for,如果你想要做一些特别的结束产生的case语句段时,才需要。

注意上next()/ .next()

在Python3中,该.next()方法被重命名.__next__()为有充分的理由:它被认为是低级的(PEP 3114)。在Python 2.6之前,内置函数next()不存在。甚至讨论过迁移next()到该operator模块(这本来是明智的做法),因为它很少需要,并且内置名称的可疑膨胀。

next()没有默认值的情况下使用仍然是非常低级的实践- StopIteration在普通的应用程序代码中公开地将神秘的东西扔掉。而且使用next()默认的哨兵-最好是next()直接输入的唯一选择builtins-受限制,并且通常会给出奇怪的非Python逻辑/可读性的原因。

底线:很少使用next()-就像使用operator模块的功能一样。使用for x in iteratorislicelist(iterator)等功能接受一个迭代器无缝地使用是在应用层上的迭代器的自然方式-而且相当总是可能的。next()是低级的,一个额外的概念,很明显-正如该线程的问题所示。虽然例如,使用breakfor是常规的。

For picking just one element of a generator use break in a for statement, or list(itertools.islice(gen, 1))

According to your example (literally) you can do something like:

while True:
  ...
  if something:
      for my_element in myfunct():
          dostuff(my_element)
          break
      else:
          do_generator_empty()

If you want “get just one element from the [once generated] generator whenever I like” (I suppose 50% thats the original intention, and the most common intention) then:

gen = myfunct()
while True:
  ...
  if something:
      for my_element in gen:
          dostuff(my_element)
          break
      else:
          do_generator_empty()

This way explicit use of generator.next() can be avoided, and end-of-input handling doesn’t require (cryptic) StopIteration exception handling or extra default value comparisons.

The else: of for statement section is only needed if you want do something special in case of end-of-generator.

Note on next() / .next():

In Python3 the .next() method was renamed to .__next__() for good reason: its considered low-level (PEP 3114). Before Python 2.6 the builtin function next() did not exist. And it was even discussed to move next() to the operator module (which would have been wise), because of its rare need and questionable inflation of builtin names.

Using next() without default is still very low-level practice – throwing the cryptic StopIteration like a bolt out of the blue in normal application code openly. And using next() with default sentinel – which best should be the only option for a next() directly in builtins – is limited and often gives reason to odd non-pythonic logic/readablity.

Bottom line: Using next() should be very rare – like using functions of operator module. Using for x in iterator , islice, list(iterator) and other functions accepting an iterator seamlessly is the natural way of using iterators on application level – and quite always possible. next() is low-level, an extra concept, unobvious – as the question of this thread shows. While e.g. using break in for is conventional.


回答 2

我不认为有一种便捷的方法可以从生成器中检索任意值。生成器将提供next()方法来遍历自身,但是不会立即生成完整序列以节省内存。那就是生成器和列表之间的功能差异。

I don’t believe there’s a convenient way to retrieve an arbitrary value from a generator. The generator will provide a next() method to traverse itself, but the full sequence is not produced immediately to save memory. That’s the functional difference between a generator and a list.


回答 3

对于那些浏览这些答案的人来说,它们是Python3的完整工作示例…在这里,您可以继续:

def numgen():
    x = 1000
    while True:
        x += 1
        yield x

nums = numgen() # because it must be the _same_ generator

for n in range(3):
    numnext = next(nums)
    print(numnext)

输出:

1001
1002
1003

For those of you scanning through these answers for a complete working example for Python3… well here ya go:

def numgen():
    x = 1000
    while True:
        x += 1
        yield x

nums = numgen() # because it must be the _same_ generator

for n in range(3):
    numnext = next(nums)
    print(numnext)

This outputs:

1001
1002
1003

回答 4

Generator是产生迭代器的函数。因此,一旦有了迭代器实例,就可以使用next()从迭代器中获取下一项。例如,使用next()函数来获取第一个项目,然后for in用于处理剩余的项目:

# create new instance of iterator by calling a generator function
items = generator_function()

# fetch and print first item
first = next(items)
print('first item:', first)

# process remaining items:
for item in items:
    print('next item:', item)

Generator is a function that produces an iterator. Therefore, once you have iterator instance, use next() to fetch the next item from the iterator. As an example, use next() function to fetch the first item, and later use for in to process remaining items:

# create new instance of iterator by calling a generator function
items = generator_function()

# fetch and print first item
first = next(items)
print('first item:', first)

# process remaining items:
for item in items:
    print('next item:', item)

回答 5

generator = myfunct()
while True:
   my_element = generator.next()

确保捕获采用最后一个元素后引发的异常

generator = myfunct()
while True:
   my_element = generator.next()

make sure to catch the exception thrown after the last element is taken


回答 6

我相信唯一的方法是从迭代器中获取一个列表,然后从该列表中获取所需的元素。

l = list(myfunct())
l[4]

I believe the only way is to get a list from the iterator then get the element you want from that list.

l = list(myfunct())
l[4]

什么是“ 1 ..__ truediv__”?Python是否具有..(“点点”)表示法语法?

问题:什么是“ 1 ..__ truediv__”?Python是否具有..(“点点”)表示法语法?

最近,我遇到了一种语法,这种语法在我学习python时从未见过,在大多数教程中,这种..表示法看起来像这样:

f = 1..__truediv__ # or 1..__div__ for python 2

print(f(8)) # prints 0.125 

我发现它和(当然,它更长)完全一样:

f = lambda x: (1).__truediv__(x)
print(f(8)) # prints 0.125 or 1//8

但是我的问题是:

  • 它怎么做呢?
  • 这两个点实际上意味着什么?
  • 如何在更复杂的语句中使用它(如果可能)?

将来可能会为我节省很多代码行… :)

I recently came across a syntax I never seen before when I learned python nor in most tutorials, the .. notation, it looks something like this:

f = 1..__truediv__ # or 1..__div__ for python 2

print(f(8)) # prints 0.125 

I figured it was exactly the same as (except it’s longer, of course):

f = lambda x: (1).__truediv__(x)
print(f(8)) # prints 0.125 or 1//8

But my questions are:

  • How can it do that?
  • What does it actually mean with the two dots?
  • How can you use it in a more complex statement (if possible)?

This will probably save me many lines of code in the future…:)


回答 0

您所拥有的是一个float不带尾随零的文字,然后您可以访问的__truediv__方法。它本身不是运算符;第一个点是float值的一部分,第二个点是用于访问对象属性和方法的点运算符。

您可以通过执行以下操作达到相同的目的。

>>> f = 1.
>>> f
1.0
>>> f.__floordiv__
<method-wrapper '__floordiv__' of float object at 0x7f9fb4dc1a20>

另一个例子

>>> 1..__add__(2.)
3.0

在这里,我们将1.0加到2.0,显然得出3.0。

What you have is a float literal without the trailing zero, which you then access the __truediv__ method of. It’s not an operator in itself; the first dot is part of the float value, and the second is the dot operator to access the objects properties and methods.

You can reach the same point by doing the following.

>>> f = 1.
>>> f
1.0
>>> f.__floordiv__
<method-wrapper '__floordiv__' of float object at 0x7f9fb4dc1a20>

Another example

>>> 1..__add__(2.)
3.0

Here we add 1.0 to 2.0, which obviously yields 3.0.


回答 1

该问题已经得到足够的答案(即@Paul Rooney的答案),但也可以验证这些答案的正确性。

让我回顾一下现有的答案:这..不是一个语法元素!

您可以检查源代码如何“标记化”。这些标记表示代码的解释方式:

>>> from tokenize import tokenize
>>> from io import BytesIO

>>> s = "1..__truediv__"
>>> list(tokenize(BytesIO(s.encode('utf-8')).readline))
[...
 TokenInfo(type=2 (NUMBER), string='1.', start=(1, 0), end=(1, 2), line='1..__truediv__'),
 TokenInfo(type=53 (OP), string='.', start=(1, 2), end=(1, 3), line='1..__truediv__'),
 TokenInfo(type=1 (NAME), string='__truediv__', start=(1, 3), end=(1, 14), line='1..__truediv__'),
 ...]

因此,字符串1.被解释为数字,第二个.是OP(运算符,在这种情况下为“ get attribute”运算符),而则__truediv__是方法名称。因此,这只是访问__truediv__float 的方法1.0

查看生成的字节码的另一种方法是对其进行汇编。这实际上显示了执行某些代码时执行的指令: dis

>>> import dis

>>> def f():
...     return 1..__truediv__

>>> dis.dis(f)
  4           0 LOAD_CONST               1 (1.0)
              3 LOAD_ATTR                0 (__truediv__)
              6 RETURN_VALUE

基本上说的一样。它加载__truediv__常量的属性1.0


关于你的问题

以及如何在更复杂的语句中使用它(如果可能)?

即使您可能永远也不要这样写代码,只是因为不清楚代码在做什么。因此,请不要在更复杂的语句中使用它。我什至会走得更远,以至于您不应该在如此“简单”的语句中使用它,至少您应该使用括号将指令分开:

f = (1.).__truediv__

这肯定会更具可读性-但类似于:

from functools import partial
from operator import truediv
f = partial(truediv, 1.0)

会更好!

使用的方法partial还保留了python的数据模型(该1..__truediv__方法没有!),可以通过以下小片段进行演示:

>>> f1 = 1..__truediv__
>>> f2 = partial(truediv, 1.)

>>> f2(1+2j)  # reciprocal of complex number - works
(0.2-0.4j)
>>> f2('a')   # reciprocal of string should raise an exception
TypeError: unsupported operand type(s) for /: 'float' and 'str'

>>> f1(1+2j)  # reciprocal of complex number - works but gives an unexpected result
NotImplemented
>>> f1('a')   # reciprocal of string should raise an exception but it doesn't
NotImplemented

这是因为1. / (1+2j)不是由- float.__truediv__而是通过complex.__rtruediv__operator.truediv进行评估的,请确保在正常操作返回时调用了反向操作,NotImplemented__truediv__直接操作时没有这些后备。这种“预期行为”的丧失是您(通常)不应直接使用魔术方法的主要原因。

The question is already sufficiently answered (i.e. @Paul Rooneys answer) but it’s also possible to verify the correctness of these answers.

Let me recap the existing answers: The .. is not a single syntax element!

You can check how the source code is “tokenized”. These tokens represent how the code is interpreted:

>>> from tokenize import tokenize
>>> from io import BytesIO

>>> s = "1..__truediv__"
>>> list(tokenize(BytesIO(s.encode('utf-8')).readline))
[...
 TokenInfo(type=2 (NUMBER), string='1.', start=(1, 0), end=(1, 2), line='1..__truediv__'),
 TokenInfo(type=53 (OP), string='.', start=(1, 2), end=(1, 3), line='1..__truediv__'),
 TokenInfo(type=1 (NAME), string='__truediv__', start=(1, 3), end=(1, 14), line='1..__truediv__'),
 ...]

So the string 1. is interpreted as number, the second . is an OP (an operator, in this case the “get attribute” operator) and the __truediv__ is the method name. So this is just accessing the __truediv__ method of the float 1.0.

Another way of viewing the generated bytecode is to disassemble it. This actually shows the instructions that are performed when some code is executed:

>>> import dis

>>> def f():
...     return 1..__truediv__

>>> dis.dis(f)
  4           0 LOAD_CONST               1 (1.0)
              3 LOAD_ATTR                0 (__truediv__)
              6 RETURN_VALUE

Which basically says the same. It loads the attribute __truediv__ of the constant 1.0.


Regarding your question

And how can you use it in a more complex statement (if possible)?

Even though it’s possible you should never write code like that, simply because it’s unclear what the code is doing. So please don’t use it in more complex statements. I would even go so far that you shouldn’t use it in so “simple” statements, at least you should use parenthesis to separate the instructions:

f = (1.).__truediv__

this would be definetly more readable – but something along the lines of:

from functools import partial
from operator import truediv
f = partial(truediv, 1.0)

would be even better!

The approach using partial also preserves python’s data model (the 1..__truediv__ approach does not!) which can be demonstrated by this little snippet:

>>> f1 = 1..__truediv__
>>> f2 = partial(truediv, 1.)

>>> f2(1+2j)  # reciprocal of complex number - works
(0.2-0.4j)
>>> f2('a')   # reciprocal of string should raise an exception
TypeError: unsupported operand type(s) for /: 'float' and 'str'

>>> f1(1+2j)  # reciprocal of complex number - works but gives an unexpected result
NotImplemented
>>> f1('a')   # reciprocal of string should raise an exception but it doesn't
NotImplemented

This is because 1. / (1+2j) is not evaluated by float.__truediv__ but with complex.__rtruediv__operator.truediv makes sure the reverse operation is called when the normal operation returns NotImplemented but you don’t have these fallbacks when you operate on __truediv__ directly. This loss of “expected behaviour” is the main reason why you (normally) shouldn’t use magic methods directly.


回答 2

首先,两个点可能有点尴尬:

f = 1..__truediv__ # or 1..__div__ for python 2

但这与写作相同:

f = 1.0.__truediv__ # or 1.0.__div__ for python 2

因为float文字可以用三种形式编写:

normal_float = 1.0
short_float = 1.  # == 1.0
prefixed_float = .1  # == 0.1

Two dots together may be a little awkward at first:

f = 1..__truediv__ # or 1..__div__ for python 2

But it is the same as writing:

f = 1.0.__truediv__ # or 1.0.__div__ for python 2

Because float literals can be written in three forms:

normal_float = 1.0
short_float = 1.  # == 1.0
prefixed_float = .1  # == 0.1

回答 3

什么f = 1..__truediv__

f是在值为1的float上绑定的特殊方法。特别,

1.0 / x

在Python 3中,调用:

(1.0).__truediv__(x)

证据:

class Float(float):
    def __truediv__(self, other):
        print('__truediv__ called')
        return super(Float, self).__truediv__(other)

和:

>>> one = Float(1)
>>> one/2
__truediv__ called
0.5

如果这样做:

f = one.__truediv__

我们保留绑定到该绑定方法的名称

>>> f(2)
__truediv__ called
0.5
>>> f(3)
__truediv__ called
0.3333333333333333

如果我们在一个紧密的循环中执行该点分查找,则可以节省一些时间。

解析抽象语法树(AST)

我们可以看到,解析表达式的AST可以告诉我们,我们__truediv__在浮点数上获取属性1.0

>>> import ast
>>> ast.dump(ast.parse('1..__truediv__').body[0])
"Expr(value=Attribute(value=Num(n=1.0), attr='__truediv__', ctx=Load()))"

您可以从以下获得相同的结果函数:

f = float(1).__truediv__

要么

f = (1.0).__truediv__

扣除

我们也可以通过扣除到达那里。

让我们建立它。

1本身是一个int

>>> 1
1
>>> type(1)
<type 'int'>

1,之后是句点:

>>> 1.
1.0
>>> type(1.)
<type 'float'>

下一个点本身就是SyntaxError,但它会在float实例上开始点分查找:

>>> 1..__truediv__
<method-wrapper '__truediv__' of float object at 0x0D1C7BF0>

没有人提到这一点 -这现在是浮动的“绑定方法”1.0

>>> f = 1..__truediv__
>>> f
<method-wrapper '__truediv__' of float object at 0x127F3CD8>
>>> f(2)
0.5
>>> f(3)
0.33333333333333331

我们可以更容易地完成相同的功能:

>>> def divide_one_by(x):
...     return 1.0/x
...     
>>> divide_one_by(2)
0.5
>>> divide_one_by(3)
0.33333333333333331

性能

divide_one_by函数的缺点是它需要另一个Python堆栈框架,这使其比绑定方法要慢一些:

>>> def f_1():
...     for x in range(1, 11):
...         f(x)
...         
>>> def f_2():
...     for x in range(1, 11):
...         divide_one_by(x)
...         
>>> timeit.repeat(f_1)
[2.5495760687176485, 2.5585621018805469, 2.5411816588331888]
>>> timeit.repeat(f_2)
[3.479687248616699, 3.46196088706062, 3.473726342237768]

当然,如果您仅可以使用普通文字,那就更快了:

>>> def f_3():
...     for x in range(1, 11):
...         1.0/x
...         
>>> timeit.repeat(f_3)
[2.1224895628296281, 2.1219930218637728, 2.1280188256941983]

What is f = 1..__truediv__?

f is a bound special method on a float with a value of one. Specifically,

1.0 / x

in Python 3, invokes:

(1.0).__truediv__(x)

Evidence:

class Float(float):
    def __truediv__(self, other):
        print('__truediv__ called')
        return super(Float, self).__truediv__(other)

and:

>>> one = Float(1)
>>> one/2
__truediv__ called
0.5

If we do:

f = one.__truediv__

We retain a name bound to that bound method

>>> f(2)
__truediv__ called
0.5
>>> f(3)
__truediv__ called
0.3333333333333333

If we were doing that dotted lookup in a tight loop, this could save a little time.

Parsing the Abstract Syntax Tree (AST)

We can see that parsing the AST for the expression tells us that we are getting the __truediv__ attribute on the floating point number, 1.0:

>>> import ast
>>> ast.dump(ast.parse('1..__truediv__').body[0])
"Expr(value=Attribute(value=Num(n=1.0), attr='__truediv__', ctx=Load()))"

You could get the same resulting function from:

f = float(1).__truediv__

Or

f = (1.0).__truediv__

Deduction

We can also get there by deduction.

Let’s build it up.

1 by itself is an int:

>>> 1
1
>>> type(1)
<type 'int'>

1 with a period after it is a float:

>>> 1.
1.0
>>> type(1.)
<type 'float'>

The next dot by itself would be a SyntaxError, but it begins a dotted lookup on the instance of the float:

>>> 1..__truediv__
<method-wrapper '__truediv__' of float object at 0x0D1C7BF0>

No one else has mentioned this – This is now a “bound method” on the float, 1.0:

>>> f = 1..__truediv__
>>> f
<method-wrapper '__truediv__' of float object at 0x127F3CD8>
>>> f(2)
0.5
>>> f(3)
0.33333333333333331

We could accomplish the same function much more readably:

>>> def divide_one_by(x):
...     return 1.0/x
...     
>>> divide_one_by(2)
0.5
>>> divide_one_by(3)
0.33333333333333331

Performance

The downside of the divide_one_by function is that it requires another Python stack frame, making it somewhat slower than the bound method:

>>> def f_1():
...     for x in range(1, 11):
...         f(x)
...         
>>> def f_2():
...     for x in range(1, 11):
...         divide_one_by(x)
...         
>>> timeit.repeat(f_1)
[2.5495760687176485, 2.5585621018805469, 2.5411816588331888]
>>> timeit.repeat(f_2)
[3.479687248616699, 3.46196088706062, 3.473726342237768]

Of course, if you can just use plain literals, that’s even faster:

>>> def f_3():
...     for x in range(1, 11):
...         1.0/x
...         
>>> timeit.repeat(f_3)
[2.1224895628296281, 2.1219930218637728, 2.1280188256941983]

如何在python中打印百分比值?

问题:如何在python中打印百分比值?

这是我的代码:

print str(float(1/3))+'%'

它显示:

0.0%

但我想得到 33%

我能做什么?

this is my code:

print str(float(1/3))+'%'

and it shows:

0.0%

but I want to get 33%

What can I do?


回答 0

format支持百分比浮点精度类型

>>> print "{0:.0%}".format(1./3)
33%

如果您不希望整数除法,则可以从导入Python3的除法__future__

>>> from __future__ import division
>>> 1 / 3
0.3333333333333333

# The above 33% example would could now be written without the explicit
# float conversion:
>>> print "{0:.0f}%".format(1/3 * 100)
33%

# Or even shorter using the format mini language:
>>> print "{:.0%}".format(1/3)
33%

format supports a percentage floating point precision type:

>>> print "{0:.0%}".format(1./3)
33%

If you don’t want integer division, you can import Python3’s division from __future__:

>>> from __future__ import division
>>> 1 / 3
0.3333333333333333

# The above 33% example would could now be written without the explicit
# float conversion:
>>> print "{0:.0f}%".format(1/3 * 100)
33%

# Or even shorter using the format mini language:
>>> print "{:.0%}".format(1/3)
33%

回答 1

格式化方法有一种更方便的“百分比”格式化选项.format()

>>> '{:.1%}'.format(1/3.0)
'33.3%'

There is a way more convenient ‘percent’-formatting option for the .format() format method:

>>> '{:.1%}'.format(1/3.0)
'33.3%'

回答 2

只是为了完整起见,因为我注意到没有人建议这种简单的方法:

>>> print("%.0f%%" % (100 * 1.0/3))
33%

细节:

  • %.0f代表“ 打印带有0个小数位的浮点数 ”,因此%.2f将打印33.33
  • %%打印文字%。比原来的要干净一点+'%'
  • 1.0而不是1强迫分部浮动,所以不再0.0

Just for the sake of completeness, since I noticed no one suggested this simple approach:

>>> print("%.0f%%" % (100 * 1.0/3))
33%

Details:

  • %.0f stands for “print a float with 0 decimal places“, so %.2f would print 33.33
  • %% prints a literal %. A bit cleaner than your original +'%'
  • 1.0 instead of 1 takes care of coercing the division to float, so no more 0.0

回答 3

您将整数相除,然后转换为浮点数。除以浮点数代替。

另外,请使用此处描述的很棒的字符串格式化方法:http : //docs.python.org/library/string.html#format-specification-mini-language

指定转换百分比和精度。

>>> float(1) / float(3)
[Out] 0.33333333333333331

>>> 1.0/3.0
[Out] 0.33333333333333331

>>> '{0:.0%}'.format(1.0/3.0) # use string formatting to specify precision
[Out] '33%'

>>> '{percent:.2%}'.format(percent=1.0/3.0)
[Out] '33.33%'

一个伟大的宝石!

You are dividing integers then converting to float. Divide by floats instead.

As a bonus, use the awesome string formatting methods described here: http://docs.python.org/library/string.html#format-specification-mini-language

To specify a percent conversion and precision.

>>> float(1) / float(3)
[Out] 0.33333333333333331

>>> 1.0/3.0
[Out] 0.33333333333333331

>>> '{0:.0%}'.format(1.0/3.0) # use string formatting to specify precision
[Out] '33%'

>>> '{percent:.2%}'.format(percent=1.0/3.0)
[Out] '33.33%'

A great gem!


回答 4

只是添加Python 3 f字符串解决方案

prob = 1.0/3.0
print(f"{prob:.0%}")

Just to add Python 3 f-string solution

prob = 1.0/3.0
print(f"{prob:.0%}")

回答 5

然后,您想这样做:

print str(int(1.0/3.0*100))+'%'

.0表示他们的花车和int()事后再发他们的整数。

Then you’d want to do this instead:

print str(int(1.0/3.0*100))+'%'

The .0 denotes them as floats and int() rounds them to integers afterwards again.