标签归档:sys

sys.stdout.flush()方法的用法

问题:sys.stdout.flush()方法的用法

怎么sys.stdout.flush()办?

What does sys.stdout.flush() do?


回答 0

Python的标准输出被缓冲(这意味着它在将标准写入之前将其收集的一些数据“写入”到标准输出中)。调用会sys.stdout.flush()强制其“刷新”缓冲区,这意味着它将把缓冲区中的所有内容都写到终端,即使通常情况下它会等待这样做。

以下是有关(非)缓冲I / O及其有用之处的一些良好信息:
http : //en.wikipedia.org/wiki/Data_buffer
缓冲与无缓冲IO

Python’s standard out is buffered (meaning that it collects some of the data “written” to standard out before it writes it to the terminal). Calling sys.stdout.flush() forces it to “flush” the buffer, meaning that it will write everything in the buffer to the terminal, even if normally it would wait before doing so.

Here’s some good information about (un)buffered I/O and why it’s useful:
http://en.wikipedia.org/wiki/Data_buffer
Buffered vs unbuffered IO


回答 1

考虑以下简单的Python脚本:

import time
import sys

for i in range(5):
    print(i),
    #sys.stdout.flush()
    time.sleep(1)

这是为了打印每秒五秒钟一个号码,你要是跑不过它,因为它是现在(取决于默认的系统缓存),你可能看不到任何输出,直到脚本完成,然后一下子你会看到0 1 2 3 4印到屏幕。

这是因为输出正在缓冲中,除非sys.stdout每次刷新后print您都不会立即看到输出。从sys.stdout.flush()行中删除注释以查看区别。

Consider the following simple Python script:

import time
import sys

for i in range(5):
    print(i),
    #sys.stdout.flush()
    time.sleep(1)

This is designed to print one number every second for five seconds, but if you run it as it is now (depending on your default system buffering) you may not see any output until the script completes, and then all at once you will see 0 1 2 3 4 printed to the screen.

This is because the output is being buffered, and unless you flush sys.stdout after each print you won’t see the output immediately. Remove the comment from the sys.stdout.flush() line to see the difference.


回答 2

根据我的理解,无论何时执行打印语句,输出都会写入缓冲区。当刷新缓冲区(清除)时,我们将在屏幕上看到输出。默认情况下,程序退出时将刷新缓冲区。但是我们也可以通过在程序中使用“ sys.stdout.flush()”语句来手动刷新缓冲区。在下面的代码中,当i的值达到5时,将刷新代码缓冲区。

您可以通过执行以下代码来理解。

chiru@online:~$ cat flush.py
import time
import sys

for i in range(10):
    print i
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
    time.sleep(1)

for i in range(10):
    print i,
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
chiru@online:~$ python flush.py 
0 1 2 3 4 5 Flushing buffer
6 7 8 9 0 1 2 3 4 5 Flushing buffer
6 7 8 9

As per my understanding, When ever we execute print statements output will be written to buffer. And we will see the output on screen when buffer get flushed(cleared). By default buffer will be flushed when program exits. BUT WE CAN ALSO FLUSH THE BUFFER MANUALLY by using “sys.stdout.flush()” statement in the program. In the below code buffer will be flushed when value of i reaches 5.

You can understand by executing the below code.

chiru@online:~$ cat flush.py
import time
import sys

for i in range(10):
    print i
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
    time.sleep(1)

for i in range(10):
    print i,
    if i == 5:
        print "Flushing buffer"
        sys.stdout.flush()
chiru@online:~$ python flush.py 
0 1 2 3 4 5 Flushing buffer
6 7 8 9 0 1 2 3 4 5 Flushing buffer
6 7 8 9

回答 3

import sys
for x in range(10000):
    print "HAPPY >> %s <<\r" % str(x),
    sys.stdout.flush()
import sys
for x in range(10000):
    print "HAPPY >> %s <<\r" % str(x),
    sys.stdout.flush()

回答 4

根据我的理解,sys.stdout.flush()会将缓冲到该点的所有数据推送到文件对象。使用stdout时,数据在写入终端之前先存储在缓冲存储器中(一段时间或直到内存被填满)。使用flush()会强制清空缓冲区,甚至在缓冲区没有空间之前就将其写入终端。

As per my understanding sys.stdout.flush() pushes out all the data that has been buffered to that point to a file object. While using stdout, data is stored in buffer memory (for some time or until the memory gets filled) before it gets written to terminal. Using flush() forces to empty the buffer and write to terminal even before buffer has empty space.


Python的sys.path是从哪里初始化的?

问题:Python的sys.path是从哪里初始化的?

Python的sys.path是从哪里初始化的?

UPD:Python在引用PYTHONPATH之前添加了一些路径:

    >>> import sys
    >>> from pprint import pprint as p
    >>> p(sys.path)
    ['',
     'C:\\Python25\\lib\\site-packages\\setuptools-0.6c9-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\orbited-0.7.8-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\morbid-0.8.6.1-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\demjson-1.4-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\stomper-0.2.2-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\uuid-1.30-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\stompservice-0.1.0-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\cherrypy-3.0.1-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\pyorbited-0.2.2-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\flup-1.0.1-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\wsgilog-0.1-py2.5.egg',
     'c:\\testdir',
     'C:\\Windows\\system32\\python25.zip',
     'C:\\Python25\\DLLs',
     'C:\\Python25\\lib',
     'C:\\Python25\\lib\\plat-win',
     'C:\\Python25\\lib\\lib-tk',
     'C:\\Python25',
     'C:\\Python25\\lib\\site-packages',
     'C:\\Python25\\lib\\site-packages\\PIL',
     'C:\\Python25\\lib\\site-packages\\win32',
     'C:\\Python25\\lib\\site-packages\\win32\\lib',
     'C:\\Python25\\lib\\site-packages\\Pythonwin']

我的PYTHONPATH是:

    PYTHONPATH=c:\testdir

我想知道PYTHONPATH之前的那些路径来自哪里?

Where is Python’s sys.path initialized from?

UPD: Python is adding some paths before refering to PYTHONPATH:

    >>> import sys
    >>> from pprint import pprint as p
    >>> p(sys.path)
    ['',
     'C:\\Python25\\lib\\site-packages\\setuptools-0.6c9-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\orbited-0.7.8-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\morbid-0.8.6.1-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\demjson-1.4-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\stomper-0.2.2-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\uuid-1.30-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\stompservice-0.1.0-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\cherrypy-3.0.1-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\pyorbited-0.2.2-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\flup-1.0.1-py2.5.egg',
     'C:\\Python25\\lib\\site-packages\\wsgilog-0.1-py2.5.egg',
     'c:\\testdir',
     'C:\\Windows\\system32\\python25.zip',
     'C:\\Python25\\DLLs',
     'C:\\Python25\\lib',
     'C:\\Python25\\lib\\plat-win',
     'C:\\Python25\\lib\\lib-tk',
     'C:\\Python25',
     'C:\\Python25\\lib\\site-packages',
     'C:\\Python25\\lib\\site-packages\\PIL',
     'C:\\Python25\\lib\\site-packages\\win32',
     'C:\\Python25\\lib\\site-packages\\win32\\lib',
     'C:\\Python25\\lib\\site-packages\\Pythonwin']

My PYTHONPATH is:

    PYTHONPATH=c:\testdir

I wonder where those paths before PYTHONPATH’s ones come from?


回答 0

“从环境变量PYTHONPATH初始化,加上与安装有关的默认值”

http://docs.python.org/library/sys.html#sys.path

“Initialized from the environment variable PYTHONPATH, plus an installation-dependent default”

http://docs.python.org/library/sys.html#sys.path


回答 1

Python确实努力进行智能设置sys.path。如何设置可能会变得非常 复杂。下面的指南是一个打了折扣的,有点不完全,有些-错,但希望-有用的时候Python会什么的使用会发生什么的职级和文件Python程序员指南初始值sys.pathsys.executablesys.exec_prefix,和sys.prefix正常的 python安装上。

首先,python会尽最大努力根据操作系统告诉它在文件系统上的实际物理位置。如果操作系统只是说“ python”正在运行,它将在$ PATH中找到自己。它解析任何符号链接。完成此操作后,它将找到的可执行文件的路径用作sys.executable,no ifs,ands或buts的值。

接下来,确定用于初始值sys.exec_prefixsys.prefix

如果pyvenv.cfg在与该目录相同的目录中有一个文件, sys.executable或者在一个目录中,则python会查看该文件。不同的操作系统对此文件执行不同的操作。

python在此配置文件中查找的值之一是configuration选项home = <DIRECTORY>sys.executable 当它动态设置以后的初始值时,Python将使用此目录而不是包含的目录sys.prefix。如果该applocal = true设置出现在pyvenv.cfgWindows 的 文件中,但没有出现在home = <DIRECTORY>设置中,sys.prefix则将被设置为包含的目录sys.executable

接下来,PYTHONHOME检查环境变量。在Linux和Mac上, sys.prefixsys.exec_prefix设置为PYTHONHOME环境变量,如果它存在,并取代任何home = <DIRECTORY>的设置pyvenv.cfg。在Windows上, sys.prefix并且sys.exec_prefix设置为PYTHONHOME环境变量(如果存在),除非在中存在home = <DIRECTORY>设置,否则将使用该设置pyvenv.cfg

否则,可以通过从或指定的目录(如果有)的位置向后走来找到sys.prefix和。sys.exec_prefixsys.executablehomepyvenv.cfg

如果lib/python<version>/dyn-load在该目录或其任何父目录中找到该文件,则将该目录设置为 sys.exec_prefix在Linux或Mac上。如果lib/python<version>/os.py在目录或其任何子目录中找到该文件 ,则将该目录设置为sys.prefix在Linux,Mac和Windows上,并sys.exec_prefix设置为与Windows 相同的值 sys.prefix。如果applocal = true已设置,则在Windows上将跳过整个步骤 。使用的目录,sys.executable或者如果home设置了目录,则将其pyvenv.cfg用于的初始值sys.prefix

如果找不到或没有找到这些“地标”文件sys.prefix,则python设置sys.prefix为“后备”值。Linux和Mac,例如,使用预编译的缺省值的数值sys.prefixsys.exec_prefix。Windows等到sys.path完全确定要为设置后备值 为止sys.prefix

然后,(您一直在等待)python确定要包含在中的初始值sys.path

  1. python正在执行的脚本目录被添加到sys.path。在Windows上,这始终是空字符串,它告诉python使用脚本所在的完整路径。
  2. 除非您在Windows上并且在中设置为true sys.path否则将添加PYTHONPATH环境变量的内容(如果applocal已设置)pyvenv.cfg
  3. <prefix>/lib/python35.zipLinux / Mac和os.path.join(os.dirname(sys.executable), "python.zip")Windows 上 的zip文件路径已添加到中sys.path
  4. 如果在Windows上没有applocal = true在中设置No pyvenv.cfg,则HK_CURRENT_USER\Software\Python\PythonCore\<DLLVersion>\PythonPath\添加注册表项的子项的内容( 如果有)。
  5. 如果在Windows上未applocal = true在中设置No pyvenv.cfg,并且sys.prefix找不到,则添加注册表项的核心内容HK_CURRENT_USER\Software\Python\PythonCore\<DLLVersion>\PythonPath\如果存在);
  6. 如果在Windows上没有applocal = true在中设置No pyvenv.cfg,则HK_LOCAL_MACHINE\Software\Python\PythonCore\<DLLVersion>\PythonPath\添加注册表项的子项的内容( 如果有)。
  7. 如果在Windows上未applocal = true在中设置No pyvenv.cfg,并且sys.prefix找不到,则添加注册表项的核心内容HK_CURRENT_USER\Software\Python\PythonCore\<DLLVersion>\PythonPath\如果存在);
  8. 如果在Windows上并且未设置PYTHONPATH,则找不到前缀,并且不存在注册表项,则添加PYTHONPATH的相对编译时值;否则,将忽略此步骤。
  9. 相对于dynamic-found添加了编译时宏PYTHONPATH中的路径sys.prefix
  10. 在Mac和Linux上,将sys.exec_prefix添加的值。在Windows上,添加了用于(或将要使用)动态搜索的目录sys.prefix

在Windows的现阶段,如果未找到前缀,则python将尝试通过搜索所有目录中sys.path的地标文件来确定它,就像它尝试使用sys.executable以前的目录一样,直到找到了东西。如果不是,sys.prefix则留空。

最后,在完成所有这些之后,Python加载了site模块,这进一步为sys.path以下模块添加了一些内容:

它从头和尾部分开始最多构建四个目录。头部使用sys.prefixsys.exec_prefix; 空头被跳过。对于尾部,它使用空字符串,然后lib/site-packages(在Windows上)或lib/pythonX.Y/site-packages (然后lib/site-python在Unix和Macintosh上)使用。对于每个不同的首尾组合,它会查看它是否指向现有目录,如果是,则将其添加到sys.path中,并检查新添加的配置文件路径。

Python really tries hard to intelligently set sys.path. How it is set can get really complicated. The following guide is a watered-down, somewhat-incomplete, somewhat-wrong, but hopefully-useful guide for the rank-and-file python programmer of what happens when python figures out what to use as the initial values of sys.path, sys.executable, sys.exec_prefix, and sys.prefix on a normal python installation.

First, python does its level best to figure out its actual physical location on the filesystem based on what the operating system tells it. If the OS just says “python” is running, it finds itself in $PATH. It resolves any symbolic links. Once it has done this, the path of the executable that it finds is used as the value for sys.executable, no ifs, ands, or buts.

Next, it determines the initial values for sys.exec_prefix and sys.prefix.

If there is a file called pyvenv.cfg in the same directory as sys.executable or one directory up, python looks at it. Different OSes do different things with this file.

One of the values in this config file that python looks for is the configuration option home = <DIRECTORY>. Python will use this directory instead of the directory containing sys.executable when it dynamically sets the initial value of sys.prefix later. If the applocal = true setting appears in the pyvenv.cfg file on Windows, but not the home = <DIRECTORY> setting, then sys.prefix will be set to the directory containing sys.executable.

Next, the PYTHONHOME environment variable is examined. On Linux and Mac, sys.prefix and sys.exec_prefix are set to the PYTHONHOME environment variable, if it exists, superseding any home = <DIRECTORY> setting in pyvenv.cfg. On Windows, sys.prefix and sys.exec_prefix is set to the PYTHONHOME environment variable, if it exists, unless a home = <DIRECTORY> setting is present in pyvenv.cfg, which is used instead.

Otherwise, these sys.prefix and sys.exec_prefix are found by walking backwards from the location of sys.executable, or the home directory given by pyvenv.cfg if any.

If the file lib/python<version>/dyn-load is found in that directory or any of its parent directories, that directory is set to be to be sys.exec_prefix on Linux or Mac. If the file lib/python<version>/os.py is is found in the directory or any of its subdirectories, that directory is set to be sys.prefix on Linux, Mac, and Windows, with sys.exec_prefix set to the same value as sys.prefix on Windows. This entire step is skipped on Windows if applocal = true is set. Either the directory of sys.executable is used or, if home is set in pyvenv.cfg, that is used instead for the initial value of sys.prefix.

If it can’t find these “landmark” files or sys.prefix hasn’t been found yet, then python sets sys.prefix to a “fallback” value. Linux and Mac, for example, use pre-compiled defaults as the values of sys.prefix and sys.exec_prefix. Windows waits until sys.path is fully figured out to set a fallback value for sys.prefix.

Then, (what you’ve all been waiting for,) python determines the initial values that are to be contained in sys.path.

  1. The directory of the script which python is executing is added to sys.path. On Windows, this is always the empty string, which tells python to use the full path where the script is located instead.
  2. The contents of PYTHONPATH environment variable, if set, is added to sys.path, unless you’re on Windows and applocal is set to true in pyvenv.cfg.
  3. The zip file path, which is <prefix>/lib/python35.zip on Linux/Mac and os.path.join(os.dirname(sys.executable), "python.zip") on Windows, is added to sys.path.
  4. If on Windows and no applocal = true was set in pyvenv.cfg, then the contents of the subkeys of the registry key HK_CURRENT_USER\Software\Python\PythonCore\<DLLVersion>\PythonPath\ are added, if any.
  5. If on Windows and no applocal = true was set in pyvenv.cfg, and sys.prefix could not be found, then the core contents of the of the registry key HK_CURRENT_USER\Software\Python\PythonCore\<DLLVersion>\PythonPath\ is added, if it exists;
  6. If on Windows and no applocal = true was set in pyvenv.cfg, then the contents of the subkeys of the registry key HK_LOCAL_MACHINE\Software\Python\PythonCore\<DLLVersion>\PythonPath\ are added, if any.
  7. If on Windows and no applocal = true was set in pyvenv.cfg, and sys.prefix could not be found, then the core contents of the of the registry key HK_CURRENT_USER\Software\Python\PythonCore\<DLLVersion>\PythonPath\ is added, if it exists;
  8. If on Windows, and PYTHONPATH was not set, the prefix was not found, and no registry keys were present, then the relative compile-time value of PYTHONPATH is added; otherwise, this step is ignored.
  9. Paths in the compile-time macro PYTHONPATH are added relative to the dynamically-found sys.prefix.
  10. On Mac and Linux, the value of sys.exec_prefix is added. On Windows, the directory which was used (or would have been used) to search dynamically for sys.prefix is added.

At this stage on Windows, if no prefix was found, then python will try to determine it by searching all the directories in sys.path for the landmark files, as it tried to do with the directory of sys.executable previously, until it finds something. If it doesn’t, sys.prefix is left blank.

Finally, after all this, Python loads the site module, which adds stuff yet further to sys.path:

It starts by constructing up to four directories from a head and a tail part. For the head part, it uses sys.prefix and sys.exec_prefix; empty heads are skipped. For the tail part, it uses the empty string and then lib/site-packages (on Windows) or lib/pythonX.Y/site-packages and then lib/site-python (on Unix and Macintosh). For each of the distinct head-tail combinations, it sees if it refers to an existing directory, and if so, adds it to sys.path and also inspects the newly added path for configuration files.


为什么我们不应该在py脚本中使用sys.setdefaultencoding(“ utf-8”)?

问题:为什么我们不应该在py脚本中使用sys.setdefaultencoding(“ utf-8”)?

我在脚本顶部看到了几个使用此脚本的py脚本。在什么情况下应该使用它?

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

I have seen few py scripts which use this at the top of the script. In what cases one should use it?

import sys
reload(sys)
sys.setdefaultencoding("utf-8")

回答 0

根据文档:这允许您从默认的ASCII切换到其他编码,例如UTF-8,Python运行时在必须将字符串缓冲区解码为unicode时将使用该编码。

此功能仅在Python扫描环境时在Python启动时可用。必须在系统范围的模块中调用,sitecustomize.py评估完setdefaultencoding()sys模块后,将从该模块中删除该功能。

实际使用它的唯一方法是通过将属性重新带回的重载hack。

此外,使用sys.setdefaultencoding()一直气馁,它已成为一个无操作的py3k。py3k的编码硬连线到“ utf-8”,更改它会引发错误。

我建议您阅读一些指针:

As per the documentation: This allows you to switch from the default ASCII to other encodings such as UTF-8, which the Python runtime will use whenever it has to decode a string buffer to unicode.

This function is only available at Python start-up time, when Python scans the environment. It has to be called in a system-wide module, sitecustomize.py, After this module has been evaluated, the setdefaultencoding() function is removed from the sys module.

The only way to actually use it is with a reload hack that brings the attribute back.

Also, the use of sys.setdefaultencoding() has always been discouraged, and it has become a no-op in py3k. The encoding of py3k is hard-wired to “utf-8” and changing it raises an error.

I suggest some pointers for reading:


回答 1

tl; dr

答案是永不(除非您真的知道自己在做什么)

在正确理解编码/解码的情况下,可以解决9/10倍的解决方案。

1/10个人的语言环境或环境定义错误,需要设置:

PYTHONIOENCODING="UTF-8"  

在他们的环境中解决控制台打印问题。

它有什么作用?

sys.setdefaultencoding("utf-8")(为了避免重复使用,请删除),更改了Python 2.x需要将Unicode()转换为str()(反之亦然)且未给出编码时使用的默认编码/解码。即:

str(u"\u20AC")
unicode("€")
"{}".format(u"\u20AC") 

在Python 2.x中,默认编码设置为ASCII,并且上面的示例将失败,并显示以下内容:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

(我的控制台配置为UTF-8,因此"€" = '\xe2\x82\xac',因此为exceptions\xe2

要么

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

sys.setdefaultencoding("utf-8")将允许这些代码对有用,但对于不使用UTF-8的用户不一定有用。ASCII的默认设置可确保不会将编码假设纳入代码

安慰

sys.setdefaultencoding("utf-8")sys.stdout.encoding在将字符打印到控制台时,也具有出现fix的副作用。Python使用用户的语言环境(Linux / OS X / Un * x)或代码页(Windows)进行设置。有时,用户的语言环境已损坏,仅需要PYTHONIOENCODING修复控制台编码

例:

$ export LANG=en_GB.gibberish
$ python
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> print u"\u20AC"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
>>> exit()

$ PYTHONIOENCODING=UTF-8 python
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> print u"\u20AC"
€

sys.setdefaultencoding(“ utf-8”)有什么不好?

人们已经认识到默认的编码是ASCII,因此针对Python 2.x进行了16年的开发。UnicodeError已经编写了异常处理方法来处理发现包含非ASCII的字符串从字符串到Unicode的转换。

来自https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

def welcome_message(byte_string):
    try:
        return u"%s runs your business" % byte_string
    except UnicodeError:
        return u"%s runs your business" % unicode(byte_string,
            encoding=detect_encoding(byte_string))

print(welcome_message(u"Angstrom (Å®)".encode("latin-1"))

在设置defaultencoding之前,此代码将无法解码ascii编码中的“Å”,然后将进入异常处理程序以猜测编码并将其正确转换为unicode。打印:埃斯特朗(Å®)经营您的业务。将defaultencoding设置为utf-8后,代码将发现byte_string可以解释为utf-8,因此它将处理数据并返回该值:Angstrom(Ů)经营您的业务。

更改应为常数的值将对您依赖的模块产生巨大影响。最好只修复代码中传入和传出的数据。

示例问题

虽然在以下示例中将defaultencoding设置为UTF-8并不是根本原因,但它显示了如何掩盖问题以及如何在输入编码更改时以不明显的方式中断代码: UnicodeDecodeError:’utf8’编解码器可以在位置3131中解码字节0x80:无效的起始字节

tl;dr

The answer is NEVER! (unless you really know what you’re doing)

9/10 times the solution can be resolved with a proper understanding of encoding/decoding.

1/10 people have an incorrectly defined locale or environment and need to set:

PYTHONIOENCODING="UTF-8"  

in their environment to fix console printing problems.

What does it do?

sys.setdefaultencoding("utf-8") (struck through to avoid re-use) changes the default encoding/decoding used whenever Python 2.x needs to convert a Unicode() to a str() (and vice-versa) and the encoding is not given. I.e:

str(u"\u20AC")
unicode("€")
"{}".format(u"\u20AC") 

In Python 2.x, the default encoding is set to ASCII and the above examples will fail with:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128)

(My console is configured as UTF-8, so "€" = '\xe2\x82\xac', hence exception on \xe2)

or

UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)

sys.setdefaultencoding("utf-8") will allow these to work for me, but won’t necessarily work for people who don’t use UTF-8. The default of ASCII ensures that assumptions of encoding are not baked into code

Console

sys.setdefaultencoding("utf-8") also has a side effect of appearing to fix sys.stdout.encoding, used when printing characters to the console. Python uses the user’s locale (Linux/OS X/Un*x) or codepage (Windows) to set this. Occasionally, a user’s locale is broken and just requires PYTHONIOENCODING to fix the console encoding.

Example:

$ export LANG=en_GB.gibberish
$ python
>>> import sys
>>> sys.stdout.encoding
'ANSI_X3.4-1968'
>>> print u"\u20AC"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\u20ac' in position 0: ordinal not in range(128)
>>> exit()

$ PYTHONIOENCODING=UTF-8 python
>>> import sys
>>> sys.stdout.encoding
'UTF-8'
>>> print u"\u20AC"
€

What’s so bad with sys.setdefaultencoding(“utf-8”)?

People have been developing against Python 2.x for 16 years on the understanding that the default encoding is ASCII. UnicodeError exception handling methods have been written to handle string to Unicode conversions on strings that are found to contain non-ASCII.

From https://anonbadger.wordpress.com/2015/06/16/why-sys-setdefaultencoding-will-break-code/

def welcome_message(byte_string):
    try:
        return u"%s runs your business" % byte_string
    except UnicodeError:
        return u"%s runs your business" % unicode(byte_string,
            encoding=detect_encoding(byte_string))

print(welcome_message(u"Angstrom (Å®)".encode("latin-1"))

Previous to setting defaultencoding this code would be unable to decode the “Å” in the ascii encoding and then would enter the exception handler to guess the encoding and properly turn it into unicode. Printing: Angstrom (Å®) runs your business. Once you’ve set the defaultencoding to utf-8 the code will find that the byte_string can be interpreted as utf-8 and so it will mangle the data and return this instead: Angstrom (Ů) runs your business.

Changing what should be a constant will have dramatic effects on modules you depend upon. It’s better to just fix the data coming in and out of your code.

Example problem

While the setting of defaultencoding to UTF-8 isn’t the root cause in the following example, it shows how problems are masked and how, when the input encoding changes, the code breaks in an unobvious way: UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0x80 in position 3131: invalid start byte


回答 2

#!/usr/bin/env python
#-*- coding: utf-8 -*-
u = u'moçambique'
print u.encode("utf-8")
print u

chmod +x test.py
./test.py
moçambique
moçambique

./test.py > output.txt
Traceback (most recent call last):
  File "./test.py", line 5, in <module>
    print u
UnicodeEncodeError: 'ascii' codec can't encode character 
u'\xe7' in position 2: ordinal not in range(128)

在shell上工作时,不发送到sdtout,因此这是写stdout的一种解决方法。

我做了另一种方法,如果未定义sys.stdout.encoding,或者换句话说,需要先导出PYTHONIOENCODING = UTF-8才能写入stdout,否则该方法将不运行。

import sys
if (sys.stdout.encoding is None):            
    print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout." 
    exit(1)


因此,使用相同的示例:

export PYTHONIOENCODING=UTF-8
./test.py > output.txt

将工作

#!/usr/bin/env python
#-*- coding: utf-8 -*-
u = u'moçambique'
print u.encode("utf-8")
print u

chmod +x test.py
./test.py
moçambique
moçambique

./test.py > output.txt
Traceback (most recent call last):
  File "./test.py", line 5, in <module>
    print u
UnicodeEncodeError: 'ascii' codec can't encode character 
u'\xe7' in position 2: ordinal not in range(128)

on shell works , sending to sdtout not , so that is one workaround, to write to stdout .

I made other approach, which is not run if sys.stdout.encoding is not define, or in others words , need export PYTHONIOENCODING=UTF-8 first to write to stdout.

import sys
if (sys.stdout.encoding is None):            
    print >> sys.stderr, "please set python env PYTHONIOENCODING=UTF-8, example: export PYTHONIOENCODING=UTF-8, when write to stdout." 
    exit(1)


so, using same example:

export PYTHONIOENCODING=UTF-8
./test.py > output.txt

will work


回答 3

  • 第一个危险在于reload(sys)

    重新加载模块时,实际上在运行时中获得了该模块的两个副本。旧模块是一个Python对象,就像其他所有模块一样,只要存在对它的引用,它就会保持活动状态。因此,一半的对象将指向旧模块,而另一半则指向新模块。进行更改时,当某些随机对象看不到更改时,您将永远看不到它:

    (This is IPython shell)
    
    In [1]: import sys
    
    In [2]: sys.stdout
    Out[2]: <colorama.ansitowin32.StreamWrapper at 0x3a2aac8>
    
    In [3]: reload(sys)
    <module 'sys' (built-in)>
    
    In [4]: sys.stdout
    Out[4]: <open file '<stdout>', mode 'w' at 0x00000000022E20C0>
    
    In [11]: import IPython.terminal
    
    In [14]: IPython.terminal.interactiveshell.sys.stdout
    Out[14]: <colorama.ansitowin32.StreamWrapper at 0x3a9aac8>
  • 现在,sys.setdefaultencoding()适当的

    它所影响的只是隐式转换str<->unicode。现在,这utf-8是地球上最聪明的编码(向后兼容ASCII和所有语言),现在转换“正常”了,可能出什么问题了吗?

    好吧,什么都可以。那就是危险。

    • 可能有些代码依赖于UnicodeError为非ASCII输入抛出的代码,或者使用错误处理程序进行代码转换,这现在会产生意外结果。而且,由于所有代码都是使用默认设置进行测试的,因此您在此处严格处于“不受支持”的范围,并且没人能保证它们的代码将如何运行。
    • 如果系统上并非所有组件都使用UTF-8,则转码可能会产生意外或无法使用的结果,因为Python 2实际上具有多个独立的“默认字符串编码”。(请记住,程序必须在客户的设备上为客户工作。)
      • 同样,最糟糕的是您永远不会知道,因为转换是隐式的 -您实际上并不知道转换的时间和地点。(Python Zen,koan 2 ahoy!)您将永远不知道为什么(如果)代码可以在一个系统上运行而在另一个系统上中断。(或者更好的是,可以在IDE中工作,并且可以在控制台中中断。)
  • The first danger lies in reload(sys).

    When you reload a module, you actually get two copies of the module in your runtime. The old module is a Python object like everything else, and stays alive as long as there are references to it. So, half of the objects will be pointing to the old module, and half to the new one. When you make some change, you will never see it coming when some random object doesn’t see the change:

    (This is IPython shell)
    
    In [1]: import sys
    
    In [2]: sys.stdout
    Out[2]: <colorama.ansitowin32.StreamWrapper at 0x3a2aac8>
    
    In [3]: reload(sys)
    <module 'sys' (built-in)>
    
    In [4]: sys.stdout
    Out[4]: <open file '<stdout>', mode 'w' at 0x00000000022E20C0>
    
    In [11]: import IPython.terminal
    
    In [14]: IPython.terminal.interactiveshell.sys.stdout
    Out[14]: <colorama.ansitowin32.StreamWrapper at 0x3a9aac8>
    
  • Now, sys.setdefaultencoding() proper

    All that it affects is implicit conversion str<->unicode. Now, utf-8 is the sanest encoding on the planet (backward-compatible with ASCII and all), the conversion now “just works”, what could possibly go wrong?

    Well, anything. And that is the danger.

    • There may be some code that relies on the UnicodeError being thrown for non-ASCII input, or does the transcoding with an error handler, which now produces an unexpected result. And since all code is tested with the default setting, you’re strictly on “unsupported” territory here, and no-one gives you guarantees about how their code will behave.
    • The transcoding may produce unexpected or unusable results if not everything on the system uses UTF-8 because Python 2 actually has multiple independent “default string encodings”. (Remember, a program must work for the customer, on the customer’s equipment.)
      • Again, the worst thing is you will never know that because the conversion is implicit — you don’t really know when and where it happens. (Python Zen, koan 2 ahoy!) You will never know why (and if) your code works on one system and breaks on another. (Or better yet, works in IDE and breaks in console.)

numpy数组的Python内存使用情况

问题:numpy数组的Python内存使用情况

我正在使用python分析一些大文件,并且遇到了内存问题,因此我一直在使用sys.getsizeof()来跟踪使用情况,但是numpy数组的行为很奇怪。这是一个涉及我必须打开的反照率地图的示例:

>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80

数据仍然存在,但是对象的大小(3600×7200像素图)已从约200 Mb变为80字节。我希望我的内存问题已经解决,并将所有内容都转换为numpy数组,但是我认为这种行为(如果为真)在某种程度上会违反某些信息论定律或热力学定律,等等。倾向于相信getsizeof()不适用于numpy数组。有任何想法吗?

I’m using python to analyse some large files and I’m running into memory issues, so I’ve been using sys.getsizeof() to try and keep track of the usage, but it’s behaviour with numpy arrays is bizarre. Here’s an example involving a map of albedos that I’m having to open:

>>> import numpy as np
>>> import struct
>>> from sys import getsizeof
>>> f = open('Albedo_map.assoc', 'rb')
>>> getsizeof(f)
144
>>> albedo = struct.unpack('%df' % (7200*3600), f.read(7200*3600*4))
>>> getsizeof(albedo)
207360056
>>> albedo = np.array(albedo).reshape(3600,7200)
>>> getsizeof(albedo)
80

Well the data’s still there, but the size of the object, a 3600×7200 pixel map, has gone from ~200 Mb to 80 bytes. I’d like to hope that my memory issues are over and just convert everything to numpy arrays, but I feel that this behaviour, if true, would in some way violate some law of information theory or thermodynamics, or something, so I’m inclined to believe that getsizeof() doesn’t work with numpy arrays. Any ideas?


回答 0

您可以将其array.nbytes用于numpy数组,例如:

>>> import numpy as np
>>> from sys import getsizeof
>>> a = [0] * 1024
>>> b = np.array(a)
>>> getsizeof(a)
8264
>>> b.nbytes
8192

You can use array.nbytes for numpy arrays, for example:

>>> import numpy as np
>>> from sys import getsizeof
>>> a = [0] * 1024
>>> b = np.array(a)
>>> getsizeof(a)
8264
>>> b.nbytes
8192

回答 1

nbytes字段将为您提供数组中所有元素的大小(以字节为单位)numpy.array

size_in_bytes = my_numpy_array.nbytes

请注意,这并不测量“数组对象的非元素属性”,因此,以字节为单位的实际大小可以比此大几个字节。

The field nbytes will give you the size in bytes of all the elements of the array in a numpy.array:

size_in_bytes = my_numpy_array.nbytes

Notice that this does not measures “non-element attributes of the array object” so the actual size in bytes can be a few bytes larger than this.


回答 2

在python笔记本中,我经常想过滤掉“悬空的numpy.ndarray”,特别是存储在的笔记本中_1_2等从未真正意味着活路。

我使用此代码来获取所有列表及其大小的列表。

不知道locals()或者globals()是更好地在这里。

import sys
import numpy
from humanize import naturalsize

for size, name in sorted(
    (value.nbytes, name)
    for name, value in locals().items()
    if isinstance(value, numpy.ndarray)):
  print("{:>30}: {:>8}".format(name, naturalsize(size)))

In python notebooks I often want to filter out ‘dangling’ numpy.ndarray‘s, in particular the ones that are stored in _1, _2, etc that were never really meant to stay alive.

I use this code to get a listing of all of them and their size.

Not sure if locals() or globals() is better here.

import sys
import numpy
from humanize import naturalsize

for size, name in sorted(
    (value.nbytes, name)
    for name, value in locals().items()
    if isinstance(value, numpy.ndarray)):
  print("{:>30}: {:>8}".format(name, naturalsize(size)))