标签归档：Python

问题：如何解决：“ UnicodeDecodeError：’ascii’编解码器无法解码字节”

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File "/usr/local/bin/wok", line 4, in
Engine()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
self.load_pages()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
page.meta['content'] = page.renderer.render(page.original)
File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
return markdown(plain, Markdown.plugins)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
return md.convert(text)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

如何解决？

在其他基于python的静态博客应用中，中文帖子可以成功发布。像这个程序：http : //github.com/vrypan/bucket3。在我的网站http://bc3.brite.biz/中，中文帖子可以成功发布。

as3:~/ngokevin-site# nano content/blog/20140114_test-chinese.mkd
as3:~/ngokevin-site# wok
Traceback (most recent call last):
File "/usr/local/bin/wok", line 4, in
Engine()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 104, in init
self.load_pages()
File "/usr/local/lib/python2.7/site-packages/wok/engine.py", line 238, in load_pages
p = Page.from_file(os.path.join(root, f), self.options, self, renderer)
File "/usr/local/lib/python2.7/site-packages/wok/page.py", line 111, in from_file
page.meta['content'] = page.renderer.render(page.original)
File "/usr/local/lib/python2.7/site-packages/wok/renderers.py", line 46, in render
return markdown(plain, Markdown.plugins)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 419, in markdown
return md.convert(text)
File "/usr/local/lib/python2.7/site-packages/markdown/init.py", line 281, in convert
source = unicode(source)
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe8 in position 1: ordinal not in range(128). -- Note: Markdown only accepts unicode input!

How to fix it?

In some other python-based static blog apps, Chinese post can be published successfully. Such as this app: http://github.com/vrypan/bucket3. In my site http://bc3.brite.biz/, Chinese post can be published successfully.

回答 0

tl; dr /快速修复

不要对Willy Nilly进行解码/编码
不要假设您的字符串是UTF-8编码的
尝试在代码中尽快将字符串转换为Unicode字符串
修复您的语言环境：如何在Python 3.6中解决UnicodeDecodeError？
不要试图使用快速reloadhack

Python 2.x中的Unicode Zen-完整版

在没有看到来源的情况下，很难知道根本原因，因此，我将不得不大体讲。

UnicodeDecodeError: 'ascii' codec can't decode byte通常，当您尝试将str包含非ASCII 的Python 2.x转换为Unicode字符串而未指定原始字符串的编码时，通常会发生这种情况。

简而言之，Unicode字符串是一种完全独立的Python字符串类型，不包含任何编码。它们仅保存Unicode 点代码，因此可以保存整个频谱中的任何Unicode点。字符串包含编码的文本，包括UTF-8，UTF-16，ISO-8895-1，GBK，Big5等。字符串被解码为Unicode，而Unicodes被编码为字符串。文件和文本数据始终以编码的字符串传输。

Markdown模块的作者可能会使用unicode()（抛出异常的地方）作为其余代码的质量门-它会转换ASCII或将现有的Unicode字符串重新包装为新的Unicode字符串。Markdown作者不知道传入字符串的编码，因此在传递给Markdown之前，将依靠您将字符串解码为Unicode字符串。

可以使用u字符串前缀在代码中声明Unicode 字符串。例如

>>> my_u = u'my ünicôdé strįng'
>>> type(my_u)
<type 'unicode'>

Unicode字符串也可能来自文件，数据库和网络模块。发生这种情况时，您无需担心编码。

陷阱

str即使不显式调用，也可能会发生从Unicode到Unicode的转换unicode()。

以下情况导致UnicodeDecodeError异常：

# Explicit conversion without encoding
unicode('€')

# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format('€')

# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u'The currency is: %s' % '€'

# Append string to Unicode
# Python will try to convert string to Unicode first
u'The currency is: ' + '€'

例子

在下图中，您可以看到如何café根据终端类型以“ UTF-8”或“ Cp1252”编码方式对单词进行编码。在两个示例中，caf都是常规的ascii。在UTF-8中，é使用两个字节进行编码。在“ Cp1252”中，é是0xE9（它也恰好是Unicode点值（这不是巧合））。正确的decode()被调用，并成功转换为Python Unicode：

在此图中，使用decode()调用ascii（与unicode()没有给出编码的调用相同）。由于ASCII不能包含大于的字节0x7F，这将引发UnicodeDecodeError异常：

Unicode三明治

最好在代码中形成一个Unicode三明治，将所有传入数据解码为Unicode字符串，使用Unicode，然后在输出时编码为strs。这使您不必担心代码中间的字符串编码。

输入/解码

源代码

如果您需要将非ASCII烘烤到源代码中，只需通过在字符串前面加上来创建Unicode字符串u。例如

u'Zürich'

为了允许Python解码您的源代码，您将需要添加一个编码标头以匹配文件的实际编码。例如，如果您的文件编码为“ UTF-8”，则可以使用：

# encoding: utf-8

仅当源代码中包含非ASCII时才需要这样做。

档案

通常从文件接收非ASCII数据。该io模块提供了一个TextWrapper，它使用给定即时解码您的文件encoding。您必须为文件使用正确的编码-不容易猜测。例如，对于UTF-8文件：

import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
     my_unicode_string = my_file.read()

my_unicode_string然后适合传递给Markdown。如果UnicodeDecodeError从read()行开始，则您可能使用了错误的编码值。

CSV文件

Python 2.7 CSV模块不支持非ASCII字符😩。但是，https：//pypi.python.org/pypi/backports.csv提供了帮助。

像上面一样使用它，但是将打开的文件传递给它：

from backports import csv
import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
    for row in csv.reader(my_file):
        yield row

资料库

大多数Python数据库驱动程序都可以Unicode格式返回数据，但是通常需要一些配置。始终对SQL查询使用Unicode字符串。

的MySQL

在连接字符串中添加：

charset='utf8',
use_unicode=True

例如

>>> db = MySQLdb.connect(host="localhost", user='root', passwd='passwd', db='sandbox', use_unicode=True, charset="utf8")

PostgreSQL的

加：

psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

HTTP

网页几乎可以采用任何编码方式进行编码。的Content-type报头应包含一个charset字段在编码暗示。然后可以根据该值手动解码内容。另外，Python-Requests在中返回Unicode response.text。

手动地

如果必须手动解码字符串，则可以简单地执行my_string.decode(encoding)，其中encoding是适当的编码。此处提供了Python 2.x支持的编解码器：标准编码。同样，如果您得到了，UnicodeDecodeError则可能是编码错误。

三明治的肉

像正常strs一样使用Unicode。

输出量

标准输出/打印

print通过标准输出流进行写入。Python尝试在stdout上配置编码器，以便将Unicode编码为控制台的编码。例如，如果Linux shell locale是en_GB.UTF-8，则输出将被编码为UTF-8。在Windows上，您将被限制为8位代码页。

错误配置的控制台（例如损坏的语言环境）可能导致意外的打印错误。PYTHONIOENCODING环境变量可以强制对stdout进行编码。

档案

就像输入一样，io.open可用于将Unicode透明地转换为编码的字节字符串。

数据库

用于读取的相同配置将允许直接编写Unicode。

Python 3

Python 3不再比Python 2.x更具有Unicode功能，但是在该主题上的混淆却稍少一些。例如，常规str字符串现在是Unicode字符串，而旧字符串str现在是bytes。

默认编码为UTF-8，因此，如果您.decode()未提供编码的字节字符串，Python 3将使用UTF-8编码。这可能解决了50％的人们的Unicode问题。

此外，open()默认情况下以文本模式运行，因此返回解码str（Unicode 编码）。编码来自您的语言环境，在Un * x系统上通常是UTF-8，在Windows机器上通常是8位代码页，例如Windows-1251。

为什么不应该使用 `sys.setdefaultencoding('utf8')`

这是一个令人讨厌的hack（这是您不得不使用的原因reload），只会掩盖问题并阻碍您迁移到Python3.x。理解问题，解决根本原因并享受Unicode zen。请参阅为什么我们不应该在py脚本中使用sys.setdefaultencoding（“ utf-8”）？了解更多详情

tl;dr / quick fix

Don’t decode/encode willy nilly
Don’t assume your strings are UTF-8 encoded
Try to convert strings to Unicode strings as soon as possible in your code
Fix your locale: How to solve UnicodeDecodeError in Python 3.6?
Don’t be tempted to use quick reload hacks

Unicode Zen in Python 2.x – The Long Version

Without seeing the source it’s difficult to know the root cause, so I’ll have to speak generally.

UnicodeDecodeError: 'ascii' codec can't decode byte generally happens when you try to convert a Python 2.x str that contains non-ASCII to a Unicode string without specifying the encoding of the original string.

In brief, Unicode strings are an entirely separate type of Python string that does not contain any encoding. They only hold Unicode point codes and therefore can hold any Unicode point from across the entire spectrum. Strings contain encoded text, beit UTF-8, UTF-16, ISO-8895-1, GBK, Big5 etc. Strings are decoded to Unicode and Unicodes are encoded to strings. Files and text data are always transferred in encoded strings.

The Markdown module authors probably use unicode() (where the exception is thrown) as a quality gate to the rest of the code – it will convert ASCII or re-wrap existing Unicodes strings to a new Unicode string. The Markdown authors can’t know the encoding of the incoming string so will rely on you to decode strings to Unicode strings before passing to Markdown.

Unicode strings can be declared in your code using the u prefix to strings. E.g.

>>> my_u = u'my ünicôdé strįng'
>>> type(my_u)
<type 'unicode'>

Unicode strings may also come from file, databases and network modules. When this happens, you don’t need to worry about the encoding.

Gotchas

Conversion from str to Unicode can happen even when you don’t explicitly call unicode().

The following scenarios cause UnicodeDecodeError exceptions:

# Explicit conversion without encoding
unicode('€')

# New style format string into Unicode string
# Python will try to convert value string to Unicode first
u"The currency is: {}".format('€')

# Old style format string into Unicode string
# Python will try to convert value string to Unicode first
u'The currency is: %s' % '€'

# Append string to Unicode
# Python will try to convert string to Unicode first
u'The currency is: ' + '€'

Examples

In the following diagram, you can see how the word café has been encoded in either “UTF-8” or “Cp1252” encoding depending on the terminal type. In both examples, caf is just regular ascii. In UTF-8, é is encoded using two bytes. In “Cp1252”, é is 0xE9 (which is also happens to be the Unicode point value (it’s no coincidence)). The correct decode() is invoked and conversion to a Python Unicode is successfull:

In this diagram, decode() is called with ascii (which is the same as calling unicode() without an encoding given). As ASCII can’t contain bytes greater than 0x7F, this will throw a UnicodeDecodeError exception:

The Unicode Sandwich

It’s good practice to form a Unicode sandwich in your code, where you decode all incoming data to Unicode strings, work with Unicodes, then encode to strs on the way out. This saves you from worrying about the encoding of strings in the middle of your code.

Input / Decode

Source code

If you need to bake non-ASCII into your source code, just create Unicode strings by prefixing the string with a u. E.g.

u'Zürich'

To allow Python to decode your source code, you will need to add an encoding header to match the actual encoding of your file. For example, if your file was encoded as ‘UTF-8’, you would use:

# encoding: utf-8

This is only necessary when you have non-ASCII in your source code.

Files

Usually non-ASCII data is received from a file. The io module provides a TextWrapper that decodes your file on the fly, using a given encoding. You must use the correct encoding for the file – it can’t be easily guessed. For example, for a UTF-8 file:

import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
     my_unicode_string = my_file.read()

my_unicode_string would then be suitable for passing to Markdown. If a UnicodeDecodeError from the read() line, then you’ve probably used the wrong encoding value.

CSV Files

The Python 2.7 CSV module does not support non-ASCII characters 😩. Help is at hand, however, with https://pypi.python.org/pypi/backports.csv.

Use it like above but pass the opened file to it:

from backports import csv
import io
with io.open("my_utf8_file.txt", "r", encoding="utf-8") as my_file:
    for row in csv.reader(my_file):
        yield row

Databases

Most Python database drivers can return data in Unicode, but usually require a little configuration. Always use Unicode strings for SQL queries.

MySQL

In the connection string add:

charset='utf8',
use_unicode=True

E.g.

>>> db = MySQLdb.connect(host="localhost", user='root', passwd='passwd', db='sandbox', use_unicode=True, charset="utf8")

PostgreSQL

Add:

psycopg2.extensions.register_type(psycopg2.extensions.UNICODE)
psycopg2.extensions.register_type(psycopg2.extensions.UNICODEARRAY)

HTTP

Web pages can be encoded in just about any encoding. The Content-type header should contain a charset field to hint at the encoding. The content can then be decoded manually against this value. Alternatively, Python-Requests returns Unicodes in response.text.

Manually

If you must decode strings manually, you can simply do my_string.decode(encoding), where encoding is the appropriate encoding. Python 2.x supported codecs are given here: Standard Encodings. Again, if you get UnicodeDecodeError then you’ve probably got the wrong encoding.

The meat of the sandwich

Work with Unicodes as you would normal strs.

Output

stdout / printing

print writes through the stdout stream. Python tries to configure an encoder on stdout so that Unicodes are encoded to the console’s encoding. For example, if a Linux shell’s locale is en_GB.UTF-8, the output will be encoded to UTF-8. On Windows, you will be limited to an 8bit code page.

An incorrectly configured console, such as corrupt locale, can lead to unexpected print errors. PYTHONIOENCODING environment variable can force the encoding for stdout.

Files

Just like input, io.open can be used to transparently convert Unicodes to encoded byte strings.

Database

The same configuration for reading will allow Unicodes to be written directly.

Python 3

Python 3 is no more Unicode capable than Python 2.x is, however it is slightly less confused on the topic. E.g the regular str is now a Unicode string and the old str is now bytes.

The default encoding is UTF-8, so if you .decode() a byte string without giving an encoding, Python 3 uses UTF-8 encoding. This probably fixes 50% of people’s Unicode problems.

Further, open() operates in text mode by default, so returns decoded str (Unicode ones). The encoding is derived from your locale, which tends to be UTF-8 on Un*x systems or an 8-bit code page, such as windows-1251, on Windows boxes.

Why you shouldn’t use `sys.setdefaultencoding('utf8')`

It’s a nasty hack (there’s a reason you have to use reload) that will only mask problems and hinder your migration to Python 3.x. Understand the problem, fix the root cause and enjoy Unicode zen. See Why should we NOT use sys.setdefaultencoding(“utf-8”) in a py script? for further details

回答 1

终于我明白了：

as3:/usr/local/lib/python2.7/site-packages# cat sitecustomize.py
# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

让我检查一下：

as3:~/ngokevin-site# python
Python 2.7.6 (default, Dec  6 2013, 14:49:02)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.getdefaultencoding()
'utf8'
>>>

上面显示了python的默认编码为utf8。然后错误不再存在。

Finally I got it:

as3:/usr/local/lib/python2.7/site-packages# cat sitecustomize.py
# encoding=utf8  
import sys  

reload(sys)  
sys.setdefaultencoding('utf8')

Let me check:

as3:~/ngokevin-site# python
Python 2.7.6 (default, Dec  6 2013, 14:49:02)
[GCC 4.4.5] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.getdefaultencoding()
'utf8'
>>>

The above shows the default encoding of python is utf8. Then the error is no more.

回答 2

这是经典的“ unicode问题”。我相信解释这一点超出了StackOverflow答案的范围，无法完全解释正在发生的事情。

这里有很好的解释。

在非常简短的摘要中，您已经将某些内容解释为字节字符串，并将其解码为Unicode字符，但是默认编解码器（ascii）失败了。

我为您指出的演示文稿提供了避免这种情况的建议。使您的代码为“ unicode三明治”。在Python 2中，使用from __future__ import unicode_literals帮助。

更新：如何固定代码：

确定-在变量“源”中，您有一些字节。从您的问题中不清楚它们是如何到达的-也许您是从网络表单中读取它们的？无论如何，它们都不是用ascii编码的，但是python会假设它们是ASCII并尝试将它们转换为unicode。您需要明确告诉它编码是什么。这意味着您需要知道什么是编码！这并不总是那么容易，它完全取决于此字符串的来源。您可以尝试一些常见的编码-例如UTF-8。您将unicode（）的编码作为第二个参数：

source = unicode(source, 'utf-8')

This is the classic “unicode issue”. I believe that explaining this is beyond the scope of a StackOverflow answer to completely explain what is happening.

It is well explained here.

In very brief summary, you have passed something that is being interpreted as a string of bytes to something that needs to decode it into Unicode characters, but the default codec (ascii) is failing.

The presentation I pointed you to provides advice for avoiding this. Make your code a “unicode sandwich”. In Python 2, the use of from __future__ import unicode_literals helps.

Update: how can the code be fixed:

OK – in your variable “source” you have some bytes. It is not clear from your question how they got in there – maybe you read them from a web form? In any case, they are not encoded with ascii, but python is trying to convert them to unicode assuming that they are. You need to explicitly tell it what the encoding is. This means that you need to know what the encoding is! That is not always easy, and it depends entirely on where this string came from. You could experiment with some common encodings – for example UTF-8. You tell unicode() the encoding as a second parameter:

source = unicode(source, 'utf-8')

回答 3

在某些情况下，当您检查默认编码（print sys.getdefaultencoding()）时，它将返回您正在使用ASCII。如果更改为UTF-8，则无法使用，具体取决于变量的内容。我发现了另一种方法：

import sys
reload(sys)  
sys.setdefaultencoding('Cp1252')

In some cases, when you check your default encoding (print sys.getdefaultencoding()), it returns that you are using ASCII. If you change to UTF-8, it doesn’t work, depending on the content of your variable. I found another way:

import sys
reload(sys)  
sys.setdefaultencoding('Cp1252')

回答 4

我正在搜索以解决以下错误消息：

unicodedecodeerror：’ascii’编解码器无法解码位置5454的字节0xe2：序数不在范围内（128）

我终于通过指定’encoding’来解决它：

f = open('../glove/glove.6B.100d.txt', encoding="utf-8")

希望它能对您有所帮助。

I was searching to solve the following error message:

unicodedecodeerror: ‘ascii’ codec can’t decode byte 0xe2 in position 5454: ordinal not in range(128)

I finally got it fixed by specifying ‘encoding’:

f = open('../glove/glove.6B.100d.txt', encoding="utf-8")

Wish it could help you too.

回答 5

"UnicodeDecodeError: 'ascii' codec can't decode byte"

发生此错误的原因：input_string必须是unicode，但给出了str

"TypeError: Decoding Unicode is not supported"

发生此错误的原因：尝试将unicode input_string转换为unicode

因此，请首先检查您的input_string str是否为必需，并在必要时转换为unicode：

if isinstance(input_string, str):
   input_string = unicode(input_string, 'utf-8')

其次，以上内容仅更改类型，但不删除非ascii字符。如果要删除非ASCII字符：

if isinstance(input_string, str):
   input_string = input_string.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.

elif isinstance(input_string, unicode):
   input_string = input_string.encode('ascii', 'ignore')

"UnicodeDecodeError: 'ascii' codec can't decode byte"

Cause of this error: input_string must be unicode but str was given

"TypeError: Decoding Unicode is not supported"

Cause of this error: trying to convert unicode input_string into unicode

So first check that your input_string is str and convert to unicode if necessary:

if isinstance(input_string, str):
   input_string = unicode(input_string, 'utf-8')

Secondly, the above just changes the type but does not remove non ascii characters. If you want to remove non-ascii characters:

if isinstance(input_string, str):
   input_string = input_string.decode('ascii', 'ignore').encode('ascii') #note: this removes the character and encodes back to string.

elif isinstance(input_string, unicode):
   input_string = input_string.encode('ascii', 'ignore')

回答 6

我发现最好的方法是始终转换为unicode-但这很难实现，因为在实践中，您必须检查每个参数并将其转换为曾经编写的包括某种形式的字符串处理的每个函数和方法。

因此，我想出了以下方法来保证从任一输入的unicode或字节字符串。简而言之，请包含并使用以下lambda：

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt) 
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

例子：

text='Some string with codes > 127, like Zürich'
utext=u'Some string with codes > 127, like Zürich'
print "==> with _u, _uu"
print _u(text), type(_u(text))
print _u(utext), type(_u(utext))
print _uu(text, utext), type(_uu(text, utext))
print "==> with u8, uu8"
print _u8(text), type(_u8(text))
print _u8(utext), type(_u8(utext))
print _uu8(text, utext), type(_uu8(text, utext))
# with % formatting, always use _u() and _uu()
print "Some unknown input %s" % _u(text)
print "Multiple inputs %s, %s" % _uu(text, text)
# but with string.format be sure to always work with unicode strings
print u"Also works with formats: {}".format(_u(text))
print u"Also works with formats: {},{}".format(*_uu(text, text))
# ... or use _u8 and _uu8, because string.format expects byte strings
print "Also works with formats: {}".format(_u8(text))
print "Also works with formats: {},{}".format(*_uu8(text, text))

这是关于此的更多原因。

I find the best is to always convert to unicode – but this is difficult to achieve because in practice you’d have to check and convert every argument to every function and method you ever write that includes some form of string processing.

So I came up with the following approach to either guarantee unicodes or byte strings, from either input. In short, include and use the following lambdas:

# guarantee unicode string
_u = lambda t: t.decode('UTF-8', 'replace') if isinstance(t, str) else t
_uu = lambda *tt: tuple(_u(t) for t in tt) 
# guarantee byte string in UTF8 encoding
_u8 = lambda t: t.encode('UTF-8', 'replace') if isinstance(t, unicode) else t
_uu8 = lambda *tt: tuple(_u8(t) for t in tt)

Examples:

text='Some string with codes > 127, like Zürich'
utext=u'Some string with codes > 127, like Zürich'
print "==> with _u, _uu"
print _u(text), type(_u(text))
print _u(utext), type(_u(utext))
print _uu(text, utext), type(_uu(text, utext))
print "==> with u8, uu8"
print _u8(text), type(_u8(text))
print _u8(utext), type(_u8(utext))
print _uu8(text, utext), type(_uu8(text, utext))
# with % formatting, always use _u() and _uu()
print "Some unknown input %s" % _u(text)
print "Multiple inputs %s, %s" % _uu(text, text)
# but with string.format be sure to always work with unicode strings
print u"Also works with formats: {}".format(_u(text))
print u"Also works with formats: {},{}".format(*_uu(text, text))
# ... or use _u8 and _uu8, because string.format expects byte strings
print "Also works with formats: {}".format(_u8(text))
print "Also works with formats: {},{}".format(*_uu8(text, text))

Here’s some more reasoning about this.

回答 7

为了在Ubuntu安装中的操作系统级别解决此问题，请检查以下内容：

$ locale charmap

如果你得到

locale: Cannot set LC_CTYPE to default locale: No such file or directory

代替

UTF-8

然后设置LC_CTYPE，LC_ALL像这样：

$ export LC_ALL="en_US.UTF-8"
$ export LC_CTYPE="en_US.UTF-8"

In order to resolve this on an operating system level in an Ubuntu installation check the following:

$ locale charmap

If you get

locale: Cannot set LC_CTYPE to default locale: No such file or directory

instead of

UTF-8

then set LC_CTYPE and LC_ALL like this:

$ export LC_ALL="en_US.UTF-8"
$ export LC_CTYPE="en_US.UTF-8"

回答 8

编码将unicode对象转换为字符串对象。我认为您正在尝试对字符串对象进行编码。首先将结果转换为unicode对象，然后将该unicode对象编码为’utf-8’。例如

    result = yourFunction()
    result.decode().encode('utf-8')

Encode converts a unicode object in to a string object. I think you are trying to encode a string object. first convert your result into unicode object and then encode that unicode object into ‘utf-8’. for example

    result = yourFunction()
    result.decode().encode('utf-8')

回答 9

我遇到了同样的问题，但是它不适用于Python3。我遵循了这一点，它解决了我的问题：

enc = sys.getdefaultencoding()
file = open(menu, "r", encoding = enc)

读取/写入文件时，必须设置编码。

I had the same problem but it didn’t work for Python 3. I followed this and it solved my problem:

enc = sys.getdefaultencoding()
file = open(menu, "r", encoding = enc)

You have to set the encoding when you are reading/writing the file.

回答 10

有一个相同的错误，这解决了我的错误。谢谢！python 2和python 3在unicode处理方面的不同使腌制的文件与加载不兼容。因此，请使用python pickle的encoding参数。当我尝试从python 3.7中打开腌制数据时，下面的链接帮助我解决了类似的问题，而我的文件最初保存在python 2.x版本中。 https://blog.modest-destiny.com/posts/python-2-and-3-compatible-pickle-save-and-load/ 我在脚本中复制了load_pickle函数，并在加载我的脚本时调用了load_pickle（pickle_file）像这样的input_data：

input_data = load_pickle("my_dataset.pkl")

load_pickle函数在这里：

def load_pickle(pickle_file):
    try:
        with open(pickle_file, 'rb') as f:
            pickle_data = pickle.load(f)
    except UnicodeDecodeError as e:
        with open(pickle_file, 'rb') as f:
            pickle_data = pickle.load(f, encoding='latin1')
    except Exception as e:
        print('Unable to load data ', pickle_file, ':', e)
        raise
    return pickle_data

Got a same error and this solved my error. Thanks! python 2 and python 3 differing in unicode handling is making pickled files quite incompatible to load. So Use python pickle’s encoding argument. Link below helped me solve the similar problem when I was trying to open pickled data from my python 3.7, while my file was saved originally in python 2.x version. https://blog.modest-destiny.com/posts/python-2-and-3-compatible-pickle-save-and-load/ I copy the load_pickle function in my script and called the load_pickle(pickle_file) while loading my input_data like this:

input_data = load_pickle("my_dataset.pkl")

The load_pickle function is here:

def load_pickle(pickle_file):
    try:
        with open(pickle_file, 'rb') as f:
            pickle_data = pickle.load(f)
    except UnicodeDecodeError as e:
        with open(pickle_file, 'rb') as f:
            pickle_data = pickle.load(f, encoding='latin1')
    except Exception as e:
        print('Unable to load data ', pickle_file, ':', e)
        raise
    return pickle_data

回答 11

这为我工作：

    file = open('docs/my_messy_doc.pdf', 'rb')

This worked for me:

    file = open('docs/my_messy_doc.pdf', 'rb')

回答 12

简而言之，为了确保在Python 2中正确处理unicode：

使用io.open读/写文件
采用 from __future__ import unicode_literals
配置其他数据输入/输出（例如数据库，网络）以使用unicode
如果您无法将输出配置为utf-8，则将其转换为输出 print(text.encode('ascii', 'replace').decode())

有关说明，请参见@Alastair McCormack的详细答案。

In short, to ensure proper unicode handling in Python 2:

use io.open for reading/writing files
use from __future__ import unicode_literals
configure other data inputs/outputs (e.g., databases, network) to use unicode
if you cannot configure outputs to utf-8, convert your output for them print(text.encode('ascii', 'replace').decode())

For explanations, see @Alastair McCormack’s detailed answer.

回答 13

我遇到相同的错误，URL包含非ascii字符（值大于128的字节），我的解决方案是：

url = url.decode('utf8').encode('utf-8')

注意：utf-8，utf8只是别名。仅使用’utf8’或’utf-8’应该以相同的方式工作

就我而言，对我有用，在Python 2.7中，我认为此分配更改了str内部表示形式中的“某些内容”，即，它强制对后备字节序列进行正确的解码，url最后将字符串放入utf-8中 str，所有的魔法都在正确的地方。Python中的Unicode对我来说是黑魔法。希望有用

I had the same error, with URLs containing non-ascii chars (bytes with values > 128), my solution:

url = url.decode('utf8').encode('utf-8')

Note: utf-8, utf8 are simply aliases . Using only ‘utf8’ or ‘utf-8’ should work in the same way

In my case, worked for me, in Python 2.7, I suppose this assignment changed ‘something’ in the str internal representation–i.e., it forces the right decoding of the backed byte sequence in url and finally puts the string into a utf-8 str with all the magic in the right place. Unicode in Python is black magic for me. Hope useful

回答 14

我遇到了字符串“PastelerÃaMallorca”相同的问题，并解决了：

unicode("PastelerÃa Mallorca", 'latin-1')

I got the same problem with the string “PastelerÃa Mallorca” and I solved with:

unicode("PastelerÃa Mallorca", 'latin-1')

回答 15

在Django（1.9.10）/ Python 2.7.5项目中，我经常遇到一些UnicodeDecodeErrorexceptions。主要是当我尝试将unicode字符串提供给日志记录时。我为任意对象创建了一个辅助函数，基本上将其格式化为8位ascii字符串，并将表中未包含的任何字符替换为’？’。我认为这不是最佳解决方案，但由于默认编码为ascii（并且我不想更改它），因此可以：

def encode_for_logging（c，encoding ='ascii'）：
    如果isinstance（c，basestring）：
        返回c.encode（encoding，'replace'）
    elif isinstance（c，Iterable）：
        c_ = []
        对于c中的v：
            c_.append（encode_for_logging（v，编码））
        返回c_
    其他：
        返回encode_for_logging（unicode（c））

In a Django (1.9.10)/Python 2.7.5 project I have frequent UnicodeDecodeError exceptions; mainly when I try to feed unicode strings to logging. I made a helper function for arbitrary objects to basically format to 8-bit ascii strings and replacing any characters not in the table to ‘?’. I think it’s not the best solution but since the default encoding is ascii (and i don’t want to change it) it will do:

def encode_for_logging(c, encoding='ascii'):
    if isinstance(c, basestring):
        return c.encode(encoding, 'replace')
    elif isinstance(c, Iterable):
        c_ = []
        for v in c:
            c_.append(encode_for_logging(v, encoding))
        return c_
    else:
        return encode_for_logging(unicode(c))

回答 16

当我们的字符串中包含一些非ASCII字符并且我们对该字符串执行任何操作而没有正确解码时，就会发生此错误。这帮助我解决了我的问题。我正在读取具有ID列，文本和解码字符的CSV文件，如下所示：

train_df = pd.read_csv("Example.csv")
train_data = train_df.values
for i in train_data:
    print("ID :" + i[0])
    text = i[1].decode("utf-8",errors="ignore").strip().lower()
    print("Text: " + text)

This error occurs when there are some non ASCII characters in our string and we are performing any operations on that string without proper decoding. This helped me solve my problem. I am reading a CSV file with columns ID,Text and decoding characters in it as below:

train_df = pd.read_csv("Example.csv")
train_data = train_df.values
for i in train_data:
    print("ID :" + i[0])
    text = i[1].decode("utf-8",errors="ignore").strip().lower()
    print("Text: " + text)

回答 17

这是我的解决方案，只需添加编码即可。 with open(file, encoding='utf8') as f

并且由于读取手套文件会花费很长时间，因此我建议将手套文件转换为numpy文件。当您使用netx时间阅读嵌入权重时，它将节省您的时间。

import numpy as np
from tqdm import tqdm


def load_glove(file):
    """Loads GloVe vectors in numpy array.
    Args:
        file (str): a path to a glove file.
    Return:
        dict: a dict of numpy arrays.
    """
    embeddings_index = {}
    with open(file, encoding='utf8') as f:
        for i, line in tqdm(enumerate(f)):
            values = line.split()
            word = ''.join(values[:-300])
            coefs = np.asarray(values[-300:], dtype='float32')
            embeddings_index[word] = coefs

    return embeddings_index

# EMBEDDING_PATH = '../embedding_weights/glove.840B.300d.txt'
EMBEDDING_PATH = 'glove.840B.300d.txt'
embeddings = load_glove(EMBEDDING_PATH)

np.save('glove_embeddings.npy', embeddings)

要点链接：https : //gist.github.com/BrambleXu/634a844cdd3cd04bb2e3ba3c83aef227

Here is my solution, just add the encoding. with open(file, encoding='utf8') as f

And because reading glove file will take a long time, I recommend to the glove file to a numpy file. When netx time you read the embedding weights, it will save your time.

import numpy as np
from tqdm import tqdm


def load_glove(file):
    """Loads GloVe vectors in numpy array.
    Args:
        file (str): a path to a glove file.
    Return:
        dict: a dict of numpy arrays.
    """
    embeddings_index = {}
    with open(file, encoding='utf8') as f:
        for i, line in tqdm(enumerate(f)):
            values = line.split()
            word = ''.join(values[:-300])
            coefs = np.asarray(values[-300:], dtype='float32')
            embeddings_index[word] = coefs

    return embeddings_index

# EMBEDDING_PATH = '../embedding_weights/glove.840B.300d.txt'
EMBEDDING_PATH = 'glove.840B.300d.txt'
embeddings = load_glove(EMBEDDING_PATH)

np.save('glove_embeddings.npy', embeddings)

Gist link: https://gist.github.com/BrambleXu/634a844cdd3cd04bb2e3ba3c83aef227

回答 18

在您的Python文件顶部指定：＃encoding = utf-8，它应该可以解决此问题

Specify: # encoding= utf-8 at the top of your Python File, It should fix the issue

知识问答

如何计算pandas DataFrame列中的NaN值

2021年7月25日 Python实用宝典

问题：如何计算pandas DataFrame列中的NaN值

我有数据，我想在其中找到数量NaN，以便如果它小于某个阈值，我将删除此列。我看了一下，但是找不到任何功能。有value_counts，但对我来说会很慢，因为大多数值是不同的，并且我只想计数NaN。

I have data, in which I want to find number of NaN, so that if it is less than some threshold, I will drop this columns. I looked, but didn’t able to find any function for this. there is value_counts, but it would be slow for me, because most of values are distinct and I want count of NaN only.

回答 0

您可以使用该isna()方法（或者它的别名isnull()也与<0.21.0的旧版熊猫兼容），然后求和以计算NaN值。对于一列：

In [1]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isna().sum()   # or s.isnull().sum() for older pandas versions
Out[4]: 2

对于几列，它也适用：

In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

In [6]: df.isna().sum()
Out[6]:
a    1
b    2
dtype: int64

You can use the isna() method (or it’s alias isnull() which is also compatible with older pandas versions < 0.21.0) and then sum to count the NaN values. For one column:

In [1]: s = pd.Series([1,2,3, np.nan, np.nan])

In [4]: s.isna().sum()   # or s.isnull().sum() for older pandas versions
Out[4]: 2

For several columns, it also works:

In [5]: df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

In [6]: df.isna().sum()
Out[6]:
a    1
b    2
dtype: int64

回答 1

您可以从非Nan值的计数中减去总长度：

count_nan = len(df) - df.count()

您应该在数据上计时。与isnull解决方案相比，小型系列的速度提高了3倍。

You could subtract the total length from the count of non-nan values:

count_nan = len(df) - df.count()

You should time it on your data. For small Series got a 3x speed up in comparison with the isnull solution.

回答 2

假设df是一个熊猫DataFrame。

然后，

df.isnull().sum(axis = 0)

这将在每列中提供NaN值的数量。

如果需要，可以在每行中输入NaN值，

df.isnull().sum(axis = 1)

Lets assume df is a pandas DataFrame.

Then,

df.isnull().sum(axis = 0)

This will give number of NaN values in every column.

If you need, NaN values in every row,

df.isnull().sum(axis = 1)

回答 3

根据投票最多的答案，我们可以轻松定义一个函数，该函数为我们提供一个数据框，以预览每列中的缺失值和缺失值的百分比：

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns

Based on the most voted answer we can easily define a function that gives us a dataframe to preview the missing values and the % of missing values in each column:

def missing_values_table(df):
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mis_val_table = pd.concat([mis_val, mis_val_percent], axis=1)
        mis_val_table_ren_columns = mis_val_table.rename(
        columns = {0 : 'Missing Values', 1 : '% of Total Values'})
        mis_val_table_ren_columns = mis_val_table_ren_columns[
            mis_val_table_ren_columns.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns.\n"      
            "There are " + str(mis_val_table_ren_columns.shape[0]) +
              " columns that have missing values.")
        return mis_val_table_ren_columns

回答 4

从熊猫0.14.1开始，我在这里建议在 value_counts方法中使用关键字参数：

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
    print df[col].value_counts(dropna=False)

2     1
 1     1
NaN    1
dtype: int64
NaN    2
 1     1
dtype: int64

Since pandas 0.14.1 my suggestion here to have a keyword argument in the value_counts method has been implemented:

import pandas as pd
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})
for col in df:
    print df[col].value_counts(dropna=False)

2     1
 1     1
NaN    1
dtype: int64
NaN    2
 1     1
dtype: int64

回答 5

如果它只是在熊猫列中计算nan值，这是一种快速方法

import pandas as pd
## df1 as an example data frame 
## col1 name of column for which you want to calculate the nan values
sum(pd.isnull(df1['col1']))

if its just counting nan values in a pandas column here is a quick way

import pandas as pd
## df1 as an example data frame 
## col1 name of column for which you want to calculate the nan values
sum(pd.isnull(df1['col1']))

回答 6

如果您正在使用Jupyter Notebook，如何…。

 %%timeit
 df.isnull().any().any()

要么

 %timeit 
 df.isnull().values.sum()

或者，数据中是否存在NaN，如果是，在哪里？

 df.isnull().any()

if you are using Jupyter Notebook, How about….

 %%timeit
 df.isnull().any().any()

 %timeit 
 df.isnull().values.sum()

or, are there anywhere NaNs in the data, if yes, where?

 df.isnull().any()

回答 7

下面将按降序打印所有Nan列。

df.isnull().sum().sort_values(ascending = False)

要么

下面将按降序打印前15 Nan列。

df.isnull().sum().sort_values(ascending = False).head(15)

The below will print all the Nan columns in descending order.

df.isnull().sum().sort_values(ascending = False)

The below will print first 15 Nan columns in descending order.

df.isnull().sum().sort_values(ascending = False).head(15)

回答 8

import numpy as np
import pandas as pd

raw_data = {'first_name': ['Jason', np.nan, 'Tina', 'Jake', 'Amy'], 
        'last_name': ['Miller', np.nan, np.nan, 'Milner', 'Cooze'], 
        'age': [22, np.nan, 23, 24, 25], 
        'sex': ['m', np.nan, 'f', 'm', 'f'], 
        'Test1_Score': [4, np.nan, 0, 0, 0],
        'Test2_Score': [25, np.nan, np.nan, 0, 0]}
results = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'sex', 'Test1_Score', 'Test2_Score'])

results 
'''
  first_name last_name   age  sex  Test1_Score  Test2_Score
0      Jason    Miller  22.0    m          4.0         25.0
1        NaN       NaN   NaN  NaN          NaN          NaN
2       Tina       NaN  23.0    f          0.0          NaN
3       Jake    Milner  24.0    m          0.0          0.0
4        Amy     Cooze  25.0    f          0.0          0.0
'''

您可以使用以下功能，这将在Dataframe中提供输出

零值
缺失值
占总价值的百分比
总零缺失值
总零缺失值百分比
数据类型

只需复制并粘贴以下函数，然后通过传递您的pandas Dataframe来调用它

def missing_zero_values_table(df):
        zero_val = (df == 0.00).astype(int).sum(axis=0)
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mz_table = pd.concat([zero_val, mis_val, mis_val_percent], axis=1)
        mz_table = mz_table.rename(
        columns = {0 : 'Zero Values', 1 : 'Missing Values', 2 : '% of Total Values'})
        mz_table['Total Zero Missing Values'] = mz_table['Zero Values'] + mz_table['Missing Values']
        mz_table['% Total Zero Missing Values'] = 100 * mz_table['Total Zero Missing Values'] / len(df)
        mz_table['Data Type'] = df.dtypes
        mz_table = mz_table[
            mz_table.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns and " + str(df.shape[0]) + " Rows.\n"      
            "There are " + str(mz_table.shape[0]) +
              " columns that have missing values.")
#         mz_table.to_excel('D:/sampledata/missing_and_zero_values.xlsx', freeze_panes=(1,0), index = False)
        return mz_table

missing_zero_values_table(results)

输出量

Your selected dataframe has 6 columns and 5 Rows.
There are 6 columns that have missing values.

             Zero Values  Missing Values  % of Total Values  Total Zero Missing Values  % Total Zero Missing Values Data Type
last_name              0               2               40.0                          2                         40.0    object
Test2_Score            2               2               40.0                          4                         80.0   float64
first_name             0               1               20.0                          1                         20.0    object
age                    0               1               20.0                          1                         20.0   float64
sex                    0               1               20.0                          1                         20.0    object
Test1_Score            3               1               20.0                          4                         80.0   float64

如果要保持简单，则可以使用以下函数获取％的缺失值

def missing(dff):
    print (round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False))


missing(results)
'''
Test2_Score    40.0
last_name      40.0
Test1_Score    20.0
sex            20.0
age            20.0
first_name     20.0
dtype: float64
'''

import numpy as np
import pandas as pd

raw_data = {'first_name': ['Jason', np.nan, 'Tina', 'Jake', 'Amy'], 
        'last_name': ['Miller', np.nan, np.nan, 'Milner', 'Cooze'], 
        'age': [22, np.nan, 23, 24, 25], 
        'sex': ['m', np.nan, 'f', 'm', 'f'], 
        'Test1_Score': [4, np.nan, 0, 0, 0],
        'Test2_Score': [25, np.nan, np.nan, 0, 0]}
results = pd.DataFrame(raw_data, columns = ['first_name', 'last_name', 'age', 'sex', 'Test1_Score', 'Test2_Score'])

results 
'''
  first_name last_name   age  sex  Test1_Score  Test2_Score
0      Jason    Miller  22.0    m          4.0         25.0
1        NaN       NaN   NaN  NaN          NaN          NaN
2       Tina       NaN  23.0    f          0.0          NaN
3       Jake    Milner  24.0    m          0.0          0.0
4        Amy     Cooze  25.0    f          0.0          0.0
'''

You can use following function, which will give you output in Dataframe

Zero Values
Missing Values
% of Total Values
Total Zero Missing Values
% Total Zero Missing Values
Data Type

Just copy and paste following function and call it by passing your pandas Dataframe

def missing_zero_values_table(df):
        zero_val = (df == 0.00).astype(int).sum(axis=0)
        mis_val = df.isnull().sum()
        mis_val_percent = 100 * df.isnull().sum() / len(df)
        mz_table = pd.concat([zero_val, mis_val, mis_val_percent], axis=1)
        mz_table = mz_table.rename(
        columns = {0 : 'Zero Values', 1 : 'Missing Values', 2 : '% of Total Values'})
        mz_table['Total Zero Missing Values'] = mz_table['Zero Values'] + mz_table['Missing Values']
        mz_table['% Total Zero Missing Values'] = 100 * mz_table['Total Zero Missing Values'] / len(df)
        mz_table['Data Type'] = df.dtypes
        mz_table = mz_table[
            mz_table.iloc[:,1] != 0].sort_values(
        '% of Total Values', ascending=False).round(1)
        print ("Your selected dataframe has " + str(df.shape[1]) + " columns and " + str(df.shape[0]) + " Rows.\n"      
            "There are " + str(mz_table.shape[0]) +
              " columns that have missing values.")
#         mz_table.to_excel('D:/sampledata/missing_and_zero_values.xlsx', freeze_panes=(1,0), index = False)
        return mz_table

missing_zero_values_table(results)

Output

Your selected dataframe has 6 columns and 5 Rows.
There are 6 columns that have missing values.

             Zero Values  Missing Values  % of Total Values  Total Zero Missing Values  % Total Zero Missing Values Data Type
last_name              0               2               40.0                          2                         40.0    object
Test2_Score            2               2               40.0                          4                         80.0   float64
first_name             0               1               20.0                          1                         20.0    object
age                    0               1               20.0                          1                         20.0   float64
sex                    0               1               20.0                          1                         20.0    object
Test1_Score            3               1               20.0                          4                         80.0   float64

If you want to keep it simple then you can use following function to get missing values in %

def missing(dff):
    print (round((dff.isnull().sum() * 100/ len(dff)),2).sort_values(ascending=False))


missing(results)
'''
Test2_Score    40.0
last_name      40.0
Test1_Score    20.0
sex            20.0
age            20.0
first_name     20.0
dtype: float64
'''

回答 9

计数零：

df[df == 0].count(axis=0)

要计算NaN：

df.isnull().sum()

要么

df.isna().sum()

To count zeroes:

df[df == 0].count(axis=0)

To count NaN:

df.isnull().sum()

df.isna().sum()

回答 10

您可以使用value_counts方法并打印np.nan的值

s.value_counts(dropna = False)[np.nan]

You can use value_counts method and print values of np.nan

s.value_counts(dropna = False)[np.nan]

回答 11

请在下面使用特定的列数

dataframe.columnName.isnull().sum()

Please use below for particular column count

dataframe.columnName.isnull().sum()

回答 12

df1.isnull().sum()

这将达到目的。

df1.isnull().sum()

This will do the trick.

回答 13

这是用于按Null列计算值的代码：

df.isna().sum()

Here is the code for counting Null values column wise :

df.isna().sum()

回答 14

2017年7月有一篇不错的Dzone文章，其中详细介绍了总结NaN值的各种方法。检查它在这里。

我引用的文章通过以下方式提供了附加值：（1）显示一种计数和显示每一列的NaN计数的方法，以便人们可以轻松地决定是否丢弃这些列，以及（2）演示一种在其中选择那些行的方法具有NaN的特定分子，因此可以有选择地丢弃或估算它们。

这是一个演示该方法实用性的简单示例-仅用几列，也许它的用处并不明显，但我发现它对较大的数据帧有帮助。

import pandas as pd
import numpy as np

# example DataFrame
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

# Check whether there are null values in columns
null_columns = df.columns[df.isnull().any()]
print(df[null_columns].isnull().sum())

# One can follow along further per the cited article

There is a nice Dzone article from July 2017 which details various ways of summarising NaN values. Check it out here.

The article I have cited provides additional value by: (1) Showing a way to count and display NaN counts for every column so that one can easily decide whether or not to discard those columns and (2) Demonstrating a way to select those rows in specific which have NaNs so that they may be selectively discarded or imputed.

Here’s a quick example to demonstrate the utility of the approach – with only a few columns perhaps its usefulness is not obvious but I found it to be of help for larger data-frames.

import pandas as pd
import numpy as np

# example DataFrame
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

# Check whether there are null values in columns
null_columns = df.columns[df.isnull().any()]
print(df[null_columns].isnull().sum())

# One can follow along further per the cited article

回答 15

为了计算NaN，尚未建议的另一个简单选项是添加形状以返回带有NaN的行数。

df[df['col_name'].isnull()]['col_name'].shape

One other simple option not suggested yet, to just count NaNs, would be adding in the shape to return the number of rows with NaN.

df[df['col_name'].isnull()]['col_name'].shape

回答 16

df.isnull（）。sum（）将给出缺失值的列式总和。

如果您想知道特定列中缺失值的总和，则可以使用以下代码df.column.isnull（）。sum（）

df.isnull().sum() will give the column-wise sum of missing values.

If you want to know the sum of missing values in a particular column then following code will work df.column.isnull().sum()

回答 17

根据给出的答案和一些改进，这是我的方法

def PercentageMissin(Dataset):
    """this function will return the percentage of missing values in a dataset """
    if isinstance(Dataset,pd.DataFrame):
        adict={} #a dictionary conatin keys columns names and values percentage of missin value in the columns
        for col in Dataset.columns:
            adict[col]=(np.count_nonzero(Dataset[col].isnull())*100)/len(Dataset[col])
        return pd.DataFrame(adict,index=['% of missing'],columns=adict.keys())
    else:
        raise TypeError("can only be used with panda dataframe")

based to the answer that was given and some improvements this is my approach

def PercentageMissin(Dataset):
    """this function will return the percentage of missing values in a dataset """
    if isinstance(Dataset,pd.DataFrame):
        adict={} #a dictionary conatin keys columns names and values percentage of missin value in the columns
        for col in Dataset.columns:
            adict[col]=(np.count_nonzero(Dataset[col].isnull())*100)/len(Dataset[col])
        return pd.DataFrame(adict,index=['% of missing'],columns=adict.keys())
    else:
        raise TypeError("can only be used with panda dataframe")

回答 18

如果您需要获取groupby提取的不同组之间的非NA（non-None）和NA（None）计数：

gdf = df.groupby(['ColumnToGroupBy'])

def countna(x):
    return (x.isna()).sum()

gdf.agg(['count', countna, 'size'])

这将返回非NA，NA的计数以及每个组的条目总数。

In case you need to get the non-NA (non-None) and NA (None) counts across different groups pulled out by groupby:

gdf = df.groupby(['ColumnToGroupBy'])

def countna(x):
    return (x.isna()).sum()

gdf.agg(['count', countna, 'size'])

This returns the counts of non-NA, NA and total number of entries per group.

回答 19

在我的代码中使用了@sushmit提出的解决方案。

相同的可能变体也可以是

colNullCnt = []
for z in range(len(df1.cols)):
    colNullCnt.append([df1.cols[z], sum(pd.isnull(trainPd[df1.cols[z]]))])

这样做的好处是，此后它将返回df中每个列的结果。

Used the solution proposed by @sushmit in my code.

A possible variation of the same can also be

colNullCnt = []
for z in range(len(df1.cols)):
    colNullCnt.append([df1.cols[z], sum(pd.isnull(trainPd[df1.cols[z]]))])

Advantage of this is that it returns the result for each of the columns in the df henceforth.

回答 20

import pandas as pd
import numpy as np

# example DataFrame
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

# count the NaNs in a column
num_nan_a = df.loc[ (pd.isna(df['a'])) , 'a' ].shape[0]
num_nan_b = df.loc[ (pd.isna(df['b'])) , 'b' ].shape[0]

# summarize the num_nan_b
print(df)
print(' ')
print(f"There are {num_nan_a} NaNs in column a")
print(f"There are {num_nan_b} NaNs in column b")

给出作为输出：

     a    b
0  1.0  NaN
1  2.0  1.0
2  NaN  NaN

There are 1 NaNs in column a
There are 2 NaNs in column b

import pandas as pd
import numpy as np

# example DataFrame
df = pd.DataFrame({'a':[1,2,np.nan], 'b':[np.nan,1,np.nan]})

# count the NaNs in a column
num_nan_a = df.loc[ (pd.isna(df['a'])) , 'a' ].shape[0]
num_nan_b = df.loc[ (pd.isna(df['b'])) , 'b' ].shape[0]

# summarize the num_nan_b
print(df)
print(' ')
print(f"There are {num_nan_a} NaNs in column a")
print(f"There are {num_nan_b} NaNs in column b")

Gives as output:

     a    b
0  1.0  NaN
1  2.0  1.0
2  NaN  NaN

There are 1 NaNs in column a
There are 2 NaNs in column b

回答 21

假设您要在称为评论的数据框中获取称为价格的列（系列）中的缺失值（NaN）数

#import the dataframe
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

要获取缺失值，以n_missing_prices作为变量，只需执行

n_missing_prices = sum(reviews.price.isnull())
print(n_missing_prices)

sum是这里的关键方法，在我意识到sum是在这种情况下使用的正确方法之前，我曾尝试使用count

Suppose you want to get the number of missing values(NaN) in a column(series) known as price in a dataframe called reviews

#import the dataframe
import pandas as pd

reviews = pd.read_csv("../input/wine-reviews/winemag-data-130k-v2.csv", index_col=0)

To get the missing values, with n_missing_prices as the variable, simple do

n_missing_prices = sum(reviews.price.isnull())
print(n_missing_prices)

sum is the key method here, was trying to use count before i realized sum is the right method to use in this context

回答 22

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.count.html#pandas.Series.count

pandas.Series.count
Series.count(level=None)[source]

返回系列中非NA /空观测值的数量

https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.count.html#pandas.Series.count

pandas.Series.count
Series.count(level=None)[source]

Return number of non-NA/null observations in the Series

回答 23

对于您的任务，您可以使用pandas.DataFrame.dropna（https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html）：

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [1, 2, 3, 4, np.nan],
                   'b': [1, 2, np.nan, 4, np.nan],
                   'c': [np.nan, 2, np.nan, 4, np.nan]})
df = df.dropna(axis='columns', thresh=3)

print(df)

您可以使用thresh thresh参数为DataFrame中的所有列声明NaN值的最大计数。

代码输出：

     a    b
0  1.0  1.0
1  2.0  2.0
2  3.0  NaN
3  4.0  4.0
4  NaN  NaN

For your task you can use pandas.DataFrame.dropna (https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.dropna.html):

import pandas as pd
import numpy as np

df = pd.DataFrame({'a': [1, 2, 3, 4, np.nan],
                   'b': [1, 2, np.nan, 4, np.nan],
                   'c': [np.nan, 2, np.nan, 4, np.nan]})
df = df.dropna(axis='columns', thresh=3)

print(df)

Whith thresh parameter you can declare the max count for NaN values for all columns in DataFrame.

Code outputs:

     a    b
0  1.0  1.0
1  2.0  2.0
2  3.0  NaN
3  4.0  4.0
4  NaN  NaN

知识问答

如何使用Python从字符串中删除字符

2021年7月25日 Python实用宝典

问题：如何使用Python从字符串中删除字符

例如，有一个字符串。EXAMPLE。

如何从中删除中间字符M？我不需要代码。我想知道：

Python中的字符串是否以任何特殊字符结尾？
哪种更好的方法-从中间字符开始或从创建新字符串开始，将所有内容从右移到左，而不是复制中间字符？

There is a string, for example. EXAMPLE.

How can I remove the middle character, i.e., M from it? I don’t need the code. I want to know:

Do strings in Python end in any special character?
Which is a better way – shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?

回答 0

在Python中，字符串是不可变的，因此您必须创建一个新字符串。您有一些有关如何创建新字符串的选项。如果要删除出现的“ M”，请执行以下操作：

newstr = oldstr.replace("M", "")

如果要删除中心字符：

midlen = len(oldstr)/2   # //2 in python 3
newstr = oldstr[:midlen] + oldstr[midlen+1:]

您询问字符串是否以特殊字符结尾。不，您在想像C程序员。在Python中，字符串按其长度存储，因此任何字节值（包括\0）都可以出现在字符串中。

In Python, strings are immutable, so you have to create a new string. You have a few options of how to create the new string. If you want to remove the ‘M’ wherever it appears:

newstr = oldstr.replace("M", "")

If you want to remove the central character:

midlen = len(oldstr)/2   # //2 in python 3
newstr = oldstr[:midlen] + oldstr[midlen+1:]

You asked if strings end with a special character. No, you are thinking like a C programmer. In Python, strings are stored with their length, so any byte value, including \0, can appear in a string.

回答 1

这可能是最好的方法：

original = "EXAMPLE"
removed = original.replace("M", "")

不用担心字符转移等问题。大多数Python代码以更高的抽象级别进行。

This is probably the best way:

original = "EXAMPLE"
removed = original.replace("M", "")

Don’t worry about shifting characters and such. Most Python code takes place on a much higher level of abstraction.

回答 2

要替换特定职位：

s = s[:pos] + s[(pos+1):]

替换特定字符：

s = s.replace('M','')

To replace a specific position:

s = s[:pos] + s[(pos+1):]

To replace a specific character:

s = s.replace('M','')

回答 3

字符串是不可变的。但是您可以将它们转换为可变的列表，然后在更改列表后将其转换回字符串。

s = "this is a string"

l = list(s)  # convert to list

l[1] = ""    # "delete" letter h (the item actually still exists but is empty)
l[1:2] = []  # really delete letter h (the item is actually removed from the list)
del(l[1])    # another way to delete it

p = l.index("a")  # find position of the letter "a"
del(l[p])         # delete it

s = "".join(l)  # convert back to string

您还可以通过从现有字符串中获取除所需字符以外的所有内容来创建一个新字符串，如其他字符串所示。

Strings are immutable. But you can convert them to a list, which is mutable, and then convert the list back to a string after you’ve changed it.

s = "this is a string"

l = list(s)  # convert to list

l[1] = ""    # "delete" letter h (the item actually still exists but is empty)
l[1:2] = []  # really delete letter h (the item is actually removed from the list)
del(l[1])    # another way to delete it

p = l.index("a")  # find position of the letter "a"
del(l[p])         # delete it

s = "".join(l)  # convert back to string

You can also create a new string, as others have shown, by taking everything except the character you want from the existing string.

回答 4

如何从中删除中间字符（即M）？

您不能，因为Python中的字符串是不可变的。

Python中的字符串是否以任何特殊字符结尾？

不。它们类似于字符列表。列表的长度定义字符串的长度，并且没有字符充当终止符。

哪种更好的方法-从中间字符开始或从创建新字符串开始，将所有内容从右移到左，而不是复制中间字符？

您无法修改现有字符串，因此必须创建一个新字符串，其中包含除中间字符以外的所有内容。

How can I remove the middle character, i.e., M from it?

You can’t, because strings in Python are immutable.

Do strings in Python end in any special character?

No. They are similar to lists of characters; the length of the list defines the length of the string, and no character acts as a terminator.

Which is a better way – shifting everything right to left starting from the middle character OR creation of a new string and not copying the middle character?

You cannot modify the existing string, so you must create a new one containing everything except the middle character.

回答 5

使用translate()方法：

>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'

Use the translate() method:

>>> s = 'EXAMPLE'
>>> s.translate(None, 'M')
'EXAPLE'

回答 6

UserString.MutableString

可变方式：

import UserString

s = UserString.MutableString("EXAMPLE")

>>> type(s)
<type 'str'>

# Delete 'M'
del s[3]

# Turn it for immutable:
s = str(s)

UserString.MutableString

Mutable way:

import UserString

s = UserString.MutableString("EXAMPLE")

>>> type(s)
<type 'str'>

# Delete 'M'
del s[3]

# Turn it for immutable:
s = str(s)

回答 7

card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)

如何从字符串中删除一个字符： 这是一个示例，其中有一堆卡表示为字符串中的字符。其中一个被绘制（为random.choice（）函数导入random模块，该函数从字符串中选择一个随机字符）。创建了一个新字符串cardsLeft，以容纳由字符串函数replace（）给出的剩余卡片，其中最后一个参数指示仅一个“卡片”将被空字符串替换…

card = random.choice(cards)
cardsLeft = cards.replace(card, '', 1)

How to remove one character from a string: Here is an example where there is a stack of cards represented as characters in a string. One of them is drawn (import random module for the random.choice() function, that picks a random character in the string). A new string, cardsLeft, is created to hold the remaining cards given by the string function replace() where the last parameter indicates that only one “card” is to be replaced by the empty string…

回答 8

def kill_char(string, n): # n = position of which character you want to remove
    begin = string[:n]    # from beginning to n (n not included)
    end = string[n+1:]    # n+1 through end of string
    return begin + end
print kill_char("EXAMPLE", 3)  # "M" removed

我看到这个地方在这里。

def kill_char(string, n): # n = position of which character you want to remove
    begin = string[:n]    # from beginning to n (n not included)
    end = string[n+1:]    # n+1 through end of string
    return begin + end
print kill_char("EXAMPLE", 3)  # "M" removed

I have seen this somewhere here.

回答 9

这是我切出“ M”的方法：

s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]

Here’s what I did to slice out the “M”:

s = 'EXAMPLE'
s1 = s[:s.index('M')] + s[s.index('M')+1:]

回答 10

如果您要删除/忽略字符串中的字符，例如，您拥有此字符串，

“ [11：L：0]”

来自Web API响应或类似CSV文件之类的信息，假设您正在使用请求

import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content

循环并摆脱不需要的字符：

for line in resp.iter_lines():
  line = line.replace("[", "")
  line = line.replace("]", "")
  line = line.replace('"', "")

可选拆分，您将能够单独读取值：

listofvalues = line.split(':')

现在访问每个值更容易：

print listofvalues[0]
print listofvalues[1]
print listofvalues[2]

这将打印

11

大号

0

If you want to delete/ignore characters in a string, and, for instance, you have this string,

“[11:L:0]”

from a web API response or something like that, like a CSV file, let’s say you are using requests

import requests
udid = 123456
url = 'http://webservices.yourserver.com/action/id-' + udid
s = requests.Session()
s.verify = False
resp = s.get(url, stream=True)
content = resp.content

loop and get rid of unwanted chars:

for line in resp.iter_lines():
  line = line.replace("[", "")
  line = line.replace("]", "")
  line = line.replace('"', "")

Optional split, and you will be able to read values individually:

listofvalues = line.split(':')

Now accessing each value is easier:

print listofvalues[0]
print listofvalues[1]
print listofvalues[2]

This will print

11

L

0

回答 11

删除一次char或sub-string 一次（仅第一次出现）：

main_string = main_string.replace(sub_str, replace_with, 1)

注意：在这里1可以用任何int您要替换的出现次数替换。

To delete a char or a sub-string once (only the first occurrence):

main_string = main_string.replace(sub_str, replace_with, 1)

NOTE: Here 1 can be replaced with any int for the number of occurrence you want to replace.

回答 12

您可以简单地使用列表理解。

假设您有字符串：，my name is并且想要删除character m。使用以下代码：

"".join([x for x in "my name is" if x is not 'm'])

You can simply use list comprehension.

Assume that you have the string: my name is and you want to remove character m. use the following code:

"".join([x for x in "my name is" if x is not 'm'])

回答 13

from random import randint


def shuffle_word(word):
    newWord=""
    for i in range(0,len(word)):
        pos=randint(0,len(word)-1)
        newWord += word[pos]
        word = word[:pos]+word[pos+1:]
    return newWord

word = "Sarajevo"
print(shuffle_word(word))

from random import randint


def shuffle_word(word):
    newWord=""
    for i in range(0,len(word)):
        pos=randint(0,len(word)-1)
        newWord += word[pos]
        word = word[:pos]+word[pos+1:]
    return newWord

word = "Sarajevo"
print(shuffle_word(word))

回答 14

另一种方法是使用函数

下面是一种仅通过调用函数即可从字符串中删除所有元音的方法

def disemvowel(s):
    return s.translate(None, "aeiouAEIOU")

Another way is with a function,

Below is a way to remove all vowels from a string, just by calling the function

def disemvowel(s):
    return s.translate(None, "aeiouAEIOU")

回答 15

字符串在Python中是不可变的，因此您的两个选项基本上意味着同一件事。

Strings are immutable in Python so both your options mean the same thing basically.

知识问答

如何在熊猫数据框的列中将所有NaN值替换为零

2021年7月25日 Python实用宝典

问题：如何在熊猫数据框的列中将所有NaN值替换为零

我有一个数据框如下

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

当我尝试将一个函数应用于“金额”列时，出现以下错误。

ValueError: cannot convert float NaN to integer

我已经尝试过使用数学模块中的.isnan来应用函数。我已经尝试过pandas .replace属性。我已经尝试过pandas 0.9的.sparse data属性。我还尝试过如果函数中的NaN == NaN语句。我还看了这篇文章如何在R数据帧中用零替换NA值？同时查看其他文章。我尝试过的所有方法均无效或无法识别NaN。任何提示或解决方案将不胜感激。

I have a dataframe as below

      itm Date                  Amount 
67    420 2012-09-30 00:00:00   65211
68    421 2012-09-09 00:00:00   29424
69    421 2012-09-16 00:00:00   29877
70    421 2012-09-23 00:00:00   30990
71    421 2012-09-30 00:00:00   61303
72    485 2012-09-09 00:00:00   71781
73    485 2012-09-16 00:00:00     NaN
74    485 2012-09-23 00:00:00   11072
75    485 2012-09-30 00:00:00  113702
76    489 2012-09-09 00:00:00   64731
77    489 2012-09-16 00:00:00     NaN

when I try to .apply a function to the Amount column I get the following error.

ValueError: cannot convert float NaN to integer

I have tried applying a function using .isnan from the Math Module I have tried the pandas .replace attribute I tried the .sparse data attribute from pandas 0.9 I have also tried if NaN == NaN statement in a function. I have also looked at this article How do I replace NA values with zeros in an R dataframe? whilst looking at some other articles. All the methods I have tried have not worked or do not recognise NaN. Any Hints or solutions would be appreciated.

回答 0

我相信DataFrame.fillna()会为您做到这一点。

链接到文档以获取数据框和系列。

例：

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

要仅将NaN填入一列，请仅选择该列。在这种情况下，我使用inplace = True实际更改df的内容。

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

编辑：

为避免出现SettingWithCopyWarning，请使用内置的列专用功能：

df.fillna({1:0}, inplace=True)

I believe DataFrame.fillna() will do this for you.

Link to Docs for a dataframe and for a Series.

Example:

In [7]: df
Out[7]: 
          0         1
0       NaN       NaN
1 -0.494375  0.570994
2       NaN       NaN
3  1.876360 -0.229738
4       NaN       NaN

In [8]: df.fillna(0)
Out[8]: 
          0         1
0  0.000000  0.000000
1 -0.494375  0.570994
2  0.000000  0.000000
3  1.876360 -0.229738
4  0.000000  0.000000

To fill the NaNs in only one column, select just that column. in this case I’m using inplace=True to actually change the contents of df.

In [12]: df[1].fillna(0, inplace=True)
Out[12]: 
0    0.000000
1    0.570994
2    0.000000
3   -0.229738
4    0.000000
Name: 1

In [13]: df
Out[13]: 
          0         1
0       NaN  0.000000
1 -0.494375  0.570994
2       NaN  0.000000
3  1.876360 -0.229738
4       NaN  0.000000

EDIT:

To avoid a SettingWithCopyWarning, use the built in column-specific functionality:

df.fillna({1:0}, inplace=True)

回答 1

不能保证切片会返回视图或副本。你可以做

df['column'] = df['column'].fillna(value)

It is not guaranteed that the slicing returns a view or a copy. You can do

df['column'] = df['column'].fillna(value)

回答 2

您可以使用replace更改NaN为0：

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)

You could use replace to change NaN to 0:

import pandas as pd
import numpy as np

# for column
df['column'] = df['column'].replace(np.nan, 0)

# for whole dataframe
df = df.replace(np.nan, 0)

# inplace
df.replace(np.nan, 0, inplace=True)

回答 3

我只是想提供一些更新/特殊情况，因为看起来人们仍然来这里。如果您使用的是多索引或以其他方式使用索引切片器，则inplace = True选项可能不足以更新您选择的切片。例如，在2×2级多索引中，这不会更改任何值（从熊猫0.15开始）：

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

“问题”是链接中断了fillna更新原始数据帧的能力。我将“问题”用引号引起来，因为设计决策有充分的理由导致在某些情况下无法通过这些链条进行解释。同样，这是一个复杂的示例（尽管我确实遇到过），但是根据切片的方式，同样的情况可能适用于较少级别的索引。

解决方案是DataFrame.update：

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

这是一行，读起来相当好（某种），并消除了中间变量或循环的不必要混乱，同时允许您将fillna应用于所需的任何多层次切片！

如果有人可以找到行不通的地方，请在评论中发帖，我一直在弄乱它并查看源代码，它似乎至少解决了我的多索引切片问题。

I just wanted to provide a bit of an update/special case since it looks like people still come here. If you’re using a multi-index or otherwise using an index-slicer the inplace=True option may not be enough to update the slice you’ve chosen. For example in a 2×2 level multi-index this will not change any values (as of pandas 0.15):

idx = pd.IndexSlice
df.loc[idx[:,mask_1],idx[mask_2,:]].fillna(value=0,inplace=True)

The “problem” is that the chaining breaks the fillna ability to update the original dataframe. I put “problem” in quotes because there are good reasons for the design decisions that led to not interpreting through these chains in certain situations. Also, this is a complex example (though I really ran into it), but the same may apply to fewer levels of indexes depending on how you slice.

The solution is DataFrame.update:

df.update(df.loc[idx[:,mask_1],idx[[mask_2],:]].fillna(value=0))

It’s one line, reads reasonably well (sort of) and eliminates any unnecessary messing with intermediate variables or loops while allowing you to apply fillna to any multi-level slice you like!

If anybody can find places this doesn’t work please post in the comments, I’ve been messing with it and looking at the source and it seems to solve at least my multi-index slice problems.

回答 4

下面的代码为我工作。

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

The below code worked for me.

import pandas

df = pandas.read_csv('somefile.txt')

df = df.fillna(0)

回答 5

填充缺失值的简单方法：

填充 字符串列：当字符串列具有缺失值和NaN值时。

df['string column name'].fillna(df['string column name'].mode().values[0], inplace = True)

填充 数字列：当数字列缺少值和NaN值时。

df['numeric column name'].fillna(df['numeric column name'].mean(), inplace = True)

用零填充NaN：

df['column name'].fillna(0, inplace = True)

Easy way to fill the missing values:-

filling string columns: when string columns have missing values and NaN values.

df['string column name'].fillna(df['string column name'].mode().values[0], inplace = True)

filling numeric columns: when the numeric columns have missing values and NaN values.

df['numeric column name'].fillna(df['numeric column name'].mean(), inplace = True)

filling NaN with zero:

df['column name'].fillna(0, inplace = True)

回答 6

您还可以使用字典来填充DataFrame中特定列的NaN值，而不是使用某个oneValue来填充所有DF。

import pandas as pd

df = pd.read_excel('example.xlsx')
df.fillna( {
        'column1': 'Write your values here',
        'column2': 'Write your values here',
        'column3': 'Write your values here',
        'column4': 'Write your values here',
        .
        .
        .
        'column-n': 'Write your values here'} , inplace=True)

You can also use dictionaries to fill NaN values of the specific columns in the DataFrame rather to fill all the DF with some oneValue.

import pandas as pd

df = pd.read_excel('example.xlsx')
df.fillna( {
        'column1': 'Write your values here',
        'column2': 'Write your values here',
        'column3': 'Write your values here',
        'column4': 'Write your values here',
        .
        .
        .
        'column-n': 'Write your values here'} , inplace=True)

回答 7

考虑到Amount上表中的特定列是整数类型。以下是一个解决方案：

df['Amount'] = df.Amount.fillna(0).astype(int)

同样，你可以用不同的数据类型，如填充它float，str等等。

特别是，我会考虑使用数据类型来比较同一列的各种值。

Considering the particular column Amount in the above table is of integer type. The following would be a solution :

df['Amount'] = df.Amount.fillna(0).astype(int)

Similarly, you can fill it with various data types like float, str and so on.

In particular, I would consider datatype to compare various values of the same column.

回答 8

替换熊猫中的na值

df['column_name'].fillna(value_to_be_replaced,inplace=True)

如果为inplace = False，则不更新df（数据帧），而是返回修改后的值。

To replace na values in pandas

df['column_name'].fillna(value_to_be_replaced,inplace=True)

if inplace = False, instead of updating the df (dataframe) it will return the modified values.

回答 9

如果要将其转换为pandas数据框，也可以使用来完成此操作fillna。

import numpy as np
df=np.array([[1,2,3, np.nan]])

import pandas as pd
df=pd.DataFrame(df)
df.fillna(0)

这将返回以下内容：

     0    1    2   3
0  1.0  2.0  3.0 NaN
>>> df.fillna(0)
     0    1    2    3
0  1.0  2.0  3.0  0.0

If you were to convert it to a pandas dataframe, you can also accomplish this by using fillna.

import numpy as np
df=np.array([[1,2,3, np.nan]])

import pandas as pd
df=pd.DataFrame(df)
df.fillna(0)

This will return the following:

     0    1    2   3
0  1.0  2.0  3.0 NaN
>>> df.fillna(0)
     0    1    2    3
0  1.0  2.0  3.0  0.0

回答 10

主要有两个选项：插补或填充缺失值的情况下NaN / np.nan，仅数字替换（跨列：

df['Amount'].fillna(value=None, method= ,axis=1,) 足够了：

从文档中：

value：标量，dict，Series或DataFrame用于填充孔的值（例如0），或者是dict / Series / DataFrame的值，这些值指定每个索引（对于Series）或列（对于DataFrame）使用哪个值。（不在dict / Series / DataFrame中的值将不被填充）。该值不能是列表。

这意味着不再允许对“字符串”或“常量”进行插补。

对于更专业的插补，请使用SimpleImputer（）：

from sklearn.impute import SimpleImputer
si = SimpleImputer(strategy='constant', missing_values=np.nan, fill_value='Replacement_Value')
df[['Col-1', 'Col-2']] = si.fit_transform(X=df[['C-1', 'C-2']])

There are two options available primarily; in case of imputation or filling of missing values NaN / np.nan with only numerical replacements (across column(s):

df['Amount'].fillna(value=None, method= ,axis=1,) is sufficient:

From the Documentation:

value : scalar, dict, Series, or DataFrame Value to use to fill holes (e.g. 0), alternately a dict/Series/DataFrame of values specifying which value to use for each index (for a Series) or column (for a DataFrame). (values not in the dict/Series/DataFrame will not be filled). This value cannot be a list.

Which means ‘strings’ or ‘constants’ are no longer permissable to be imputed.

For more specialized imputations use SimpleImputer():

from sklearn.impute import SimpleImputer
si = SimpleImputer(strategy='constant', missing_values=np.nan, fill_value='Replacement_Value')
df[['Col-1', 'Col-2']] = si.fit_transform(X=df[['C-1', 'C-2']])

回答 11

用不同的方式替换不同列中的nan：

   replacement= {'column_A': 0, 'column_B': -999, 'column_C': -99999}
   df.fillna(value=replacement)

To replace nan in different columns with different ways:

   replacement= {'column_A': 0, 'column_B': -999, 'column_C': -99999}
   df.fillna(value=replacement)

知识问答

我已经安装了哪个版本的Python？

2021年7月25日 Python实用宝典

问题：我已经安装了哪个版本的Python？

我必须在Windows服务器上运行Python脚本。我怎么知道我拥有哪个版本的Python，它真的很重要吗？

我当时想更新到最新版本的Python。

I have to run a Python script on a Windows server. How can I know which version of Python I have, and does it even really matter?

I was thinking of updating to the latest version of Python.

回答 0

python -V

http://docs.python.org/using/cmdline.html#generic-options

--version 可能也可以使用（在2.5版中引入）

python -V

http://docs.python.org/using/cmdline.html#generic-options

--version may also work (introduced in version 2.5)

回答 1

Python 2.5以上版本：

python --version

Python 2.4-：

python -c 'import sys; print(sys.version)'

Python 2.5+:

python --version

Python 2.4-:

python -c 'import sys; print(sys.version)'

回答 2

在Python IDE中，只需复制并粘贴以下代码并运行它（版本将显示在输出区域中）：

import sys
print(sys.version)

In a Python IDE, just copy and paste in the following code and run it (the version will come up in the output area):

import sys
print(sys.version)

回答 3

在命令提示符下键入：

python -V

或者，如果您有pyenv：

pyenv versions

At a command prompt type:

python -V

Or if you have pyenv:

pyenv versions

回答 4

当我打开Python (command line)第一件事时，它会告诉我版本。

When I open Python (command line) the first thing it tells me is the version.

回答 5

尽管问题是“我正在使用哪个版本？”，但这实际上可能并不是您需要知道的所有内容。您可能安装了其他版本，这可能会导致问题，尤其是在安装其他模块时。这是我了解安装了哪些版本的粗略方法：

updatedb                  # Be in root for this
locate site.py            # All installations I've ever seen have this

单个Python安装的输出应如下所示：

/usr/lib64/python2.7/site.py
/usr/lib64/python2.7/site.pyc
/usr/lib64/python2.7/site.pyo

多个安装将输出如下内容：

/root/Python-2.7.6/Lib/site.py
/root/Python-2.7.6/Lib/site.pyc
/root/Python-2.7.6/Lib/site.pyo
/root/Python-2.7.6/Lib/test/test_site.py
/usr/lib/python2.6/site-packages/site.py
/usr/lib/python2.6/site-packages/site.pyc
/usr/lib/python2.6/site-packages/site.pyo
/usr/lib64/python2.6/site.py
/usr/lib64/python2.6/site.pyc
/usr/lib64/python2.6/site.pyo
/usr/local/lib/python2.7/site.py
/usr/local/lib/python2.7/site.pyc
/usr/local/lib/python2.7/site.pyo
/usr/local/lib/python2.7/test/test_site.py
/usr/local/lib/python2.7/test/test_site.pyc
/usr/local/lib/python2.7/test/test_site.pyo

Although the question is “which version am I using?”, this may not actually be everything you need to know. You may have other versions installed and this can cause problems, particularly when installing additional modules. This is my rough-and-ready approach to finding out what versions are installed:

updatedb                  # Be in root for this
locate site.py            # All installations I've ever seen have this

The output for a single Python installation should look something like this:

/usr/lib64/python2.7/site.py
/usr/lib64/python2.7/site.pyc
/usr/lib64/python2.7/site.pyo

Multiple installations will have output something like this:

/root/Python-2.7.6/Lib/site.py
/root/Python-2.7.6/Lib/site.pyc
/root/Python-2.7.6/Lib/site.pyo
/root/Python-2.7.6/Lib/test/test_site.py
/usr/lib/python2.6/site-packages/site.py
/usr/lib/python2.6/site-packages/site.pyc
/usr/lib/python2.6/site-packages/site.pyo
/usr/lib64/python2.6/site.py
/usr/lib64/python2.6/site.pyc
/usr/lib64/python2.6/site.pyo
/usr/local/lib/python2.7/site.py
/usr/local/lib/python2.7/site.pyc
/usr/local/lib/python2.7/site.pyo
/usr/local/lib/python2.7/test/test_site.py
/usr/local/lib/python2.7/test/test_site.pyc
/usr/local/lib/python2.7/test/test_site.pyo

回答 6

In [1]: import sys

In [2]: sys.version
2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Dec  6 2015, 18:08:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

In [3]: sys.version_info
sys.version_info(major=2, minor=7, micro=11, releaselevel='final', serial=0)

In [4]: sys.version_info >= (2,7)
Out[4]: True

In [5]: sys.version_info >= (3,)
Out[5]: False

In [1]: import sys

In [2]: sys.version
2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Dec  6 2015, 18:08:32) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)]

In [3]: sys.version_info
sys.version_info(major=2, minor=7, micro=11, releaselevel='final', serial=0)

In [4]: sys.version_info >= (2,7)
Out[4]: True

In [5]: sys.version_info >= (3,)
Out[5]: False

回答 7

简而言之：

键入python在命令提示

只需打开命令提示符（Win+ R）并键入，cmd然后在命令提示符中键入，即可python为您提供有关版本的所有必要信息：

In short:

Type python in a command prompt

Simply open the command prompt (Win + R) and type cmd and in the command prompt then typing python will give you all necessary information regarding versions:

回答 8

>>> import sys; print('{0[0]}.{0[1]}'.format(sys.version_info))
3.5

所以从命令行：

python -c "import sys; print('{0[0]}.{0[1]}'.format(sys.version_info))"

>>> import sys; print('{0[0]}.{0[1]}'.format(sys.version_info))
3.5

so from the command line:

python -c "import sys; print('{0[0]}.{0[1]}'.format(sys.version_info))"

回答 9

采用

python -V

要么

python --version

注意：请注意，python -V命令中的“ V” 为大写V。python -v（小“ v”）将以详细模式启动Python。

Use

python -V

python --version

NOTE: Please note that the “V” in the python -V command is capital V. python -v (small “v”) will launch Python in verbose mode.

回答 10

您可以使用以下命令获取Python的版本

python --version

您甚至可以使用以下命令获取venv中安装的任何软件包的版本pip freeze：

pip freeze | grep "package name"

或将Python解释器用作：

In [1]: import django
In [2]: django.VERSION
Out[2]: (1, 6, 1, 'final', 0)

You can get the version of Python by using the following command

python --version

You can even get the version of any package installed in venv using pip freeze as:

pip freeze | grep "package name"

Or using the Python interpreter as:

In [1]: import django
In [2]: django.VERSION
Out[2]: (1, 6, 1, 'final', 0)

回答 11

我在Windows 10上使用Python 3.7.0。

这是在命令提示符和Git Bash中对我有用的方法：

要运行Python并检查版本：

py

仅检查您拥有的版本：

py --version

要么

py -V    # Make sure it is a capital V

注：python，python --version，python -V，Python，Python --version，Python -V并没有为我工作。

I have Python 3.7.0 on Windows 10.

This is what worked for me in the command prompt and Git Bash:

To run Python and check the version:

py

To only check which version you have:

py --version

py -V    # Make sure it is a capital V

Note: python, python --version, python -V,Python, Python --version, Python -V did not work for me.

回答 12

如果您已经在REPL窗口中，但没有看到带有版本号的欢迎消息，则可以使用help（）查看主要版本和次要版本：

>>>help()
Welcome to Python 3.6's help utility!
...

If you are already in a REPL window and don’t see the welcome message with the version number, you can use help() to see the major and minor version:

>>>help()
Welcome to Python 3.6's help utility!
...

回答 13

要在Jupyter笔记本中检查Python版本，可以使用：

from platform import python_version
print(python_version())

获取版本号，例如：

3.7.3

要么：

import sys
print(sys.version)

以获取更多信息

3.7.3 (default, Apr 24 2019, 13:20:13) [MSC v.1915 32 bit (Intel)]

要么：

sys.version_info

获得主要版本，次要版本和微型版本

sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)

To check the Python version in a Jupyter notebook, you can use:

from platform import python_version
print(python_version())

to get version number, as:

3.7.3

or:

import sys
print(sys.version)

to get more information, as

3.7.3 (default, Apr 24 2019, 13:20:13) [MSC v.1915 32 bit (Intel)]

or:

sys.version_info

to get major, minor and micro versions, as

sys.version_info(major=3, minor=7, micro=3, releaselevel='final', serial=0)

回答 14

只需创建一个以.py结尾的文件，然后将以下代码粘贴到并运行即可。

#!/usr/bin/python3.6

import platform
import sys

def linux_dist():
  try:
    return platform.linux_distribution()
  except:
    return "N/A"

print("""Python version: %s
dist: %s
linux_distribution: %s
system: %s
machine: %s
platform: %s
uname: %s
version: %s
""" % (
sys.version.split('\n'),
str(platform.dist()),
linux_dist(),
platform.system(),
platform.machine(),
platform.platform(),
platform.uname(),
platform.version(),
))

如果系统上安装了多个Python解释器版本，请运行以下命令。

在Linux上，在终端上运行：

ll /usr/bin/python*

在Windows上，在命令提示符下运行：

dir %LOCALAPPDATA%\Programs\Python

Just create a file ending with .py and paste the code below into and run it.

#!/usr/bin/python3.6

import platform
import sys

def linux_dist():
  try:
    return platform.linux_distribution()
  except:
    return "N/A"

print("""Python version: %s
dist: %s
linux_distribution: %s
system: %s
machine: %s
platform: %s
uname: %s
version: %s
""" % (
sys.version.split('\n'),
str(platform.dist()),
linux_dist(),
platform.system(),
platform.machine(),
platform.platform(),
platform.uname(),
platform.version(),
))

If several Python interpreter versions are installed on a system, run the following commands.

On Linux, run in a terminal:

ll /usr/bin/python*

On Windows, run in a command prompt:

dir %LOCALAPPDATA%\Programs\Python

回答 15

要在Windows上验证Python版本的命令，请在命令提示符下运行以下命令并验证输出：

c:\> python -V
Python 2.7.16

c:\> py -2 -V
Python 2.7.16

c:\> py -3 -V
Python 3.7.3

另外，要查看每个Python版本的文件夹配置，请运行以下命令：

For Python 2, 'py -2 -m site'
For Python 3, 'py -3 -m site'

To verify the Python version for commands on Windows, run the following commands in a command prompt and verify the output:

c:\> python -V
Python 2.7.16

c:\> py -2 -V
Python 2.7.16

c:\> py -3 -V
Python 3.7.3

Also, to see the folder configuration for each Python version, run the following commands:

For Python 2, 'py -2 -m site'
For Python 3, 'py -3 -m site'

回答 16

在具有Python 3.6的Windows 10上

    python

Python 3.6.0a4 (v3.6.0a4:017cf260936b, Aug 16 2016, 00:59:16) [MSC v.1900 64 bit (AMD64)] on win32


    python -V

Python 3.6.0a4


    python --version

Python 3.6.0a4

On Windows 10 with Python 3.6

    python

Python 3.6.0a4 (v3.6.0a4:017cf260936b, Aug 16 2016, 00:59:16) [MSC v.1900 64 bit (AMD64)] on win32


    python -V

Python 3.6.0a4


    python --version

Python 3.6.0a4

回答 17

如果已安装Python，则检查版本号的最简单方法是在命令提示符下键入“ python”。它会显示版本号，以及它是在32位还是64位上运行以及其他信息。对于某些应用程序，您可能需要具有最新版本，而有时却没有。这取决于您要安装或使用的软件包。

If you have Python installed then the easiest way you can check the version number is by typing “python” in your command prompt. It will show you the version number and if it is running on 32 bit or 64 bit and some other information. For some applications you would want to have a latest version and sometimes not. It depends on what packages you want to install or use.

回答 18

对我来说，打开CMD并运行

py

将显示类似

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

For me, opening CMD and running

py

will show something like

Python 3.4.3 (v3.4.3:9b73f1c3e601, Feb 24 2015, 22:43:06) [MSC v.1600 32 bit (Intel)] on win32

Type "help", "copyright", "credits" or "license" for more information.

回答 19

打开命令提示符窗口（按Windows+ R，输入cmd，然后单击Enter）。

类型 python.exe

Open a command prompt window (press Windows + R, type in cmd, and hit Enter).

Type python.exe

回答 20

打字 where python在Windows上命令提示符可能会告诉您在哪里安装了多个不同版本的python，前提是它们已添加到您的路径中。

键入python -V命令提示符将显示版本。

Typing where python on Windows into a Command Prompt may tell you where multiple different versions of python are installed, assuming they have been added to your path.

Typing python -V into the Command Prompt should display the version.

回答 21

主要是用法命令：

python -version

要么

python -V

Mostly usage commands:

python -version

python -V

知识问答

从子目录导入文件？

2021年7月25日 Python实用宝典

问题：从子目录导入文件？

我的档案tester.py位于/project。

/project有一个名为的子目录lib，文件名为BoxTime.py：

/project/tester.py
/project/lib/BoxTime.py

我想导入BoxTime的tester。我已经试过了：

import lib.BoxTime

结果是：

Traceback (most recent call last):
  File "./tester.py", line 3, in <module>
    import lib.BoxTime
ImportError: No module named lib.BoxTime

任何想法如何BoxTime从子目录导入？

编辑

该__init__.py是问题，但不要忘了提及BoxTime作为lib.BoxTime，或使用：

import lib.BoxTime as BT
...
BT.bt_function()

I have a file called tester.py, located on /project.

/project has a subdirectory called lib, with a file called BoxTime.py:

/project/tester.py
/project/lib/BoxTime.py

I want to import BoxTime from tester. I have tried this:

import lib.BoxTime

Which resulted:

Traceback (most recent call last):
  File "./tester.py", line 3, in <module>
    import lib.BoxTime
ImportError: No module named lib.BoxTime

Any ideas how to import BoxTime from the subdirectory?

EDIT

The __init__.py was the problem, but don’t forget to refer to BoxTime as lib.BoxTime, or use:

import lib.BoxTime as BT
...
BT.bt_function()

回答 0

在此处查看Packages文档（第6.4节）：http : //docs.python.org/tutorial/modules.html

简而言之，您需要放置一个名为

__init__.py

在“ lib”目录中。

Take a look at the Packages documentation (Section 6.4) here: http://docs.python.org/tutorial/modules.html

In short, you need to put a blank file named

__init__.py

in the “lib” directory.

回答 1

创建一个名为的子目录lib。
创建一个名为的空文件lib\__init__.py。
在中lib\BoxTime.py，编写如下函数foo()：
```
def foo():
    print "foo!"
```
在上面目录中的客户机代码中lib，编写：
```
from lib import BoxTime
BoxTime.foo()
```
运行您的客户端代码。你会得到：
```
foo!
```

后来，在Linux中，它看起来像这样：

% cd ~/tmp
% mkdir lib
% touch lib/__init__.py
% cat > lib/BoxTime.py << EOF
heredoc> def foo():
heredoc>     print "foo!"
heredoc> EOF
% tree lib
lib
├── BoxTime.py
└── __init__.py

0 directories, 2 files
% python 
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lib import BoxTime
>>> BoxTime.foo()
foo!

Create a subdirectory named lib.
Create an empty file named lib\__init__.py.
In lib\BoxTime.py, write a function foo() like this:
```
def foo():
    print "foo!"
```
In your client code in the directory above lib, write:
```
from lib import BoxTime
BoxTime.foo()
```
Run your client code. You will get:
```
foo!
```

Much later — in linux, it would look like this:

% cd ~/tmp
% mkdir lib
% touch lib/__init__.py
% cat > lib/BoxTime.py << EOF
heredoc> def foo():
heredoc>     print "foo!"
heredoc> EOF
% tree lib
lib
├── BoxTime.py
└── __init__.py

0 directories, 2 files
% python 
Python 2.7.6 (default, Mar 22 2014, 22:59:56) 
[GCC 4.8.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lib import BoxTime
>>> BoxTime.foo()
foo!

回答 2

您可以尝试将其插入sys.path：

sys.path.insert(0, './lib')
import BoxTime

You can try inserting it in sys.path:

sys.path.insert(0, './lib')
import BoxTime

回答 3

我写下来是因为每个人似乎都建议您必须创建lib目录。

您无需命名子目录lib。你能说出它anything提供你把一个__init__.py进去。

您可以通过在Linux shell中输入以下命令来做到这一点：

$ touch anything/__init__.py

所以现在您有了以下结构：

$ ls anything/
__init__.py
mylib.py

$ ls
main.py

然后，你可以导入mylib到main.py这样的：

from anything import mylib 

mylib.myfun()

您也可以像这样导入函数和类：

from anything.mylib import MyClass
from anything.mylib import myfun

instance = MyClass()
result = myfun()

您放置在其中的任何变量函数或类__init__.py也可以访问：

import anything

print(anything.myvar)

或像这样：

from anything import myvar

print(myvar)

I am writing this down because everyone seems to suggest that you have to create a lib directory.

You don’t need to name your sub-directory lib. You can name it anything provided you put an __init__.py into it.

You can do that by entering the following command in a linux shell:

$ touch anything/__init__.py

So now you have this structure:

$ ls anything/
__init__.py
mylib.py

$ ls
main.py

Then you can import mylib into main.py like this:

from anything import mylib 

mylib.myfun()

You can also import functions and classes like this:

from anything.mylib import MyClass
from anything.mylib import myfun

instance = MyClass()
result = myfun()

Any variable function or class you place inside __init__.py can also be accessed:

import anything

print(anything.myvar)

Or like this:

from anything import myvar

print(myvar)

回答 4

您的lib目录是否包含__init__.py文件？

Python用于__init__.py确定目录是否为模块。

Does your lib directory contain a __init__.py file?

Python uses __init__.py to determine if a directory is a module.

回答 5

尝试import .lib.BoxTime。有关更多信息，请参阅PEP 328中的相对导入。

Try import .lib.BoxTime. For more information read about relative import in PEP 328.

回答 6

我这样做基本上涵盖了所有情况（确保您__init__.py在relative / path / to / your / lib / folder中）：

import sys, os
sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/relative/path/to/your/lib/folder")
import someFileNameWhichIsInTheFolder
...
somefile.foo()

示例：
您在项目文件夹中：

/root/myproject/app.py

您在另一个项目文件夹中：

/root/anotherproject/utils.py
/root/anotherproject/__init__.py

您要使用/root/anotherproject/utils.py并调用其中的foo函数。

因此，您在app.py中编写：

import sys, os
sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/../anotherproject")
import utils

utils.foo()

I do this which basically covers all cases (make sure you have __init__.py in relative/path/to/your/lib/folder):

import sys, os
sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/relative/path/to/your/lib/folder")
import someFileNameWhichIsInTheFolder
...
somefile.foo()

Example:
You have in your project folder:

/root/myproject/app.py

You have in another project folder:

/root/anotherproject/utils.py
/root/anotherproject/__init__.py

You want to use /root/anotherproject/utils.py and call foo function which is in it.

So you write in app.py:

import sys, os
sys.path.append(os.path.dirname(os.path.realpath(__file__)) + "/../anotherproject")
import utils

utils.foo()

回答 7

__init__.py在子目录/ lib中创建一个空文件。并在主代码的开头添加

from __future__ import absolute_import

然后

import lib.BoxTime as BT
...
BT.bt_function()

或更好

from lib.BoxTime import bt_function
...
bt_function()

Create an empty file __init__.py in subdirectory /lib. And add at the begin of main code

from __future__ import absolute_import

then

import lib.BoxTime as BT
...
BT.bt_function()

or better

from lib.BoxTime import bt_function
...
bt_function()

回答 8

只是这些答案的补充。

如果要从所有子目录导入所有文件，可以将其添加到文件的根目录。

import sys, os
sys.path.extend([f'./{name}' for name in os.listdir(".") if os.path.isdir(name)])

然后，您可以简单地从子目录中导入文件，就像这些文件位于当前目录中一样。

工作实例

.
├── a.py
├── b.py
├── c.py
├── subdirectory_a
│   ├── d.py
│   └── e.py
├── subdirectory_b
│   └── f.py
├── subdirectory_c
│   └── g.py
└── subdirectory_d
    └── h.py

我可以将以下代码放入a.py文件中

import sys, os
sys.path.extend([f'./{name}' for name in os.listdir(".") if os.path.isdir(name)])

# And then you can import files just as if these files are inside the current directory

import b
import c
import d
import e
import f
import g
import h

换句话说，此代码将抽象出文件来自哪个目录。

Just an addition to these answers.

If you want to import all files from all subdirectories, you can add this to the root of your file.

import sys, os
sys.path.extend([f'./{name}' for name in os.listdir(".") if os.path.isdir(name)])

And then you can simply import files from the subdirectories just as if these files are inside the current directory.

Working example

If I have the following directory with subdirectories in my project…

.
├── a.py
├── b.py
├── c.py
├── subdirectory_a
│   ├── d.py
│   └── e.py
├── subdirectory_b
│   └── f.py
├── subdirectory_c
│   └── g.py
└── subdirectory_d
    └── h.py

I can put the following code inside my a.py file

import sys, os
sys.path.extend([f'./{name}' for name in os.listdir(".") if os.path.isdir(name)])

# And then you can import files just as if these files are inside the current directory

import b
import c
import d
import e
import f
import g
import h

In other words, this code will abstract from which directory the file is coming from.

回答 9

/project/tester.py

/project/lib/BoxTime.py

下一__init__.py行创建空白文件，直到找到文件

/project/lib/somefolder/BoxTime.py

#lib-需求有两个项目__init__.py，一个名为somefolder的目录 #somefolder有两个项目boxtime.py，__init__.py

/project/tester.py

/project/lib/BoxTime.py

create blank file __init__.py down the line till you reach the file

/project/lib/somefolder/BoxTime.py

#lib — needs has two items one __init__.py and a directory named somefolder #somefolder has two items boxtime.py and __init__.py

回答 10

尝试这个：

from lib import BoxTime

try this:

from lib import BoxTime

知识问答

正确缩进Python多行字符串

2021年7月25日 Python实用宝典

问题：正确缩进Python多行字符串

函数中Python多行字符串的正确缩进是什么？

    def method():
        string = """line one
line two
line three"""

要么

    def method():
        string = """line one
        line two
        line three"""

或者是其他东西？

在第一个示例中，将字符串挂在函数外部看起来有些奇怪。

What is the proper indentation for Python multiline strings within a function?

    def method():
        string = """line one
line two
line three"""

    def method():
        string = """line one
        line two
        line three"""

or something else?

It looks kind of weird to have the string hanging outside the function in the first example.

回答 0

您可能想与 """

def foo():
    string = """line one
             line two
             line three"""

由于换行符和空格包含在字符串本身中，因此您必须对其进行后处理。如果您不想这样做，并且文本很多，则可能需要将其分别存储在文本文件中。如果文本文件不能很好地适合您的应用程序，并且您不想进行后处理，那么我可能会选择

def foo():
    string = ("this is an "
              "implicitly joined "
              "string")

如果要对多行字符串进行后处理以修剪掉不需要的部分，则应考虑PEP 257中 textwrap介绍的对文档字符串进行后处理的模块或技术：

def trim(docstring):
    if not docstring:
        return ''
    # Convert tabs to spaces (following the normal Python rules)
    # and split into a list of lines:
    lines = docstring.expandtabs().splitlines()
    # Determine minimum indentation (first line doesn't count):
    indent = sys.maxint
    for line in lines[1:]:
        stripped = line.lstrip()
        if stripped:
            indent = min(indent, len(line) - len(stripped))
    # Remove indentation (first line is special):
    trimmed = [lines[0].strip()]
    if indent < sys.maxint:
        for line in lines[1:]:
            trimmed.append(line[indent:].rstrip())
    # Strip off trailing and leading blank lines:
    while trimmed and not trimmed[-1]:
        trimmed.pop()
    while trimmed and not trimmed[0]:
        trimmed.pop(0)
    # Return a single string:
    return '\n'.join(trimmed)

You probably want to line up with the """

def foo():
    string = """line one
             line two
             line three"""

Since the newlines and spaces are included in the string itself, you will have to postprocess it. If you don’t want to do that and you have a whole lot of text, you might want to store it separately in a text file. If a text file does not work well for your application and you don’t want to postprocess, I’d probably go with

def foo():
    string = ("this is an "
              "implicitly joined "
              "string")

If you want to postprocess a multiline string to trim out the parts you don’t need, you should consider the textwrap module or the technique for postprocessing docstrings presented in PEP 257:

def trim(docstring):
    if not docstring:
        return ''
    # Convert tabs to spaces (following the normal Python rules)
    # and split into a list of lines:
    lines = docstring.expandtabs().splitlines()
    # Determine minimum indentation (first line doesn't count):
    indent = sys.maxint
    for line in lines[1:]:
        stripped = line.lstrip()
        if stripped:
            indent = min(indent, len(line) - len(stripped))
    # Remove indentation (first line is special):
    trimmed = [lines[0].strip()]
    if indent < sys.maxint:
        for line in lines[1:]:
            trimmed.append(line[indent:].rstrip())
    # Strip off trailing and leading blank lines:
    while trimmed and not trimmed[-1]:
        trimmed.pop()
    while trimmed and not trimmed[0]:
        trimmed.pop(0)
    # Return a single string:
    return '\n'.join(trimmed)

回答 1

该textwrap.dedent功能允许在源代码中以正确的缩进开始，然后在使用前从文本中删除它。

正如其他一些人所指出的那样，这是对文字的一个额外的函数调用。在决定将这些文字放在代码中的位置时，请考虑到这一点。

import textwrap

def frobnicate(param):
    """ Frobnicate the scrognate param.

        The Weebly-Ruckford algorithm is employed to frobnicate
        the scrognate to within an inch of its life.

        """
    prepare_the_comfy_chair(param)
    log_message = textwrap.dedent("""\
            Prepare to frobnicate:
            Here it comes...
                Any moment now.
            And: Frobnicate!""")
    weebly(param, log_message)
    ruckford(param)

\日志消息文字中的结尾是为了确保换行符不在文字中；这样，文字不以空白行开头，而是以下一个完整行开头。

from的返回值textwrap.dedent是输入字符串，在字符串的每一行上都删除了所有常见的前导空格。因此，上面的log_message值将是：

Prepare to frobnicate:
Here it comes...
    Any moment now.
And: Frobnicate!

The textwrap.dedent function allows one to start with correct indentation in the source, and then strip it from the text before use.

The trade-off, as noted by some others, is that this is an extra function call on the literal; take this into account when deciding where to place these literals in your code.

import textwrap

def frobnicate(param):
    """ Frobnicate the scrognate param.

        The Weebly-Ruckford algorithm is employed to frobnicate
        the scrognate to within an inch of its life.

        """
    prepare_the_comfy_chair(param)
    log_message = textwrap.dedent("""\
            Prepare to frobnicate:
            Here it comes...
                Any moment now.
            And: Frobnicate!""")
    weebly(param, log_message)
    ruckford(param)

The trailing \ in the log message literal is to ensure that line break isn’t in the literal; that way, the literal doesn’t start with a blank line, and instead starts with the next full line.

The return value from textwrap.dedent is the input string with all common leading whitespace indentation removed on each line of the string. So the above log_message value will be:

Prepare to frobnicate:
Here it comes...
    Any moment now.
And: Frobnicate!

回答 2

inspect.cleandoc像这样使用：

def method():
    string = inspect.cleandoc("""
        line one
        line two
        line three""")

相对缩进将保持预期。正如评论下面，如果你想保持使用前空行，textwrap.dedent。但是，这样也可以保持第一行。

注意：优良作法是在代码的相关上下文下缩进逻辑代码块以阐明结构。例如，属于变量的多行字符串string。

Use inspect.cleandoc like so:

def method():
    string = inspect.cleandoc("""
        line one
        line two
        line three""")

Relative indentation will be maintained as expected. As commented below, if you want to keep preceding empty lines, use textwrap.dedent. However that also keeps the first line break.

Note: It’s good practice to indent logical blocks of code under its related context to clarify the structure. E.g. the multi-line string belonging to the variable string.

回答 3

以下似乎是其他答案（仅在naxa的评论的最下方提到）中缺少的一个选项：

def foo():
    string = ("line one\n"          # Add \n in the string
              "line two"  "\n"      # Add "\n" after the string
              "line three\n")

这将允许正确对齐，隐式连接行并仍保持行移位，这对我来说还是我仍然要使用多行字符串的原因之一。

它不需要任何后处理，但是您需要\n在要结束行的任何给定位置手动添加。内联或后接一个单独的字符串。后者更容易复制粘贴。

One option which seems to missing from the other answers (only mentioned deep down in a comment by naxa) is the following:

def foo():
    string = ("line one\n"          # Add \n in the string
              "line two"  "\n"      # Add "\n" after the string
              "line three\n")

This will allow proper aligning, join the lines implicitly, and still keep the line shift which, for me, is one of the reasons why I would like to use multiline strings anyway.

It doesn’t require any postprocessing, but you need to manually add the \n at any given place that you want the line to end. Either inline or as a separate string after. The latter is easier to copy-paste in.

回答 4

一些更多的选择。在启用pylab的Ipython中，dedent已经在命名空间中。我检查了，它来自matplotlib。或者可以将其导入：

from matplotlib.cbook import dedent

在文档中它指出它比等效的textwrap更快，在我的ipython测试中，它的确比我的快速测试平均快3倍。它还具有丢弃任何前导空白行的好处，这使您可以灵活地构造字符串：

"""
line 1 of string
line 2 of string
"""

"""\
line 1 of string
line 2 of string
"""

"""line 1 of string
line 2 of string
"""

在这三个示例上使用matplotlib dedent将产生相同的明智结果。textwrap dedent函数在第一个示例中将有一个前导空白行。

明显的缺点是textwrap在标准库中，而matplotlib是外部模块。

这里有一些折衷… dedent函数使您的代码在定义字符串的地方更具可读性，但是需要稍后进行处理才能以可用格式获取字符串。在文档字符串中，很明显应该使用正确的缩进，因为文档字符串的大多数用法都会进行所需的处理。

当我的代码中需要一个非长字符串时，我发现以下公认的丑陋代码，在其中让长字符串脱离了封闭的缩进。肯定在“美丽比丑陋更好”上失败了，但是有人会说它比坚决的选择更简单，更明确。

def example():
    long_string = '''\
Lorem ipsum dolor sit amet, consectetur adipisicing
elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip.\
'''
    return long_string

print example()

Some more options. In Ipython with pylab enabled, dedent is already in the namespace. I checked and it is from matplotlib. Or it can be imported with:

from matplotlib.cbook import dedent

In documentation it states that it is faster than the textwrap equivalent one and in my tests in ipython it is indeed 3 times faster on average with my quick tests. It also has the benefit that it discards any leading blank lines this allows you to be flexible in how you construct the string:

"""
line 1 of string
line 2 of string
"""

"""\
line 1 of string
line 2 of string
"""

"""line 1 of string
line 2 of string
"""

Using the matplotlib dedent on these three examples will give the same sensible result. The textwrap dedent function will have a leading blank line with 1st example.

Obvious disadvantage is that textwrap is in standard library while matplotlib is external module.

Some tradeoffs here… the dedent functions make your code more readable where the strings get defined, but require processing later to get the string in usable format. In docstrings it is obvious that you should use correct indentation as most uses of the docstring will do the required processing.

When I need a non long string in my code I find the following admittedly ugly code where I let the long string drop out of the enclosing indentation. Definitely fails on “Beautiful is better than ugly.”, but one could argue that it is simpler and more explicit than the dedent alternative.

def example():
    long_string = '''\
Lorem ipsum dolor sit amet, consectetur adipisicing
elit, sed do eiusmod tempor incididunt ut labore et
dolore magna aliqua. Ut enim ad minim veniam, quis
nostrud exercitation ullamco laboris nisi ut aliquip.\
'''
    return long_string

print example()

回答 5

如果您想要一个快速简便的解决方案并避免输入换行符，则可以选择一个列表，例如：

def func(*args, **kwargs):
    string = '\n'.join([
        'first line of very long string and',
        'second line of the same long thing and',
        'third line of ...',
        'and so on...',
        ])
    print(string)
    return

If you want a quick&easy solution and save yourself from typing newlines, you could opt for a list instead, e.g.:

def func(*args, **kwargs):
    string = '\n'.join([
        'first line of very long string and',
        'second line of the same long thing and',
        'third line of ...',
        'and so on...',
        ])
    print(string)
    return

回答 6

我更喜欢

    def method():
        string = \
"""\
line one
line two
line three\
"""

要么

    def method():
        string = """\
line one
line two
line three\
"""

I prefer

    def method():
        string = \
"""\
line one
line two
line three\
"""

    def method():
        string = """\
line one
line two
line three\
"""

回答 7

我的两分钱，逃离行尾以获取缩进：

def foo():
    return "{}\n"\
           "freq: {}\n"\
           "temp: {}\n".format( time, freq, temp )

My two cents, escape the end of line to get the indents:

def foo():
    return "{}\n"\
           "freq: {}\n"\
           "temp: {}\n".format( time, freq, temp )

回答 8

我来这里是为了寻找一种简单的1-衬板，以去除/校正打印时文档字符串的标识级别，而又不会使其看起来不整洁，例如，通过使其在脚本内“挂在函数外部”。

我最终要做的是：

import string
def myfunction():

    """
    line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print str(string.replace(myfunction.__doc__,'\n\t','\n'))[1:]

显然，如果要缩进空格（例如4）而不是Tab键，请改用如下代码：

print str(string.replace(myfunction.__doc__,'\n    ','\n'))[1:]

而且，如果您希望文档字符串看起来像这样，则无需删除第一个字符：

    """line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print string.replace(myfunction.__doc__,'\n\t','\n')

I came here looking for a simple 1-liner to remove/correct the identation level of the docstring for printing, without making it look untidy, for example by making it “hang outside the function” within the script.

Here’s what I ended up doing:

import string
def myfunction():

    """
    line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print str(string.replace(myfunction.__doc__,'\n\t','\n'))[1:]

Obviously, if you’re indenting with spaces (e.g. 4) rather than the tab key use something like this instead:

print str(string.replace(myfunction.__doc__,'\n    ','\n'))[1:]

And you don’t need to remove the first character if you like your docstrings to look like this instead:

    """line 1 of docstring
    line 2 of docstring
    line 3 of docstring"""

print string.replace(myfunction.__doc__,'\n\t','\n')

回答 9

第一种选择是好的-包括缩进。它是python样式-提供代码的可读性。

要正确显示它：

print string.lstrip()

The first option is the good one – with indentation included. It is in python style – provides readability for the code.

To display it properly:

print string.lstrip()

回答 10

这取决于您希望文本如何显示。如果您希望所有内容都左对齐，则可以按照第一个代码段的格式对其进行格式化，也可以遍历所有空间的左行进行迭代。

It depends on how you want the text to display. If you want it all to be left-aligned then either format it as in the first snippet or iterate through the lines left-trimming all the space.

回答 11

对于字符串，您可以在处理字符串之后。对于文档字符串，您需要对函数进行后处理。这是一个仍然可读的解决方案。

class Lstrip(object):
    def __rsub__(self, other):
        import re
        return re.sub('^\n', '', re.sub('\n$', '', re.sub('\n\s+', '\n', other)))

msg = '''
      Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
      tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
      veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
      commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
      velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
      cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
      est laborum.
      ''' - Lstrip()

print msg

def lstrip_docstring(func):
    func.__doc__ = func.__doc__ - Lstrip()
    return func

@lstrip_docstring
def foo():
    '''
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
    tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
    veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
    commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
    velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
    cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
    est laborum.
    '''
    pass


print foo.__doc__

For strings you can just after process the string. For docstrings you need to after process the function instead. Here is a solution for both that is still readable.

class Lstrip(object):
    def __rsub__(self, other):
        import re
        return re.sub('^\n', '', re.sub('\n$', '', re.sub('\n\s+', '\n', other)))

msg = '''
      Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
      tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
      veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
      commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
      velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
      cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
      est laborum.
      ''' - Lstrip()

print msg

def lstrip_docstring(func):
    func.__doc__ = func.__doc__ - Lstrip()
    return func

@lstrip_docstring
def foo():
    '''
    Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
    tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim
    veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea
    commodo consequat. Duis aute irure dolor in reprehenderit in voluptate
    velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat
    cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id
    est laborum.
    '''
    pass


print foo.__doc__

回答 12

我遇到类似的问题，使用多行代码使代码变得难以理解，我想到了类似

print("""aaaa
"""   """bbb
""")

是的，一开始看起来可能很糟糕，但是嵌入式语法非常复杂，并且在末尾添加一些内容（例如’\ n“’）不是解决方案

I’m having a similar issue, code got really unreadable using multilines, I came out with something like

print("""aaaa
"""   """bbb
""")

yes, at beginning could look terrible but the embedded syntax was quite complex and adding something at the end (like ‘\n”‘) was not a solution

回答 13

您可以使用此函数trim_indent。

import re


def trim_indent(s: str):
    s = re.sub(r'^\n+', '', s)
    s = re.sub(r'\n+$', '', s)
    spaces = re.findall(r'^ +', s, flags=re.MULTILINE)
    if len(spaces) > 0 and len(re.findall(r'^[^\s]', s, flags=re.MULTILINE)) == 0:
        s = re.sub(r'^%s' % (min(spaces)), '', s, flags=re.MULTILINE)
    return s


print(trim_indent("""


        line one
            line two
                line three
            line two
        line one


"""))

结果：

"""
line one
    line two
        line three
    line two
line one
"""

You can use this function trim_indent.

import re


def trim_indent(s: str):
    s = re.sub(r'^\n+', '', s)
    s = re.sub(r'\n+$', '', s)
    spaces = re.findall(r'^ +', s, flags=re.MULTILINE)
    if len(spaces) > 0 and len(re.findall(r'^[^\s]', s, flags=re.MULTILINE)) == 0:
        s = re.sub(r'^%s' % (min(spaces)), '', s, flags=re.MULTILINE)
    return s


print(trim_indent("""


        line one
            line two
                line three
            line two
        line one


"""))

Result:

"""
line one
    line two
        line three
    line two
line one
"""

知识问答

按多个属性对列表进行排序？

2021年7月25日 Python实用宝典

问题：按多个属性对列表进行排序？

我有一个清单清单：

[[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]

如果我想按一个元素（例如，高/短元素）排序，则可以通过进行s = sorted(s, key = itemgetter(1))。

如果我想同时根据高/短和颜色进行排序，我可以进行两次排序，每个元素一次，但是有一种更快的方法吗？

I have a list of lists:

[[12, 'tall', 'blue', 1],
[2, 'short', 'red', 9],
[4, 'tall', 'blue', 13]]

If I wanted to sort by one element, say the tall/short element, I could do it via s = sorted(s, key = itemgetter(1)).

If I wanted to sort by both tall/short and colour, I could do the sort twice, once for each element, but is there a quicker way?

回答 0

键可以是返回元组的函数：

s = sorted(s, key = lambda x: (x[1], x[2]))

或者，您可以使用来达到相同的效果itemgetter（速度更快，并且避免了Python函数调用）：

import operator
s = sorted(s, key = operator.itemgetter(1, 2))

并请注意，您可以在此处使用sort而不是使用sorted，然后重新分配：

s.sort(key = operator.itemgetter(1, 2))

A key can be a function that returns a tuple:

s = sorted(s, key = lambda x: (x[1], x[2]))

Or you can achieve the same using itemgetter (which is faster and avoids a Python function call):

import operator
s = sorted(s, key = operator.itemgetter(1, 2))

And notice that here you can use sort instead of using sorted and then reassigning:

s.sort(key = operator.itemgetter(1, 2))

回答 1

我不确定这是否是最pythonic的方法……我有一个元组列表，需要按降序对整数值进行排序，然后按字母顺序对第二个进行排序。这需要反转整数排序，而不是字母排序。这是我的解决方案：（在一次考试中，我什至不知道您可以“嵌套”排序功能）

a = [('Al', 2),('Bill', 1),('Carol', 2), ('Abel', 3), ('Zeke', 2), ('Chris', 1)]  
b = sorted(sorted(a, key = lambda x : x[0]), key = lambda x : x[1], reverse = True)  
print(b)  
[('Abel', 3), ('Al', 2), ('Carol', 2), ('Zeke', 2), ('Bill', 1), ('Chris', 1)]

I’m not sure if this is the most pythonic method … I had a list of tuples that needed sorting 1st by descending integer values and 2nd alphabetically. This required reversing the integer sort but not the alphabetical sort. Here was my solution: (on the fly in an exam btw, I was not even aware you could ‘nest’ sorted functions)

a = [('Al', 2),('Bill', 1),('Carol', 2), ('Abel', 3), ('Zeke', 2), ('Chris', 1)]  
b = sorted(sorted(a, key = lambda x : x[0]), key = lambda x : x[1], reverse = True)  
print(b)  
[('Abel', 3), ('Al', 2), ('Carol', 2), ('Zeke', 2), ('Bill', 1), ('Chris', 1)]

回答 2

看来您可以使用list而不是tuple。我认为，当您获取属性而不是列表/元组的“魔术索引”时，这变得尤为重要。

在我的情况下，我想按类的多个属性进行排序，其中传入的键是字符串。我需要在不同的地方进行不同的排序，并且我希望为与客户进行交互的父类提供一个通用的默认排序。只需在真正需要时覆盖“排序键”，而且还可以将它们存储为类可以共享的列表

所以首先我定义了一个辅助方法

def attr_sort(self, attrs=['someAttributeString']:
  '''helper to sort by the attributes named by strings of attrs in order'''
  return lambda k: [ getattr(k, attr) for attr in attrs ]

然后使用它

# would defined elsewhere but showing here for consiseness
self.SortListA = ['attrA', 'attrB']
self.SortListB = ['attrC', 'attrA']
records = .... #list of my objects to sort
records.sort(key=self.attr_sort(attrs=self.SortListA))
# perhaps later nearby or in another function
more_records = .... #another list
more_records.sort(key=self.attr_sort(attrs=self.SortListB))

这将使用生成的lambda函数对列表进行排序object.attrA，然后object.attrB假定object具有与提供的字符串名称相对应的getter。到object.attrC那时，第二种情况将得到解决object.attrA。

这还允许您潜在地暴露向外的排序选择，以供使用者，单元测试共享，或者让他们告诉您他们希望如何对api中的某些操作进行排序，而只需给您一个列表，而不是将它们耦合到您的后端实现。

It appears you could use a list instead of a tuple. This becomes more important I think when you are grabbing attributes instead of ‘magic indexes’ of a list/tuple.

In my case I wanted to sort by multiple attributes of a class, where the incoming keys were strings. I needed different sorting in different places, and I wanted a common default sort for the parent class that clients were interacting with; only having to override the ‘sorting keys’ when I really ‘needed to’, but also in a way that I could store them as lists that the class could share

So first I defined a helper method

def attr_sort(self, attrs=['someAttributeString']:
  '''helper to sort by the attributes named by strings of attrs in order'''
  return lambda k: [ getattr(k, attr) for attr in attrs ]

then to use it

# would defined elsewhere but showing here for consiseness
self.SortListA = ['attrA', 'attrB']
self.SortListB = ['attrC', 'attrA']
records = .... #list of my objects to sort
records.sort(key=self.attr_sort(attrs=self.SortListA))
# perhaps later nearby or in another function
more_records = .... #another list
more_records.sort(key=self.attr_sort(attrs=self.SortListB))

This will use the generated lambda function sort the list by object.attrA and then object.attrB assuming object has a getter corresponding to the string names provided. And the second case would sort by object.attrC then object.attrA.

This also allows you to potentially expose outward sorting choices to be shared alike by a consumer, a unit test, or for them to perhaps tell you how they want sorting done for some operation in your api by only have to give you a list and not coupling them to your back end implementation.

回答 3

几年迟到了，但我想这两个排序2个标准和使用reverse=True。如果其他人想知道如何做，则可以将您的条件（函数）括在括号中：

s = sorted(my_list, key=lambda i: ( criteria_1(i), criteria_2(i) ), reverse=True)

Several years late to the party but I want to both sort on 2 criteria and use reverse=True. In case someone else wants to know how, you can wrap your criteria (functions) in parenthesis:

s = sorted(my_list, key=lambda i: ( criteria_1(i), criteria_2(i) ), reverse=True)

回答 4

这是一种方法：您基本上是重写您的排序函数以获取一个排序函数列表，每个排序函数都会比较您要测试的属性，在每次排序测试中，您都会查看并查看cmp函数是否返回非零返回值如果是这样，则中断并发送返回值。您可以通过调用Lambda列表功能的Lambda来调用它。

它的优点是它可以单次通过数据，而不像其他方法那样通过以前的排序。另一件事是，它排序到位，而排序似乎可以复制。

我用它编写了一个等级函数，该函数对每个对象在一个组中并具有得分函数的类列表进行排名，但是您可以添加任何属性列表。请注意类似unlambda的内容，尽管会使用lambda来调用setter。等级部分不适用于列表数组，但排序可以。

#First, here's  a pure list version
my_sortLambdaLst = [lambda x,y:cmp(x[0], y[0]), lambda x,y:cmp(x[1], y[1])]
def multi_attribute_sort(x,y):
    r = 0
    for l in my_sortLambdaLst:
        r = l(x,y)
        if r!=0: return r #keep looping till you see a difference
    return r

Lst = [(4, 2.0), (4, 0.01), (4, 0.9), (4, 0.999),(4, 0.2), (1, 2.0), (1, 0.01), (1, 0.9), (1, 0.999), (1, 0.2) ]
Lst.sort(lambda x,y:multi_attribute_sort(x,y)) #The Lambda of the Lambda
for rec in Lst: print str(rec)

这是一种对对象列表进行排名的方法

class probe:
    def __init__(self, group, score):
        self.group = group
        self.score = score
        self.rank =-1
    def set_rank(self, r):
        self.rank = r
    def __str__(self):
        return '\t'.join([str(self.group), str(self.score), str(self.rank)]) 


def RankLst(inLst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank)):
    #Inner function is the only way (I could think of) to pass the sortLambdaLst into a sort function
    def multi_attribute_sort(x,y):
        r = 0
        for l in sortLambdaLst:
            r = l(x,y)
            if r!=0: return r #keep looping till you see a difference
        return r

    inLst.sort(lambda x,y:multi_attribute_sort(x,y))
    #Now Rank your probes
    rank = 0
    last_group = group_lambda(inLst[0])
    for i in range(len(inLst)):
        rec = inLst[i]
        group = group_lambda(rec)
        if last_group == group: 
            rank+=1
        else:
            rank=1
            last_group = group
        SetRank_Lambda(inLst[i], rank) #This is pure evil!! The lambda purists are gnashing their teeth

Lst = [probe(4, 2.0), probe(4, 0.01), probe(4, 0.9), probe(4, 0.999), probe(4, 0.2), probe(1, 2.0), probe(1, 0.01), probe(1, 0.9), probe(1, 0.999), probe(1, 0.2) ]

RankLst(Lst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank))
print '\t'.join(['group', 'score', 'rank']) 
for r in Lst: print r

Here’s one way: You basically re-write your sort function to take a list of sort functions, each sort function compares the attributes you want to test, on each sort test, you look and see if the cmp function returns a non-zero return if so break and send the return value. You call it by calling a Lambda of a function of a list of Lambdas.

Its advantage is that it does single pass through the data not a sort of a previous sort as other methods do. Another thing is that it sorts in place, whereas sorted seems to make a copy.

I used it to write a rank function, that ranks a list of classes where each object is in a group and has a score function, but you can add any list of attributes. Note the un-lambda-like, though hackish use of a lambda to call a setter. The rank part won’t work for an array of lists, but the sort will.

#First, here's  a pure list version
my_sortLambdaLst = [lambda x,y:cmp(x[0], y[0]), lambda x,y:cmp(x[1], y[1])]
def multi_attribute_sort(x,y):
    r = 0
    for l in my_sortLambdaLst:
        r = l(x,y)
        if r!=0: return r #keep looping till you see a difference
    return r

Lst = [(4, 2.0), (4, 0.01), (4, 0.9), (4, 0.999),(4, 0.2), (1, 2.0), (1, 0.01), (1, 0.9), (1, 0.999), (1, 0.2) ]
Lst.sort(lambda x,y:multi_attribute_sort(x,y)) #The Lambda of the Lambda
for rec in Lst: print str(rec)

Here’s a way to rank a list of objects

class probe:
    def __init__(self, group, score):
        self.group = group
        self.score = score
        self.rank =-1
    def set_rank(self, r):
        self.rank = r
    def __str__(self):
        return '\t'.join([str(self.group), str(self.score), str(self.rank)]) 


def RankLst(inLst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank)):
    #Inner function is the only way (I could think of) to pass the sortLambdaLst into a sort function
    def multi_attribute_sort(x,y):
        r = 0
        for l in sortLambdaLst:
            r = l(x,y)
            if r!=0: return r #keep looping till you see a difference
        return r

    inLst.sort(lambda x,y:multi_attribute_sort(x,y))
    #Now Rank your probes
    rank = 0
    last_group = group_lambda(inLst[0])
    for i in range(len(inLst)):
        rec = inLst[i]
        group = group_lambda(rec)
        if last_group == group: 
            rank+=1
        else:
            rank=1
            last_group = group
        SetRank_Lambda(inLst[i], rank) #This is pure evil!! The lambda purists are gnashing their teeth

Lst = [probe(4, 2.0), probe(4, 0.01), probe(4, 0.9), probe(4, 0.999), probe(4, 0.2), probe(1, 2.0), probe(1, 0.01), probe(1, 0.9), probe(1, 0.999), probe(1, 0.2) ]

RankLst(Lst, group_lambda= lambda x:x.group, sortLambdaLst = [lambda x,y:cmp(x.group, y.group), lambda x,y:cmp(x.score, y.score)], SetRank_Lambda = lambda x, rank:x.set_rank(rank))
print '\t'.join(['group', 'score', 'rank']) 
for r in Lst: print r

知识问答

如何将熊猫数据框的索引转换为列？

2021年7月25日 Python实用宝典

问题：如何将熊猫数据框的索引转换为列？

这似乎很明显，但是我似乎无法弄清楚如何将数据帧的索引转换为列？

例如：

df=
        gi       ptt_loc
 0  384444683      593  
 1  384444684      594 
 2  384444686      596

至，

df=
    index1    gi       ptt_loc
 0  0     384444683      593  
 1  1     384444684      594 
 2  2     384444686      596

This seems rather obvious, but I can’t seem to figure out how to convert an index of data frame to a column?

For example:

df=
        gi       ptt_loc
 0  384444683      593  
 1  384444684      594 
 2  384444686      596

To,

df=
    index1    gi       ptt_loc
 0  0     384444683      593  
 1  1     384444684      594 
 2  2     384444686      596

回答 0

要么：

df['index1'] = df.index

或.reset_index：

df.reset_index(level=0, inplace=True)

因此，如果您有一个3级索引的多索引框架，例如：

>>> df
                       val
tick       tag obs        
2016-02-26 C   2    0.0139
2016-02-27 A   2    0.5577
2016-02-28 C   6    0.0303

并且要将索引中的第1级（tick）和第3级（obs）转换为列，您可以执行以下操作：

>>> df.reset_index(level=['tick', 'obs'])
          tick  obs     val
tag                        
C   2016-02-26    2  0.0139
A   2016-02-27    2  0.5577
C   2016-02-28    6  0.0303

either:

df['index1'] = df.index

or, .reset_index:

df.reset_index(level=0, inplace=True)

so, if you have a multi-index frame with 3 levels of index, like:

>>> df
                       val
tick       tag obs        
2016-02-26 C   2    0.0139
2016-02-27 A   2    0.5577
2016-02-28 C   6    0.0303

and you want to convert the 1st (tick) and 3rd (obs) levels in the index into columns, you would do:

>>> df.reset_index(level=['tick', 'obs'])
          tick  obs     val
tag                        
C   2016-02-26    2  0.0139
A   2016-02-27    2  0.5577
C   2016-02-28    6  0.0303

回答 1

对于MultiIndex，您可以使用以下方法提取其子索引

df['si_name'] = R.index.get_level_values('si_name')

子si_name索引的名称在哪里。

For MultiIndex you can extract its subindex using

df['si_name'] = R.index.get_level_values('si_name')

where si_name is the name of the subindex.

回答 2

为了更加清楚，让我们看一下索引中具有两个级别的DataFrame（一个MultiIndex）。

index = pd.MultiIndex.from_product([['TX', 'FL', 'CA'], 
                                    ['North', 'South']], 
                                   names=['State', 'Direction'])

df = pd.DataFrame(index=index, 
                  data=np.random.randint(0, 10, (6,4)), 
                  columns=list('abcd'))

reset_index使用默认参数调用的方法将所有索引级别转换为列，并使用简单RangeIndex的新索引。

df.reset_index()

使用level参数控制将哪些索引级别转换为列。如果可能，请使用更明确的级别名称。如果没有级别名称，则可以通过其整数位置来引用每个级别，整数位置从外部开始为0。您可以在此处使用标量值或要重置的所有索引的列表。

df.reset_index(level='State') # same as df.reset_index(level=0)

在极少数情况下，您想要保留索引并将索引转换为列，可以执行以下操作：

# for a single level
df.assign(State=df.index.get_level_values('State'))

# for all levels
df.assign(**df.index.to_frame())

To provide a bit more clarity, let’s look at a DataFrame with two levels in its index (a MultiIndex).

index = pd.MultiIndex.from_product([['TX', 'FL', 'CA'], 
                                    ['North', 'South']], 
                                   names=['State', 'Direction'])

df = pd.DataFrame(index=index, 
                  data=np.random.randint(0, 10, (6,4)), 
                  columns=list('abcd'))

The reset_index method, called with the default parameters, converts all index levels to columns and uses a simple RangeIndex as new index.

df.reset_index()

Use the level parameter to control which index levels are converted into columns. If possible, use the level name, which is more explicit. If there are no level names, you can refer to each level by its integer location, which begin at 0 from the outside. You can use a scalar value here or a list of all the indexes you would like to reset.

df.reset_index(level='State') # same as df.reset_index(level=0)

In the rare event that you want to preserve the index and turn the index into a column, you can do the following:

# for a single level
df.assign(State=df.index.get_level_values('State'))

# for all levels
df.assign(**df.index.to_frame())

回答 3

`rename_axis` + `reset_index`

您可以先将索引重命名为所需的标签，然后提升为一系列：

df = df.rename_axis('index1').reset_index()

print(df)

   index1         gi  ptt_loc
0       0  384444683      593
1       1  384444684      594
2       2  384444686      596

这也适用于MultiIndex数据框：

print(df)
#                        val
# tick       tag obs        
# 2016-02-26 C   2    0.0139
# 2016-02-27 A   2    0.5577
# 2016-02-28 C   6    0.0303

df = df.rename_axis(['index1', 'index2', 'index3']).reset_index()

print(df)

       index1 index2  index3     val
0  2016-02-26      C       2  0.0139
1  2016-02-27      A       2  0.5577
2  2016-02-28      C       6  0.0303

`rename_axis` + `reset_index`

You can first rename your index to a desired label, then elevate to a series:

df = df.rename_axis('index1').reset_index()

print(df)

   index1         gi  ptt_loc
0       0  384444683      593
1       1  384444684      594
2       2  384444686      596

This works also for MultiIndex dataframes:

print(df)
#                        val
# tick       tag obs        
# 2016-02-26 C   2    0.0139
# 2016-02-27 A   2    0.5577
# 2016-02-28 C   6    0.0303

df = df.rename_axis(['index1', 'index2', 'index3']).reset_index()

print(df)

       index1 index2  index3     val
0  2016-02-26      C       2  0.0139
1  2016-02-27      A       2  0.5577
2  2016-02-28      C       6  0.0303

回答 4

如果要使用该reset_index方法并保留现有索引，则应使用：

df.reset_index().set_index('index', drop=False)

或更改它的位置：

df.reset_index(inplace=True)
df.set_index('index', drop=False, inplace=True)

例如：

print(df)
          gi  ptt_loc
0  384444683      593
4  384444684      594
9  384444686      596

print(df.reset_index())
   index         gi  ptt_loc
0      0  384444683      593
1      4  384444684      594
2      9  384444686      596

print(df.reset_index().set_index('index', drop=False))
       index         gi  ptt_loc
index
0          0  384444683      593
4          4  384444684      594
9          9  384444686      596

如果要摆脱索引标签，可以执行以下操作：

df2 = df.reset_index().set_index('index', drop=False)
df2.index.name = None
print(df2)
   index         gi  ptt_loc
0      0  384444683      593
4      4  384444684      594
9      9  384444686      596

If you want to use the reset_index method and also preserve your existing index you should use:

df.reset_index().set_index('index', drop=False)

or to change it in place:

df.reset_index(inplace=True)
df.set_index('index', drop=False, inplace=True)

For example:

print(df)
          gi  ptt_loc
0  384444683      593
4  384444684      594
9  384444686      596

print(df.reset_index())
   index         gi  ptt_loc
0      0  384444683      593
1      4  384444684      594
2      9  384444686      596

print(df.reset_index().set_index('index', drop=False))
       index         gi  ptt_loc
index
0          0  384444683      593
4          4  384444684      594
9          9  384444686      596

And if you want to get rid of the index label you can do:

df2 = df.reset_index().set_index('index', drop=False)
df2.index.name = None
print(df2)
   index         gi  ptt_loc
0      0  384444683      593
4      4  384444684      594
9      9  384444686      596

回答 5

df1 = pd.DataFrame({"gi":[232,66,34,43],"ptt":[342,56,662,123]})
p = df1.index.values
df1.insert( 0, column="new",value = p)
df1

    new     gi     ptt
0    0      232    342
1    1      66     56 
2    2      34     662
3    3      43     123

df1 = pd.DataFrame({"gi":[232,66,34,43],"ptt":[342,56,662,123]})
p = df1.index.values
df1.insert( 0, column="new",value = p)
df1

    new     gi     ptt
0    0      232    342
1    1      66     56 
2    2      34     662
3    3      43     123

回答 6

一种简单的方法是使用reset_index（）方法。对于数据帧df，请使用以下代码：

df.reset_index(inplace=True)

这样，索引将成为一列，并且通过使用inplace作为True，这将成为永久更改。

A very simple way of doing this is to use reset_index() method.For a data frame df use the code below:

df.reset_index(inplace=True)

This way, the index will become a column, and by using inplace as True,this become permanent change.

知识问答

删除文件的大多数pythonic方法

2021年7月25日 Python实用宝典

问题：删除文件的大多数pythonic方法

我要删除该文件（filename如果存在）。说的合适吗

if os.path.exists(filename):
    os.remove(filename)

有没有更好的办法？单行方式？

I want to delete the file filename if it exists. Is it proper to say

if os.path.exists(filename):
    os.remove(filename)

Is there a better way? A one-line way?

回答 0

一种更pythonic的方式是：

try:
    os.remove(filename)
except OSError:
    pass

尽管这需要花费更多的行，并且看起来很丑陋，但可以避免不必要的调用 os.path.exists()并遵循过度使用异常的python约定。

可能值得编写一个函数为您执行此操作：

import os, errno

def silentremove(filename):
    try:
        os.remove(filename)
    except OSError as e: # this would be "except OSError, e:" before Python 2.6
        if e.errno != errno.ENOENT: # errno.ENOENT = no such file or directory
            raise # re-raise exception if a different error occurred

A more pythonic way would be:

try:
    os.remove(filename)
except OSError:
    pass

Although this takes even more lines and looks very ugly, it avoids the unnecessary call to os.path.exists() and follows the python convention of overusing exceptions.

It may be worthwhile to write a function to do this for you:

import os, errno

def silentremove(filename):
    try:
        os.remove(filename)
    except OSError as e: # this would be "except OSError, e:" before Python 2.6
        if e.errno != errno.ENOENT: # errno.ENOENT = no such file or directory
            raise # re-raise exception if a different error occurred

回答 1

我更喜欢抑制异常，而不是检查文件的存在，以避免TOCTTOU错误。Matt的答案就是一个很好的例子，但是我们可以在Python 3下使用contextlib.suppress()以下方法稍微简化一下：

import contextlib

with contextlib.suppress(FileNotFoundError):
    os.remove(filename)

如果filename是一个pathlib.Path对象而不是字符串，我们可以调用其.unlink()方法而不是使用os.remove()。以我的经验，Path对象比字符串更有用。

由于此答案中的所有内容都是Python 3专有的，因此它提供了另一个升级理由。

I prefer to suppress an exception rather than checking for the file’s existence, to avoid a TOCTTOU bug. Matt’s answer is a good example of this, but we can simplify it slightly under Python 3, using contextlib.suppress():

import contextlib

with contextlib.suppress(FileNotFoundError):
    os.remove(filename)

If filename is a pathlib.Path object instead of a string, we can call its .unlink() method instead of using os.remove(). In my experience, Path objects are more useful than strings for filesystem manipulation.

Since everything in this answer is exclusive to Python 3, it provides yet another reason to upgrade.

回答 2

os.path.exists返回True文件夹和文件。考虑使用os.path.isfile来检查文件是否存在。

os.path.exists returns True for folders as well as files. Consider using os.path.isfile to check for whether the file exists instead.

回答 3

本着安迪·琼斯（Andy Jones）的回答精神，如何进行真正的三元运算：

os.remove(fn) if os.path.exists(fn) else None

In the spirit of Andy Jones’ answer, how about an authentic ternary operation:

os.remove(fn) if os.path.exists(fn) else None

回答 4

知道文件是否存在并删除的另一种方法是使用模块glob。

from glob import glob
import os

for filename in glob("*.csv"):
    os.remove(filename)

Glob查找所有可以使用* nix通配符选择模式的文件，并循环列表。

Another way to know if the file (or files) exists, and to remove it, is using the module glob.

from glob import glob
import os

for filename in glob("*.csv"):
    os.remove(filename)

Glob finds all the files that could select the pattern with a *nix wildcard, and loops the list.

回答 5

从Python 3.8开始，请使用missing_ok=True和pathlib.Path.unlink（docs此处）

from pathlib import Path

my_file = Path("./dir1/dir2/file.txt")

# Python 3.8+
my_file.unlink(missing_ok=True)

# Python 3.7 and earlier
if my_file.exists():
    my_file.unlink()

As of Python 3.8, use missing_ok=True and pathlib.Path.unlink (docs here)

from pathlib import Path

my_file = Path("./dir1/dir2/file.txt")

# Python 3.8+
my_file.unlink(missing_ok=True)

# Python 3.7 and earlier
if my_file.exists():
    my_file.unlink()

回答 6

Matt的答案是较老的Python和Kevin的正确答案的正确答案，答案是较新的的正确答案。

如果您不想复制的功能silentremove，则此功能在path.py中显示为remove_p：

from path import Path
Path(filename).remove_p()

Matt’s answer is the right one for older Pythons and Kevin’s the right answer for newer ones.

If you wish not to copy the function for silentremove, this functionality is exposed in path.py as remove_p:

from path import Path
Path(filename).remove_p()

回答 7

if os.path.exists(filename): os.remove(filename)

是单线的。

你们中的许多人可能会不同意（可能是出于考虑将三元体系建议使用为“丑陋”的理由）的想法，但这引出了一个问题，即当人们习惯将丑陋标准称为“非丑陋”时，我们是否应该听取他们的意见。

if os.path.exists(filename): os.remove(filename)

is a one-liner.

Many of you may disagree – possibly for reasons like considering the proposed use of ternaries “ugly” – but this begs the question of whether we should listen to people used to ugly standards when they call something non-standard “ugly”.

回答 8

在Python 3.4或更高版本中，pythonic方式为：

import os
from contextlib import suppress

with suppress(OSError):
    os.remove(filename)

In Python 3.4 or later version, the pythonic way would be:

import os
from contextlib import suppress

with suppress(OSError):
    os.remove(filename)

回答 9

像这样吗利用短路评估。如果文件不存在，则整个条件不能为真，因此python不会麻烦评估第二部分。

os.path.exists("gogogo.php") and os.remove("gogogo.php")

Something like this? Takes advantage of short-circuit evaluation. If the file does not exist, the whole conditional cannot be true, so python will not bother evaluation the second part.

os.path.exists("gogogo.php") and os.remove("gogogo.php")

回答 10

亲吻提供：

def remove_if_exists(filename):
  if os.path.exists(filename):
    os.remove(filename)

然后：

remove_if_exists("my.file")

A KISS offering:

def remove_if_exists(filename):
  if os.path.exists(filename):
    os.remove(filename)

And then:

remove_if_exists("my.file")

回答 11

这是另一个解决方案：

if os.path.isfile(os.path.join(path, filename)):
    os.remove(os.path.join(path, filename))

This is another solution:

if os.path.isfile(os.path.join(path, filename)):
    os.remove(os.path.join(path, filename))

回答 12

带有您自己的消息的另一种解决方案。

import os

try:
    os.remove(filename)
except:
    print("Not able to delete the file %s" % filename)

Another solution with your own message in exception.

import os

try:
    os.remove(filename)
except:
    print("Not able to delete the file %s" % filename)

回答 13

我曾经用过rm，它可以强制删除不存在的文件，并可以将其--preserve-root作为选项rm。

--preserve-root
              do not remove `/' (default)

rm --help | grep "force"
  -f, --force           ignore nonexistent files and arguments, never prompt

我们也可以使用safe-rm（sudo apt-get install safe-rm）

Safe-rm是一种安全工具，旨在通过使用包装程序替换/ bin / rm来防止重要文件的意外删除，该包装程序将对给定的参数与永不删除的可配置文件和目录黑名单进行检查。

首先，我检查文件夹/文件路径是否存在。这将防止设置变量fileToRemove /folderToRemove to the string-r /`。

import os, subprocess

fileToRemove = '/home/user/fileName';
if os.path.isfile(fileToRemove):
   subprocess.run(['rm', '-f', '--preserve-root', fileToRemove]
   subprocess.run(['safe-rm', '-f', fileToRemove]

I have used rm which can force to delete nonexistent files with --preserve-root as an option to rm.

--preserve-root
              do not remove `/' (default)

rm --help | grep "force"
  -f, --force           ignore nonexistent files and arguments, never prompt

We can also use safe-rm (sudo apt-get install safe-rm)

Safe-rm is a safety tool intended to prevent the accidental deletion of important files by replacing /bin/rm with a wrapper, which checks the given arguments against a configurable blacklist of files and directories that should never be removed.

First I check whether folder/file path exist or not. This will prevent setting variable fileToRemove/folderToRemoveto the string-r /`.

import os, subprocess

fileToRemove = '/home/user/fileName';
if os.path.isfile(fileToRemove):
   subprocess.run(['rm', '-f', '--preserve-root', fileToRemove]
   subprocess.run(['safe-rm', '-f', fileToRemove]

问题：如何解决：“ UnicodeDecodeError：’ascii’编解码器无法解码字节”

回答 0

tl; dr /快速修复

Python 2.x中的Unicode Zen-完整版

陷阱

例子

Unicode三明治

输入/解码

源代码

档案

CSV文件

资料库

HTTP

手动地

三明治的肉

输出量

标准输出/打印

档案

数据库

Python 3

为什么不应该使用 sys.setdefaultencoding('utf8')

tl;dr / quick fix

Unicode Zen in Python 2.x – The Long Version

Gotchas

Examples

The Unicode Sandwich

Input / Decode

Source code

Files

CSV Files

Databases

HTTP

Manually

The meat of the sandwich

Output

stdout / printing

Files

Database

Python 3

Why you shouldn’t use sys.setdefaultencoding('utf8')

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

问题：如何计算pandas DataFrame列中的NaN值

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

回答 6

回答 7

回答 8

回答 9

回答 10

回答 11

回答 12

回答 13

回答 14

回答 15

回答 16

回答 17

回答 18

回答 19

回答 20

为什么不应该使用 `sys.setdefaultencoding('utf8')`

Why you shouldn’t use `sys.setdefaultencoding('utf8')`