分类目录归档:知识问答

如何记录带有调试信息的Python错误?

问题:如何记录带有调试信息的Python错误?

我正在使用以下命令将Python异常消息打印到日志文件中logging.error

import logging
try:
    1/0
except ZeroDivisionError as e:
    logging.error(e)  # ERROR:root:division by zero

除了异常字符串以外,是否可以打印有关异常及其生成代码的更多详细信息?行号或堆栈跟踪之类的东西会很棒。

I am printing Python exception messages to a log file with logging.error:

import logging
try:
    1/0
except ZeroDivisionError as e:
    logging.error(e)  # ERROR:root:division by zero

Is it possible to print more detailed information about the exception and the code that generated it than just the exception string? Things like line numbers or stack traces would be great.


回答 0

logger.exception 将在错误消息旁边输出堆栈跟踪。

例如:

import logging
try:
    1/0
except ZeroDivisionError as e:
    logging.exception("message")

输出:

ERROR:root:message
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: integer division or modulo by zero

@Paulo Check指出:“请注意,在Python 3中,您必须logging.exceptionexcept零件内部调用该方法。如果在任意位置调用此方法,则可能会遇到奇怪的异常。文档对此有所提示。”

logger.exception will output a stack trace alongside the error message.

For example:

import logging
try:
    1/0
except ZeroDivisionError as e:
    logging.exception("message")

Output:

ERROR:root:message
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: integer division or modulo by zero

@Paulo Cheque notes, “be aware that in Python 3 you must call the logging.exception method just inside the except part. If you call this method in an arbitrary place you may get a bizarre exception. The docs alert about that.”


回答 1

约一个好处logging.exceptionSiggyF的回答并没有显示的是,你可以在任意的消息传递和记录仍然会显示完整的回溯与所有异常的详细信息:

import logging
try:
    1/0
except ZeroDivisionError:
    logging.exception("Deliberate divide by zero traceback")

在默认情况下(在最新版本中),仅将错误打印到的日志记录行为sys.stderr如下所示:

>>> import logging
>>> try:
...     1/0
... except ZeroDivisionError:
...     logging.exception("Deliberate divide by zero traceback")
... 
ERROR:root:Deliberate divide by zero traceback
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: integer division or modulo by zero

One nice thing about logging.exception that SiggyF’s answer doesn’t show is that you can pass in an arbitrary message, and logging will still show the full traceback with all the exception details:

import logging
try:
    1/0
except ZeroDivisionError:
    logging.exception("Deliberate divide by zero traceback")

With the default (in recent versions) logging behaviour of just printing errors to sys.stderr, it looks like this:

>>> import logging
>>> try:
...     1/0
... except ZeroDivisionError:
...     logging.exception("Deliberate divide by zero traceback")
... 
ERROR:root:Deliberate divide by zero traceback
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
ZeroDivisionError: integer division or modulo by zero

回答 2

使用exc_info选项可能更好,允许您选择错误级别(如果使用exception,它将始终处于错误error级别):

try:
    # do something here
except Exception as e:
    logging.critical(e, exc_info=True)  # log exception info at CRITICAL log level

Using exc_info options may be better, to allow you to choose the error level (if you use exception, it will always be at the error level):

try:
    # do something here
except Exception as e:
    logging.critical(e, exc_info=True)  # log exception info at CRITICAL log level

回答 3

报价单

如果您的应用程序以其他方式记录日志而不使用logging模块怎么办?

现在,traceback可以在这里使用。

import traceback

def log_traceback(ex, ex_traceback=None):
    if ex_traceback is None:
        ex_traceback = ex.__traceback__
    tb_lines = [ line.rstrip('\n') for line in
                 traceback.format_exception(ex.__class__, ex, ex_traceback)]
    exception_logger.log(tb_lines)
  • Python 2中使用它:

    try:
        # your function call is here
    except Exception as ex:
        _, _, ex_traceback = sys.exc_info()
        log_traceback(ex, ex_traceback)
  • Python 3中使用它:

    try:
        x = get_number()
    except Exception as ex:
        log_traceback(ex)

Quoting

What if your application does logging some other way – not using the logging module?

Now, traceback could be used here.

import traceback

def log_traceback(ex, ex_traceback=None):
    if ex_traceback is None:
        ex_traceback = ex.__traceback__
    tb_lines = [ line.rstrip('\n') for line in
                 traceback.format_exception(ex.__class__, ex, ex_traceback)]
    exception_logger.log(tb_lines)
  • Use it in Python 2:

    try:
        # your function call is here
    except Exception as ex:
        _, _, ex_traceback = sys.exc_info()
        log_traceback(ex, ex_traceback)
    
  • Use it in Python 3:

    try:
        x = get_number()
    except Exception as ex:
        log_traceback(ex)
    

回答 4

如果您使用纯日志-您的所有日志记录都应符合以下规则:one record = one line。遵循此规则,您可以使用grep和其他工具来处理日志文件。

但是回溯信息是多行的。因此,我的答案是zangw在此线程中提出的解决方案的扩展版本。问题是回溯线可能在\n内部,因此我们需要做一些额外的工作来消除该行的结尾:

import logging


logger = logging.getLogger('your_logger_here')

def log_app_error(e: BaseException, level=logging.ERROR) -> None:
    e_traceback = traceback.format_exception(e.__class__, e, e.__traceback__)
    traceback_lines = []
    for line in [line.rstrip('\n') for line in e_traceback]:
        traceback_lines.extend(line.splitlines())
    logger.log(level, traceback_lines.__str__())

之后(当您要分析日志时),您可以从日志文件中复制/粘贴所需的回溯行,然后执行以下操作:

ex_traceback = ['line 1', 'line 2', ...]
for line in ex_traceback:
    print(line)

利润!

If you use plain logs – all your log records should correspond this rule: one record = one line. Following this rule you can use grep and other tools to process your log files.

But traceback information is multi-line. So my answer is an extended version of solution proposed by zangw above in this thread. The problem is that traceback lines could have \n inside, so we need to do an extra work to get rid of this line endings:

import logging


logger = logging.getLogger('your_logger_here')

def log_app_error(e: BaseException, level=logging.ERROR) -> None:
    e_traceback = traceback.format_exception(e.__class__, e, e.__traceback__)
    traceback_lines = []
    for line in [line.rstrip('\n') for line in e_traceback]:
        traceback_lines.extend(line.splitlines())
    logger.log(level, traceback_lines.__str__())

After that (when you’ll be analyzing your logs) you could copy / paste required traceback lines from your log file and do this:

ex_traceback = ['line 1', 'line 2', ...]
for line in ex_traceback:
    print(line)

Profit!


回答 5

这个答案是建立在上述优秀答案之上的。

在大多数应用程序中,您不会直接调用logging.exception(e)。您很可能已经定义了特定于您的应用程序或模块的自定义记录器,如下所示:

# Set the name of the app or module
my_logger = logging.getLogger('NEM Sequencer')
# Set the log level
my_logger.setLevel(logging.INFO)

# Let's say we want to be fancy and log to a graylog2 log server
graylog_handler = graypy.GELFHandler('some_server_ip', 12201)
graylog_handler.setLevel(logging.INFO)
my_logger.addHandler(graylog_handler)

在这种情况下,只需使用记录器调用异常(e),如下所示:

try:
    1/0
except ZeroDivisionError, e:
    my_logger.exception(e)

This answer builds up from the above excellent ones.

In most applications, you won’t be calling logging.exception(e) directly. Most likely you have defined a custom logger specific for your application or module like this:

# Set the name of the app or module
my_logger = logging.getLogger('NEM Sequencer')
# Set the log level
my_logger.setLevel(logging.INFO)

# Let's say we want to be fancy and log to a graylog2 log server
graylog_handler = graypy.GELFHandler('some_server_ip', 12201)
graylog_handler.setLevel(logging.INFO)
my_logger.addHandler(graylog_handler)

In this case, just use the logger to call the exception(e) like this:

try:
    1/0
except ZeroDivisionError, e:
    my_logger.exception(e)

回答 6

您可以毫无exceptions地记录堆栈跟踪。

https://docs.python.org/3/library/logging.html#logging.Logger.debug

第二个可选的关键字参数是stack_info,默认为False。如果为true,则将堆栈信息添加到日志消息中,包括实际的日志调用。请注意,这与通过指定exc_info显示的堆栈信息不同:前者是从堆栈底部到当前线程中的日志记录调用的堆栈帧,而后者是有关已取消缠绕的堆栈帧的信息,在搜索异常处理程序时跟踪异常。

例:

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> logging.getLogger().info('This prints the stack', stack_info=True)
INFO:root:This prints the stack
Stack (most recent call last):
  File "<stdin>", line 1, in <module>
>>>

You can log the stack trace without an exception.

https://docs.python.org/3/library/logging.html#logging.Logger.debug

The second optional keyword argument is stack_info, which defaults to False. If true, stack information is added to the logging message, including the actual logging call. Note that this is not the same stack information as that displayed through specifying exc_info: The former is stack frames from the bottom of the stack up to the logging call in the current thread, whereas the latter is information about stack frames which have been unwound, following an exception, while searching for exception handlers.

Example:

>>> import logging
>>> logging.basicConfig(level=logging.DEBUG)
>>> logging.getLogger().info('This prints the stack', stack_info=True)
INFO:root:This prints the stack
Stack (most recent call last):
  File "<stdin>", line 1, in <module>
>>>

回答 7

一点点装饰器处理(受Maybe monad和举重的启发很松散)。您可以安全地删除Python 3.6类型注释,并使用较旧的消息格式样式。

fallable.py

from functools import wraps
from typing import Callable, TypeVar, Optional
import logging


A = TypeVar('A')


def fallible(*exceptions, logger=None) \
        -> Callable[[Callable[..., A]], Callable[..., Optional[A]]]:
    """
    :param exceptions: a list of exceptions to catch
    :param logger: pass a custom logger; None means the default logger, 
                   False disables logging altogether.
    """
    def fwrap(f: Callable[..., A]) -> Callable[..., Optional[A]]:

        @wraps(f)
        def wrapped(*args, **kwargs):
            try:
                return f(*args, **kwargs)
            except exceptions:
                message = f'called {f} with *args={args} and **kwargs={kwargs}'
                if logger:
                    logger.exception(message)
                if logger is None:
                    logging.exception(message)
                return None

        return wrapped

    return fwrap

演示:

In [1] from fallible import fallible

In [2]: @fallible(ArithmeticError)
    ...: def div(a, b):
    ...:     return a / b
    ...: 
    ...: 

In [3]: div(1, 2)
Out[3]: 0.5

In [4]: res = div(1, 0)
ERROR:root:called <function div at 0x10d3c6ae8> with *args=(1, 0) and **kwargs={}
Traceback (most recent call last):
  File "/Users/user/fallible.py", line 17, in wrapped
    return f(*args, **kwargs)
  File "<ipython-input-17-e056bd886b5c>", line 3, in div
    return a / b

In [5]: repr(res)
'None'

您还可以修改此解决方案来回报比的东西更有意义一点Noneexcept部分(甚至使溶液一般,通过指定该返回值fallible的论点)。

A little bit of decorator treatment (very loosely inspired by the Maybe monad and lifting). You can safely remove Python 3.6 type annotations and use an older message formatting style.

fallible.py

from functools import wraps
from typing import Callable, TypeVar, Optional
import logging


A = TypeVar('A')


def fallible(*exceptions, logger=None) \
        -> Callable[[Callable[..., A]], Callable[..., Optional[A]]]:
    """
    :param exceptions: a list of exceptions to catch
    :param logger: pass a custom logger; None means the default logger, 
                   False disables logging altogether.
    """
    def fwrap(f: Callable[..., A]) -> Callable[..., Optional[A]]:

        @wraps(f)
        def wrapped(*args, **kwargs):
            try:
                return f(*args, **kwargs)
            except exceptions:
                message = f'called {f} with *args={args} and **kwargs={kwargs}'
                if logger:
                    logger.exception(message)
                if logger is None:
                    logging.exception(message)
                return None

        return wrapped

    return fwrap

Demo:

In [1] from fallible import fallible

In [2]: @fallible(ArithmeticError)
    ...: def div(a, b):
    ...:     return a / b
    ...: 
    ...: 

In [3]: div(1, 2)
Out[3]: 0.5

In [4]: res = div(1, 0)
ERROR:root:called <function div at 0x10d3c6ae8> with *args=(1, 0) and **kwargs={}
Traceback (most recent call last):
  File "/Users/user/fallible.py", line 17, in wrapped
    return f(*args, **kwargs)
  File "<ipython-input-17-e056bd886b5c>", line 3, in div
    return a / b

In [5]: repr(res)
'None'

You can also modify this solution to return something a bit more meaningful than None from the except part (or even make the solution generic, by specifying this return value in fallible‘s arguments).


回答 8

在您的日志记录模块(如果是自定义模块)中,只需启用stack_info即可。

api_logger.exceptionLog("*Input your Custom error message*",stack_info=True)

In your logging module(if custom module) just enable stack_info.

api_logger.exceptionLog("*Input your Custom error message*",stack_info=True)

回答 9

如果您可以处理额外的依赖关系,则可以使用twisted.log,您不必显式记录错误,而且它会将整个回溯和时间返回到文件或流。

If you can cope with the extra dependency then use twisted.log, you don’t have to explicitly log errors and also it returns the entire traceback and time to the file or stream.


回答 10

一种干净的方法是使用format_exc(),然后解析输出以获取相关部分:

from traceback import format_exc

try:
    1/0
except Exception:
    print 'the relevant part is: '+format_exc().split('\n')[-2]

问候

A clean way to do it is using format_exc() and then parse the output to get the relevant part:

from traceback import format_exc

try:
    1/0
except Exception:
    print 'the relevant part is: '+format_exc().split('\n')[-2]

Regards


如何删除文件夹的内容?

问题:如何删除文件夹的内容?

如何在Python中删除本地文件夹的内容?

当前项目适用于Windows,但我也希望看到* nix。

How can I delete the contents of a local folder in Python?

The current project is for Windows, but I would like to see *nix also.


回答 0

import os, shutil
folder = '/path/to/folder'
for filename in os.listdir(folder):
    file_path = os.path.join(folder, filename)
    try:
        if os.path.isfile(file_path) or os.path.islink(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)
    except Exception as e:
        print('Failed to delete %s. Reason: %s' % (file_path, e))
import os, shutil
folder = '/path/to/folder'
for filename in os.listdir(folder):
    file_path = os.path.join(folder, filename)
    try:
        if os.path.isfile(file_path) or os.path.islink(file_path):
            os.unlink(file_path)
        elif os.path.isdir(file_path):
            shutil.rmtree(file_path)
    except Exception as e:
        print('Failed to delete %s. Reason: %s' % (file_path, e))

回答 1

您可以简单地做到这一点:

import os
import glob

files = glob.glob('/YOUR/PATH/*')
for f in files:
    os.remove(f)

当然,您可以在路径中使用其他过滤器,例如:/YOU/PATH/*.txt,以删除目录中的所有文本文件。

You can simply do this:

import os
import glob

files = glob.glob('/YOUR/PATH/*')
for f in files:
    os.remove(f)

You can of course use an other filter in you path, for example : /YOU/PATH/*.txt for removing all text files in a directory.


回答 2

您可以使用以下命令删除文件夹本身及其所有内容shutil.rmtree

import shutil
shutil.rmtree('/path/to/folder')
shutil.rmtree(path, ignore_errors=False, onerror=None)


删除整个目录树;路径必须指向目录(但不能指向目录的符号链接)。如果ignore_errors为true,则删除失败导致的错误将被忽略;如果为false或忽略,则通过调用onerror指定的处理程序来处理此类错误;如果省略,则引发异常。

You can delete the folder itself, as well as all its contents, using shutil.rmtree:

import shutil
shutil.rmtree('/path/to/folder')
shutil.rmtree(path, ignore_errors=False, onerror=None)


Delete an entire directory tree; path must point to a directory (but not a symbolic link to a directory). If ignore_errors is true, errors resulting from failed removals will be ignored; if false or omitted, such errors are handled by calling a handler specified by onerror or, if that is omitted, they raise an exception.


回答 3

扩展mhawke的答案,这就是我已经实现的方法。它会删除文件夹的所有内容,但不会删除文件夹本身。在Linux上使用文件,文件夹和符号链接进行了测试,也应该在Windows上运行。

import os
import shutil

for root, dirs, files in os.walk('/path/to/folder'):
    for f in files:
        os.unlink(os.path.join(root, f))
    for d in dirs:
        shutil.rmtree(os.path.join(root, d))

Expanding on mhawke’s answer this is what I’ve implemented. It removes all the content of a folder but not the folder itself. Tested on Linux with files, folders and symbolic links, should work on Windows as well.

import os
import shutil

for root, dirs, files in os.walk('/path/to/folder'):
    for f in files:
        os.unlink(os.path.join(root, f))
    for d in dirs:
        shutil.rmtree(os.path.join(root, d))

回答 4

使用rmtree和重新创建文件夹可能有效,但是删除并立即在网络驱动器上重新创建文件夹时遇到错误。

提议的使用walk的解决方案不起作用,因为它用于rmtree删除文件夹,然后可能会尝试使用os.unlink这些文件夹中以前的文件。这会导致错误。

发布的glob解决方案还将尝试删除非空文件夹,从而导致错误。

我建议您使用:

folder_path = '/path/to/folder'
for file_object in os.listdir(folder_path):
    file_object_path = os.path.join(folder_path, file_object)
    if os.path.isfile(file_object_path) or os.path.islink(file_object_path):
        os.unlink(file_object_path)
    else:
        shutil.rmtree(file_object_path)

Using rmtree and recreating the folder could work, but I have run into errors when deleting and immediately recreating folders on network drives.

The proposed solution using walk does not work as it uses rmtree to remove folders and then may attempt to use os.unlink on the files that were previously in those folders. This causes an error.

The posted glob solution will also attempt to delete non-empty folders, causing errors.

I suggest you use:

folder_path = '/path/to/folder'
for file_object in os.listdir(folder_path):
    file_object_path = os.path.join(folder_path, file_object)
    if os.path.isfile(file_object_path) or os.path.islink(file_object_path):
        os.unlink(file_object_path)
    else:
        shutil.rmtree(file_object_path)

回答 5

这个:

  • 删除所有符号链接
    • 无效链接
    • 链接到目录
    • 链接到文件
  • 删除子目录
  • 不删除父目录

码:

for filename in os.listdir(dirpath):
    filepath = os.path.join(dirpath, filename)
    try:
        shutil.rmtree(filepath)
    except OSError:
        os.remove(filepath)

与许多其他答案一样,这不会尝试调整权限以启用文件/目录的删除。

This:

  • removes all symbolic links
    • dead links
    • links to directories
    • links to files
  • removes subdirectories
  • does not remove the parent directory

Code:

for filename in os.listdir(dirpath):
    filepath = os.path.join(dirpath, filename)
    try:
        shutil.rmtree(filepath)
    except OSError:
        os.remove(filepath)

As many other answers, this does not try to adjust permissions to enable removal of files/directories.


回答 6

作为单线:

import os

# Python 2.7
map( os.unlink, (os.path.join( mydir,f) for f in os.listdir(mydir)) )

# Python 3+
list( map( os.unlink, (os.path.join( mydir,f) for f in os.listdir(mydir)) ) )

一个考虑文件和目录的更健壮的解决方案是(2.7):

def rm(f):
    if os.path.isdir(f): return os.rmdir(f)
    if os.path.isfile(f): return os.unlink(f)
    raise TypeError, 'must be either file or directory'

map( rm, (os.path.join( mydir,f) for f in os.listdir(mydir)) )

As a oneliner:

import os

# Python 2.7
map( os.unlink, (os.path.join( mydir,f) for f in os.listdir(mydir)) )

# Python 3+
list( map( os.unlink, (os.path.join( mydir,f) for f in os.listdir(mydir)) ) )

A more robust solution accounting for files and directories as well would be (2.7):

def rm(f):
    if os.path.isdir(f): return os.rmdir(f)
    if os.path.isfile(f): return os.unlink(f)
    raise TypeError, 'must be either file or directory'

map( rm, (os.path.join( mydir,f) for f in os.listdir(mydir)) )

回答 7

注意:万一有人否决了我的答案,请在此说明。

  1. 每个人都喜欢简短的“ n”个简单答案。但是,有时现实并非如此简单。
  2. 回到我的答案。我知道shutil.rmtree()可以用来删除目录树。我在自己的项目中使用了很多次。但是您必须意识到目录本身也会被删除shutil.rmtree()。尽管这对于某些人来说可能是可以接受的,但这对于删除文件夹的内容不是一个有效的答案(无副作用)
  3. 我会给你一个副作用的例子。假设您有一个包含自定义所有者和模式位的目录,其中包含很多内容。然后,您使用删除它shutil.rmtree()并使用重建它os.mkdir()。然后,您将获得一个空目录,该目录具有默认(继承)的所有者和模式位。尽管您可能有权删除目录甚至目录,但是您可能无法在目录上设置原始所有者和模式位(例如,您不是超级用户)。
  4. 最后,请耐心阅读代码。它长而丑陋(可见),但事实证明是可靠且有效的(使用中)。

这是一个长而丑陋但可靠且有效的解决方案。

它解决了一些其他答复者无法解决的问题:

  • 它可以正确处理符号链接,包括不调用shutil.rmtree()符号链接(os.path.isdir()如果它链接到目录,它将通过测试;甚至结果也os.walk()包含符号链接目录)。
  • 它很好地处理了只读文件。

这是代码(唯一有用的函数是clear_dir()):

import os
import stat
import shutil


# http://stackoverflow.com/questions/1889597/deleting-directory-in-python
def _remove_readonly(fn, path_, excinfo):
    # Handle read-only files and directories
    if fn is os.rmdir:
        os.chmod(path_, stat.S_IWRITE)
        os.rmdir(path_)
    elif fn is os.remove:
        os.lchmod(path_, stat.S_IWRITE)
        os.remove(path_)


def force_remove_file_or_symlink(path_):
    try:
        os.remove(path_)
    except OSError:
        os.lchmod(path_, stat.S_IWRITE)
        os.remove(path_)


# Code from shutil.rmtree()
def is_regular_dir(path_):
    try:
        mode = os.lstat(path_).st_mode
    except os.error:
        mode = 0
    return stat.S_ISDIR(mode)


def clear_dir(path_):
    if is_regular_dir(path_):
        # Given path is a directory, clear its content
        for name in os.listdir(path_):
            fullpath = os.path.join(path_, name)
            if is_regular_dir(fullpath):
                shutil.rmtree(fullpath, onerror=_remove_readonly)
            else:
                force_remove_file_or_symlink(fullpath)
    else:
        # Given path is a file or a symlink.
        # Raise an exception here to avoid accidentally clearing the content
        # of a symbolic linked directory.
        raise OSError("Cannot call clear_dir() on a symbolic link")

Notes: in case someone down voted my answer, I have something to explain here.

  1. Everyone likes short ‘n’ simple answers. However, sometimes the reality is not so simple.
  2. Back to my answer. I know shutil.rmtree() could be used to delete a directory tree. I’ve used it many times in my own projects. But you must realize that the directory itself will also be deleted by shutil.rmtree(). While this might be acceptable for some, it’s not a valid answer for deleting the contents of a folder (without side effects).
  3. I’ll show you an example of the side effects. Suppose that you have a directory with customized owner and mode bits, where there are a lot of contents. Then you delete it with shutil.rmtree() and rebuild it with os.mkdir(). And you’ll get an empty directory with default (inherited) owner and mode bits instead. While you might have the privilege to delete the contents and even the directory, you might not be able to set back the original owner and mode bits on the directory (e.g. you’re not a superuser).
  4. Finally, be patient and read the code. It’s long and ugly (in sight), but proven to be reliable and efficient (in use).

Here’s a long and ugly, but reliable and efficient solution.

It resolves a few problems which are not addressed by the other answerers:

  • It correctly handles symbolic links, including not calling shutil.rmtree() on a symbolic link (which will pass the os.path.isdir() test if it links to a directory; even the result of os.walk() contains symbolic linked directories as well).
  • It handles read-only files nicely.

Here’s the code (the only useful function is clear_dir()):

import os
import stat
import shutil


# http://stackoverflow.com/questions/1889597/deleting-directory-in-python
def _remove_readonly(fn, path_, excinfo):
    # Handle read-only files and directories
    if fn is os.rmdir:
        os.chmod(path_, stat.S_IWRITE)
        os.rmdir(path_)
    elif fn is os.remove:
        os.lchmod(path_, stat.S_IWRITE)
        os.remove(path_)


def force_remove_file_or_symlink(path_):
    try:
        os.remove(path_)
    except OSError:
        os.lchmod(path_, stat.S_IWRITE)
        os.remove(path_)


# Code from shutil.rmtree()
def is_regular_dir(path_):
    try:
        mode = os.lstat(path_).st_mode
    except os.error:
        mode = 0
    return stat.S_ISDIR(mode)


def clear_dir(path_):
    if is_regular_dir(path_):
        # Given path is a directory, clear its content
        for name in os.listdir(path_):
            fullpath = os.path.join(path_, name)
            if is_regular_dir(fullpath):
                shutil.rmtree(fullpath, onerror=_remove_readonly)
            else:
                force_remove_file_or_symlink(fullpath)
    else:
        # Given path is a file or a symlink.
        # Raise an exception here to avoid accidentally clearing the content
        # of a symbolic linked directory.
        raise OSError("Cannot call clear_dir() on a symbolic link")

回答 8

我感到惊讶的是,没有人提到pathlib做这项工作很棒。

如果您只想删除目录中的文件,则可以将其作为一个文件

from pathlib import Path

[f.unlink() for f in Path("/path/to/folder").glob("*") if f.is_file()] 

要还递归地删除目录,您可以编写如下内容:

from pathlib import Path
from shutil import rmtree

for path in Path("/path/to/folder").glob("**/*"):
    if path.is_file():
        path.unlink()
    elif path.is_dir():
        rmtree(path)

I’m surprised nobody has mentioned the awesome pathlib to do this job.

If you only want to remove files in a directory it can be a oneliner

from pathlib import Path

[f.unlink() for f in Path("/path/to/folder").glob("*") if f.is_file()] 

To also recursively remove directories you can write something like this:

from pathlib import Path
from shutil import rmtree

for path in Path("/path/to/folder").glob("**/*"):
    if path.is_file():
        path.unlink()
    elif path.is_dir():
        rmtree(path)

回答 9

import os
import shutil

# Gather directory contents
contents = [os.path.join(target_dir, i) for i in os.listdir(target_dir)]

# Iterate and remove each item in the appropriate manner
[os.remove(i) if os.path.isfile(i) or os.path.islink(i) else shutil.rmtree(i) for i in contents]

较早的注释还提到在Python 3.5+中使用os.scandir。例如:

import os
import shutil

with os.scandir(target_dir) as entries:
    for entry in entries:
        if entry.is_file() or entry.is_symlink():
            os.remove(entry.path)
        elif entry.is_dir():
            shutil.rmtree(entry.path)
import os
import shutil

# Gather directory contents
contents = [os.path.join(target_dir, i) for i in os.listdir(target_dir)]

# Iterate and remove each item in the appropriate manner
[os.remove(i) if os.path.isfile(i) or os.path.islink(i) else shutil.rmtree(i) for i in contents]

An earlier comment also mentions using os.scandir in Python 3.5+. For example:

import os
import shutil

with os.scandir(target_dir) as entries:
    for entry in entries:
        if entry.is_file() or entry.is_symlink():
            os.remove(entry.path)
        elif entry.is_dir():
            shutil.rmtree(entry.path)

回答 10

使用os.walk()此功能可能会更好。

os.listdir()不能将文件与目录区分开来,因此您在尝试取消链接时会很快遇到麻烦。有使用的一个很好的例子os.walk()递归删除目录在这里,以及如何使其适应你的情况提示。

You might be better off using os.walk() for this.

os.listdir() doesn’t distinguish files from directories and you will quickly get into trouble trying to unlink these. There is a good example of using os.walk() to recursively remove a directory here, and hints on how to adapt it to your circumstances.


回答 11

我曾经通过这种方式解决问题:

import shutil
import os

shutil.rmtree(dirpath)
os.mkdir(dirpath)

I used to solve the problem this way:

import shutil
import os

shutil.rmtree(dirpath)
os.mkdir(dirpath)

回答 12

另一个解决方案:

import sh
sh.rm(sh.glob('/path/to/folder/*'))

Yet Another Solution:

import sh
sh.rm(sh.glob('/path/to/folder/*'))

回答 13

我知道这是一个旧线程,但是我从python的官方站点发现了一些有趣的东西。只是为了分享另一个想法,即删除目录中的所有内容。因为在使用shutil.rmtree()时遇到授权问题,所以我不想删除目录并重新创建它。原始地址为http://docs.python.org/2/library/os.html#os.walk。希望可以帮助到某人。

def emptydir(top):
    if(top == '/' or top == "\\"): return
    else:
        for root, dirs, files in os.walk(top, topdown=False):
            for name in files:
                os.remove(os.path.join(root, name))
            for name in dirs:
                os.rmdir(os.path.join(root, name))

I konw it’s an old thread but I have found something interesting from the official site of python. Just for sharing another idea for removing of all contents in a directory. Because I have some problems of authorization when using shutil.rmtree() and I don’t want to remove the directory and recreate it. The address original is http://docs.python.org/2/library/os.html#os.walk. Hope that could help someone.

def emptydir(top):
    if(top == '/' or top == "\\"): return
    else:
        for root, dirs, files in os.walk(top, topdown=False):
            for name in files:
                os.remove(os.path.join(root, name))
            for name in dirs:
                os.rmdir(os.path.join(root, name))

回答 14

要删除目录及其子目录中的所有文件而不删除文件夹本身,只需执行以下操作:

import os
mypath = "my_folder" #Enter your path here
for root, dirs, files in os.walk(mypath):
    for file in files:
        os.remove(os.path.join(root, file))

To delete all the files inside the directory as well as its sub-directories, without removing the folders themselves, simply do this:

import os
mypath = "my_folder" #Enter your path here
for root, dirs, files in os.walk(mypath):
    for file in files:
        os.remove(os.path.join(root, file))

回答 15

如果使用的是* nix系统,为什么不利用system命令?

import os
path = 'folder/to/clean'
os.system('rm -rf %s/*' % path)

If you are using a *nix system, why not leverage the system command?

import os
path = 'folder/to/clean'
os.system('rm -rf %s/*' % path)

回答 16

相当直观的方式:

import shutil, os


def remove_folder_contents(path):
    shutil.rmtree(path)
    os.makedirs(path)


remove_folder_contents('/path/to/folder')

Pretty intuitive way of doing it:

import shutil, os


def remove_folder_contents(path):
    shutil.rmtree(path)
    os.makedirs(path)


remove_folder_contents('/path/to/folder')

回答 17

好吧,我认为这段代码可以正常工作。它不会删除该文件夹,您可以使用此代码删除具有特定扩展名的文件。

import os
import glob

files = glob.glob(r'path/*')
for items in files:
    os.remove(items)

Well, I think this code is working. It will not delete the folder and you can use this code to delete files having the particular extension.

import os
import glob

files = glob.glob(r'path/*')
for items in files:
    os.remove(items)

回答 18

我必须从单个父目录中的3个单独的文件夹中删除文件:

directory
   folderA
      file1
   folderB
      file2
   folderC
      file3

这个简单的代码帮了我大忙:(我在Unix上)

import os
import glob

folders = glob.glob('./path/to/parentdir/*')
for fo in folders:
  file = glob.glob(f'{fo}/*')
  for f in file:
    os.remove(f)

希望这可以帮助。

I had to remove files from 3 separate folders inside a single parent directory:

directory
   folderA
      file1
   folderB
      file2
   folderC
      file3

This simple code did the trick for me: (I’m on Unix)

import os
import glob

folders = glob.glob('./path/to/parentdir/*')
for fo in folders:
  file = glob.glob(f'{fo}/*')
  for f in file:
    os.remove(f)

Hope this helps.


回答 19

rmtree makedirs通过添加以下内容解决了该问题time.sleep()

if os.path.isdir(folder_location):
    shutil.rmtree(folder_location)

time.sleep(.5)

os.makedirs(folder_location, 0o777)

I resolved the issue with rmtree makedirs by adding time.sleep() between:

if os.path.isdir(folder_location):
    shutil.rmtree(folder_location)

time.sleep(.5)

os.makedirs(folder_location, 0o777)

回答 20

回答有限的特定情况:假设您要在维护子文件夹树时删除文件,则可以使用递归算法:

import os

def recursively_remove_files(f):
    if os.path.isfile(f):
        os.unlink(f)
    elif os.path.isdir(f):
        for fi in os.listdir(f):
            recursively_remove_files(os.path.join(f, fi))

recursively_remove_files(my_directory)

也许有点题外话,但我认为许多人会觉得有用

Answer for a limited, specific situation: assuming you want to delete the files while maintainig the subfolders tree, you could use a recursive algorithm:

import os

def recursively_remove_files(f):
    if os.path.isfile(f):
        os.unlink(f)
    elif os.path.isdir(f):
        for fi in os.listdir(f):
            recursively_remove_files(os.path.join(f, fi))

recursively_remove_files(my_directory)

Maybe slightly off-topic, but I think many would find it useful


回答 21

假设temp_dir要删除,使用的单行命令os将是:

_ = [os.remove(os.path.join(save_dir,i)) for i in os.listdir(temp_dir)]

注意:这只是删除文件的1线。

希望这可以帮助。谢谢。

Assuming temp_dir to be deleted, a single line command using os would be:

_ = [os.remove(os.path.join(save_dir,i)) for i in os.listdir(temp_dir)]

Note: This is only a 1-liner for deleting files’ Doesn’t delete directories.

Hope this helps. Thanks.


回答 22

使用下面的方法删除目录的内容,而不是目录本身:

import os
import shutil

def remove_contents(path):
    for c in os.listdir(path):
        full_path = os.path.join(path, c)
        if os.path.isfile(full_path):
            os.remove(full_path)
        else:
            shutil.rmtree(full_path)

Use the method bellow to remove the contents of a directory, not the directory itself:

import os
import shutil

def remove_contents(path):
    for c in os.listdir(path):
        full_path = os.path.join(path, c)
        if os.path.isfile(full_path):
            os.remove(full_path)
        else:
            shutil.rmtree(full_path)

回答 23

删除文件夹中的所有文件/删除所有文件的最简单方法

import os
files = os.listdir(yourFilePath)
for f in files:
    os.remove(yourFilePath + f)

the easiest way to delete all files in a folder/remove all files

import os
files = os.listdir(yourFilePath)
for f in files:
    os.remove(yourFilePath + f)

回答 24

仅使用OS模块列出然后删除,就可以达到目的。

import os
DIR = os.list('Folder')
for i in range(len(DIR)):
    os.remove('Folder'+chr(92)+i)

为我工作,任何问题都让我知道!

This should do the trick just using the OS module to list and then remove!

import os
DIR = os.list('Folder')
for i in range(len(DIR)):
    os.remove('Folder'+chr(92)+i)

Worked for me, any problems let me know!


升级pip后出错:无法导入名称“ main”

问题:升级pip后出错:无法导入名称“ main”

每当我尝试使用pip安装任何软件包时,都会收到此导入错误:

guru@guru-notebook:~$ pip3 install numpy
Traceback (most recent call last):
  File "/usr/bin/pip3", line 9, in <module>
    from pip import main
ImportError: cannot import name 'main'


guru@guru-notebook:~$ cat `which pip3`
#!/usr/bin/python3
# GENERATED BY DEBIAN

import sys

# Run the main entry point, similarly to how setuptools does it, but because
# we didn't install the actual entry point from setup.py, don't use the
# pkg_resources API.
from pip import main
if __name__ == '__main__':
    sys.exit(main())

之前它运行良好,我不确定为什么会引发此错误。我已经搜索了此错误,但找不到任何可解决的错误。

如果您需要更多详细信息,请告诉我,我将更新我的问题。

Whenever I am trying to install any package using pip, I am getting this import error:

guru@guru-notebook:~$ pip3 install numpy
Traceback (most recent call last):
  File "/usr/bin/pip3", line 9, in <module>
    from pip import main
ImportError: cannot import name 'main'


guru@guru-notebook:~$ cat `which pip3`
#!/usr/bin/python3
# GENERATED BY DEBIAN

import sys

# Run the main entry point, similarly to how setuptools does it, but because
# we didn't install the actual entry point from setup.py, don't use the
# pkg_resources API.
from pip import main
if __name__ == '__main__':
    sys.exit(main())

It was working fine earlier, I am not sure why it is throwing this error. I have searched about this error, but can’t find anything to fix it.

Please let me know if you need any further detail, I will update my question.


回答 0

您必须不经意间升级了系统pip(可能通过sudo pip install pip --upgrade

pip 10.x调整其内部位置。pip3您看到的命令是您的软件包维护者提供的(这里大概是基于debian的?),而不是pip管理的文件。

您可以在pip的问题跟踪器上了解有关此内容的更多信息

你可能会想升级系统PIP和改为使用的virtualenv。

要恢复pip3二进制文件,您需要sudo python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall

如果您想继续在“不受支持的地区”(在系统软件包管理器之外升级系统软件包),则可以选择使用python3 -m pip ...代替pip3

You must have inadvertently upgraded your system pip (probably through something like sudo pip install pip --upgrade)

pip 10.x adjusts where its internals are situated. The pip3 command you’re seeing is one provided by your package maintainer (presumably debian based here?) and is not a file managed by pip.

You can read more about this on pip’s issue tracker

You’ll probably want to not upgrade your system pip and instead use a virtualenv.

To recover the pip3 binary you’ll need to sudo python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall.

If you want to continue in “unsupported territory” (upgrading a system package outside of the system package manager), you can probably get away with python3 -m pip ... instead of pip3.


回答 1

我们可以通过修改pip文件来清除错误。

检查文件的位置:

$ which pip

路径-> / usr / bin / pip

转到该位置(/ usr / bin / pip)并打开终端

输入: $ sudo nano pip

您可以看到:

import sys
from pip import main
if __name__ == '__main__':
     sys.exit(main())

改成:

import sys
from pip import __main__
if __name__ == '__main__':
     sys.exit(__main__._main())

然后按Ctrl + o写入更改并退出

希望能做到!

We can clear the error by modifying the pip file.

Check the location of the file:

$ which pip

path -> /usr/bin/pip

Go to that location(/usr/bin/pip) and open terminal

Enter: $ sudo nano pip

You can see:

import sys
from pip import main
if __name__ == '__main__':
     sys.exit(main())

Change to:

import sys
from pip import __main__
if __name__ == '__main__':
     sys.exit(__main__._main())

then ctrl + o write the changes and exit

Hope this will do!!


回答 2

对于Ubuntu系列,Debian和Linux Mint用户

多亏了Anthony的上述说明,您可以保留原始系统pip(位于/ usr / bin /和dist-packages /中)并删除手动安装的pip(位于〜/ .local /中)以解决冲突:

$ python3 -m pip uninstall pip

来自python3-pipdebian软件包的Ubuntu / Debian pip v8.1.1(16.04)(请参阅参考资料$ pip3 -V)显示的搜索结果与最新的pip v10.0.1相同,并且可以从PyPI安装最新的模块。它具有有效的pip命令(已在$ PATH中),以及--user自2016年以来默认修补的nice 选项。查看pip发行说明,较新的版本主要是针对用例特定的错误修复和某些新功能,因此不是每个人都必须赶紧升级点子。无论如何,新的pip 10可以部署到Python virtualenvs。

但是,无论使用哪种pip,您的操作系统都可以通过APT快速安装常见的Python模块(包括numpy),而无需使用pip,例如:(
$ sudo apt install python3-numpy python3-scipy具有系统依赖性)
$ sudo apt install python3-pip(Debian修补的pip,稍旧,但是没关系)

快速apt语法提醒(请参阅man apt有关详细信息):(
$ sudo apt update以从最新源重新同步Ubuntu软件包索引文件)
$ apt search <python-package-name> (对所有可用软件包进行全文搜索)
$ apt show <python-package-name>(显示详细的软件包说明)
$ sudo apt install <python-package-name>

前缀python-为的软件包名称适用于Python 2;并带有前缀python3-用于Python 3(例如python3-pandas)。有成千上万个,它们在Debian和Ubuntu中进行集成测试。除非您寻求在每个用户级别(pip install --user选件)或在virtualenv / venv中安装,否则可能会需要apt。这些系统程序包也可以从虚拟环境访问,因为如果您的环境没有给定模块的副本,则virtualenv将在使用时优雅地转而使用系统库。您自定义安装的(带有pip --user)每用户模块~/.local/lib也会覆盖它们。

请注意,由于这是系统范围的安装,因此您几乎不需要删除它们(需要注意OS依赖性)。这对于具有许多系统依赖性的软件包(例如,使用scipy或matplotlib)很方便,因为APT会跟踪并提供所有必需的系统库和C扩展名,而使用pip则无法保证

实际上,对于系统范围的Python软件包(与按用户,主目录级别或更低级别的软件包相反),Ubuntu 希望使用APT软件包管理器(而不是sudo pip)来避免破坏OS:sudo pip3/usr/lib/python3/dist-packagesAPT存储OS的同一目录为目标敏感模块。Debian / Ubuntu的最新发行版在很大程度上依赖于Python 3,因此其预装模块由apt且不应该更改。

因此,如果您使用pip3 install命令,请确保它在隔离的虚拟开发环境中运行,例如virtualenvsudo apt install python3-virtualenv)或Python3内置(-m venv)或在每个用户级别运行(--userpip选项,在Ubuntu提供的默认选项中自2016年以来一直是pip),但不是系统范围的(从来没有sudo pip3!),因为pip会干扰 APT软件包管理器的操作,并且在意外更改系统使用的python模块时可能会影响Ubuntu OS 组件。祝好运!


P. S. 以上都是针对“理想”解决方案的(Debian / Ubuntu方式)。

如果您仍然想独占使用新的pip3 v10,则有3种快速解决方法:

  • 只需打开一个新的bash会话(一个新的终端选项卡,或键入bash)-pip3 v10可用(请参阅参考资料pip3 -V)。debian的pip3 v8仍然安装但已损坏;要么
  • $ hash -d pip3 && pip3 -V 用于刷新$ PATH中的pip3路径名的命令。debian的pip3 v8仍然安装但已损坏;要么
  • 该命令$ sudo apt remove python3-pip && hash -d pip3用于完全卸载debian的pip3 v8,以支持新的pip3 v10。

注意:--user除非您处于virtualenv中,否则您将始终需要将标记添加到任何非debian提供的pip中!(~/.local/自2016年起,它将python软件包部署到,默认为debian / ubuntu提供的python3-pip和python-pip)。Ubuntu / Debian并不真正支持您在virtualenv之外使用系统范围内的pip 10。永不sudo pip3

更多详细信息:
https : //github.com/pypa/pip/issues/5221#issuecomment-382069604
https://github.com/pypa/pip/issues/5240#issuecomment-381673100

For Ubuntu family, Debian, Linux Mint users

Thanks to Anthony’s explanation above, you can retain your original system pip (in /usr/bin/ and dist-packages/) and remove the manually-installed pip (in ~/.local/) to resolve the conflict:

$ python3 -m pip uninstall pip

Ubuntu/Debian pip v8.1.1 (16.04) from python3-pip debian package (see$ pip3 -V) shows the same search results as the latest pip v10.0.1, and installs latest modules from PyPI just fine. It has a working pip command (already in the $PATH), plus the nice --user option patched-in by default since 2016. Looking at pip release notes, the newer versions are mostly about use-case specific bug fixes and certain new features, so not everyone has to rush upgrading pip just yet. And the new pip 10 can be deployed to Python virtualenvs, anyway.

But regardless of pips, your OS allows to quickly install common Python modules (including numpy) with APT, without the need for pip, for example:
$ sudo apt install python3-numpy python3-scipy (with system dependencies)
$ sudo apt install python3-pip (Debian-patched pip, slightly older but it doesn’t matter)

Quick apt syntax reminder (please see man apt for details):
$ sudo apt update (to resync Ubuntu package index files from up-to-date sources)
$ apt search <python-package-name> (full text-search on all available packages)
$ apt show <python-package-name> (displays the detailed package description)
$ sudo apt install <python-package-name>

Package names prefixed with python- are for Python 2; and prefixed with python3- are for Python 3 (e.g. python3-pandas). There are thousands, and they undergo integration testing within Debian and Ubuntu. Unless you seek to install at per-user level (pip install --user option) or within virtualenv/venv, apt could be what you needed. These system packages are accessible from virtual envs too, as virtualenv will gracefully fall back to using system libs on import if your envs don’t have given copies of modules. Your custom-installed (with pip --user) per-user modules in ~/.local/lib will override them too.

Note, since this is a system-wide installation, you’d rarely need to remove them (need to be mindful about OS dependencies). This is convenient for packages with many system dependencies (such as with scipy or matplotlib), as APT will keep track and provide all required system libs and C extensions, while with pip you have no such guarantees.

In fact, for system-wide Python packages (in contrast to per-user, home dir level, or lower), Ubuntu expects using the APT package manager (rather than sudo pip) to avoid breaking OS: sudo pip3 targets the very same /usr/lib/python3/dist-packages directory where APT stores OS-sensitive modules. Recent Debian/Ubuntu releases depend heavily on Python 3, so its pre-installed modules are managed by apt and shouldn’t be changed.

So if you use pip3 install command, please ensure that it runs in an isolated virtual dev environment, such as with virtualenv (sudo apt install python3-virtualenv), or with Python3 built-in (-m venv), or at a per-user level (--user pip option, default in Ubuntu-provided pip since 2016), but not system-wide (never sudo pip3!), because pip interferes with the operation of the APT package manager and may affect Ubuntu OS components when a system-used python module is unexpectedly changed. Good luck!


P. S. All the above is for the ‘ideal’ solution (Debian/Ubuntu way).

If you still want to use the new pip3 v10 exclusively, there are 3 quick workarounds:

  • simply open a new bash session (a new terminal tab, or type bash) – and pip3 v10 becomes available (see pip3 -V). debian’s pip3 v8 remains installed but is broken; or
  • the command $ hash -d pip3 && pip3 -V to refresh pip3 pathname in the $PATH. debian’s pip3 v8 remains installed but is broken; or
  • the command $ sudo apt remove python3-pip && hash -d pip3 to uninstall debian’s pip3 v8 completely, in favor of your new pip3 v10.

Note: You will always need to add --user flag to any non-debian-provided pip, unless you are in a virtualenv! (it deploys python packages to ~/.local/, default in debian/ubuntu-provided python3-pip and python-pip since 2016). Your use of pip 10 system-wide, outside of virtualenv, is not really supported by Ubuntu/Debian. Never sudo pip3!

Further details:
https://github.com/pypa/pip/issues/5221#issuecomment-382069604
https://github.com/pypa/pip/issues/5240#issuecomment-381673100


回答 3

仅一步解决。

我也曾经遇到过这个问题,但是可以简单地通过1条命令解决它,而不会打扰和浪费时间,而且我已经在多个系统上进行了尝试,这是解决此问题的最干净的方法。那就是:

对于python3:- sudo python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall

这样,您可以简单地使用安装软件包pip3。检查使用pip3 --version

对于旧版本,请使用:sudo python -m pip uninstall pip && sudo apt install python-pip --reinstall

这样,您现在可以使用来简单地安装软件包pip。检查使用pip --version

resolved in one step only.

I too faced this issue, But this can be resolved simply by 1 command without bothering around and wasting time and i have tried it on multiple systems it’s the cleanest solution for this issue. And that’s:

For python3:- sudo python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall.

By this , you can simply install packages using pip3. to check use pip3 --version.

For older versions, use : sudo python -m pip uninstall pip && sudo apt install python-pip --reinstall.

By this, now you can simply install packages using pip. to check use pip --version.


回答 4

使用python -m pip install代替pip install

例:

python -m pip install --user somepackage
python3 -m pip install --user somepackage

pip(相应地pip3)执行是由你的发行版提供的(python-pip在Ubuntu 16.04封装)和位于/usr/bin/pip

因此,pip在升级pip时,它不会与软件包本身保持最新状态,并且可能会损坏。

如果您只是python -m pip直接使用,例如:

python -m pip install --user somepackage
python3 -m pip install --user somepackage

它会通过您的Python路径,找到最新版本的pip并执行该文件。

它依赖于这样的事实,即文件可以通过来执行import,但这是一种非常标准的接口类型,因此比骇客的Debian脚本更不可能被破坏。

然后,我建议将以下别名添加到您的.bashrc

pip() ( python -m pip "$@" )
pip3() ( python3 -m pip "$@" )

Ubuntu 18.04 /usr/bin/pip3文件执行以下操作:

from pip import main

大概mainpip在某个时候被破坏了。

中断的pip提交似乎是:95bcf8c5f6394298035a7332c441868f3b0169f4“将所有内部API移至pip._internal”已进入pip 18.0。

pip39.0.1升级到18.0 后,在Ubuntu 16.04中进行了测试。

pyenv

但是,最终,对于认真的Python开发,我只建议您使用pyenv + virtualenv安装自己的本地Python,这也可以解决以下Ubuntu错误:https ://askubuntu.com/questions/682869/how-do-i- 安装一个不同的python-version-using-apt-get / 1195153#1195153

Use python -m pip install instead of pip install

Example:

python -m pip install --user somepackage
python3 -m pip install --user somepackage

The pip (resp. pip3) executable is provided by your distro (python-pip package on Ubuntu 16.04) and located at /usr/bin/pip.

Therefore, it is not kept up-to date with the pip package itself as you upgrade pip, and may break.

If you just use python -m pip directly, e.g. as in:

python -m pip install --user somepackage
python3 -m pip install --user somepackage

it goes through your Python path, finds the latest version of pip and executes that file.

It relies on the fact that file is executable through import, but that is a very standard type of interface, and therefore less likely to break than the hackier Debian script.

Then I recommend adding the following aliases to your .bashrc:

pip() ( python -m pip "$@" )
pip3() ( python3 -m pip "$@" )

The Ubuntu 18.04 /usr/bin/pip3 file does:

from pip import main

and presumably main was removed from pip at some point which is what broke things.

The breaking pip commit appears to be: 95bcf8c5f6394298035a7332c441868f3b0169f4 “Move all internal APIs to pip._internal” which went into pip 18.0.

Tested in Ubuntu 16.04 after an update from pip3 9.0.1 to 18.0.

pyenv

Ultimately however, for serious Python development I would just recommend that you install your own local Python with pyenv + virtualenv, which would also get around this Ubuntu bug: https://askubuntu.com/questions/682869/how-do-i-install-a-different-python-version-using-apt-get/1195153#1195153


回答 5

您可以通过重新安装pip解决此问题。

使用以下命令行命令之一重新安装pip:

Python2:

python -m pip uninstall pip && sudo apt install python-pip --reinstall

Python3:

 python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall

You can resolve this issue by reinstalling pip.

Use one of the following command line commands to reinstall pip:

Python2:

python -m pip uninstall pip && sudo apt install python-pip --reinstall

Python3:

 python3 -m pip uninstall pip && sudo apt install python3-pip --reinstall

回答 6

检查pip是否已缓存在另一路径上,为此,请调用$ which pip并检查该路径是否与错误中提示的路径不同(如果是这种情况):

$ hash -r

清除缓存后,pip将再次起作用。参考:http : //cheng.logdown.com/posts/2015/06/14/-usr-bin-pip-no-such-file-or-directory

Check if pip has been cached on another path, to do so, call $ which pip and check that the path is different from the one prompted in the error, if that’s the case run:

$ hash -r

When the cache is clear, pip will be working again. reference: http://cheng.logdown.com/posts/2015/06/14/-usr-bin-pip-no-such-file-or-directory


回答 7

我在有sudo apt但没有sudo pip的系统上运行。(并且没有su访问权限。)我按照pip的建议进入了同样的情况:

您正在使用pip版本8.1.1,但是18.0可用。您应该考虑通过“ pip install –upgrade pip”命令进行升级。

没有其他修复程序对我有用,因为我没有足够的管理员权限。但是,通过阅读以下内容,我有些不安:

  • 我不应该这样做 当然,点子告诉我。撒谎了
  • 使用–user通过专注于仅用户目录解决了许多问题。

因此,我发现此命令行可以将我恢复到原来的状态。如果您使用的版本与8.1.1不同,则显然需要更改该行的该部分。

python -m pip install --force-reinstall pip==8.1.1 --user

那是唯一对我有用的东西,但是效果很好!

I’m running on a system where I have sudo apt but no sudo pip. (And no su access.) I got myself into this same situation by following the advice from pip:

You are using pip version 8.1.1, however 18.0 is available. You should consider upgrading via the ‘pip install –upgrade pip’ command.

None of the other fixes worked for me, because I don’t have enough admin privileges. However, a few things stuck with me from reading up on this:

  • I shouldn’t have done this. Sure, pip told me to. It lied.
  • Using –user solves a lot of issues by focusing on the user-only directory.

So, I found this command line to work to revert me back to where I was. If you were using a different version than 8.1.1, you will obviously want to change that part of the line.

python -m pip install --force-reinstall pip==8.1.1 --user

That’s the only thing that worked for me, but it worked perfectly!


回答 8

使用python3 -m pip install --user pip==9.0.1(或可用的版本)进行恢复

Recover with python3 -m pip install --user pip==9.0.1 (or the version that worked)


回答 9

使用新的LXC(strech)在Pixelbook上发生了同样的事情。此解决方案与公认的解决方案非常相似,只是有一个细微的区别,即我固定了pip3。

sudo python3 -m pip install --upgrade pip

颠覆了版本,现在可以正常工作了。

我在这里找到了它。Python.org:确保pip是最新的

Same thing happened to me on Pixelbook using the new LXC (strech). This solution is very similar to the accepted one, with one subtle difference, whiched fixed pip3 for me.

sudo python3 -m pip install --upgrade pip

That bumped the version, and now it works as expected.

I found it here … Python.org: Ensure pip is up-to-date


回答 10

我在Ubuntu 16.04系统上遇到了相同的问题。我设法通过使用以下命令重新安装pip来修复它:

curl https://bootstrap.pypa.io/get-pip.py | sudo python3

I met the same problem on my Ubuntu 16.04 system. I managed to fix it by re-installing pip with the following command:

curl https://bootstrap.pypa.io/get-pip.py | sudo python3


回答 11

上面的命令对我不起作用,但这些命令非常有用:

sudo apt purge python3-pip
sudo rm -rf '/usr/lib/python3/dist-packages/pip'  
sudo apt install python3-pip
cd
cd .local/lib/python3/site-packages
sudo rm -rf pip*  
cd
cd .local/lib/python3.5/site-packages
sudo rm -rf pip*  
sudo pip3 install jupyter

The commands above didn’t work for me but those were very helpful:

sudo apt purge python3-pip
sudo rm -rf '/usr/lib/python3/dist-packages/pip'  
sudo apt install python3-pip
cd
cd .local/lib/python3/site-packages
sudo rm -rf pip*  
cd
cd .local/lib/python3.5/site-packages
sudo rm -rf pip*  
sudo pip3 install jupyter

回答 12

在ubuntu 18.04.1 Bionic Beaver中,您需要注销并重新登录(无需重新启动)以获得正确的环境。

$ sudo apt install python-pip

$ pip --version
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)

$ pip install --upgrade pip

$ pip --version
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError: cannot import name main

$ exit
<login>

$ pip --version
pip 18.1 from /home/test/.local/lib/python2.7/site-packages/pip (python 2.7)

In ubuntu 18.04.1 Bionic Beaver, you need to log out and log back in (restart not necessary) to get the proper environment.

$ sudo apt install python-pip

$ pip --version
pip 9.0.1 from /usr/lib/python2.7/dist-packages (python 2.7)

$ pip install --upgrade pip

$ pip --version
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError: cannot import name main

$ exit
<login>

$ pip --version
pip 18.1 from /home/test/.local/lib/python2.7/site-packages/pip (python 2.7)

回答 13

我用 sudo apt remove python3-pip 然后pip工作。

 ~ sudo pip install pip --upgrade
[sudo] password for sen: 
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError: cannot import name 'main'
  ~ sudo apt remove python3-pip   
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libexpat1-dev libpython3-dev libpython3.5-dev python-pip-whl python3-dev python3-wheel
  python3.5-dev
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  python3-pip
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 569 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 215769 files and directories currently installed.)
Removing python3-pip (8.1.1-2ubuntu0.4) ...
Processing triggers for man-db (2.7.5-1) ...
  ~ pip

Usage:   
  pip <command> [options]

I use sudo apt remove python3-pip then pip works.

 ~ sudo pip install pip --upgrade
[sudo] password for sen: 
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError: cannot import name 'main'
➜  ~ sudo apt remove python3-pip   
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following packages were automatically installed and are no longer required:
  libexpat1-dev libpython3-dev libpython3.5-dev python-pip-whl python3-dev python3-wheel
  python3.5-dev
Use 'sudo apt autoremove' to remove them.
The following packages will be REMOVED:
  python3-pip
0 upgraded, 0 newly installed, 1 to remove and 0 not upgraded.
After this operation, 569 kB disk space will be freed.
Do you want to continue? [Y/n] y
(Reading database ... 215769 files and directories currently installed.)
Removing python3-pip (8.1.1-2ubuntu0.4) ...
Processing triggers for man-db (2.7.5-1) ...
➜  ~ pip

Usage:   
  pip <command> [options]

回答 14

对于Python 2.7版,@ Anthony解决方案可以完美实现,方法是将python3更改为python,如下所示:

sudo python -m pip uninstall pip && sudo apt install python-pip --reinstall

For Python version 2.7 @Anthony solution works perfect, by changing python3 to python as follows:

sudo python -m pip uninstall pip && sudo apt install python-pip --reinstall

回答 15

对我来说使用修复错误的原因pip3是:

sudo cp -v /usr/local/bin/pip3 /usr/bin/pip3

一切正常:

 demon@UbuntuHP:~$ pip -V
 pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)

 demon@UbuntuHP:~$ pip2 -V
 pip 10.0.1 from /home/demon/.local/lib/python2.7/site-packages/pip (python 2.7)

 demon@UbuntuHP:~$ pip3 -V
 pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)

也许新的10.0.1版本的pip不会更新/ usr / bin中的二进制文件?(似乎没有)

编辑:在Ubuntu 18.04中会发生相同的问题。我发现的最佳解决方案是将pip二进制文件从符号链接/home/<user/.local/bin/usr/local/bin/usr/bin(取决于您的偏好),如下所示:

ln -sv /home/<user>/.local/bin/pip /usr/local/bin/pip
ln -sv /home/<user>/.local/bin/pip2 /usr/local/bin/pip2
ln -sv /home/<user>/.local/bin/pip2.7 /usr/local/bin/pip2.7
ln -sv /home/<user>/.local/bin/pip3 /usr/local/bin/pip3
ln -sv /home/<user>/.local/bin/pip3.6 /usr/local/bin/pip3.6

注意:替换 <user> 为当前运行的用户

关联的版本(最新)位于:

版本3.6:

/home/demon/.local/lib/python3.6/site-packages/pip(python 3.6)

2.7版:

/home/demon/.local/lib/python2.7/site-packages/pip(python 2.7)

What worked for me to fix the error with using pip3 was:

sudo cp -v /usr/local/bin/pip3 /usr/bin/pip3

Everything works:

 demon@UbuntuHP:~$ pip -V
 pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)

 demon@UbuntuHP:~$ pip2 -V
 pip 10.0.1 from /home/demon/.local/lib/python2.7/site-packages/pip (python 2.7)

 demon@UbuntuHP:~$ pip3 -V
 pip 10.0.1 from /usr/local/lib/python3.5/dist-packages/pip (python 3.5)

Maybe the new 10.0.1 version of pip doesn’t update the binary in /usr/bin ? (which seems it does not)

EDIT: the same issue occurs in Ubuntu 18.04. The best solution I’ve found is to symlink the pip binaries from /home/<user/.local/bin to /usr/local/bin or /usr/bin (depending on your preference), as follows:

ln -sv /home/<user>/.local/bin/pip /usr/local/bin/pip
ln -sv /home/<user>/.local/bin/pip2 /usr/local/bin/pip2
ln -sv /home/<user>/.local/bin/pip2.7 /usr/local/bin/pip2.7
ln -sv /home/<user>/.local/bin/pip3 /usr/local/bin/pip3
ln -sv /home/<user>/.local/bin/pip3.6 /usr/local/bin/pip3.6

NOTE: replace <user> with your current running user

The associated versions (latest) are in:

Version 3.6:

/home/demon/.local/lib/python3.6/site-packages/pip (python 3.6)

Version 2.7:

/home/demon/.local/lib/python2.7/site-packages/pip (python 2.7)


回答 16

绝招

须藤-H pip install lxml

Trick and works too

sudo -H pip install lxml


回答 17

我也遇到了同样的错误,但python -m pip仍在工作,因此我使用了核选项解决了该问题sudo python -m pip install --upgrade pip。它为我做到了。

I had this same error, but python -m pip was still working, so I fixed it with the nuclear option sudo python -m pip install --upgrade pip. It did it for me.


回答 18

对于它的价值,我遇到了pip(不是pip2pip3)问题:

$ pip -V
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError: cannot import name main

$ pip2 -V
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)

$ pip3 -V
pip 8.1.1 from /usr/lib/python3/dist-packages (python 3.5)

不知何故(我不记得如何),我在~/.local目录中安装了python东西。从那里删除pip目录后,pip再次开始工作。

$ rm -rf /home/precor/.local/lib/python2.7/site-packages/pip
$ pip -V
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)

For what it’s worth, I had the problem with pip (not pip2 or pip3):

$ pip -V
Traceback (most recent call last):
  File "/usr/bin/pip", line 9, in <module>
    from pip import main
ImportError: cannot import name main

$ pip2 -V
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)

$ pip3 -V
pip 8.1.1 from /usr/lib/python3/dist-packages (python 3.5)

Somehow (I can’t remember how) I had python stuff installed in my ~/.local directory. After I removed the pip directory from there, pip started working again.

$ rm -rf /home/precor/.local/lib/python2.7/site-packages/pip
$ pip -V
pip 8.1.1 from /usr/lib/python2.7/dist-packages (python 2.7)

回答 19

软件包有问题,当它生成文件/ usr / bin / pip时,必须更改导入:

from pip import main

from pip._internal import main

这就解决了问题,我不确定它为什么产生,但是在以下问题中说得很对:

在pyenv上进行pip 10升级后,“导入错误:无法导入名称’main’”

Is something wrong with the packages, when it generating de file /usr/bin/pip, you have to change the import:

from pip import main

to

from pip._internal import main

That solves the problem, I’m not sure why it generated, but it saids somthing in the following issue:

After pip 10 upgrade on pyenv “ImportError: cannot import name ‘main'”


回答 20

您可以尝试以下方法:

sudo ln -sf $( type -P pip ) /usr/bin/pip

You can try this:

sudo ln -sf $( type -P pip ) /usr/bin/pip

回答 21

当我想将系统pip pip3从9.0.1 升级到19.2.3 时,我也遇到了这个问题。

运行后pip3 install --upgrade pippip版本变为19.2.3。但main()已移至pip._internal最新版本,但已pip3损坏。

因此,在文件中/usr/bin/pip3,替换line 9from pip import mainfrom pip._internal import main。该问题将得到解决,适用于python2-pip。(在Ubuntu 18.04发行版上测试)

根据@Vincent H.的回答

I also run into this problem when I wanted to upgrade system pip pip3 from 9.0.1 to 19.2.3.

After running pip3 install --upgrade pip, pip version becomes 19.2.3. But main() has been moved in pip._internal in the latest version, which leaves pip3 broken.

So in file /usr/bin/pip3, replace line 9: from pip import main with from pip._internal import main. The issue will be fixed, works the same for python2-pip. (Tested on Ubuntu 18.04 distribution)

According to @Vincent H.’s answer


回答 22

请运行以下命令进行修复。运行后python3 -m pip install --upgrade pip,请运行以下命令。

hash -r pip

资料来源:https : //github.com/pypa/pip/issues/5221

Please run the following commands to do the fix. After running python3 -m pip install --upgrade pip, please run the following command.

hash -r pip

Source: https://github.com/pypa/pip/issues/5221


回答 23

您可以简单地使用以下方法修复pip和pip3路径 update-alternatives

您应该检查的第一件事是当前$PATH 运行情况,echo $PATH然后您可以找到/usr/local/binpip3和pip通常在哪里

有一个变化,您的系统正在这里寻找/bin/pip/bin/pip3 所以我会说通过添加到您的~/.bash_profile文件中使其持久存在

export PATH=$PATH:/usr/local/bin 然后用which pip和检查它是否固定which pip3

如果没有update-alternatives,最后用它来修复

update-alternatives --install /bin/pip3 pip3 /usr/local/bin/pip3 30

如果您想将pip指向pip3,则

update-alternatives --install /bin/pip pip /usr/local/bin/pip3 30

you can simply fix the pip and pip3 paths using update-alternatives

first thing you should check is your current $PATH run echo $PATH and see is you can find /usr/local/bin which is where pip3 and pip usually are

there is a change your system is looking here /bin/pip and /bin/pip3 so i will say fix the PATH by adding to your ~/.bash_profile file so it persists

export PATH=$PATH:/usr/local/bin and then check is its fixed with which pip and which pip3

if not then use update-alternatives to fix it finally

update-alternatives --install /bin/pip3 pip3 /usr/local/bin/pip3 30

and if you want to point pip to pip3 then

update-alternatives --install /bin/pip pip /usr/local/bin/pip3 30

回答 24

这对我有用!

hash -r pip # or hash -d pip

现在,卸载pip安装的版本,然后使用以下命令将其重新安装。

python -m pip uninstall pip  # sudo
sudo apt install --reinstall python-pip

如果pip损坏,请使用:

python -m pip install --force-reinstall pip

希望能帮助到你!

This Worked for me !

hash -r pip # or hash -d pip

Now, uninstall the pip installed version and reinstall it using the following commands.

python -m pip uninstall pip  # sudo
sudo apt install --reinstall python-pip

If pip is broken, use:

python -m pip install --force-reinstall pip

Hope it helps!


回答 25

从pip._internal导入main

from pip._internal import main

编辑来自的点子代码

sudo nano /usr/bin/pip3

import main from pip._internal

from pip._internal import main

Edit the pip code from

sudo nano /usr/bin/pip3

回答 26

正如@cryptoboy所说的-检查您安装了什么pip / python版本

 demon@UbuntuHP:~$ pip -V
 demon@UbuntuHP:~$ pip2 -V
 demon@UbuntuHP:~$ pip3 -V

然后在.local / lib /文件夹中检查不需要的库。

当我迁移到较新的Kubuntu时,我做了设置的备份,并且在主目录中有.local / lib / python2.7 /文件夹。安装了python 3.6。我刚刚删除了旧文件夹,现在一切正常!

As @cryptoboy said – check what pip/python version you have installed

 demon@UbuntuHP:~$ pip -V
 demon@UbuntuHP:~$ pip2 -V
 demon@UbuntuHP:~$ pip3 -V

and then check for no-needed libraries in your .local/lib/ folder.

I did backup of settings when I was migrating to newer Kubuntu and in had .local/lib/python2.7/ folder in my home directory. Installed python 3.6. I just removed the old folder and now everything works great!


回答 27

在Debian上,您需要先更新apt。

sudo apt-get update -qq
sudo apt-get install python-pip -qq
sudo pip install pip --upgrade --quiet
sudo pip2 install virtualenv --quiet

如果您跳过“ sudo apt-get update -qq”,则您的点会损坏并显示“找不到主要”错误。

On Debian you will need to update apt first….

sudo apt-get update -qq
sudo apt-get install python-pip -qq
sudo pip install pip --upgrade --quiet
sudo pip2 install virtualenv --quiet

If you skip ‘sudo apt-get update -qq’ your pip will become corrupt and display the ‘cannot find main’ error.


回答 28

此错误可能是权限错误。因此,测试使用-H标志执行命令:

sudo -H pip3 install numpy

This error may be a permission one. So, test executing the command with -H flag:

sudo -H pip3 install numpy

回答 29

在执行任何pip命令之前使用以下命令

hash -d pip

会工作的

Use the following command before the execution of any pip command

hash -d pip

It will work


是否在Python 3.6+中订购了字典?

问题:是否在Python 3.6+中订购了字典?

与以前的版本不同,字典在Python 3.6中排序(至少在CPython实现下)。这似乎是一个重大更改,但只是文档中的一小段。它被描述为CPython实现细节而不是语言功能,但这也意味着将来可能会成为标准。

在保留元素顺序的同时,新的字典实现如何比旧的实现更好?

以下是文档中的文字:

dict()现在使用PyPy率先提出的“紧凑”表示形式。与Python 3.5相比,新dict()的内存使用量减少了20%至25%。PEP 468(在函数中保留** kwarg的顺序。)由此实现。此新实现的顺序保留方面被认为是实现细节,因此不应依赖(将来可能会更改,但是希望在更改语言规范之前,先在几个发行版中使用该新dict实现该语言,为所有当前和将来的Python实现强制要求保留顺序的语义;这还有助于保留与仍旧有效的随机迭代顺序的旧版本语言(例如Python 3.5)的向后兼容性。(由INADA Naoki在发行27350最初由Raymond Hettinger提出的想法。)

2017年12月更新:Python 3.7 保证dict保留插入顺序

Dictionaries are ordered in Python 3.6 (under the CPython implementation at least) unlike in previous incarnations. This seems like a substantial change, but it’s only a short paragraph in the documentation. It is described as a CPython implementation detail rather than a language feature, but also implies this may become standard in the future.

How does the new dictionary implementation perform better than the older one while preserving element order?

Here is the text from the documentation:

dict() now uses a “compact” representation pioneered by PyPy. The memory usage of the new dict() is between 20% and 25% smaller compared to Python 3.5. PEP 468 (Preserving the order of **kwargs in a function.) is implemented by this. The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon (this may change in the future, but it is desired to have this new dict implementation in the language for a few releases before changing the language spec to mandate order-preserving semantics for all current and future Python implementations; this also helps preserve backwards-compatibility with older versions of the language where random iteration order is still in effect, e.g. Python 3.5). (Contributed by INADA Naoki in issue 27350. Idea originally suggested by Raymond Hettinger.)

Update December 2017: dicts retaining insertion order is guaranteed for Python 3.7


回答 0

是否在Python 3.6+中订购了字典?

它们是插入顺序[1]。从Python 3.6开始,对于Python的CPython实现,字典会记住插入项目的顺序这在Python 3.6中被视为实现细节;你需要使用OrderedDict,如果你想多数民众赞成插入排序保证不同的Python的其它实现(与其他有序行为[1] )。

从Python 3.7开始,它不再是实现细节,而是成为一种语言功能。从GvR的py​​thon-dev消息中

做到这一点。裁定“裁定保留插入顺序”。谢谢!

这只是意味着您可以依靠它。如果其他Python实现希望成为Python 3.7的一致实现,则还必须提供插入顺序字典。


在保留元素顺序的同时,Python 3.6字典实现如何比旧的实现更好的性能[2]

本质上,通过保留两个数组

  • 第一个数组,按插入顺序dk_entries保存字典的条目(类型PyDictKeyEntry)。保留顺序是通过仅附加数组来实现的,在该数组中始终在末尾插入新项(插入顺序)。

  • 第二个dk_indices保留dk_entries数组的索引(即,指示中相应条目位置的值dk_entries)。该数组充当哈希表。对键进行哈希处理时,它会导致存储在其中的索引之一,dk_indices并且通过indexing获取相应的条目dk_entries。由于只有索引被保留,此数组的类型取决于字典的整体大小(范围从类型int8_t1字节)到int32_t/ int64_t4/ 8字节)上32/ 64位构建)

在以前的实现中,必须分配类型PyDictKeyEntry和大小的稀疏数组dk_size。不幸的是,由于性能原因,该阵列不允许2/3 * dk_size满载,这也导致了很多空白。(并且空白区域具有大小!)。PyDictKeyEntry

现在不是这种情况,因为仅存储了必需的条目(已插入的条目),并且保留了一个稀疏类型的数组intX_tX取决于dict的大小)2/3 * dk_size。空格从类型更改PyDictKeyEntryintX_t

因此,显然,创建一个类型PyDictKeyEntry稀疏的数组比存储ints 的稀疏数组需要更多的内存。

如果有兴趣,可以在Python-Dev上查看有关此功能的完整对话,这是一本好书。


在Raymond Hettinger提出的原始建议中,可以看到使用的数据结构的可视化效果,该可视化体现了该思想的要旨。

例如,字典:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

当前存储为[keyhash,key,value]:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

相反,数据应按以下方式组织:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

正如您现在可以从视觉上看到的那样,在原始建议中,很多空间实际上是空的,以减少冲突并加快查找速度。使用新方法,可以通过将稀疏移动到真正需要的索引中来减少所需的内存。


[1]:我说“插入有序”而不是“有序”,因为在存在OrderedDict的情况下,“有序”暗示了dict对象不提供的其他行为。OrderedDicts是可逆的,提供顺序敏感的方法,并且主要是,提供一个订单sensive相等测试(==!=)。dict目前不提供任何这些行为/方法。


[2]:新的字典实现通过更紧凑的设计而在内存方面表现更好;这是这里的主要好处。在速度方面,差异并不那么明显,在某些地方,新的dict可能会引入轻微的回归(例如,关键查找),而在其他地方(会想到迭代和调整大小),应该会提高性能。

总体而言,由于引入的紧凑性,字典的性能(尤其是在现实生活中)得以提高。

Are dictionaries ordered in Python 3.6+?

They are insertion ordered[1]. As of Python 3.6, for the CPython implementation of Python, dictionaries remember the order of items inserted. This is considered an implementation detail in Python 3.6; you need to use OrderedDict if you want insertion ordering that’s guaranteed across other implementations of Python (and other ordered behavior[1]).

As of Python 3.7, this is no longer an implementation detail and instead becomes a language feature. From a python-dev message by GvR:

Make it so. “Dict keeps insertion order” is the ruling. Thanks!

This simply means that you can depend on it. Other implementations of Python must also offer an insertion ordered dictionary if they wish to be a conforming implementation of Python 3.7.


How does the Python 3.6 dictionary implementation perform better[2] than the older one while preserving element order?

Essentially, by keeping two arrays.

  • The first array, dk_entries, holds the entries (of type PyDictKeyEntry) for the dictionary in the order that they were inserted. Preserving order is achieved by this being an append only array where new items are always inserted at the end (insertion order).

  • The second, dk_indices, holds the indices for the dk_entries array (that is, values that indicate the position of the corresponding entry in dk_entries). This array acts as the hash table. When a key is hashed it leads to one of the indices stored in dk_indices and the corresponding entry is fetched by indexing dk_entries. Since only indices are kept, the type of this array depends on the overall size of the dictionary (ranging from type int8_t(1 byte) to int32_t/int64_t (4/8 bytes) on 32/64 bit builds)

In the previous implementation, a sparse array of type PyDictKeyEntry and size dk_size had to be allocated; unfortunately, it also resulted in a lot of empty space since that array was not allowed to be more than 2/3 * dk_size full for performance reasons. (and the empty space still had PyDictKeyEntry size!).

This is not the case now since only the required entries are stored (those that have been inserted) and a sparse array of type intX_t (X depending on dict size) 2/3 * dk_sizes full is kept. The empty space changed from type PyDictKeyEntry to intX_t.

So, obviously, creating a sparse array of type PyDictKeyEntry is much more memory demanding than a sparse array for storing ints.

You can see the full conversation on Python-Dev regarding this feature if interested, it is a good read.


In the original proposal made by Raymond Hettinger, a visualization of the data structures used can be seen which captures the gist of the idea.

For example, the dictionary:

d = {'timmy': 'red', 'barry': 'green', 'guido': 'blue'}

is currently stored as [keyhash, key, value]:

entries = [['--', '--', '--'],
           [-8522787127447073495, 'barry', 'green'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           ['--', '--', '--'],
           [-9092791511155847987, 'timmy', 'red'],
           ['--', '--', '--'],
           [-6480567542315338377, 'guido', 'blue']]

Instead, the data should be organized as follows:

indices =  [None, 1, None, None, None, 0, None, 2]
entries =  [[-9092791511155847987, 'timmy', 'red'],
            [-8522787127447073495, 'barry', 'green'],
            [-6480567542315338377, 'guido', 'blue']]

As you can visually now see, in the original proposal, a lot of space is essentially empty to reduce collisions and make look-ups faster. With the new approach, you reduce the memory required by moving the sparseness where it’s really required, in the indices.


[1]: I say “insertion ordered” and not “ordered” since, with the existence of OrderedDict, “ordered” suggests further behavior that the dict object doesn’t provide. OrderedDicts are reversible, provide order sensitive methods and, mainly, provide an order-sensive equality tests (==, !=). dicts currently don’t offer any of those behaviors/methods.


[2]: The new dictionary implementations performs better memory wise by being designed more compactly; that’s the main benefit here. Speed wise, the difference isn’t so drastic, there’s places where the new dict might introduce slight regressions (key-lookups, for example) while in others (iteration and resizing come to mind) a performance boost should be present.

Overall, the performance of the dictionary, especially in real-life situations, improves due to the compactness introduced.


回答 1

以下是回答最初的第一个问题:

我应该在Python 3.6中使用dict还是OrderedDict在Python 3.6中使用?

我认为文档中的这句话实际上足以回答您的问题

此新实现的顺序保留方面被视为实现细节,不应依赖于此

dict并不明确表示它是有序集合,因此,如果您要保持一致并且不依赖于新实现的副作用,则应坚持使用OrderedDict

使您的代码成为未来的证明:)

有关于辩论在这里

编辑:Python 3.7将保留此功能, 请参阅

Below is answering the original first question:

Should I use dict or OrderedDict in Python 3.6?

I think this sentence from the documentation is actually enough to answer your question

The order-preserving aspect of this new implementation is considered an implementation detail and should not be relied upon

dict is not explicitly meant to be an ordered collection, so if you want to stay consistent and not rely on a side effect of the new implementation you should stick with OrderedDict.

Make your code future proof :)

There’s a debate about that here.

EDIT: Python 3.7 will keep this as a feature see


回答 2

更新:Guido van Rossum 在邮件列表宣布,从 Python 3.7开始dict,所有Python实现中必须保留插入顺序。

Update: Guido van Rossum announced on the mailing list that as of Python 3.7 dicts in all Python implementations must preserve insertion order.


回答 3

我想添加到上面的讨论中,但没有评论的声誉。

Python 3.8尚未发布,但它甚至将包含reversed()字典上的函数(消除了的另一个区别OrderedDict。)。

现在可以使用reversed()以反向插入顺序迭代Dict和dictviews。(由RémiLapeyre在bpo-33462中贡献。) 查看python 3.8的新增功能

我没有提到相等运算符或的其他功能,OrderedDict因此它们仍然不完全相同。

I wanted to add to the discussion above but don’t have the reputation to comment.

Python 3.8 is not quite released yet, but it will even include the reversed() function on dictionaries (removing another difference from OrderedDict.

Dict and dictviews are now iterable in reversed insertion order using reversed(). (Contributed by Rémi Lapeyre in bpo-33462.) See what’s new in python 3.8

I don’t see any mention of the equality operator or other features of OrderedDict so they are still not entirely the same.


如何在不使用try / except的情况下检查字符串是否表示int?

问题:如何在不使用try / except的情况下检查字符串是否表示int?

有没有办法在不使用try / except机制的情况下判断字符串是否表示一个整数(例如'3''-17'但不是'3.14''asfasfas')?

is_int('3.14') = False
is_int('-7')   = True

Is there any way to tell whether a string represents an integer (e.g., '3', '-17' but not '3.14' or 'asfasfas') Without using a try/except mechanism?

is_int('3.14') = False
is_int('-7')   = True

回答 0

如果您真的很讨厌在try/except各处使用s,请编写一个辅助函数:

def RepresentsInt(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

>>> print RepresentsInt("+123")
True
>>> print RepresentsInt("10.0")
False

要完全覆盖Python认为整数的所有字符串,将需要更多的代码。我说这是pythonic。

If you’re really just annoyed at using try/excepts all over the place, please just write a helper function:

def RepresentsInt(s):
    try: 
        int(s)
        return True
    except ValueError:
        return False

>>> print RepresentsInt("+123")
True
>>> print RepresentsInt("10.0")
False

It’s going to be WAY more code to exactly cover all the strings that Python considers integers. I say just be pythonic on this one.


回答 1

使用正整数可以使用.isdigit

>>> '16'.isdigit()
True

它不适用于负整数。假设您可以尝试以下操作:

>>> s = '-17'
>>> s.startswith('-') and s[1:].isdigit()
True

它不适用于'16.0'格式,int在这种意义上类似于强制转换。

编辑

def check_int(s):
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()

with positive integers you could use .isdigit:

>>> '16'.isdigit()
True

it doesn’t work with negative integers though. suppose you could try the following:

>>> s = '-17'
>>> s.startswith('-') and s[1:].isdigit()
True

it won’t work with '16.0' format, which is similar to int casting in this sense.

edit:

def check_int(s):
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()

回答 2

您知道,无论出于何种原因,我都发现(并且已经对此进行了反复测试)try / except的效果并不理想。我经常尝试几种做事方法,但我认为我从来没有找到一种使用try / except来最好地完成那些测试的方法,实际上,在我看来,这些方法通常已经接近于最糟糕的,即使不是最糟糕的。并非在每种情况下,但在许多情况下。我知道很多人说这是“ Pythonic”方式,但这是我与他们分开的一个领域。对我来说,它既不是很好的表现也不是非常优雅,因此,我倾向于只将其用于错误捕获和报告。

我要抱怨的是,PHP,perl,ruby,C,甚至是怪异的shell都具有简单的功能来测试整数整数字符串,但是尽力验证这些假设使我大跌眼镜!显然,这种缺乏是一种常见的疾病。

这是布鲁诺帖子的快速而肮脏的编辑:

import sys, time, re

g_intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")

testvals = [
    # integers
    0, 1, -1, 1.0, -1.0,
    '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0', '06',
    # non-integers
    'abc 123',
    1.1, -1.1, '1.1', '-1.1', '+1.1',
    '1.1.1', '1.1.0', '1.0.1', '1.0.0',
    '1.0.', '1..0', '1..',
    '0.0.', '0..0', '0..',
    'one', object(), (1,2,3), [1,2,3], {'one':'two'},
    # with spaces
    ' 0 ', ' 0.', ' .0','.01 '
]

def isInt_try(v):
    try:     i = int(v)
    except:  return False
    return True

def isInt_str(v):
    v = str(v).strip()
    return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

def isInt_re(v):
    import re
    if not hasattr(isInt_re, 'intRegex'):
        isInt_re.intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
    return isInt_re.intRegex.match(str(v).strip()) is not None

def isInt_re2(v):
    return g_intRegex.match(str(v).strip()) is not None

def check_int(s):
    s = str(s)
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()    


def timeFunc(func, times):
    t1 = time.time()
    for n in range(times):
        for v in testvals: 
            r = func(v)
    t2 = time.time()
    return t2 - t1

def testFuncs(funcs):
    for func in funcs:
        sys.stdout.write( "\t%s\t|" % func.__name__)
    print()
    for v in testvals:
        if type(v) == type(''):
            sys.stdout.write("'%s'" % v)
        else:
            sys.stdout.write("%s" % str(v))
        for func in funcs:
            sys.stdout.write( "\t\t%s\t|" % func(v))
        sys.stdout.write("\r\n") 

if __name__ == '__main__':
    print()
    print("tests..")
    testFuncs((isInt_try, isInt_str, isInt_re, isInt_re2, check_int))
    print()

    print("timings..")
    print("isInt_try:   %6.4f" % timeFunc(isInt_try, 10000))
    print("isInt_str:   %6.4f" % timeFunc(isInt_str, 10000)) 
    print("isInt_re:    %6.4f" % timeFunc(isInt_re, 10000))
    print("isInt_re2:   %6.4f" % timeFunc(isInt_re2, 10000))
    print("check_int:   %6.4f" % timeFunc(check_int, 10000))

性能比较结果如下:

timings..
isInt_try:   0.6426
isInt_str:   0.7382
isInt_re:    1.1156
isInt_re2:   0.5344
check_int:   0.3452

AC方法可以扫描一次并完成。我认为,一次扫描字符串的AC方法将是正确的做法。

编辑:

我已经更新了上面的代码以在Python 3.5中工作,并包括了当前投票最多的答案中的check_int函数,并使用了我可以找到的当前最流行的正则表达式来测试整数罩。此正则表达式拒绝诸如“ abc 123”之类的字符串。我添加了“ abc 123”作为测试值。

在这一点上,我非常感兴趣的是要注意,没有一个测试的函数(包括try方法,流行的check_int函数和最流行的用于测试整数罩的正则表达式)会返回所有正确的答案。测试值(嗯,取决于您认为正确答案是什么;请参阅下面的测试结果)。

内置的int()函数会默默地截断浮点数的小数部分,并返回小数点前的整数部分,除非首先将浮点数转换为字符串。

check_int()函数对于0.0和1.0(在技术上是整数)之类的值返回false,对于“ 06”之类的值返回true。

以下是当前(Python 3.5)的测试结果:

                  isInt_try |       isInt_str       |       isInt_re        |       isInt_re2       |   check_int   |
    0               True    |               True    |               True    |               True    |       True    |
    1               True    |               True    |               True    |               True    |       True    |
    -1              True    |               True    |               True    |               True    |       True    |
    1.0             True    |               True    |               False   |               False   |       False   |
    -1.0            True    |               True    |               False   |               False   |       False   |
    '0'             True    |               True    |               True    |               True    |       True    |
    '0.'            False   |               True    |               False   |               False   |       False   |
    '0.0'           False   |               True    |               False   |               False   |       False   |
    '1'             True    |               True    |               True    |               True    |       True    |
    '-1'            True    |               True    |               True    |               True    |       True    |
    '+1'            True    |               True    |               True    |               True    |       True    |
    '1.0'           False   |               True    |               False   |               False   |       False   |
    '-1.0'          False   |               True    |               False   |               False   |       False   |
    '+1.0'          False   |               True    |               False   |               False   |       False   |
    '06'            True    |               True    |               False   |               False   |       True    |
    'abc 123'       False   |               False   |               False   |               False   |       False   |
    1.1             True    |               False   |               False   |               False   |       False   |
    -1.1            True    |               False   |               False   |               False   |       False   |
    '1.1'           False   |               False   |               False   |               False   |       False   |
    '-1.1'          False   |               False   |               False   |               False   |       False   |
    '+1.1'          False   |               False   |               False   |               False   |       False   |
    '1.1.1'         False   |               False   |               False   |               False   |       False   |
    '1.1.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.1'         False   |               False   |               False   |               False   |       False   |
    '1.0.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.'          False   |               False   |               False   |               False   |       False   |
    '1..0'          False   |               False   |               False   |               False   |       False   |
    '1..'           False   |               False   |               False   |               False   |       False   |
    '0.0.'          False   |               False   |               False   |               False   |       False   |
    '0..0'          False   |               False   |               False   |               False   |       False   |
    '0..'           False   |               False   |               False   |               False   |       False   |
    'one'           False   |               False   |               False   |               False   |       False   |
    <obj..>         False   |               False   |               False   |               False   |       False   |
    (1, 2, 3)       False   |               False   |               False   |               False   |       False   |
    [1, 2, 3]       False   |               False   |               False   |               False   |       False   |
    {'one': 'two'}  False   |               False   |               False   |               False   |       False   |
    ' 0 '           True    |               True    |               True    |               True    |       False   |
    ' 0.'           False   |               True    |               False   |               False   |       False   |
    ' .0'           False   |               False   |               False   |               False   |       False   |
    '.01 '          False   |               False   |               False   |               False   |       False   |

刚才我尝试添加此功能:

def isInt_float(s):
    try:
        return float(str(s)).is_integer()
    except:
        return False

它的性能几乎与check_int(0.3486)一样好,并且对于1.0和0.0以及+1.0和0和.0等值返回true。但是它对于’06’也返回true,因此。我猜,选择你的毒药。

You know, I’ve found (and I’ve tested this over and over) that try/except does not perform all that well, for whatever reason. I frequently try several ways of doing things, and I don’t think I’ve ever found a method that uses try/except to perform the best of those tested, in fact it seems to me those methods have usually come out close to the worst, if not the worst. Not in every case, but in many cases. I know a lot of people say it’s the “Pythonic” way, but that’s one area where I part ways with them. To me, it’s neither very performant nor very elegant, so, I tend to only use it for error trapping and reporting.

I was going to gripe that PHP, perl, ruby, C, and even the freaking shell have simple functions for testing a string for integer-hood, but due diligence in verifying those assumptions tripped me up! Apparently this lack is a common sickness.

Here’s a quick and dirty edit of Bruno’s post:

import sys, time, re

g_intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")

testvals = [
    # integers
    0, 1, -1, 1.0, -1.0,
    '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0', '06',
    # non-integers
    'abc 123',
    1.1, -1.1, '1.1', '-1.1', '+1.1',
    '1.1.1', '1.1.0', '1.0.1', '1.0.0',
    '1.0.', '1..0', '1..',
    '0.0.', '0..0', '0..',
    'one', object(), (1,2,3), [1,2,3], {'one':'two'},
    # with spaces
    ' 0 ', ' 0.', ' .0','.01 '
]

def isInt_try(v):
    try:     i = int(v)
    except:  return False
    return True

def isInt_str(v):
    v = str(v).strip()
    return v=='0' or (v if v.find('..') > -1 else v.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

def isInt_re(v):
    import re
    if not hasattr(isInt_re, 'intRegex'):
        isInt_re.intRegex = re.compile(r"^([+-]?[1-9]\d*|0)$")
    return isInt_re.intRegex.match(str(v).strip()) is not None

def isInt_re2(v):
    return g_intRegex.match(str(v).strip()) is not None

def check_int(s):
    s = str(s)
    if s[0] in ('-', '+'):
        return s[1:].isdigit()
    return s.isdigit()    


def timeFunc(func, times):
    t1 = time.time()
    for n in range(times):
        for v in testvals: 
            r = func(v)
    t2 = time.time()
    return t2 - t1

def testFuncs(funcs):
    for func in funcs:
        sys.stdout.write( "\t%s\t|" % func.__name__)
    print()
    for v in testvals:
        if type(v) == type(''):
            sys.stdout.write("'%s'" % v)
        else:
            sys.stdout.write("%s" % str(v))
        for func in funcs:
            sys.stdout.write( "\t\t%s\t|" % func(v))
        sys.stdout.write("\r\n") 

if __name__ == '__main__':
    print()
    print("tests..")
    testFuncs((isInt_try, isInt_str, isInt_re, isInt_re2, check_int))
    print()

    print("timings..")
    print("isInt_try:   %6.4f" % timeFunc(isInt_try, 10000))
    print("isInt_str:   %6.4f" % timeFunc(isInt_str, 10000)) 
    print("isInt_re:    %6.4f" % timeFunc(isInt_re, 10000))
    print("isInt_re2:   %6.4f" % timeFunc(isInt_re2, 10000))
    print("check_int:   %6.4f" % timeFunc(check_int, 10000))

Here are the performance comparison results:

timings..
isInt_try:   0.6426
isInt_str:   0.7382
isInt_re:    1.1156
isInt_re2:   0.5344
check_int:   0.3452

A C method could scan it Once Through, and be done. A C method that scans the string once through would be the Right Thing to do, I think.

EDIT:

I’ve updated the code above to work in Python 3.5, and to include the check_int function from the currently most voted up answer, and to use the current most popular regex that I can find for testing for integer-hood. This regex rejects strings like ‘abc 123’. I’ve added ‘abc 123’ as a test value.

It is Very Interesting to me to note, at this point, that NONE of the functions tested, including the try method, the popular check_int function, and the most popular regex for testing for integer-hood, return the correct answers for all of the test values (well, depending on what you think the correct answers are; see the test results below).

The built-in int() function silently truncates the fractional part of a floating point number and returns the integer part before the decimal, unless the floating point number is first converted to a string.

The check_int() function returns false for values like 0.0 and 1.0 (which technically are integers) and returns true for values like ’06’.

Here are the current (Python 3.5) test results:

                  isInt_try |       isInt_str       |       isInt_re        |       isInt_re2       |   check_int   |
    0               True    |               True    |               True    |               True    |       True    |
    1               True    |               True    |               True    |               True    |       True    |
    -1              True    |               True    |               True    |               True    |       True    |
    1.0             True    |               True    |               False   |               False   |       False   |
    -1.0            True    |               True    |               False   |               False   |       False   |
    '0'             True    |               True    |               True    |               True    |       True    |
    '0.'            False   |               True    |               False   |               False   |       False   |
    '0.0'           False   |               True    |               False   |               False   |       False   |
    '1'             True    |               True    |               True    |               True    |       True    |
    '-1'            True    |               True    |               True    |               True    |       True    |
    '+1'            True    |               True    |               True    |               True    |       True    |
    '1.0'           False   |               True    |               False   |               False   |       False   |
    '-1.0'          False   |               True    |               False   |               False   |       False   |
    '+1.0'          False   |               True    |               False   |               False   |       False   |
    '06'            True    |               True    |               False   |               False   |       True    |
    'abc 123'       False   |               False   |               False   |               False   |       False   |
    1.1             True    |               False   |               False   |               False   |       False   |
    -1.1            True    |               False   |               False   |               False   |       False   |
    '1.1'           False   |               False   |               False   |               False   |       False   |
    '-1.1'          False   |               False   |               False   |               False   |       False   |
    '+1.1'          False   |               False   |               False   |               False   |       False   |
    '1.1.1'         False   |               False   |               False   |               False   |       False   |
    '1.1.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.1'         False   |               False   |               False   |               False   |       False   |
    '1.0.0'         False   |               False   |               False   |               False   |       False   |
    '1.0.'          False   |               False   |               False   |               False   |       False   |
    '1..0'          False   |               False   |               False   |               False   |       False   |
    '1..'           False   |               False   |               False   |               False   |       False   |
    '0.0.'          False   |               False   |               False   |               False   |       False   |
    '0..0'          False   |               False   |               False   |               False   |       False   |
    '0..'           False   |               False   |               False   |               False   |       False   |
    'one'           False   |               False   |               False   |               False   |       False   |
    <obj..>         False   |               False   |               False   |               False   |       False   |
    (1, 2, 3)       False   |               False   |               False   |               False   |       False   |
    [1, 2, 3]       False   |               False   |               False   |               False   |       False   |
    {'one': 'two'}  False   |               False   |               False   |               False   |       False   |
    ' 0 '           True    |               True    |               True    |               True    |       False   |
    ' 0.'           False   |               True    |               False   |               False   |       False   |
    ' .0'           False   |               False   |               False   |               False   |       False   |
    '.01 '          False   |               False   |               False   |               False   |       False   |

Just now I tried adding this function:

def isInt_float(s):
    try:
        return float(str(s)).is_integer()
    except:
        return False

It performs almost as well as check_int (0.3486) and it returns true for values like 1.0 and 0.0 and +1.0 and 0. and .0 and so on. But it also returns true for ’06’, so. Pick your poison, I guess.


回答 3

str.isdigit() 应该可以。

例子:

str.isdigit("23") ## True
str.isdigit("abc") ## False
str.isdigit("23.4") ## False

编辑:正如@BuzzMoschetti所指出的那样,这种方式将在减号(例如“ -23”)上失败。如果您的input_num可以小于0,请在应用str.isdigit()之前使用re.sub(regex_search,regex_replace,contents 。例如:

import re
input_num = "-23"
input_num = re.sub("^-", "", input_num) ## "^" indicates to remove the first "-" only
str.isdigit(input_num) ## True

str.isdigit() should do the trick.

Examples:

str.isdigit("23") ## True
str.isdigit("abc") ## False
str.isdigit("23.4") ## False

EDIT: As @BuzzMoschetti pointed out, this way will fail for minus number (e.g, “-23”). In case your input_num can be less than 0, use re.sub(regex_search,regex_replace,contents) before applying str.isdigit(). For example:

import re
input_num = "-23"
input_num = re.sub("^-", "", input_num) ## "^" indicates to remove the first "-" only
str.isdigit(input_num) ## True

回答 4

使用正则表达式:

import re
def RepresentsInt(s):
    return re.match(r"[-+]?\d+$", s) is not None

如果还必须接受小数:

def RepresentsInt(s):
    return re.match(r"[-+]?\d+(\.0*)?$", s) is not None

为了提高性能(如果您经常这样做),请仅使用一次编译正则表达式re.compile()

Use a regular expression:

import re
def RepresentsInt(s):
    return re.match(r"[-+]?\d+$", s) is not None

If you must accept decimal fractions also:

def RepresentsInt(s):
    return re.match(r"[-+]?\d+(\.0*)?$", s) is not None

For improved performance if you’re doing this often, compile the regular expression only once using re.compile().


回答 5

适当的RegEx解决方案将结合Greg Hewgill和Nowell的想法,但不使用全局变量。您可以通过将属性附加到方法来完成此操作。另外,我知道将导入放在一种方法中并不容易,但是我要使用的是“惰性模块”效果,例如http://peak.telecommunity.com/DevCenter/Importing#lazy-imports

编辑:到目前为止,我最喜欢的技术是仅使用String对象的方法。

#!/usr/bin/env python

# Uses exclusively methods of the String object
def isInteger(i):
    i = str(i)
    return i=='0' or (i if i.find('..') > -1 else i.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

# Uses re module for regex
def isIntegre(i):
    import re
    if not hasattr(isIntegre, '_re'):
        print("I compile only once. Remove this line when you are confident in that.")
        isIntegre._re = re.compile(r"[-+]?\d+(\.0*)?$")
    return isIntegre._re.match(str(i)) is not None

# When executed directly run Unit Tests
if __name__ == '__main__':
    for obj in [
                # integers
                0, 1, -1, 1.0, -1.0,
                '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0',
                # non-integers
                1.1, -1.1, '1.1', '-1.1', '+1.1',
                '1.1.1', '1.1.0', '1.0.1', '1.0.0',
                '1.0.', '1..0', '1..',
                '0.0.', '0..0', '0..',
                'one', object(), (1,2,3), [1,2,3], {'one':'two'}
            ]:
        # Notice the integre uses 're' (intended to be humorous)
        integer = ('an integer' if isInteger(obj) else 'NOT an integer')
        integre = ('an integre' if isIntegre(obj) else 'NOT an integre')
        # Make strings look like strings in the output
        if isinstance(obj, str):
            obj = ("'%s'" % (obj,))
        print("%30s is %14s is %14s" % (obj, integer, integre))

对于Class较少的成员,输出如下:

I compile only once. Remove this line when you are confident in that.
                             0 is     an integer is     an integre
                             1 is     an integer is     an integre
                            -1 is     an integer is     an integre
                           1.0 is     an integer is     an integre
                          -1.0 is     an integer is     an integre
                           '0' is     an integer is     an integre
                          '0.' is     an integer is     an integre
                         '0.0' is     an integer is     an integre
                           '1' is     an integer is     an integre
                          '-1' is     an integer is     an integre
                          '+1' is     an integer is     an integre
                         '1.0' is     an integer is     an integre
                        '-1.0' is     an integer is     an integre
                        '+1.0' is     an integer is     an integre
                           1.1 is NOT an integer is NOT an integre
                          -1.1 is NOT an integer is NOT an integre
                         '1.1' is NOT an integer is NOT an integre
                        '-1.1' is NOT an integer is NOT an integre
                        '+1.1' is NOT an integer is NOT an integre
                       '1.1.1' is NOT an integer is NOT an integre
                       '1.1.0' is NOT an integer is NOT an integre
                       '1.0.1' is NOT an integer is NOT an integre
                       '1.0.0' is NOT an integer is NOT an integre
                        '1.0.' is NOT an integer is NOT an integre
                        '1..0' is NOT an integer is NOT an integre
                         '1..' is NOT an integer is NOT an integre
                        '0.0.' is NOT an integer is NOT an integre
                        '0..0' is NOT an integer is NOT an integre
                         '0..' is NOT an integer is NOT an integre
                         'one' is NOT an integer is NOT an integre
<object object at 0x103b7d0a0> is NOT an integer is NOT an integre
                     (1, 2, 3) is NOT an integer is NOT an integre
                     [1, 2, 3] is NOT an integer is NOT an integre
                {'one': 'two'} is NOT an integer is NOT an integre

The proper RegEx solution would combine the ideas of Greg Hewgill and Nowell, but not use a global variable. You can accomplish this by attaching an attribute to the method. Also, I know that it is frowned upon to put imports in a method, but what I’m going for is a “lazy module” effect like http://peak.telecommunity.com/DevCenter/Importing#lazy-imports

edit: My favorite technique so far is to use exclusively methods of the String object.

#!/usr/bin/env python

# Uses exclusively methods of the String object
def isInteger(i):
    i = str(i)
    return i=='0' or (i if i.find('..') > -1 else i.lstrip('-+').rstrip('0').rstrip('.')).isdigit()

# Uses re module for regex
def isIntegre(i):
    import re
    if not hasattr(isIntegre, '_re'):
        print("I compile only once. Remove this line when you are confident in that.")
        isIntegre._re = re.compile(r"[-+]?\d+(\.0*)?$")
    return isIntegre._re.match(str(i)) is not None

# When executed directly run Unit Tests
if __name__ == '__main__':
    for obj in [
                # integers
                0, 1, -1, 1.0, -1.0,
                '0', '0.','0.0', '1', '-1', '+1', '1.0', '-1.0', '+1.0',
                # non-integers
                1.1, -1.1, '1.1', '-1.1', '+1.1',
                '1.1.1', '1.1.0', '1.0.1', '1.0.0',
                '1.0.', '1..0', '1..',
                '0.0.', '0..0', '0..',
                'one', object(), (1,2,3), [1,2,3], {'one':'two'}
            ]:
        # Notice the integre uses 're' (intended to be humorous)
        integer = ('an integer' if isInteger(obj) else 'NOT an integer')
        integre = ('an integre' if isIntegre(obj) else 'NOT an integre')
        # Make strings look like strings in the output
        if isinstance(obj, str):
            obj = ("'%s'" % (obj,))
        print("%30s is %14s is %14s" % (obj, integer, integre))

And for the less adventurous members of the class, here is the output:

I compile only once. Remove this line when you are confident in that.
                             0 is     an integer is     an integre
                             1 is     an integer is     an integre
                            -1 is     an integer is     an integre
                           1.0 is     an integer is     an integre
                          -1.0 is     an integer is     an integre
                           '0' is     an integer is     an integre
                          '0.' is     an integer is     an integre
                         '0.0' is     an integer is     an integre
                           '1' is     an integer is     an integre
                          '-1' is     an integer is     an integre
                          '+1' is     an integer is     an integre
                         '1.0' is     an integer is     an integre
                        '-1.0' is     an integer is     an integre
                        '+1.0' is     an integer is     an integre
                           1.1 is NOT an integer is NOT an integre
                          -1.1 is NOT an integer is NOT an integre
                         '1.1' is NOT an integer is NOT an integre
                        '-1.1' is NOT an integer is NOT an integre
                        '+1.1' is NOT an integer is NOT an integre
                       '1.1.1' is NOT an integer is NOT an integre
                       '1.1.0' is NOT an integer is NOT an integre
                       '1.0.1' is NOT an integer is NOT an integre
                       '1.0.0' is NOT an integer is NOT an integre
                        '1.0.' is NOT an integer is NOT an integre
                        '1..0' is NOT an integer is NOT an integre
                         '1..' is NOT an integer is NOT an integre
                        '0.0.' is NOT an integer is NOT an integre
                        '0..0' is NOT an integer is NOT an integre
                         '0..' is NOT an integer is NOT an integre
                         'one' is NOT an integer is NOT an integre
<object object at 0x103b7d0a0> is NOT an integer is NOT an integre
                     (1, 2, 3) is NOT an integer is NOT an integre
                     [1, 2, 3] is NOT an integer is NOT an integre
                {'one': 'two'} is NOT an integer is NOT an integre

回答 6

>>> "+7".lstrip("-+").isdigit()
True
>>> "-7".lstrip("-+").isdigit()
True
>>> "7".lstrip("-+").isdigit()
True
>>> "13.4".lstrip("-+").isdigit()
False

因此,您的功能将是:

def is_int(val):
   return val[1].isdigit() and val.lstrip("-+").isdigit()
>>> "+7".lstrip("-+").isdigit()
True
>>> "-7".lstrip("-+").isdigit()
True
>>> "7".lstrip("-+").isdigit()
True
>>> "13.4".lstrip("-+").isdigit()
False

So your function would be:

def is_int(val):
   return val[1].isdigit() and val.lstrip("-+").isdigit()

回答 7

Greg Hewgill的方法缺少一些组件:前导“ ^”只匹配字符串的开头,然后预先编译re。但是这种方法将使您避免尝试:专家:

import re
INT_RE = re.compile(r"^[-]?\d+$")
def RepresentsInt(s):
    return INT_RE.match(str(s)) is not None

我很想知道为什么您要尝试避免尝试:除了?

Greg Hewgill’s approach was missing a few components: the leading “^” to only match the start of the string, and compiling the re beforehand. But this approach will allow you to avoid a try: exept:

import re
INT_RE = re.compile(r"^[-]?\d+$")
def RepresentsInt(s):
    return INT_RE.match(str(s)) is not None

I would be interested why you are trying to avoid try: except?


回答 8

我必须一直这样做,而且我对使用try / except模式有轻微但不合理的厌恶感。我用这个:

all([xi in '1234567890' for xi in x])

它不包含负数,因此您可以去除一个减号(如果有),然后检查结果是否包含0-9的数字:

all([xi in '1234567890' for xi in x.replace('-', '', 1)])

如果不确定输入是否为字符串,也可以将x传递给str():

all([xi in '1234567890' for xi in str(x).replace('-', '', 1)])

至少有两种(边缘?)情况会崩溃:

  1. 它不适用于各种科学和/或指数表示法(例如1.2E3、10 ^ 3等)-两者都将返回False。我也不认为其他答案也可以解决这个问题,甚至Python 3.8也有不一致的意见,因为type(1E2)给出了<class 'float'>type(10^2)给出了<class 'int'>
  2. 空字符串输入为True。

因此,它不适用于所有可能的输入,但是如果您可以排除科学计数法,指数计数法和空字符串,则可以单行检查,False如果x不是整数,True并且x是整数,则返回单行检查。

我不知道它是否是pythonic,但这只是一行,而且相对清楚代码的作用。

I have to do this all the time, and I have a mild but admittedly irrational aversion to using the try/except pattern. I use this:

all([xi in '1234567890' for xi in x])

It doesn’t accommodate negative numbers, so you could strip out one minus sign (if any), and then check if the result comprises digits from 0-9:

all([xi in '1234567890' for xi in x.replace('-', '', 1)])

You could also pass x to str() if you’re not sure the input is a string:

all([xi in '1234567890' for xi in str(x).replace('-', '', 1)])

There are at least two (edge?) cases where this falls apart:

  1. It doesn’t work for various scientific and/or exponential notations (e.g. 1.2E3, 10^3, etc.) – both will return False. I don’t think other answers accommodated this either, and even Python 3.8 has inconsistent opinions, since type(1E2) gives <class 'float'> whereas type(10^2) gives <class 'int'>.
  2. An empty string input gives True.

So it won’t work for every possible input, but if you can exclude scientific notation, exponential notation, and empty strings, it’s an OK one-line check that returns False if x is not an integer and True if x is an integer.

I don’t know if it’s pythonic, but it’s one line, and it’s relatively clear what the code does.


回答 9

我认为

s.startswith('-') and s[1:].isdigit()

最好重写为:

s.replace('-', '').isdigit()

因为s [1:]也创建了一个新字符串

但是更好的解决方案是

s.lstrip('+-').isdigit()

I think

s.startswith('-') and s[1:].isdigit()

would be better to rewrite to:

s.replace('-', '').isdigit()

because s[1:] also creates a new string

But much better solution is

s.lstrip('+-').isdigit()

回答 10

我真的很喜欢Shavais的帖子,但是我又添加了一个测试用例(和内置的isdigit()函数):

def isInt_loop(v):
    v = str(v).strip()
    # swapping '0123456789' for '9876543210' makes nominal difference (might have because '1' is toward the beginning of the string)
    numbers = '0123456789'
    for i in v:
        if i not in numbers:
            return False
    return True

def isInt_Digit(v):
    v = str(v).strip()
    return v.isdigit()

并且始终如一地超越其他时间:

timings..
isInt_try:   0.4628
isInt_str:   0.3556
isInt_re:    0.4889
isInt_re2:   0.2726
isInt_loop:   0.1842
isInt_Digit:   0.1577

使用普通的2.7 python:

$ python --version
Python 2.7.10

我添加的两个测试用例(isInt_loop和isInt_digit)都通过了完全相同的测试用例(它们都只接受无符号整数),但是我认为人们可以更灵活地修改字符串实现(isInt_loop),而不是内置的isdigit ()函数,因此即使执行时间略有不同,我也将其包含在内。(而且这两种方法都击败了其他一切,但是不处理多余的东西:“ ./+/-”)

此外,我确实发现有趣的是注意到正则表达式(isInt_re2方法)在Shavais于2012年(当前为2018年)执行的同一测试中击败了字符串比较。也许正则表达式库得到了改进?

I really liked Shavais’ post, but I added one more test case ( & the built in isdigit() function):

def isInt_loop(v):
    v = str(v).strip()
    # swapping '0123456789' for '9876543210' makes nominal difference (might have because '1' is toward the beginning of the string)
    numbers = '0123456789'
    for i in v:
        if i not in numbers:
            return False
    return True

def isInt_Digit(v):
    v = str(v).strip()
    return v.isdigit()

and it significantly consistently beats the times of the rest:

timings..
isInt_try:   0.4628
isInt_str:   0.3556
isInt_re:    0.4889
isInt_re2:   0.2726
isInt_loop:   0.1842
isInt_Digit:   0.1577

using normal 2.7 python:

$ python --version
Python 2.7.10

Both the two test cases I added (isInt_loop and isInt_digit) pass the exact same test cases (they both only accept unsigned integers), but I thought that people could be more clever with modifying the string implementation (isInt_loop) opposed to the built in isdigit() function, so I included it, even though there’s a slight difference in execution time. (and both methods beat everything else by a lot, but don’t handle the extra stuff: “./+/-” )

Also, I did find it interesting to note that the regex (isInt_re2 method) beat the string comparison in the same test that was performed by Shavais in 2012 (currently 2018). Maybe the regex libraries have been improved?


回答 11

在我看来,这可能是最直接,最Python的方法。我没有看到这种解决方案,它与regex基本相同,但是没有regex。

def is_int(test):
    import string
    return not (set(test) - set(string.digits))

This is probably the most straightforward and pythonic way to approach it in my opinion. I didn’t see this solution and it’s basically the same as the regex one, but without the regex.

def is_int(test):
    import string
    return not (set(test) - set(string.digits))

回答 12

这是一个不会引起错误的解析函数。它处理明显的None失败案例(在CPython上默认处理多达2000个“-/ +”符号!):

#!/usr/bin/env python

def get_int(number):
    splits = number.split('.')
    if len(splits) > 2:
        # too many splits
        return None
    if len(splits) == 2 and splits[1]:
        # handle decimal part recursively :-)
        if get_int(splits[1]) != 0:
            return None

    int_part = splits[0].lstrip("+")
    if int_part.startswith('-'):
        # handle minus sign recursively :-)
        return get_int(int_part[1:]) * -1
    # successful 'and' returns last truth-y value (cast is always valid)
    return int_part.isdigit() and int(int_part)

一些测试:

tests = ["0", "0.0", "0.1", "1", "1.1", "1.0", "-1", "-1.1", "-1.0", "-0", "--0", "---3", '.3', '--3.', "+13", "+-1.00", "--+123", "-0.000"]

for t in tests:
    print "get_int(%s) = %s" % (t, get_int(str(t)))

结果:

get_int(0) = 0
get_int(0.0) = 0
get_int(0.1) = None
get_int(1) = 1
get_int(1.1) = None
get_int(1.0) = 1
get_int(-1) = -1
get_int(-1.1) = None
get_int(-1.0) = -1
get_int(-0) = 0
get_int(--0) = 0
get_int(---3) = -3
get_int(.3) = None
get_int(--3.) = 3
get_int(+13) = 13
get_int(+-1.00) = -1
get_int(--+123) = 123
get_int(-0.000) = 0

根据您的需要,您可以使用:

def int_predicate(number):
     return get_int(number) is not None

Here is a function that parses without raising errors. It handles obvious cases returns None on failure (handles up to 2000 ‘-/+’ signs by default on CPython!):

#!/usr/bin/env python

def get_int(number):
    splits = number.split('.')
    if len(splits) > 2:
        # too many splits
        return None
    if len(splits) == 2 and splits[1]:
        # handle decimal part recursively :-)
        if get_int(splits[1]) != 0:
            return None

    int_part = splits[0].lstrip("+")
    if int_part.startswith('-'):
        # handle minus sign recursively :-)
        return get_int(int_part[1:]) * -1
    # successful 'and' returns last truth-y value (cast is always valid)
    return int_part.isdigit() and int(int_part)

Some tests:

tests = ["0", "0.0", "0.1", "1", "1.1", "1.0", "-1", "-1.1", "-1.0", "-0", "--0", "---3", '.3', '--3.', "+13", "+-1.00", "--+123", "-0.000"]

for t in tests:
    print "get_int(%s) = %s" % (t, get_int(str(t)))

Results:

get_int(0) = 0
get_int(0.0) = 0
get_int(0.1) = None
get_int(1) = 1
get_int(1.1) = None
get_int(1.0) = 1
get_int(-1) = -1
get_int(-1.1) = None
get_int(-1.0) = -1
get_int(-0) = 0
get_int(--0) = 0
get_int(---3) = -3
get_int(.3) = None
get_int(--3.) = 3
get_int(+13) = 13
get_int(+-1.00) = -1
get_int(--+123) = 123
get_int(-0.000) = 0

For your needs you can use:

def int_predicate(number):
     return get_int(number) is not None

回答 13

我建议以下内容:

import ast

def is_int(s):
    return isinstance(ast.literal_eval(s), int)

文档

安全地评估表达式节点或包含Python文字或容器显示的字符串。提供的字符串或节点只能由以下Python文字结构组成:字符串,字节,数字,元组,列表,字典,集合,布尔值和无。

我应该注意,ValueError当对任何不构成Python文字的内容进行调用时,这将引发异常。由于问题要求的解决方案没有try / except,因此我为此准备了Kobayashi-Maru类型的解决方案:

from ast import literal_eval
from contextlib import suppress

def is_int(s):
    with suppress(ValueError):
        return isinstance(literal_eval(s), int)
    return False

¯\ _(ツ)_ /¯

I suggest the following:

import ast

def is_int(s):
    return isinstance(ast.literal_eval(s), int)

From the docs:

Safely evaluate an expression node or a string containing a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and None.

I should note that this will raise a ValueError exception when called against anything that does not constitute a Python literal. Since the question asked for a solution without try/except, I have a Kobayashi-Maru type solution for that:

from ast import literal_eval
from contextlib import suppress

def is_int(s):
    with suppress(ValueError):
        return isinstance(literal_eval(s), int)
    return False

¯\_(ツ)_/¯


回答 14

我有一种根本不使用int的可能性,并且除非字符串不代表数字,否则不应引发异常

float(number)==float(number)//1

它应该适用于float接受的任何类型的字符串(正,负,工程符号)。

I have one possibility that doesn’t use int at all, and should not raise an exception unless the string does not represent a number

float(number)==float(number)//1

It should work for any kind of string that float accepts, positive, negative, engineering notation…


回答 15

我猜这个问题与速度有关,因为try / except有时间限制:

 测试数据

首先,我创建了一个包含200个字符串,100个失败字符串和100个数字字符串的列表。

from random import shuffle
numbers = [u'+1'] * 100
nonumbers = [u'1abc'] * 100
testlist = numbers + nonumbers
shuffle(testlist)
testlist = np.array(testlist)

 numpy解决方案(仅适用于数组和unicode)

np.core.defchararray.isnumeric也可以使用unicode字符串,np.core.defchararray.isnumeric(u'+12')但是它返回和数组。因此,如果您必须进行数千次转换并且缺少数据或非数字数据,这是一个很好的解决方案。

import numpy as np
%timeit np.core.defchararray.isnumeric(testlist)
10000 loops, best of 3: 27.9 µs per loop # 200 numbers per loop

尝试/除外

def check_num(s):
  try:
    int(s)
    return True
  except:
    return False

def check_list(l):
  return [check_num(e) for e in l]

%timeit check_list(testlist)
1000 loops, best of 3: 217 µs per loop # 200 numbers per loop

似乎numpy解决方案要快得多。

I guess the question is related with speed since the try/except has a time penalty:

 test data

First, I created a list of 200 strings, 100 failing strings and 100 numeric strings.

from random import shuffle
numbers = [u'+1'] * 100
nonumbers = [u'1abc'] * 100
testlist = numbers + nonumbers
shuffle(testlist)
testlist = np.array(testlist)

 numpy solution (only works with arrays and unicode)

np.core.defchararray.isnumeric can also work with unicode strings np.core.defchararray.isnumeric(u'+12') but it returns and array. So, it’s a good solution if you have to do thousands of conversions and have missing data or non numeric data.

import numpy as np
%timeit np.core.defchararray.isnumeric(testlist)
10000 loops, best of 3: 27.9 µs per loop # 200 numbers per loop

try/except

def check_num(s):
  try:
    int(s)
    return True
  except:
    return False

def check_list(l):
  return [check_num(e) for e in l]

%timeit check_list(testlist)
1000 loops, best of 3: 217 µs per loop # 200 numbers per loop

Seems that numpy solution is much faster.


回答 16

如果您只想接受低位数字,请执行以下测试:

Python 3.7+: (u.isdecimal() and u.isascii())

Python <= 3.6: (u.isdecimal() and u == str(int(u)))

其他答案建议使用.isdigit()或,.isdecimal()都包含一些高位unicode字符,例如'٢'u'\u0662'):

u = u'\u0662'     # '٢'
u.isdigit()       # True
u.isdecimal()     # True
u.isascii()       # False (Python 3.7+ only)
u == str(int(u))  # False

If you want to accept lower-ascii digits only, here are tests to do so:

Python 3.7+: (u.isdecimal() and u.isascii())

Python <= 3.6: (u.isdecimal() and u == str(int(u)))

Other answers suggest using .isdigit() or .isdecimal() but these both include some upper-unicode characters such as '٢' (u'\u0662'):

u = u'\u0662'     # '٢'
u.isdigit()       # True
u.isdecimal()     # True
u.isascii()       # False (Python 3.7+ only)
u == str(int(u))  # False

回答 17

嗯。尝试这个:

def int_check(a):
    if int(a) == a:
        return True
    else:
        return False

如果您不输入不是数字的字符串,则此方法有效。

而且(我忘了放数字检查部分。),还有一个函数检查字符串是否是数字。它是str.isdigit()。这是一个例子:

a = 2
a.isdigit()

如果调用a.isdigit(),它将返回True。

Uh.. Try this:

def int_check(a):
    if int(a) == a:
        return True
    else:
        return False

This works if you don’t put a string that’s not a number.

And also (I forgot to put the number check part. ), there is a function checking if the string is a number or not. It is str.isdigit(). Here’s an example:

a = 2
a.isdigit()

If you call a.isdigit(), it will return True.


与常规Python列表相比,NumPy有什么优势?

问题:与常规Python列表相比,NumPy有什么优势?

与常规Python列表相比,NumPy有什么优势?

我大约有100个金融市场系列,我将创建一个100x100x100 = 1百万个单元的多维数据集数组。我将每个x与y和z回归(3变量),以用标准误差填充数组。

我听说对于“大型矩阵”,出于性能和可伸缩性的原因,我应该使用NumPy而不是Python列表。事实是,我知道Python列表,它们似乎对我有用。

如果我转到NumPy,会有什么好处?

如果我有1000个序列(即立方体中有10亿个浮点单元)怎么办?

What are the advantages of NumPy over regular Python lists?

I have approximately 100 financial markets series, and I am going to create a cube array of 100x100x100 = 1 million cells. I will be regressing (3-variable) each x with each y and z, to fill the array with standard errors.

I have heard that for “large matrices” I should use NumPy as opposed to Python lists, for performance and scalability reasons. Thing is, I know Python lists and they seem to work for me.

What will the benefits be if I move to NumPy?

What if I had 1000 series (that is, 1 billion floating point cells in the cube)?


回答 0

NumPy的数组比Python列表更紧凑-您在Python中描述的列表列表至少需要20 MB左右,而单元格中具有单精度浮点数的NumPy 3D数组则需要4 MB。使用NumPy可以更快地读取和写入项目。

也许您只关心一百万个单元就不会那么在意,但是您肯定会关心十亿个单元-两种方法都不适合32位体系结构,但是使用64位版本,NumPy可以节省约4 GB ,仅Python一项就至少需要12 GB(很多指针的大小加倍),这是一个昂贵得多的硬件!

差异主要是由于“间接性”造成的-Python列表是指向Python对象的指针的数组,每个指针至少4个字节,对于最小的Python对象也至少包含16个字节(类型指针为4,引用计数为4,类型为4值-内存分配器舍入为16)。NumPy数组是统一值的数组-单精度数字每个占用4个字节,双精度数字每个占用8个字节。灵活性较差,但您要为标准Python列表的灵活性付出高昂的代价!

NumPy’s arrays are more compact than Python lists — a list of lists as you describe, in Python, would take at least 20 MB or so, while a NumPy 3D array with single-precision floats in the cells would fit in 4 MB. Access in reading and writing items is also faster with NumPy.

Maybe you don’t care that much for just a million cells, but you definitely would for a billion cells — neither approach would fit in a 32-bit architecture, but with 64-bit builds NumPy would get away with 4 GB or so, Python alone would need at least about 12 GB (lots of pointers which double in size) — a much costlier piece of hardware!

The difference is mostly due to “indirectness” — a Python list is an array of pointers to Python objects, at least 4 bytes per pointer plus 16 bytes for even the smallest Python object (4 for type pointer, 4 for reference count, 4 for value — and the memory allocators rounds up to 16). A NumPy array is an array of uniform values — single-precision numbers takes 4 bytes each, double-precision ones, 8 bytes. Less flexible, but you pay substantially for the flexibility of standard Python lists!


回答 1

NumPy不仅效率更高;这也更加方便。您可以免费获得许多矢量和矩阵运算,有时这可以避免不必要的工作。而且它们也得到有效实施。

例如,您可以将多维数据集直接从文件读取到数组中:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100))

沿第二维求和:

s = x.sum(axis=1)

查找哪些单元格高于阈值:

(x > 0.5).nonzero()

删除沿第三维的每个偶数索引切片:

x[:, :, ::2]

同样,许多有用的库都可以与NumPy数组一起使用。例如,统计分析和可视化库。

即使您没有性能问题,学习NumPy也是值得的。

NumPy is not just more efficient; it is also more convenient. You get a lot of vector and matrix operations for free, which sometimes allow one to avoid unnecessary work. And they are also efficiently implemented.

For example, you could read your cube directly from a file into an array:

x = numpy.fromfile(file=open("data"), dtype=float).reshape((100, 100, 100))

Sum along the second dimension:

s = x.sum(axis=1)

Find which cells are above a threshold:

(x > 0.5).nonzero()

Remove every even-indexed slice along the third dimension:

x[:, :, ::2]

Also, many useful libraries work with NumPy arrays. For example, statistical analysis and visualization libraries.

Even if you don’t have performance problems, learning NumPy is worth the effort.


回答 2

Alex提到了内存效率,Roberto提到了便利性,这些都是不错的地方。对于其他一些想法,我将提到速度功能

功能性:NumPy,FFT,卷积,快速搜索,基本统计信息,线性代数,直方图等都内置了很多功能。实际上,没有FFT谁能活下去?

速度:这是一项对列表和NumPy数组求和的测试,表明NumPy数组的求和速度快10倍(在此测试中,里程可能会有所不同)。

from numpy import arange
from timeit import Timer

Nelements = 10000
Ntimeits = 10000

x = arange(Nelements)
y = range(Nelements)

t_numpy = Timer("x.sum()", "from __main__ import x")
t_list = Timer("sum(y)", "from __main__ import y")
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))

在我的系统上(运行备份时),它会给出:

numpy: 3.004e-05
list:  5.363e-04

Alex mentioned memory efficiency, and Roberto mentions convenience, and these are both good points. For a few more ideas, I’ll mention speed and functionality.

Functionality: You get a lot built in with NumPy, FFTs, convolutions, fast searching, basic statistics, linear algebra, histograms, etc. And really, who can live without FFTs?

Speed: Here’s a test on doing a sum over a list and a NumPy array, showing that the sum on the NumPy array is 10x faster (in this test — mileage may vary).

from numpy import arange
from timeit import Timer

Nelements = 10000
Ntimeits = 10000

x = arange(Nelements)
y = range(Nelements)

t_numpy = Timer("x.sum()", "from __main__ import x")
t_list = Timer("sum(y)", "from __main__ import y")
print("numpy: %.3e" % (t_numpy.timeit(Ntimeits)/Ntimeits,))
print("list:  %.3e" % (t_list.timeit(Ntimeits)/Ntimeits,))

which on my systems (while I’m running a backup) gives:

numpy: 3.004e-05
list:  5.363e-04

回答 3

这是scipy.org网站上的常见问题解答中的一个很好的答案:

与(嵌套)Python列表相比,NumPy数组有什么优势?

Python的列表是有效的通用容器。它们支持(相当)高效的插入,删除,附加和连接,并且Python的列表理解使它们易于构造和操作。但是,它们有一定的局限性:它们不支持“向量化”操作,例如逐元素加法和乘法,并且它们可以包含不同类型的对象这一事实意味着Python必须为每个元素存储类型信息,并且必须执行类型分派代码在每个元素上操作时。这也意味着有效的C循环几乎无法执行列表操作-每次迭代都需要类型检查和其他Python API簿记。

Here’s a nice answer from the FAQ on the scipy.org website:

What advantages do NumPy arrays offer over (nested) Python lists?

Python’s lists are efficient general-purpose containers. They support (fairly) efficient insertion, deletion, appending, and concatenation, and Python’s list comprehensions make them easy to construct and manipulate. However, they have certain limitations: they don’t support “vectorized” operations like elementwise addition and multiplication, and the fact that they can contain objects of differing types mean that Python must store type information for every element, and must execute type dispatching code when operating on each element. This also means that very few list operations can be carried out by efficient C loops – each iteration would require type checks and other Python API bookkeeping.


回答 4

所有人都强调了numpy数组和python列表之间的几乎所有主要区别,在这里我将向大家简单介绍一下:

  1. Numpy数组在创建时具有固定的大小,这与python列表(可以动态增长)不同。更改ndarray的大小将创建一个新数组并删除原始数组。

  2. Numpy数组中的所有元素都必须具有相同的数据类型(我们也可以具有异构类型,但这将不允许您进行数学运算),因此在内存中的大小将相同

  3. Numpy数组有助于对大量数据进行数学运算和其他类型的运算。通常,与使用python顺序构建相比,此类操作执行效率更高且代码更少

All have highlighted almost all major differences between numpy array and python list, I will just brief them out here:

  1. Numpy arrays have a fixed size at creation, unlike python lists (which can grow dynamically). Changing the size of ndarray will create a new array and delete the original.

  2. The elements in a Numpy array are all required to be of the same data type (we can have the heterogeneous type as well but that will not gonna permit you mathematical operations) and thus will be the same size in memory

  3. Numpy arrays are facilitated advances mathematical and other types of operations on large numbers of data. Typically such operations are executed more efficiently and with less code than is possible using pythons build in sequences


Pandas中map,applymap和apply方法之间的区别

问题:Pandas中map,applymap和apply方法之间的区别

您能否通过基本示例告诉我何时使用这些矢量化方法?

我看到这map是一种Series方法,而其余都是DataFrame方法。我糊涂了约applyapplymap,虽然方法。为什么我们有两种将函数应用于DataFrame的方法?同样,简单的例子可以很好地说明用法!

Can you tell me when to use these vectorization methods with basic examples?

I see that map is a Series method whereas the rest are DataFrame methods. I got confused about apply and applymap methods though. Why do we have two methods for applying a function to a DataFrame? Again, simple examples which illustrate the usage would be great!


回答 0

直接来自Wes McKinney的《Python for Data Analysis》一书,第16页。132(我强烈推荐这本书):

另一种常见的操作是将一维数组上的函数应用于每一列或每一行。DataFrame的apply方法正是这样做的:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

许多最常见的数组统计信息(例如sum和mean)都是DataFrame方法,因此不必使用apply。

也可以使用基于元素的Python函数。假设您要根据帧中的每个浮点值来计算格式化的字符串。您可以使用applymap做到这一点:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

之所以使用applymap之所以命名,是因为Series具有用于应用逐元素函数的map方法:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

总结起来,apply在DataFrame的行/列基础上工作,在DataFrame applymap上按map元素工作,在Series上按元素工作。

Straight from Wes McKinney’s Python for Data Analysis book, pg. 132 (I highly recommended this book):

Another frequent operation is applying a function on 1D arrays to each column or row. DataFrame’s apply method does exactly this:

In [116]: frame = DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [117]: frame
Out[117]: 
               b         d         e
Utah   -0.029638  1.081563  1.280300
Ohio    0.647747  0.831136 -1.549481
Texas   0.513416 -0.884417  0.195343
Oregon -0.485454 -0.477388 -0.309548

In [118]: f = lambda x: x.max() - x.min()

In [119]: frame.apply(f)
Out[119]: 
b    1.133201
d    1.965980
e    2.829781
dtype: float64

Many of the most common array statistics (like sum and mean) are DataFrame methods, so using apply is not necessary.

Element-wise Python functions can be used, too. Suppose you wanted to compute a formatted string from each floating point value in frame. You can do this with applymap:

In [120]: format = lambda x: '%.2f' % x

In [121]: frame.applymap(format)
Out[121]: 
            b      d      e
Utah    -0.03   1.08   1.28
Ohio     0.65   0.83  -1.55
Texas    0.51  -0.88   0.20
Oregon  -0.49  -0.48  -0.31

The reason for the name applymap is that Series has a map method for applying an element-wise function:

In [122]: frame['e'].map(format)
Out[122]: 
Utah       1.28
Ohio      -1.55
Texas      0.20
Oregon    -0.31
Name: e, dtype: object

Summing up, apply works on a row / column basis of a DataFrame, applymap works element-wise on a DataFrame, and map works element-wise on a Series.


回答 1

比较mapapplymap和:上下文问题apply

第一个主要区别:定义

  • map 仅在系列上定义
  • applymap 仅在DataFrames上定义
  • apply 两者都定义

第二个主要区别:输入参数

  • map接受dictS, Series,或可调用
  • applymap并且apply只接受可调用

第三大区别:行为

  • map 对于系列是元素
  • applymap 对于DataFrames是元素
  • apply也可以逐元素工作,但适用于更复杂的操作和聚合。行为和返回值取决于函数。

四主要的区别(最重要的):用例

  • map是用于将值从一个域映射到另一个域,因此针对性能进行了优化(例如df['A'].map({1:'a', 2:'b', 3:'c'})
  • applymap适用于跨多个行/列的元素转换(例如df[['A', 'B', 'C']].applymap(str.strip)
  • apply用于应用无法向量化的任何功能(例如df['sentences'].apply(nltk.sent_tokenize)

总结

在此处输入图片说明

脚注

  1. map通过字典/系列时,将基于该字典/系列中的键映射元素。缺少的值将在输出中记录为NaN。
  2. applymap在最新版本中,已针对某些操作进行了优化。您会发现applymapapply某些情况下要快一些。我的建议是对它们都进行测试,并使用更好的方法。

  3. map针对元素映射和转换进行了优化。涉及字典或系列的操作将使熊猫能够使用更快的代码路径来获得更好的性能。

  4. Series.apply返回用于汇总操作的标量,否则返回Series。同样适用于DataFrame.apply。需要注意的是apply,当某些NumPy的功能,如所谓的也有FastPaths的meansum等等。

Comparing map, applymap and apply: Context Matters

First major difference: DEFINITION

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

Second major difference: INPUT ARGUMENT

  • map accepts dicts, Series, or callable
  • applymap and apply accept callables only

Third major difference: BEHAVIOR

  • map is elementwise for Series
  • applymap is elementwise for DataFrames
  • apply also works elementwise but is suited to more complex operations and aggregation. The behaviour and return value depends on the function.

Fourth major difference (the most important one): USE CASE

  • map is meant for mapping values from one domain to another, so is optimised for performance (e.g., df['A'].map({1:'a', 2:'b', 3:'c'}))
  • applymap is good for elementwise transformations across multiple rows/columns (e.g., df[['A', 'B', 'C']].applymap(str.strip))
  • apply is for applying any function that cannot be vectorised (e.g., df['sentences'].apply(nltk.sent_tokenize))

Summarising

enter image description here

Footnotes

  1. map when passed a dictionary/Series will map elements based on the keys in that dictionary/Series. Missing values will be recorded as NaN in the output.
  2. applymap in more recent versions has been optimised for some operations. You will find applymap slightly faster than apply in some cases. My suggestion is to test them both and use whatever works better.

  3. map is optimised for elementwise mappings and transformation. Operations that involve dictionaries or Series will enable pandas to use faster code paths for better performance.

  4. Series.apply returns a scalar for aggregating operations, Series otherwise. Similarly for DataFrame.apply. Note that apply also has fastpaths when called with certain NumPy functions such as mean, sum, etc.

回答 2

这些答案中有很多有用的信息,但是我要添加自己的信息,以明确总结哪些方法在数组方式与元素方式下均有效。jeremiahbuddha主要这样做,但没有提及Series.apply。我没有代表对此发表评论。

  • DataFrame.apply 一次对整个行或列进行操作。

  • DataFrame.applymap,,Series.apply并同时Series.map对一个元素进行操作。

Series.apply和的功能之间有很多重叠之处Series.map,这意味着任何一种在大多数情况下都可以使用。但是,它们确实有一些细微的差异,其中一些已在osa的答案中进行了讨论。

There’s great information in these answers, but I’m adding my own to clearly summarize which methods work array-wise versus element-wise. jeremiahbuddha mostly did this but did not mention Series.apply. I don’t have the rep to comment.

  • DataFrame.apply operates on entire rows or columns at a time.

  • DataFrame.applymap, Series.apply, and Series.map operate on one element at time.

There is a lot of overlap between the capabilities of Series.apply and Series.map, meaning that either one will work in most cases. They do have some slight differences though, some of which were discussed in osa’s answer.


回答 3

除了其他答案外,Series还有mapapply

Apply可以使DataFrame脱离系列;但是,map只会将一个系列放在另一个系列的每个单元格中,这可能不是您想要的。

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

另外,如果我有一个带有副作用的功能,例如“连接到Web服务器”,那么我可能apply只是为了清楚起见而使用。

series.apply(download_file_for_every_element) 

Map不仅可以使用功能,还可以使用字典或其他系列。假设您要操纵排列

采取

1 2 3 4 5
2 1 4 5 3

此排列的平方是

1 2 3 4 5
1 2 5 3 4

您可以使用进行计算map。不知道自助申请是否已记录在案,但可以在中使用0.15.1

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

Adding to the other answers, in a Series there are also map and apply.

Apply can make a DataFrame out of a series; however, map will just put a series in every cell of another series, which is probably not what you want.

In [40]: p=pd.Series([1,2,3])
In [41]: p
Out[31]:
0    1
1    2
2    3
dtype: int64

In [42]: p.apply(lambda x: pd.Series([x, x]))
Out[42]: 
   0  1
0  1  1
1  2  2
2  3  3

In [43]: p.map(lambda x: pd.Series([x, x]))
Out[43]: 
0    0    1
1    1
dtype: int64
1    0    2
1    2
dtype: int64
2    0    3
1    3
dtype: int64
dtype: object

Also if I had a function with side effects, such as “connect to a web server”, I’d probably use apply just for the sake of clarity.

series.apply(download_file_for_every_element) 

Map can use not only a function, but also a dictionary or another series. Let’s say you want to manipulate permutations.

Take

1 2 3 4 5
2 1 4 5 3

The square of this permutation is

1 2 3 4 5
1 2 5 3 4

You can compute it using map. Not sure if self-application is documented, but it works in 0.15.1.

In [39]: p=pd.Series([1,0,3,4,2])

In [40]: p.map(p)
Out[40]: 
0    0
1    1
2    4
3    2
4    3
dtype: int64

回答 4

@jeremiahbuddha提到了apply在行/列上的工作,而applymap在元素上工作。但似乎您仍可以使用apply进行元素计算。

    frame.apply(np.sqrt)
    Out[102]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

    frame.applymap(np.sqrt)
    Out[103]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

@jeremiahbuddha mentioned that apply works on row/columns, while applymap works element-wise. But it seems you can still use apply for element-wise computation….

    frame.apply(np.sqrt)
    Out[102]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

    frame.applymap(np.sqrt)
    Out[103]: 
                   b         d         e
    Utah         NaN  1.435159       NaN
    Ohio    1.098164  0.510594  0.729748
    Texas        NaN  0.456436  0.697337
    Oregon  0.359079       NaN       NaN

回答 5

只是想指出一点,因为我为此苦了一点

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

这不会修改数据框本身,必须重新分配

df = df.applymap(f)
df.describe()

Just wanted to point out, as I struggled with this for a bit

def f(x):
    if x < 0:
        x = 0
    elif x > 100000:
        x = 100000
    return x

df.applymap(f)
df.describe()

this does not modify the dataframe itself, has to be reassigned

df = df.applymap(f)
df.describe()

回答 6

可能最简单的解释apply和applymap之间的区别:

apply将整个列作为参数,然后将结果分配给该列

applymap将单独的单元格值作为参数,并将结果分配回该单元格。

注意:如果apply返回单个值,则分配后将具有该值而不是列,最终将仅具有一行而不是矩阵。

Probably simplest explanation the difference between apply and applymap:

apply takes the whole column as a parameter and then assign the result to this column

applymap takes the separate cell value as a parameter and assign the result back to this cell.

NB If apply returns the single value you will have this value instead of the column after assigning and eventually will have just a row instead of matrix.


回答 7

我的理解:

从功能上看:

如果函数具有需要在列/行中进行比较的变量,请使用 apply

例如:lambda x: x.max()-x.mean()

如果要将函数应用于每个元素:

1>如果找到列/行,请使用 apply

2>如果适用于整个数据框,请使用 applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)

My understanding:

From the function point of view:

If the function has variables that need to compare within a column/ row, use apply.

e.g.: lambda x: x.max()-x.mean().

If the function is to be applied to each element:

1> If a column/row is located, use apply

2> If apply to entire dataframe, use applymap

majority = lambda x : x > 17
df2['legal_drinker'] = df2['age'].apply(majority)

def times10(x):
  if type(x) is int:
    x *= 10 
  return x
df2.applymap(times10)

回答 8

基于cs95的答案

  • map 仅在系列上定义
  • applymap 仅在DataFrames上定义
  • apply 两者都定义

举一些例子

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

Based on the answer of cs95

  • map is defined on Series ONLY
  • applymap is defined on DataFrames ONLY
  • apply is defined on BOTH

give some examples

In [3]: frame = pd.DataFrame(np.random.randn(4, 3), columns=list('bde'), index=['Utah', 'Ohio', 'Texas', 'Oregon'])

In [4]: frame
Out[4]:
            b         d         e
Utah    0.129885 -0.475957 -0.207679
Ohio   -2.978331 -1.015918  0.784675
Texas  -0.256689 -0.226366  2.262588
Oregon  2.605526  1.139105 -0.927518

In [5]: myformat=lambda x: f'{x:.2f}'

In [6]: frame.d.map(myformat)
Out[6]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [7]: frame.d.apply(myformat)
Out[7]:
Utah      -0.48
Ohio      -1.02
Texas     -0.23
Oregon     1.14
Name: d, dtype: object

In [8]: frame.applymap(myformat)
Out[8]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93

In [9]: frame.apply(lambda x: x.apply(myformat))
Out[9]:
            b      d      e
Utah     0.13  -0.48  -0.21
Ohio    -2.98  -1.02   0.78
Texas   -0.26  -0.23   2.26
Oregon   2.61   1.14  -0.93


In [10]: myfunc=lambda x: x**2

In [11]: frame.applymap(myfunc)
Out[11]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

In [12]: frame.apply(myfunc)
Out[12]:
            b         d         e
Utah    0.016870  0.226535  0.043131
Ohio    8.870453  1.032089  0.615714
Texas   0.065889  0.051242  5.119305
Oregon  6.788766  1.297560  0.860289

回答 9

FOMO:

以下示例显示applyapplymap应用于DataFrame

map函数仅适用于Series。您不能map 在DataFrame上申请。

需要记住的是, apply可以做任何事情 applymap都可以,但apply额外的选项。

X因子选项包括:axisresult_typeresult_type仅在axis=1(仅适用于)时才适用。

df = DataFrame(1, columns=list('abc'),
                  index=list('1234'))
print(df)

f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only

# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1))  # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result

map附带说明一下,Series 函数不应与Python map函数混淆。

第一个应用于Series,以映射值,第二个应用于迭代对象的每个项目。


最后,不要将dataframe apply方法与groupby apply方法混淆。

FOMO:

The following example shows apply and applymap applied to a DataFrame.

map function is something you do apply on Series only. You cannot apply map on DataFrame.

The thing to remember is that apply can do anything applymap can, but apply has eXtra options.

The X factor options are: axis and result_type where result_type only works when axis=1 (for columns).

df = DataFrame(1, columns=list('abc'),
                  index=list('1234'))
print(df)

f = lambda x: np.log(x)
print(df.applymap(f)) # apply to the whole dataframe
print(np.log(df)) # applied to the whole dataframe
print(df.applymap(np.sum)) # reducing can be applied for rows only

# apply can take different options (vs. applymap cannot)
print(df.apply(f)) # same as applymap
print(df.apply(sum, axis=1))  # reducing example
print(df.apply(np.log, axis=1)) # cannot reduce
print(df.apply(lambda x: [1, 2, 3], axis=1, result_type='expand')) # expand result

As a sidenote, Series map function, should not be confused with the Python map function.

The first one is applied on Series, to map the values, and the second one to every item of an iterable.


Lastly don’t confuse the dataframe apply method with groupby apply method.


将pandas数据框转换为NumPy数组

问题:将pandas数据框转换为NumPy数组

我对知道如何将熊猫数据框转换为NumPy数组感兴趣。

数据框:

import numpy as np
import pandas as pd

index = [1, 2, 3, 4, 5, 6, 7]
a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1]
b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan]
c = [np.nan, 0.5, 0.5, np.nan, 0.5, 0.5, np.nan]
df = pd.DataFrame({'A': a, 'B': b, 'C': c}, index=index)
df = df.rename_axis('ID')

label   A    B    C
ID                                 
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

我想将其转换为NumPy数组,如下所示:

array([[ nan,  0.2,  nan],
       [ nan,  nan,  0.5],
       [ nan,  0.2,  0.5],
       [ 0.1,  0.2,  nan],
       [ 0.1,  0.2,  0.5],
       [ 0.1,  nan,  0.5],
       [ 0.1,  nan,  nan]])

我怎样才能做到这一点?


另外,是否可以像这样保留dtype?

array([[ 1, nan,  0.2,  nan],
       [ 2, nan,  nan,  0.5],
       [ 3, nan,  0.2,  0.5],
       [ 4, 0.1,  0.2,  nan],
       [ 5, 0.1,  0.2,  0.5],
       [ 6, 0.1,  nan,  0.5],
       [ 7, 0.1,  nan,  nan]],
     dtype=[('ID', '<i4'), ('A', '<f8'), ('B', '<f8'), ('B', '<f8')])

或类似的?

I am interested in knowing how to convert a pandas dataframe into a NumPy array.

dataframe:

import numpy as np
import pandas as pd

index = [1, 2, 3, 4, 5, 6, 7]
a = [np.nan, np.nan, np.nan, 0.1, 0.1, 0.1, 0.1]
b = [0.2, np.nan, 0.2, 0.2, 0.2, np.nan, np.nan]
c = [np.nan, 0.5, 0.5, np.nan, 0.5, 0.5, np.nan]
df = pd.DataFrame({'A': a, 'B': b, 'C': c}, index=index)
df = df.rename_axis('ID')

gives

label   A    B    C
ID                                 
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

I would like to convert this to a NumPy array, as so:

array([[ nan,  0.2,  nan],
       [ nan,  nan,  0.5],
       [ nan,  0.2,  0.5],
       [ 0.1,  0.2,  nan],
       [ 0.1,  0.2,  0.5],
       [ 0.1,  nan,  0.5],
       [ 0.1,  nan,  nan]])

How can I do this?


As a bonus, is it possible to preserve the dtypes, like this?

array([[ 1, nan,  0.2,  nan],
       [ 2, nan,  nan,  0.5],
       [ 3, nan,  0.2,  0.5],
       [ 4, 0.1,  0.2,  nan],
       [ 5, 0.1,  0.2,  0.5],
       [ 6, 0.1,  nan,  0.5],
       [ 7, 0.1,  nan,  nan]],
     dtype=[('ID', '<i4'), ('A', '<f8'), ('B', '<f8'), ('B', '<f8')])

or similar?


回答 0

要将熊猫数据框(df)转换为numpy ndarray,请使用以下代码:

df.values

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

To convert a pandas dataframe (df) to a numpy ndarray, use this code:

df.values

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

回答 1

弃用的用法valuesas_matrix()

pandas v0.24.0引入了两种从pandas对象获取NumPy数组的新方法:

  1. to_numpy(),其定义上IndexSeries,DataFrame对象,并
  2. array,仅在IndexSeries对象上定义。

如果您访问的v0.24文档.values,则会看到一个红色的大警告:

警告:我们建议DataFrame.to_numpy()改为使用。

请参阅v0.24.0发行说明的本部分以及此答案以获取更多信息。


追求更好的一致性: to_numpy()

本着整个API更好的一致性的精神,to_numpy引入了一种新方法来从DataFrames中提取底层的NumPy数组。

# Setup.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])

df.to_numpy()
array([[1, 4],
       [2, 5],
       [3, 6]])

如上所述,该方法也在IndexSeries对象上定义(请参见此处)。

df.index.to_numpy()
# array(['a', 'b', 'c'], dtype=object)

df['A'].to_numpy()
#  array([1, 2, 3])

默认情况下,将返回视图,因此所做的任何修改都会影响原始视图。

v = df.to_numpy()
v[0, 0] = -1

df
   A  B
a -1  4
b  2  5
c  3  6

如果您需要副本,请使用to_numpy(copy=True

Pandas> = 1.0更新为ExtensionTypes

如果您使用的是熊猫1.x,那么您可能会更多地处理扩展类型。您必须多加注意,这些扩展名类型已正确转换。

a = pd.array([1, 2, None], dtype="Int64")                                  
a                                                                          

<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64 

# Wrong
a.to_numpy()                                                               
# array([1, 2, <NA>], dtype=object)  # yuck, objects

# Right
a.to_numpy(dtype='float', na_value=np.nan)                                 
# array([ 1.,  2., nan])

在文档中对此进行了标注

如果您需要dtypes

如另一个答案所示,DataFrame.to_records是执行此操作的好方法。

df.to_records()
# rec.array([('a', -1, 4), ('b',  2, 5), ('c',  3, 6)],
#           dtype=[('index', 'O'), ('A', '<i8'), ('B', '<i8')])

to_numpy不幸的是,这不能做到。但是,您也可以使用np.rec.fromrecords

v = df.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())
# rec.array([('a', -1, 4), ('b',  2, 5), ('c',  3, 6)],
#          dtype=[('index', '<U1'), ('A', '<i8'), ('B', '<i8')])

在性能方面,它几乎是相同的(实际上,使用rec.fromrecords速度要快一些)。

df2 = pd.concat([df] * 10000)

%timeit df2.to_records()
%%timeit
v = df2.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())

11.1 ms ± 557 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.67 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

添加新方法的理由

to_numpy()(除了array)作为两个GitHub问题GH19954GH23623下讨论的结果而添加

具体来说,文档提到了基本原理:

[…] .values尚不清楚返回的值是实际数组,它的某种转换还是熊猫自定义数组之一(如Categorical)。例如,使用PeriodIndex,每次都会.values 生成一个新ndarray的周期对象。[…]

to_numpy旨在提高API的一致性,这是朝正确方向迈出的重要一步。.values不会在当前版本中被弃用,但我希望这种情况可能会在将来的某个时刻发生,因此,我敦促用户尽快向较新的API迁移。


批判其他解决方案

DataFrame.values 如前所述,具有不一致的行为。

DataFrame.get_values()只是一个包装器DataFrame.values,因此上述所有内容均适用。

DataFrame.as_matrix()现在已弃用,请勿使用!

Deprecate your usage of values and as_matrix()!

pandas v0.24.0 introduced two new methods for obtaining NumPy arrays from pandas objects:

  1. to_numpy(), which is defined on Index, Series, and DataFrame objects, and
  2. array, which is defined on Index and Series objects only.

If you visit the v0.24 docs for .values, you will see a big red warning that says:

Warning: We recommend using DataFrame.to_numpy() instead.

See this section of the v0.24.0 release notes, and this answer for more information.


Towards Better Consistency: to_numpy()

In the spirit of better consistency throughout the API, a new method to_numpy has been introduced to extract the underlying NumPy array from DataFrames.

# Setup.
df = pd.DataFrame({'A': [1, 2, 3], 'B': [4, 5, 6]}, index=['a', 'b', 'c'])

df.to_numpy()
array([[1, 4],
       [2, 5],
       [3, 6]])

As mentioned above, this method is also defined on Index and Series objects (see here).

df.index.to_numpy()
# array(['a', 'b', 'c'], dtype=object)

df['A'].to_numpy()
#  array([1, 2, 3])

By default, a view is returned, so any modifications made will affect the original.

v = df.to_numpy()
v[0, 0] = -1

df
   A  B
a -1  4
b  2  5
c  3  6

If you need a copy instead, use to_numpy(copy=True).

pandas >= 1.0 update for ExtensionTypes

If you’re using pandas 1.x, chances are you’ll be dealing with extension types a lot more. You’ll have to be a little more careful that these extension types are correctly converted.

a = pd.array([1, 2, None], dtype="Int64")                                  
a                                                                          

<IntegerArray>
[1, 2, <NA>]
Length: 3, dtype: Int64 

# Wrong
a.to_numpy()                                                               
# array([1, 2, <NA>], dtype=object)  # yuck, objects

# Right
a.to_numpy(dtype='float', na_value=np.nan)                                 
# array([ 1.,  2., nan])

This is called out in the docs.

If you need the dtypes

As shown in another answer, DataFrame.to_records is a good way to do this.

df.to_records()
# rec.array([('a', -1, 4), ('b',  2, 5), ('c',  3, 6)],
#           dtype=[('index', 'O'), ('A', '<i8'), ('B', '<i8')])

This cannot be done with to_numpy, unfortunately. However, as an alternative, you can use np.rec.fromrecords:

v = df.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())
# rec.array([('a', -1, 4), ('b',  2, 5), ('c',  3, 6)],
#          dtype=[('index', '<U1'), ('A', '<i8'), ('B', '<i8')])

Performance wise, it’s nearly the same (actually, using rec.fromrecords is a bit faster).

df2 = pd.concat([df] * 10000)

%timeit df2.to_records()
%%timeit
v = df2.reset_index()
np.rec.fromrecords(v, names=v.columns.tolist())

11.1 ms ± 557 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
9.67 ms ± 126 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Rationale for Adding a New Method

to_numpy() (in addition to array) was added as a result of discussions under two GitHub issues GH19954 and GH23623.

Specifically, the docs mention the rationale:

[…] with .values it was unclear whether the returned value would be the actual array, some transformation of it, or one of pandas custom arrays (like Categorical). For example, with PeriodIndex, .values generates a new ndarray of period objects each time. […]

to_numpy aim to improve the consistency of the API, which is a major step in the right direction. .values will not be deprecated in the current version, but I expect this may happen at some point in the future, so I would urge users to migrate towards the newer API, as soon as you can.


Critique of Other Solutions

DataFrame.values has inconsistent behaviour, as already noted.

DataFrame.get_values() is simply a wrapper around DataFrame.values, so everything said above applies.

DataFrame.as_matrix() is deprecated now, do NOT use!


回答 2

注意.as_matrix()不建议使用此答案中的方法。熊猫0.23.4警告:

方法.as_matrix将在以后的版本中删除。请改用.values。


熊猫内置一些东西…

numpy_matrix = df.as_matrix()

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

Note: The .as_matrix() method used in this answer is deprecated. Pandas 0.23.4 warns:

Method .as_matrix will be removed in a future version. Use .values instead.


Pandas has something built in…

numpy_matrix = df.as_matrix()

gives

array([[nan, 0.2, nan],
       [nan, nan, 0.5],
       [nan, 0.2, 0.5],
       [0.1, 0.2, nan],
       [0.1, 0.2, 0.5],
       [0.1, nan, 0.5],
       [0.1, nan, nan]])

回答 3

我只需要链接DataFrame.reset_index()DataFrame.values函数来获得数据帧的Numpy表示,包括索引:

In [8]: df
Out[8]: 
          A         B         C
0 -0.982726  0.150726  0.691625
1  0.617297 -0.471879  0.505547
2  0.417123 -1.356803 -1.013499
3 -0.166363 -0.957758  1.178659
4 -0.164103  0.074516 -0.674325
5 -0.340169 -0.293698  1.231791
6 -1.062825  0.556273  1.508058
7  0.959610  0.247539  0.091333

[8 rows x 3 columns]

In [9]: df.reset_index().values
Out[9]:
array([[ 0.        , -0.98272574,  0.150726  ,  0.69162512],
       [ 1.        ,  0.61729734, -0.47187926,  0.50554728],
       [ 2.        ,  0.4171228 , -1.35680324, -1.01349922],
       [ 3.        , -0.16636303, -0.95775849,  1.17865945],
       [ 4.        , -0.16410334,  0.0745164 , -0.67432474],
       [ 5.        , -0.34016865, -0.29369841,  1.23179064],
       [ 6.        , -1.06282542,  0.55627285,  1.50805754],
       [ 7.        ,  0.95961001,  0.24753911,  0.09133339]])

为了获得dtypes,我们需要使用view将此ndarray转换为结构化数组:

In [10]: df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)])
Out[10]:
array([( 0, -0.98272574,  0.150726  ,  0.69162512),
       ( 1,  0.61729734, -0.47187926,  0.50554728),
       ( 2,  0.4171228 , -1.35680324, -1.01349922),
       ( 3, -0.16636303, -0.95775849,  1.17865945),
       ( 4, -0.16410334,  0.0745164 , -0.67432474),
       ( 5, -0.34016865, -0.29369841,  1.23179064),
       ( 6, -1.06282542,  0.55627285,  1.50805754),
       ( 7,  0.95961001,  0.24753911,  0.09133339),
       dtype=[('index', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

I would just chain the DataFrame.reset_index() and DataFrame.values functions to get the Numpy representation of the dataframe, including the index:

In [8]: df
Out[8]: 
          A         B         C
0 -0.982726  0.150726  0.691625
1  0.617297 -0.471879  0.505547
2  0.417123 -1.356803 -1.013499
3 -0.166363 -0.957758  1.178659
4 -0.164103  0.074516 -0.674325
5 -0.340169 -0.293698  1.231791
6 -1.062825  0.556273  1.508058
7  0.959610  0.247539  0.091333

[8 rows x 3 columns]

In [9]: df.reset_index().values
Out[9]:
array([[ 0.        , -0.98272574,  0.150726  ,  0.69162512],
       [ 1.        ,  0.61729734, -0.47187926,  0.50554728],
       [ 2.        ,  0.4171228 , -1.35680324, -1.01349922],
       [ 3.        , -0.16636303, -0.95775849,  1.17865945],
       [ 4.        , -0.16410334,  0.0745164 , -0.67432474],
       [ 5.        , -0.34016865, -0.29369841,  1.23179064],
       [ 6.        , -1.06282542,  0.55627285,  1.50805754],
       [ 7.        ,  0.95961001,  0.24753911,  0.09133339]])

To get the dtypes we’d need to transform this ndarray into a structured array using view:

In [10]: df.reset_index().values.ravel().view(dtype=[('index', int), ('A', float), ('B', float), ('C', float)])
Out[10]:
array([( 0, -0.98272574,  0.150726  ,  0.69162512),
       ( 1,  0.61729734, -0.47187926,  0.50554728),
       ( 2,  0.4171228 , -1.35680324, -1.01349922),
       ( 3, -0.16636303, -0.95775849,  1.17865945),
       ( 4, -0.16410334,  0.0745164 , -0.67432474),
       ( 5, -0.34016865, -0.29369841,  1.23179064),
       ( 6, -1.06282542,  0.55627285,  1.50805754),
       ( 7,  0.95961001,  0.24753911,  0.09133339),
       dtype=[('index', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

回答 4

您可以使用该to_records方法,但是如果dtypes不是您一开始就想要的,就必须多花点时间。就我而言,从字符串复制了DF,索引类型为字符串(以objectdtypes在pandas中表示):

In [102]: df
Out[102]: 
label    A    B    C
ID                  
1      NaN  0.2  NaN
2      NaN  NaN  0.5
3      NaN  0.2  0.5
4      0.1  0.2  NaN
5      0.1  0.2  0.5
6      0.1  NaN  0.5
7      0.1  NaN  NaN

In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]: 
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In [106]: df.to_records().dtype
Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

转换recarray dtype对我不起作用,但是已经可以在Pandas中做到这一点:

In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Out[111]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

请注意,Pandas ID在导出的记录数组中没有正确地将索引名设置为(错误?),因此我们可以从类型转换中受益,也可以对此进行更正。

目前,Pandas只有8个字节的整数i8,并且浮点数f8(请参阅本期)。

You can use the to_records method, but have to play around a bit with the dtypes if they are not what you want from the get go. In my case, having copied your DF from a string, the index type is string (represented by an object dtype in pandas):

In [102]: df
Out[102]: 
label    A    B    C
ID                  
1      NaN  0.2  NaN
2      NaN  NaN  0.5
3      NaN  0.2  0.5
4      0.1  0.2  NaN
5      0.1  0.2  0.5
6      0.1  NaN  0.5
7      0.1  NaN  NaN

In [103]: df.index.dtype
Out[103]: dtype('object')
In [104]: df.to_records()
Out[104]: 
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
In [106]: df.to_records().dtype
Out[106]: dtype([('index', '|O8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Converting the recarray dtype does not work for me, but one can do this in Pandas already:

In [109]: df.index = df.index.astype('i8')
In [111]: df.to_records().view([('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])
Out[111]:
rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

Note that Pandas does not set the name of the index properly (to ID) in the exported record array (a bug?), so we profit from the type conversion to also correct for that.

At the moment Pandas has only 8-byte integers, i8, and floats, f8 (see this issue).


回答 5

似乎df.to_records()会为您工作。您正在寻找的确切功能已被要求to_records指出作为替代。

我使用您的示例在本地进行了尝试,该调用产生的结果与您正在寻找的输出非常相似:

rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)],
      dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])

请注意,这是一个recarray而不是array。您可以通过将其构造函数调用为,将结果移入常规numpy数组np.array(df.to_records())

It seems like df.to_records() will work for you. The exact feature you’re looking for was requested and to_records pointed to as an alternative.

I tried this out locally using your example, and that call yields something very similar to the output you were looking for:

rec.array([(1, nan, 0.2, nan), (2, nan, nan, 0.5), (3, nan, 0.2, 0.5),
       (4, 0.1, 0.2, nan), (5, 0.1, 0.2, 0.5), (6, 0.1, nan, 0.5),
       (7, 0.1, nan, nan)],
      dtype=[(u'ID', '<i8'), (u'A', '<f8'), (u'B', '<f8'), (u'C', '<f8')])

Note that this is a recarray rather than an array. You could move the result in to regular numpy array by calling its constructor as np.array(df.to_records()).


回答 6

尝试这个:

a = numpy.asarray(df)

Try this:

a = numpy.asarray(df)

回答 7

这是我从pandas DataFrame制作结构数组的方法。

创建数据框

import pandas as pd
import numpy as np
import six

NaN = float('nan')
ID = [1, 2, 3, 4, 5, 6, 7]
A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
columns = {'A':A, 'B':B, 'C':C}
df = pd.DataFrame(columns, index=ID)
df.index.name = 'ID'
print(df)

      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

定义函数以从pandas DataFrame中创建一个numpy结构数组(而不是记录数组)。

def df_to_sarray(df):
    """
    Convert a pandas DataFrame object to a numpy structured array.
    This is functionally equivalent to but more efficient than
    np.array(df.to_array())

    :param df: the data frame to convert
    :return: a numpy structured array representation of df
    """

    v = df.values
    cols = df.columns

    if six.PY2:  # python 2 needs .encode() but 3 does not
        types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
    else:
        types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
    dtype = np.dtype(types)
    z = np.zeros(v.shape[0], dtype)
    for (i, k) in enumerate(z.dtype.names):
        z[k] = v[:, i]
    return z

使用reset_index使包括索引作为其数据的一部分,新的数据帧。将该数据帧转换为结构数组。

sa = df_to_sarray(df.reset_index())
sa

array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
       (4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
       (7L, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

编辑:更新df_to_sarray以避免错误调用.encode()与Python 3.感谢约瑟夫·加尔文宁静 为他们的意见和解决方案。

Here is my approach to making a structure array from a pandas DataFrame.

Create the data frame

import pandas as pd
import numpy as np
import six

NaN = float('nan')
ID = [1, 2, 3, 4, 5, 6, 7]
A = [NaN, NaN, NaN, 0.1, 0.1, 0.1, 0.1]
B = [0.2, NaN, 0.2, 0.2, 0.2, NaN, NaN]
C = [NaN, 0.5, 0.5, NaN, 0.5, 0.5, NaN]
columns = {'A':A, 'B':B, 'C':C}
df = pd.DataFrame(columns, index=ID)
df.index.name = 'ID'
print(df)

      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

Define function to make a numpy structure array (not a record array) from a pandas DataFrame.

def df_to_sarray(df):
    """
    Convert a pandas DataFrame object to a numpy structured array.
    This is functionally equivalent to but more efficient than
    np.array(df.to_array())

    :param df: the data frame to convert
    :return: a numpy structured array representation of df
    """

    v = df.values
    cols = df.columns

    if six.PY2:  # python 2 needs .encode() but 3 does not
        types = [(cols[i].encode(), df[k].dtype.type) for (i, k) in enumerate(cols)]
    else:
        types = [(cols[i], df[k].dtype.type) for (i, k) in enumerate(cols)]
    dtype = np.dtype(types)
    z = np.zeros(v.shape[0], dtype)
    for (i, k) in enumerate(z.dtype.names):
        z[k] = v[:, i]
    return z

Use reset_index to make a new data frame that includes the index as part of its data. Convert that data frame to a structure array.

sa = df_to_sarray(df.reset_index())
sa

array([(1L, nan, 0.2, nan), (2L, nan, nan, 0.5), (3L, nan, 0.2, 0.5),
       (4L, 0.1, 0.2, nan), (5L, 0.1, 0.2, 0.5), (6L, 0.1, nan, 0.5),
       (7L, 0.1, nan, nan)], 
      dtype=[('ID', '<i8'), ('A', '<f8'), ('B', '<f8'), ('C', '<f8')])

EDIT: Updated df_to_sarray to avoid error calling .encode() with python 3. Thanks to Joseph Garvin and halcyon for their comment and solution.


回答 8

将数据帧转换为其Numpy数组表示形式的两种方法。

  • mah_np_array = df.as_matrix(columns=None)

  • mah_np_array = df.values

Doc:https : //pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html

Two ways to convert the data-frame to its Numpy-array representation.

  • mah_np_array = df.as_matrix(columns=None)

  • mah_np_array = df.values

Doc: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.as_matrix.html


回答 9

示例DataFrame的更简单方法:

df

         gbm       nnet        reg
0  12.097439  12.047437  12.100953
1  12.109811  12.070209  12.095288
2  11.720734  11.622139  11.740523
3  11.824557  11.926414  11.926527
4  11.800868  11.727730  11.729737
5  12.490984  12.502440  12.530894

采用:

np.array(df.to_records().view(type=np.matrix))

得到:

array([[(0, 12.097439  , 12.047437, 12.10095324),
        (1, 12.10981081, 12.070209, 12.09528824),
        (2, 11.72073428, 11.622139, 11.74052253),
        (3, 11.82455653, 11.926414, 11.92652727),
        (4, 11.80086775, 11.72773 , 11.72973699),
        (5, 12.49098389, 12.50244 , 12.53089367)]],
dtype=(numpy.record, [('index', '<i8'), ('gbm', '<f8'), ('nnet', '<f4'),
       ('reg', '<f8')]))

A Simpler Way for Example DataFrame:

df

         gbm       nnet        reg
0  12.097439  12.047437  12.100953
1  12.109811  12.070209  12.095288
2  11.720734  11.622139  11.740523
3  11.824557  11.926414  11.926527
4  11.800868  11.727730  11.729737
5  12.490984  12.502440  12.530894

USE:

np.array(df.to_records().view(type=np.matrix))

GET:

array([[(0, 12.097439  , 12.047437, 12.10095324),
        (1, 12.10981081, 12.070209, 12.09528824),
        (2, 11.72073428, 11.622139, 11.74052253),
        (3, 11.82455653, 11.926414, 11.92652727),
        (4, 11.80086775, 11.72773 , 11.72973699),
        (5, 12.49098389, 12.50244 , 12.53089367)]],
dtype=(numpy.record, [('index', '<i8'), ('gbm', '<f8'), ('nnet', '<f4'),
       ('reg', '<f8')]))

回答 10

从数据框导出到arcgis表时遇到了类似的问题,偶然发现了来自usgs的解决方案(https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+to+ArcGIS+Table)。简而言之,您的问题具有类似的解决方案:

df

      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

np_data = np.array(np.rec.fromrecords(df.values))
np_names = df.dtypes.index.tolist()
np_data.dtype.names = tuple([name.encode('UTF8') for name in np_names])

np_data

array([( nan,  0.2,  nan), ( nan,  nan,  0.5), ( nan,  0.2,  0.5),
       ( 0.1,  0.2,  nan), ( 0.1,  0.2,  0.5), ( 0.1,  nan,  0.5),
       ( 0.1,  nan,  nan)], 
      dtype=(numpy.record, [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]))

Just had a similar problem when exporting from dataframe to arcgis table and stumbled on a solution from usgs (https://my.usgs.gov/confluence/display/cdi/pandas.DataFrame+to+ArcGIS+Table). In short your problem has a similar solution:

df

      A    B    C
ID               
1   NaN  0.2  NaN
2   NaN  NaN  0.5
3   NaN  0.2  0.5
4   0.1  0.2  NaN
5   0.1  0.2  0.5
6   0.1  NaN  0.5
7   0.1  NaN  NaN

np_data = np.array(np.rec.fromrecords(df.values))
np_names = df.dtypes.index.tolist()
np_data.dtype.names = tuple([name.encode('UTF8') for name in np_names])

np_data

array([( nan,  0.2,  nan), ( nan,  nan,  0.5), ( nan,  0.2,  0.5),
       ( 0.1,  0.2,  nan), ( 0.1,  0.2,  0.5), ( 0.1,  nan,  0.5),
       ( 0.1,  nan,  nan)], 
      dtype=(numpy.record, [('A', '<f8'), ('B', '<f8'), ('C', '<f8')]))

回答 11

我经历了以上答案。“ as_matrix() ”方法可以使用,但是现在已经过时了。对我来说,有效的是“ .to_numpy() ”。

这将返回一个多维数组。如果您要从excel工作表中读取数据,并且需要从任何索引访问数据,则我会更喜欢使用此方法。希望这可以帮助 :)

I went through the answers above. The “as_matrix()” method works but its obsolete now. For me, What worked was “.to_numpy()“.

This returns a multidimensional array. I’ll prefer using this method if you’re reading data from excel sheet and you need to access data from any index. Hope this helps :)


回答 12

除了陨石的答案,我找到了代码

df.index = df.index.astype('i8')

对我不起作用。因此,我将代码放在这里,以方便陷入此问题的其他人。

city_cluster_df = pd.read_csv(text_filepath, encoding='utf-8')
# the field 'city_en' is a string, when converted to Numpy array, it will be an object
city_cluster_arr = city_cluster_df[['city_en','lat','lon','cluster','cluster_filtered']].to_records()
descr=city_cluster_arr.dtype.descr
# change the field 'city_en' to string type (the index for 'city_en' here is 1 because before the field is the row index of dataframe)
descr[1]=(descr[1][0], "S20")
newArr=city_cluster_arr.astype(np.dtype(descr))

Further to meteore’s answer, I found the code

df.index = df.index.astype('i8')

doesn’t work for me. So I put my code here for the convenience of others stuck with this issue.

city_cluster_df = pd.read_csv(text_filepath, encoding='utf-8')
# the field 'city_en' is a string, when converted to Numpy array, it will be an object
city_cluster_arr = city_cluster_df[['city_en','lat','lon','cluster','cluster_filtered']].to_records()
descr=city_cluster_arr.dtype.descr
# change the field 'city_en' to string type (the index for 'city_en' here is 1 because before the field is the row index of dataframe)
descr[1]=(descr[1][0], "S20")
newArr=city_cluster_arr.astype(np.dtype(descr))

回答 13

将数据帧转换为numpy数组的简单方法:

import pandas as pd
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df_to_array = df.to_numpy()
array([[1, 3],
   [2, 4]])

鼓励使用to_numpy来保持一致性。

参考:https : //pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html

A simple way to convert dataframe to numpy array:

import pandas as pd
df = pd.DataFrame({"A": [1, 2], "B": [3, 4]})
df_to_array = df.to_numpy()
array([[1, 3],
   [2, 4]])

Use of to_numpy is encouraged to preserve consistency.

Reference: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.to_numpy.html


回答 14

尝试这个:

np.array(df) 

array([['ID', nan, nan, nan],
   ['1', nan, 0.2, nan],
   ['2', nan, nan, 0.5],
   ['3', nan, 0.2, 0.5],
   ['4', 0.1, 0.2, nan],
   ['5', 0.1, 0.2, 0.5],
   ['6', 0.1, nan, 0.5],
   ['7', 0.1, nan, nan]], dtype=object)

有关更多信息,请访问:[ https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html] 对numpy 1.16.5和pandas 0.25.2有效。

Try this:

np.array(df) 

array([['ID', nan, nan, nan],
   ['1', nan, 0.2, nan],
   ['2', nan, nan, 0.5],
   ['3', nan, 0.2, 0.5],
   ['4', 0.1, 0.2, nan],
   ['5', 0.1, 0.2, 0.5],
   ['6', 0.1, nan, 0.5],
   ['7', 0.1, nan, nan]], dtype=object)

Some more information at: [https://docs.scipy.org/doc/numpy/reference/generated/numpy.array.html] Valid for numpy 1.16.5 and pandas 0.25.2.


从Flask视图返回JSON响应

问题:从Flask视图返回JSON响应

我有一个函数,可使用Pandas分析CSV文件并生成带有摘要信息的字典。我想从Flask视图返回结果作为响应。如何返回JSON响应?

@app.route("/summary")
def summary():
    d = make_summary()
    # send it back as json

I have a function that analyzes a CSV file with Pandas and produces a dict with summary information. I want to return the results as a response from a Flask view. How do I return a JSON response?

@app.route("/summary")
def summary():
    d = make_summary()
    # send it back as json

回答 0

将摘要数据传递给该jsonify函数,该函数返回JSON响应。

from flask import jsonify

@app.route('/summary')
def summary():
    d = make_summary()
    return jsonify(d)

从Flask 0.11开始,您可以将任何JSON可序列化的类型(不仅是dict)传递为顶级对象。

Pass the summary data to the jsonify function, which returns a JSON response.

from flask import jsonify

@app.route('/summary')
def summary():
    d = make_summary()
    return jsonify(d)

As of Flask 0.11, you can pass any JSON-serializable type, not just dict, as the top level object.


回答 1

jsonify序列化您传递给JSON的数据。如果您想自己序列化数据,请jsonify使用status=200和构建响应以执行操作mimetype='application/json'

from flask import json

@app.route('/summary')
def summary():
    data = make_summary()
    response = app.response_class(
        response=json.dumps(data),
        status=200,
        mimetype='application/json'
    )
    return response

jsonify serializes the data you pass it to JSON. If you want to serialize the data yourself, do what jsonify does by building a response with status=200 and mimetype='application/json'.

from flask import json

@app.route('/summary')
def summary():
    data = make_summary()
    response = app.response_class(
        response=json.dumps(data),
        status=200,
        mimetype='application/json'
    )
    return response

回答 2

将关键字参数传递给flask.jsonify,它们将作为JSON对象输出。

@app.route('/_get_current_user')
def get_current_user():
    return jsonify(
        username=g.user.username,
        email=g.user.email,
        id=g.user.id
    )
{
    "username": "admin",
    "email": "admin@localhost",
    "id": 42
}

如果您已有字典,则可以将其直接传递为jsonify(d)

Pass keyword arguments to flask.jsonify and they will be output as a JSON object.

@app.route('/_get_current_user')
def get_current_user():
    return jsonify(
        username=g.user.username,
        email=g.user.email,
        id=g.user.id
    )
{
    "username": "admin",
    "email": "admin@localhost",
    "id": 42
}

If you already have a dict, you can pass it directly as jsonify(d).


回答 3

如果jsonify由于某种原因不想使用,则可以手动执行。调用flask.json.dumps以创建JSON数据,然后返回application/json内容类型的响应。

from flask import json

@app.route('/summary')
def summary():
    data = make_summary()
    response = app.response_class(
        response=json.dumps(data),
        mimetype='application/json'
    )
    return response

flask.json与内置json模块不同。simplejson如果可用,它将使用速度更快的模块,并与Flask应用进行各种集成。

If you don’t want to use jsonify for some reason, you can do what it does manually. Call flask.json.dumps to create JSON data, then return a response with the application/json content type.

from flask import json

@app.route('/summary')
def summary():
    data = make_summary()
    response = app.response_class(
        response=json.dumps(data),
        mimetype='application/json'
    )
    return response

flask.json is distinct from the built-in json module. It will use the faster simplejson module if available, and enables various integrations with your Flask app.


回答 4

如果您要分析用户上传的文件,则Flask 快速入门会显示如何从用户那里获取文件并进行访问。从中获取文件request.files并将其传递给摘要函数。

from flask import request, jsonify
from werkzeug import secure_filename

@app.route('/summary', methods=['GET', 'POST'])
def summary():
    if request.method == 'POST':
        csv = request.files['data']
        return jsonify(
            summary=make_summary(csv),
            csv_name=secure_filename(csv.filename)
        )

    return render_template('submit_data.html')

'data'键替换为request.filesHTML表单中输入的文件名。

If you want to analyze a file uploaded by the user, the Flask quickstart shows how to get files from users and access them. Get the file from request.files and pass it to the summary function.

from flask import request, jsonify
from werkzeug import secure_filename

@app.route('/summary', methods=['GET', 'POST'])
def summary():
    if request.method == 'POST':
        csv = request.files['data']
        return jsonify(
            summary=make_summary(csv),
            csv_name=secure_filename(csv.filename)
        )

    return render_template('submit_data.html')

Replace the 'data' key for request.files with the name of the file input in your HTML form.


回答 5

要返回JSON响应并设置状态代码,您可以使用make_response

from flask import jsonify, make_response

@app.route('/summary')
def summary():
    d = make_summary()
    return make_response(jsonify(d), 200)

Flask问题跟踪器中此注释的启发。

To return a JSON response and set a status code you can use make_response:

from flask import jsonify, make_response

@app.route('/summary')
def summary():
    d = make_summary()
    return make_response(jsonify(d), 200)

Inspiration taken from this comment in the Flask issue tracker.


回答 6

从版本1.1.0 Flask开始,如果视图返回dict,它将被转换为JSON response

@app.route("/users", methods=['GET'])
def get_user():
    return {
        "user": "John Doe",
    }

As of version 1.1.0 Flask, if a view returns a dict it will be turned into a JSON response.

@app.route("/users", methods=['GET'])
def get_user():
    return {
        "user": "John Doe",
    }

回答 7

我使用装饰器返回的结果jsonfiy。我认为当视图具有多个返回值时,它更具可读性。这不支持返回类似的元组content, status,但是我将使用返回app.errorhandler来处理错误状态。

import functools
from flask import jsonify

def return_json(f):
    @functools.wraps(f)
    def inner(**kwargs):
        return jsonify(f(**kwargs))

    return inner

@app.route('/test/<arg>')
@return_json
def test(arg):
    if arg == 'list':
        return [1, 2, 3]
    elif arg == 'dict':
        return {'a': 1, 'b': 2}
    elif arg == 'bool':
        return True
    return 'none of them'

I use a decorator to return the result of jsonfiy. I think it is more readable when a view has multiple returns. This does not support returning a tuple like content, status, but I handle returning error statuses with app.errorhandler instead.

import functools
from flask import jsonify

def return_json(f):
    @functools.wraps(f)
    def inner(**kwargs):
        return jsonify(f(**kwargs))

    return inner

@app.route('/test/<arg>')
@return_json
def test(arg):
    if arg == 'list':
        return [1, 2, 3]
    elif arg == 'dict':
        return {'a': 1, 'b': 2}
    elif arg == 'bool':
        return True
    return 'none of them'

回答 8

在Flask 0.11之前,jsonfiy不允许直接返回数组。而是将列表作为关键字参数传递。

@app.route('/get_records')
def get_records():
    results = [
        {
          "rec_create_date": "12 Jun 2016",
          "rec_dietary_info": "nothing",
          "rec_dob": "01 Apr 1988",
          "rec_first_name": "New",
          "rec_last_name": "Guy",
        },
        {
          "rec_create_date": "1 Apr 2016",
          "rec_dietary_info": "Nut allergy",
          "rec_dob": "01 Feb 1988",
          "rec_first_name": "Old",
          "rec_last_name": "Guy",
        },
    ]
    return jsonify(results=list)

Prior to Flask 0.11, jsonfiy would not allow returning an array directly. Instead, pass the list as a keyword argument.

@app.route('/get_records')
def get_records():
    results = [
        {
          "rec_create_date": "12 Jun 2016",
          "rec_dietary_info": "nothing",
          "rec_dob": "01 Apr 1988",
          "rec_first_name": "New",
          "rec_last_name": "Guy",
        },
        {
          "rec_create_date": "1 Apr 2016",
          "rec_dietary_info": "Nut allergy",
          "rec_dob": "01 Feb 1988",
          "rec_first_name": "Old",
          "rec_last_name": "Guy",
        },
    ]
    return jsonify(results=list)

回答 9

在Flask 1.1中,如果返回字典,它将自动转换为JSON。因此,如果make_summary()返回字典,您可以

from flask import Flask

app = Flask(__name__)

@app.route('/summary')
def summary():
    d = make_summary()
    return d

要求包含状态码SO已作为与此副本的副本被关闭。因此,要回答该问题,您可以通过返回形式的元组来包含状态代码(dict, int)。将dict转换为JSON,int将HTTP状态代码。没有任何输入,状态是默认的200。因此在上面的示例中,代码将是200。在下面的示例中,其代码更改为201。

from flask import Flask

app = Flask(__name__)

@app.route('/summary')
def summary():
    d = make_summary()
    return d, 201  # 200 is the default

您可以使用以下方法检查状态码

curl --request GET "http://127.0.0.1:5000/summary" -w "\ncode: %{http_code}\n\n"

In Flask 1.1, if you return a dictionary and it will automatically be converted into JSON. So if make_summary() returns a dictionary, you can

from flask import Flask

app = Flask(__name__)

@app.route('/summary')
def summary():
    d = make_summary()
    return d

The SO that asks about including the status code was closed as a duplicate to this one. So to also answer that question, you can include the status code by returning a tuple of the form (dict, int). The dict is converted to JSON and the int will be the HTTP Status Code. Without any input, the Status is the default 200. So in the above example the code would be 200. In the example below it is changed to 201.

from flask import Flask

app = Flask(__name__)

@app.route('/summary')
def summary():
    d = make_summary()
    return d, 201  # 200 is the default

You can check the status code using

curl --request GET "http://127.0.0.1:5000/summary" -w "\ncode: %{http_code}\n\n"

回答 10

如果是字典,则flask可以直接将其返回(版本1.0.2)

def summary():
    d = make_summary()
    return d, 200

if its a dict, flask can return it directly (Version 1.0.2)

def summary():
    d = make_summary()
    return d, 200

回答 11

“”“ 使用Flask基于类的视图 ”“”

from flask import Flask, request, jsonify

from flask.views import MethodView

app = Flask(**__name__**)

app.add_url_rule('/summary/', view_func=Summary.as_view('summary'))

class Summary(MethodView):

    def __init__(self):
        self.response = dict()

    def get(self):
        self.response['summary'] = make_summary()  # make_summary is a method to calculate the summary.
        return jsonify(self.response)

“”” Using Flask Class-base View “””

from flask import Flask, request, jsonify

from flask.views import MethodView

app = Flask(**__name__**)

app.add_url_rule('/summary/', view_func=Summary.as_view('summary'))

class Summary(MethodView):

    def __init__(self):
        self.response = dict()

    def get(self):
        self.response['summary'] = make_summary()  # make_summary is a method to calculate the summary.
        return jsonify(self.response)

回答 12

烧瓶1.1.x

现在Flask支持请求直接通过json返回,不再需要jsonify

@app.route("/")
def index():
    return {
        "api_stuff": "values",
    }

相当于

@app.route("/")
def index():
    return jsonify({
        "api_stuff": "values",
    })

有关更多信息,请在此处阅读https://medium.com/octopus-wealth/returning-json-from-flask-cf4ce6fe9aebhttps://github.com/pallets/flask/pull/3111

Flask 1.1.x

now Flask support request return with json directly, jsonify not required anymore

@app.route("/")
def index():
    return {
        "api_stuff": "values",
    }

is equivalent to

@app.route("/")
def index():
    return jsonify({
        "api_stuff": "values",
    })

for more information read here https://medium.com/octopus-wealth/returning-json-from-flask-cf4ce6fe9aeb and https://github.com/pallets/flask/pull/3111


使用列表上的max()/ min()获取返回的最大或最小项目的索引

问题:使用列表上的max()/ min()获取返回的最大或最小项目的索引

我正在使用列表中的Python maxmin函数来执行minimax算法,并且需要由max()或返回的值的索引min()。换句话说,我需要知道哪个移动产生了最大(第一玩家回合)或最小(第二玩家)值。

for i in range(9):
    newBoard = currentBoard.newBoardWithMove([i / 3, i % 3], player)

    if newBoard:
        temp = minMax(newBoard, depth + 1, not isMinLevel)  
        values.append(temp)

if isMinLevel:
    return min(values)
else:
    return max(values)

我需要能够返回最小值或最大值的实际索引,而不仅仅是返回值。

I’m using Python’s max and min functions on lists for a minimax algorithm, and I need the index of the value returned by max() or min(). In other words, I need to know which move produced the max (at a first player’s turn) or min (second player) value.

for i in range(9):
    newBoard = currentBoard.newBoardWithMove([i / 3, i % 3], player)

    if newBoard:
        temp = minMax(newBoard, depth + 1, not isMinLevel)  
        values.append(temp)

if isMinLevel:
    return min(values)
else:
    return max(values)

I need to be able to return the actual index of the min or max value, not just the value.


回答 0

如果isMinLevel:
    返回values.index(min(values))
其他:
    返回values.index(max(values))
if isMinLevel:
    return values.index(min(values))
else:
    return values.index(max(values))

回答 1

假设您有一个list values = [3,6,1,5],并且需要最小元素的索引,即index_min = 2在这种情况下。

避免itemgetter()使用其他答案中提出的解决方案,而改用

index_min = min(range(len(values)), key=values.__getitem__)

因为它不需要import operator使用enumerate,也总是比使用解决方案更快(下面的基准)itemgetter()

如果您正在处理numpy数组或可以numpy作为依赖提供,请考虑同时使用

import numpy as np
index_min = np.argmin(values)

即使在以下情况下将其应用于纯Python列表,也将比第一个解决方案更快。

  • 它大于几个元素(我的机器上大约2 ** 4个元素)
  • 您可以提供从纯列表到numpy数组的内存副本

正如该基准所指出的: 在此处输入图片说明

我已经在我的机器上使用python 2.7运行了基准测试,用于上述两个解决方案(蓝色:纯python,第一个解决方案)(红色,numpy解决方案)以及基于的标准解决方案itemgetter()(黑色,参考解决方案)。与python 3.5相同的基准测试表明,这些方法与上述python 2.7情况完全相同

Say that you have a list values = [3,6,1,5], and need the index of the smallest element, i.e. index_min = 2 in this case.

Avoid the solution with itemgetter() presented in the other answers, and use instead

index_min = min(range(len(values)), key=values.__getitem__)

because it doesn’t require to import operator nor to use enumerate, and it is always faster(benchmark below) than a solution using itemgetter().

If you are dealing with numpy arrays or can afford numpy as a dependency, consider also using

import numpy as np
index_min = np.argmin(values)

This will be faster than the first solution even if you apply it to a pure Python list if:

  • it is larger than a few elements (about 2**4 elements on my machine)
  • you can afford the memory copy from a pure list to a numpy array

as this benchmark points out: enter image description here

I have run the benchmark on my machine with python 2.7 for the two solutions above (blue: pure python, first solution) (red, numpy solution) and for the standard solution based on itemgetter() (black, reference solution). The same benchmark with python 3.5 showed that the methods compare exactly the same of the python 2.7 case presented above


回答 2

如果您枚举列表中的项目,则可以同时找到min / max索引和值,但是要对列表的原始值执行min / max。像这样:

import operator
min_index, min_value = min(enumerate(values), key=operator.itemgetter(1))
max_index, max_value = max(enumerate(values), key=operator.itemgetter(1))

这样,列表将只遍历一次最小值(或最大值)。

You can find the min/max index and value at the same time if you enumerate the items in the list, but perform min/max on the original values of the list. Like so:

import operator
min_index, min_value = min(enumerate(values), key=operator.itemgetter(1))
max_index, max_value = max(enumerate(values), key=operator.itemgetter(1))

This way the list will only be traversed once for min (or max).


回答 3

如果要在数字列表中查找max的索引(这似乎是您的情况),那么建议您使用numpy:

import numpy as np
ind = np.argmax(mylist)

If you want to find the index of max within a list of numbers (which seems your case), then I suggest you use numpy:

import numpy as np
ind = np.argmax(mylist)

回答 4

可能更简单的解决方案是将值的数组转换为值,索引对的数组,并取其最大值/最小值。这将给出具有最大值/最小值的最大/最小索引(即,通过首先比较第一个元素,然后比较第二个元素(如果第一个元素相同,则比较第二对元素))。注意,实际上不需要创建数组,因为最小/最大允许生成器作为输入。

values = [3,4,5]
(m,i) = max((v,i) for i,v in enumerate(values))
print (m,i) #(5, 2)

Possibly a simpler solution would be to turn the array of values into an array of value,index-pairs, and take the max/min of that. This would give the largest/smallest index that has the max/min (i.e. pairs are compared by first comparing the first element, and then comparing the second element if the first ones are the same). Note that it’s not necessary to actually create the array, because min/max allow generators as input.

values = [3,4,5]
(m,i) = max((v,i) for i,v in enumerate(values))
print (m,i) #(5, 2)

回答 5

list=[1.1412, 4.3453, 5.8709, 0.1314]
list.index(min(list))

将给您第一个最小值的索引。

list=[1.1412, 4.3453, 5.8709, 0.1314]
list.index(min(list))

Will give you first index of minimum.


回答 6

我认为最好的办法是将列表转换为a numpy array并使用以下功能:

a = np.array(list)
idx = np.argmax(a)

I think the best thing to do is convert the list to a numpy array and use this function :

a = np.array(list)
idx = np.argmax(a)

回答 7

我对此也很感兴趣,并使用perfplot比较了一些建议的解决方案(我的一个宠物项目)。

原来那是numpy的argmin

numpy.argmin(x)

即使足够大的列表(从输入list到a 的隐式转换),它也是最快的方法numpy.array

在此处输入图片说明


生成绘图的代码:

import numpy
import operator
import perfplot


def min_enumerate(a):
    return min(enumerate(a), key=lambda x: x[1])[0]


def min_enumerate_itemgetter(a):
    min_index, min_value = min(enumerate(a), key=operator.itemgetter(1))
    return min_index


def getitem(a):
    return min(range(len(a)), key=a.__getitem__)


def np_argmin(a):
    return numpy.argmin(a)


perfplot.show(
    setup=lambda n: numpy.random.rand(n).tolist(),
    kernels=[
        min_enumerate,
        min_enumerate_itemgetter,
        getitem,
        np_argmin,
        ],
    n_range=[2**k for k in range(15)],
    logx=True,
    logy=True,
    )

I was also interested in this and compared some of the suggested solutions using perfplot (a pet project of mine).

Turns out that numpy’s argmin,

numpy.argmin(x)

is the fastest method for large enough lists, even with the implicit conversion from the input list to a numpy.array.

enter image description here


Code for generating the plot:

import numpy
import operator
import perfplot


def min_enumerate(a):
    return min(enumerate(a), key=lambda x: x[1])[0]


def min_enumerate_itemgetter(a):
    min_index, min_value = min(enumerate(a), key=operator.itemgetter(1))
    return min_index


def getitem(a):
    return min(range(len(a)), key=a.__getitem__)


def np_argmin(a):
    return numpy.argmin(a)


perfplot.show(
    setup=lambda n: numpy.random.rand(n).tolist(),
    kernels=[
        min_enumerate,
        min_enumerate_itemgetter,
        getitem,
        np_argmin,
        ],
    n_range=[2**k for k in range(15)],
    logx=True,
    logy=True,
    )

回答 8

使用一个numpy数组和argmax()函数

 a=np.array([1,2,3])
 b=np.argmax(a)
 print(b) #2

Use a numpy array and the argmax() function

 a=np.array([1,2,3])
 b=np.argmax(a)
 print(b) #2

回答 9

获得最大值后,请尝试以下操作:

max_val = max(list)
index_max = list.index(max_val)

比很多选择要简单得多。

After you get the maximum values, try this:

max_val = max(list)
index_max = list.index(max_val)

Much simpler than a lot of options.


回答 10

我认为以上答案解决了您的问题,但我想我会分享一种方法,该方法可以为您提供最小值以及所有出现在其中的索引。

minval = min(mylist)
ind = [i for i, v in enumerate(mylist) if v == minval]

这两次通过了列表,但仍然非常快。但是,这比找到最小值的第一次遇到的索引要慢一些。因此,如果您仅需要一个最小值,请使用Matt Anderson的解决方案,如果您需要全部解决方案,请使用此解决方案。

I think the answer above solves your problem but I thought I’d share a method that gives you the minimum and all the indices the minimum appears in.

minval = min(mylist)
ind = [i for i, v in enumerate(mylist) if v == minval]

This passes the list twice but is still quite fast. It is however slightly slower than finding the index of the first encounter of the minimum. So if you need just one of the minima, use Matt Anderson‘s solution, if you need them all, use this.


回答 11

使用numpy模块的函数numpy.where

import numpy as n
x = n.array((3,3,4,7,4,56,65,1))

对于最小值索引:

idx = n.where(x==x.min())[0]

对于最大值索引:

idx = n.where(x==x.max())[0]

实际上,此功能更强大。您可以构成各种布尔运算对于3到60之间的值的索引:

idx = n.where((x>3)&(x<60))[0]
idx
array([2, 3, 4, 5])
x[idx]
array([ 4,  7,  4, 56])

Use numpy module’s function numpy.where

import numpy as n
x = n.array((3,3,4,7,4,56,65,1))

For index of minimum value:

idx = n.where(x==x.min())[0]

For index of maximum value:

idx = n.where(x==x.max())[0]

In fact, this function is much more powerful. You can pose all kinds of boolean operations For index of value between 3 and 60:

idx = n.where((x>3)&(x<60))[0]
idx
array([2, 3, 4, 5])
x[idx]
array([ 4,  7,  4, 56])

回答 12

使用内置enumerate()max()函数以及函数的可选key参数max()和简单的lambda表达式,就可以轻松实现:

theList = [1, 5, 10]
maxIndex, maxValue = max(enumerate(theList), key=lambda v: v[1])
# => (2, 10)

在文档中max()说,该key参数需要一个类似于函数中的list.sort()函数。另请参阅“ 排序方法”

的工作原理相同min()。顺便说一句,它返回第一个最大值/最小值。

This is simply possible using the built-in enumerate() and max() function and the optional key argument of the max() function and a simple lambda expression:

theList = [1, 5, 10]
maxIndex, maxValue = max(enumerate(theList), key=lambda v: v[1])
# => (2, 10)

In the docs for max() it says that the key argument expects a function like in the list.sort() function. Also see the Sorting How To.

It works the same for min(). Btw it returns the first max/min value.


回答 13

假设您有一个清单,例如:

a = [9,8,7]

以下两种方法是使用最小元素及其索引获取元组的非常紧凑的方法。两者都相似时间来处理。我更喜欢zip方法,但这就是我的口味。

拉链方式

element, index = min(list(zip(a, range(len(a)))))

min(list(zip(a, range(len(a)))))
(7, 2)

timeit min(list(zip(a, range(len(a)))))
1.36 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

列举方法

index, element = min(list(enumerate(a)), key=lambda x:x[1])

min(list(enumerate(a)), key=lambda x:x[1])
(2, 7)

timeit min(list(enumerate(a)), key=lambda x:x[1])
1.45 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

Say you have a list such as:

a = [9,8,7]

The following two methods are pretty compact ways to get a tuple with the minimum element and its index. Both take a similar time to process. I better like the zip method, but that is my taste.

zip method

element, index = min(list(zip(a, range(len(a)))))

min(list(zip(a, range(len(a)))))
(7, 2)

timeit min(list(zip(a, range(len(a)))))
1.36 µs ± 107 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

enumerate method

index, element = min(list(enumerate(a)), key=lambda x:x[1])

min(list(enumerate(a)), key=lambda x:x[1])
(2, 7)

timeit min(list(enumerate(a)), key=lambda x:x[1])
1.45 µs ± 78.1 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

回答 14

只要您知道如何使用lambda和“ key”参数,一个简单的解决方案就是:

max_index = max( range( len(my_list) ), key = lambda index : my_list[ index ] )

As long as you know how to use lambda and the “key” argument, a simple solution is:

max_index = max( range( len(my_list) ), key = lambda index : my_list[ index ] )

回答 15

就那么简单 :

stuff = [2, 4, 8, 15, 11]

index = stuff.index(max(stuff))

Simple as that :

stuff = [2, 4, 8, 15, 11]

index = stuff.index(max(stuff))

回答 16

为什么要先添加索引然后再颠倒索引呢?Enumerate()函数只是zip()函数用法的一种特殊情况。让我们以适当的方式使用它:

my_indexed_list = zip(my_list, range(len(my_list)))

min_value, min_index = min(my_indexed_list)
max_value, max_index = max(my_indexed_list)

Why bother to add indices first and then reverse them? Enumerate() function is just a special case of zip() function usage. Let’s use it in appropiate way:

my_indexed_list = zip(my_list, range(len(my_list)))

min_value, min_index = min(my_indexed_list)
max_value, max_index = max(my_indexed_list)

回答 17

只是已经说过的一小部分。 values.index(min(values))似乎返回的最小值的最小值。以下是最大的索引:

    values.reverse()
    (values.index(min(values)) + len(values) - 1) % len(values)
    values.reverse()

如果原地反转的副作用无关紧要,则可以省略最后一行。

遍历所有事件

    indices = []
    i = -1
    for _ in range(values.count(min(values))):
      i = values[i + 1:].index(min(values)) + i + 1
      indices.append(i)

为了简洁起见。将其缓存min(values), values.count(min)在循环外可能是一个更好的主意。

Just a minor addition to what has already been said. values.index(min(values)) seems to return the smallest index of min. The following gets the largest index:

    values.reverse()
    (values.index(min(values)) + len(values) - 1) % len(values)
    values.reverse()

The last line can be left out if the side effect of reversing in place does not matter.

To iterate through all occurrences

    indices = []
    i = -1
    for _ in range(values.count(min(values))):
      i = values[i + 1:].index(min(values)) + i + 1
      indices.append(i)

For the sake of brevity. It is probably a better idea to cache min(values), values.count(min) outside the loop.


回答 18

如果您不想导入其他模块,则可以使用一种简单的方法在列表中查找值最小的索引:

min_value = min(values)
indexes_with_min_value = [i for i in range(0,len(values)) if values[i] == min_value]

然后选择第一个:

choosen = indexes_with_min_value[0]

A simple way for finding the indexes with minimal value in a list if you don’t want to import additional modules:

min_value = min(values)
indexes_with_min_value = [i for i in range(0,len(values)) if values[i] == min_value]

Then choose for example the first one:

choosen = indexes_with_min_value[0]

回答 19

没有足够高的代表评论现有答案。

但对于https://stackoverflow.com/a/11825864/3920439回答

这适用于整数,但不适用于浮点数数组(至少在python 3.6中),它将引发 TypeError: list indices must be integers or slices, not float

Dont have high enough rep to comment on existing answer.

But for https://stackoverflow.com/a/11825864/3920439 answer

This works for integers, but does not work for array of floats (at least in python 3.6) It will raise TypeError: list indices must be integers or slices, not float


回答 20

https://docs.python.org/3/library/functions.html#max

如果有多个最大项,则该函数返回遇到的第一个项。这与其他排序稳定性保持工具(例如sorted(iterable, key=keyfunc, reverse=True)[0]

要获得的不仅仅是第一个,请使用sort方法。

import operator

x = [2, 5, 7, 4, 8, 2, 6, 1, 7, 1, 8, 3, 4, 9, 3, 6, 5, 0, 9, 0]

min = False
max = True

min_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = min )

max_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = max )


min_val_index[0]
>(0, 17)

max_val_index[0]
>(9, 13)

import ittertools

max_val = max_val_index[0][0]

maxes = [n for n in itertools.takewhile(lambda x: x[0] == max_val, max_val_index)]

https://docs.python.org/3/library/functions.html#max

If multiple items are maximal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as sorted(iterable, key=keyfunc, reverse=True)[0]

To get more than just the first use the sort method.

import operator

x = [2, 5, 7, 4, 8, 2, 6, 1, 7, 1, 8, 3, 4, 9, 3, 6, 5, 0, 9, 0]

min = False
max = True

min_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = min )

max_val_index = sorted( list(zip(x, range(len(x)))), key = operator.itemgetter(0), reverse = max )


min_val_index[0]
>(0, 17)

max_val_index[0]
>(9, 13)

import ittertools

max_val = max_val_index[0][0]

maxes = [n for n in itertools.takewhile(lambda x: x[0] == max_val, max_val_index)]

回答 21

那这个呢:

a=[1,55,2,36,35,34,98,0]
max_index=dict(zip(a,range(len(a))))[max(a)]

它从in中的项a作为键创建其字典,并将其索引作为值创建索引,从而dict(zip(a,range(len(a))))[max(a)]返回与键对应的值,即max(a)a中最大值的索引。我是python的初学者,所以我不知道该解决方案的计算复杂性。

What about this:

a=[1,55,2,36,35,34,98,0]
max_index=dict(zip(a,range(len(a))))[max(a)]

It creates a dictionary from the items in a as keys and their indexes as values, thus dict(zip(a,range(len(a))))[max(a)] returns the value that corresponds to the key max(a) which is the index of the maximum in a. I’m a beginner in python so I don’t know about the computational complexity of this solution.