分类目录归档:知识问答

显示正在运行的Python应用程序的堆栈跟踪

问题:显示正在运行的Python应用程序的堆栈跟踪

我有这个Python应用程序,它有时会卡住,我找不到位置。

有什么方法可以让Python解释器向您显示正在运行的确切代码吗?

某种动态堆栈跟踪?

相关问题:

I have this Python application that gets stuck from time to time and I can’t find out where.

Is there any way to signal Python interpreter to show you the exact code that’s running?

Some kind of on-the-fly stacktrace?

Related questions:


回答 0

我有用于以下情况的模块-进程将长时间运行,但有时由于未知且不可复制的原因而卡住。它有点hacky,并且只能在unix上运行(需要信号):

import code, traceback, signal

def debug(sig, frame):
    """Interrupt running process, and provide a python prompt for
    interactive debugging."""
    d={'_frame':frame}         # Allow access to frame object.
    d.update(frame.f_globals)  # Unless shadowed by global
    d.update(frame.f_locals)

    i = code.InteractiveConsole(d)
    message  = "Signal received : entering python shell.\nTraceback:\n"
    message += ''.join(traceback.format_stack(frame))
    i.interact(message)

def listen():
    signal.signal(signal.SIGUSR1, debug)  # Register handler

要使用它,只需在程序启动时在某个时候调用listen()函数(您甚至可以将其粘贴在site.py中,以使所有python程序都使用它),然后使其运行。在任何时候,使用kill或在python中向进程发送SIGUSR1信号:

    os.kill(pid, signal.SIGUSR1)

这将导致程序在当前位置中断到python控制台,向您显示堆栈跟踪,并允许您操作变量。使用control-d(EOF)继续运行(尽管请注意,您可能会在发出信号的那一刻中断任何I / O等,因此它并不是完全非侵入式的。

我有另一个执行相同功能的脚本,除了它通过管道与正在运行的进程通信(允许调试后台进程等)。它在这里发布有点大,但我已将其添加为python食谱

I have module I use for situations like this – where a process will be running for a long time but gets stuck sometimes for unknown and irreproducible reasons. Its a bit hacky, and only works on unix (requires signals):

import code, traceback, signal

def debug(sig, frame):
    """Interrupt running process, and provide a python prompt for
    interactive debugging."""
    d={'_frame':frame}         # Allow access to frame object.
    d.update(frame.f_globals)  # Unless shadowed by global
    d.update(frame.f_locals)

    i = code.InteractiveConsole(d)
    message  = "Signal received : entering python shell.\nTraceback:\n"
    message += ''.join(traceback.format_stack(frame))
    i.interact(message)

def listen():
    signal.signal(signal.SIGUSR1, debug)  # Register handler

To use, just call the listen() function at some point when your program starts up (You could even stick it in site.py to have all python programs use it), and let it run. At any point, send the process a SIGUSR1 signal, using kill, or in python:

    os.kill(pid, signal.SIGUSR1)

This will cause the program to break to a python console at the point it is currently at, showing you the stack trace, and letting you manipulate the variables. Use control-d (EOF) to continue running (though note that you will probably interrupt any I/O etc at the point you signal, so it isn’t fully non-intrusive.

I’ve another script that does the same thing, except it communicates with the running process through a pipe (to allow for debugging backgrounded processes etc). Its a bit large to post here, but I’ve added it as a python cookbook recipe.


回答 1

安装信号处理程序的建议是一个不错的建议,我经常使用它。例如,默认情况下,bzr安装一个SIGQUIT处理程序,该处理程序pdb.set_trace()将立即调用以将您放入pdb提示符。(有关详细信息,请参见bzrlib.breakin模块的源代码。)使用pdb,您不仅可以获取当前的堆栈跟踪信息,还可以检查变量等。

但是,有时我需要调试一个没有先见之明的进程来安装信号处理程序。在linux上,您可以将gdb附加到该进程并获取带有某些gdb宏的python堆栈跟踪。将http://svn.python.org/projects/python/trunk/Misc/gdbinit放在中~/.gdbinit,然后:

  • 附加gdb: gdb -p PID
  • 获取python堆栈跟踪: pystack

不幸的是,它并不是完全可靠的,但是大多数情况下它都可以工作。

最后,附加strace通常可以使您很好地了解流程在做什么。

The suggestion to install a signal handler is a good one, and I use it a lot. For example, bzr by default installs a SIGQUIT handler that invokes pdb.set_trace() to immediately drop you into a pdb prompt. (See the bzrlib.breakin module’s source for the exact details.) With pdb you can not only get the current stack trace but also inspect variables, etc.

However, sometimes I need to debug a process that I didn’t have the foresight to install the signal handler in. On linux, you can attach gdb to the process and get a python stack trace with some gdb macros. Put http://svn.python.org/projects/python/trunk/Misc/gdbinit in ~/.gdbinit, then:

  • Attach gdb: gdb -p PID
  • Get the python stack trace: pystack

It’s not totally reliable unfortunately, but it works most of the time.

Finally, attaching strace can often give you a good idea what a process is doing.


回答 2

我几乎总是与多个线程打交道,而主线程通常不会做很多事情,所以最有趣的是转储所有堆栈(这更像Java的转储)。这是基于此博客的实现:

import threading, sys, traceback

def dumpstacks(signal, frame):
    id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
    code = []
    for threadId, stack in sys._current_frames().items():
        code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""), threadId))
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
            if line:
                code.append("  %s" % (line.strip()))
    print "\n".join(code)

import signal
signal.signal(signal.SIGQUIT, dumpstacks)

I am almost always dealing with multiple threads and main thread is generally not doing much, so what is most interesting is to dump all the stacks (which is more like the Java’s dump). Here is an implementation based on this blog:

import threading, sys, traceback

def dumpstacks(signal, frame):
    id2name = dict([(th.ident, th.name) for th in threading.enumerate()])
    code = []
    for threadId, stack in sys._current_frames().items():
        code.append("\n# Thread: %s(%d)" % (id2name.get(threadId,""), threadId))
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
            if line:
                code.append("  %s" % (line.strip()))
    print "\n".join(code)

import signal
signal.signal(signal.SIGQUIT, dumpstacks)

回答 3

可以使用pyrasite来获取未经准备的 python程序的堆栈跟踪,并在没有调试符号的情况下在现有的python中运行。在Ubuntu Trusty上对我来说就像是一种魅力:

$ sudo pip install pyrasite
$ echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
$ sudo pyrasite 16262 dump_stacks.py # dumps stacks to stdout/stderr of the python program

(有关@Albert的提示,以及其他工具,其答案中包含指向此的指针。)

Getting a stack trace of an unprepared python program, running in a stock python without debugging symbols can be done with pyrasite. Worked like a charm for me in on Ubuntu Trusty:

$ sudo pip install pyrasite
$ echo 0 | sudo tee /proc/sys/kernel/yama/ptrace_scope
$ sudo pyrasite 16262 dump_stacks.py # dumps stacks to stdout/stderr of the python program

(Hat tip to @Albert, whose answer contained a pointer to this, among other tools.)


回答 4

>>> import traceback
>>> def x():
>>>    print traceback.extract_stack()

>>> x()
[('<stdin>', 1, '<module>', None), ('<stdin>', 2, 'x', None)]

您还可以很好地格式化堆栈跟踪,请参阅docs

编辑:要模拟Java的行为,如@Douglas Leeder所建议,请添加以下内容:

import signal
import traceback

signal.signal(signal.SIGUSR1, lambda sig, stack: traceback.print_stack(stack))

应用程序中的启动代码。然后,您可以通过发送SIGUSR1到正在运行的Python进程来打印堆栈。

>>> import traceback
>>> def x():
>>>    print traceback.extract_stack()

>>> x()
[('<stdin>', 1, '<module>', None), ('<stdin>', 2, 'x', None)]

You can also nicely format the stack trace, see the docs.

Edit: To simulate Java’s behavior, as suggested by @Douglas Leeder, add this:

import signal
import traceback

signal.signal(signal.SIGUSR1, lambda sig, stack: traceback.print_stack(stack))

to the startup code in your application. Then you can print the stack by sending SIGUSR1 to the running Python process.


回答 5

回溯模块有一些不错的功能,其中包括:print_stack:

import traceback

traceback.print_stack()

The traceback module has some nice functions, among them: print_stack:

import traceback

traceback.print_stack()

回答 6

您可以尝试使用Faulthandler模​​块。使用安装pip install faulthandler并添加:

import faulthandler, signal
faulthandler.register(signal.SIGUSR1)

在程序开始时。然后将SIGUSR1发送到您的进程(例如:)以kill -USR1 42显示所有线程到标准输出的Python追溯。阅读文档以了解更多选项(例如:登录文件)和其他显示回溯的方式。

该模块现在是Python 3.3的一部分。对于Python 2,请参见http://faulthandler.readthedocs.org/

You can try the faulthandler module. Install it using pip install faulthandler and add:

import faulthandler, signal
faulthandler.register(signal.SIGUSR1)

at the beginning of your program. Then send SIGUSR1 to your process (ex: kill -USR1 42) to display the Python traceback of all threads to the standard output. Read the documentation for more options (ex: log into a file) and other ways to display the traceback.

The module is now part of Python 3.3. For Python 2, see http://faulthandler.readthedocs.org/


回答 7

真正帮助我的是spiv的技巧(如果我没有信誉点,我会投票并提出评论),以获取未经准备的 Python进程的堆栈跟踪。直到我修改了gdbinit脚本起作用。所以:

  • 下载http://svn.python.org/projects/python/trunk/Misc/gdbinit并将其放入~/.gdbinit

  • 编辑它,更改PyEval_EvalFramePyEval_EvalFrameEx[编辑:不再需要;截至2010年1月14日,链接文件已具有此更改]

  • 附加gdb: gdb -p PID

  • 获取python堆栈跟踪: pystack

What really helped me here is spiv’s tip (which I would vote up and comment on if I had the reputation points) for getting a stack trace out of an unprepared Python process. Except it didn’t work until I modified the gdbinit script. So:

  • download http://svn.python.org/projects/python/trunk/Misc/gdbinit and put it in ~/.gdbinit

  • edit it, changing PyEval_EvalFrame to PyEval_EvalFrameEx [edit: no longer needed; the linked file already has this change as of 2010-01-14]

  • Attach gdb: gdb -p PID

  • Get the python stack trace: pystack


回答 8

我会将其添加为haridsv的评论,但我缺乏这样做的声誉:

我们中有些人仍然停留在2.6之前的python版本(Thread.ident必需)上,因此我得到的代码在Python 2.5中工作(尽管未显示线程名称),如下所示:

import traceback
import sys
def dumpstacks(signal, frame):
    code = []
    for threadId, stack in sys._current_frames().items():
            code.append("\n# Thread: %d" % (threadId))
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
            if line:
                code.append("  %s" % (line.strip()))
    print "\n".join(code)

import signal
signal.signal(signal.SIGQUIT, dumpstacks)

I would add this as a comment to haridsv’s response, but I lack the reputation to do so:

Some of us are still stuck on a version of Python older than 2.6 (required for Thread.ident), so I got the code working in Python 2.5 (though without the thread name being displayed) as such:

import traceback
import sys
def dumpstacks(signal, frame):
    code = []
    for threadId, stack in sys._current_frames().items():
            code.append("\n# Thread: %d" % (threadId))
        for filename, lineno, name, line in traceback.extract_stack(stack):
            code.append('File: "%s", line %d, in %s' % (filename, lineno, name))
            if line:
                code.append("  %s" % (line.strip()))
    print "\n".join(code)

import signal
signal.signal(signal.SIGQUIT, dumpstacks)

回答 9

python -dv yourscript.py

这将使解释器以调试模式运行,并为您提供解释器正在执行的操作的轨迹。

如果要交互式调试代码,则应按以下方式运行它:

python -m pdb yourscript.py

这告诉python解释器使用模块“ pdb”(即python调试器)运行脚本,如果您像这样运行解释器,则解释器将以交互模式执行,就像GDB一样

python -dv yourscript.py

That will make the interpreter to run in debug mode and to give you a trace of what the interpreter is doing.

If you want to interactively debug the code you should run it like this:

python -m pdb yourscript.py

That tells the python interpreter to run your script with the module “pdb” which is the python debugger, if you run it like that the interpreter will be executed in interactive mode, much like GDB


回答 10

看一下faulthandlerPython 3.3中的新模块。一个faulthandler反向移植了在Python 2使用可PyPI上。

Take a look at the faulthandler module, new in Python 3.3. A faulthandler backport for use in Python 2 is available on PyPI.


回答 11

在Solaris上,可以使用pstack(1)无需更改python代码。例如。

# pstack 16000 | grep : | head
16000: /usr/bin/python2.6 /usr/lib/pkg.depotd --cfg svc:/application/pkg/serv
[ /usr/lib/python2.6/vendor-packages/cherrypy/process/wspbus.py:282 (_wait) ]
[ /usr/lib/python2.6/vendor-packages/cherrypy/process/wspbus.py:295 (wait) ]
[ /usr/lib/python2.6/vendor-packages/cherrypy/process/wspbus.py:242 (block) ]
[ /usr/lib/python2.6/vendor-packages/cherrypy/_init_.py:249 (quickstart) ]
[ /usr/lib/pkg.depotd:890 (<module>) ]
[ /usr/lib/python2.6/threading.py:256 (wait) ]
[ /usr/lib/python2.6/Queue.py:177 (get) ]
[ /usr/lib/python2.6/vendor-packages/pkg/server/depot.py:2142 (run) ]
[ /usr/lib/python2.6/threading.py:477 (run)
etc.

On Solaris, you can use pstack(1) No changes to the python code are necessary. eg.

# pstack 16000 | grep : | head
16000: /usr/bin/python2.6 /usr/lib/pkg.depotd --cfg svc:/application/pkg/serv
[ /usr/lib/python2.6/vendor-packages/cherrypy/process/wspbus.py:282 (_wait) ]
[ /usr/lib/python2.6/vendor-packages/cherrypy/process/wspbus.py:295 (wait) ]
[ /usr/lib/python2.6/vendor-packages/cherrypy/process/wspbus.py:242 (block) ]
[ /usr/lib/python2.6/vendor-packages/cherrypy/_init_.py:249 (quickstart) ]
[ /usr/lib/pkg.depotd:890 (<module>) ]
[ /usr/lib/python2.6/threading.py:256 (wait) ]
[ /usr/lib/python2.6/Queue.py:177 (get) ]
[ /usr/lib/python2.6/vendor-packages/pkg/server/depot.py:2142 (run) ]
[ /usr/lib/python2.6/threading.py:477 (run)
etc.

回答 12

如果您使用的是Linux系统,请结合使用gdbPython调试扩展(可以在包装中python-dbgpython-debuginfo打包中)使用。它还对多线程应用程序,GUI应用程序和C模块有帮助。

使用以下命令运行程序:

$ gdb -ex r --args python <programname>.py [arguments]

这指示gdb准备python <programname>.py <arguments>r取消准备。

现在,当程序挂起时,切换到gdb控制台,按Ctr+C并执行:

(gdb) thread apply all py-list

在此处此处查看示例会话和更多信息。

If you’re on a Linux system, use the awesomeness of gdb with Python debug extensions (can be in python-dbg or python-debuginfo package). It also helps with multithreaded applications, GUI applications and C modules.

Run your program with:

$ gdb -ex r --args python <programname>.py [arguments]

This instructs gdb to prepare python <programname>.py <arguments> and run it.

Now when you program hangs, switch into gdb console, press Ctr+C and execute:

(gdb) thread apply all py-list

See example session and more info here and here.


回答 13

我一直在寻找调试我的线程的解决方案,由于使用haridsv,我在这里找到了它。我使用采用traceback.print_stack()的稍微简化的版本:

import sys, traceback, signal
import threading
import os

def dumpstacks(signal, frame):
  id2name = dict((th.ident, th.name) for th in threading.enumerate())
  for threadId, stack in sys._current_frames().items():
    print(id2name[threadId])
    traceback.print_stack(f=stack)

signal.signal(signal.SIGQUIT, dumpstacks)

os.killpg(os.getpgid(0), signal.SIGQUIT)

对于我的需求,我还按名称过滤线程。

I was looking for a while for a solution to debug my threads and I found it here thanks to haridsv. I use slightly simplified version employing the traceback.print_stack():

import sys, traceback, signal
import threading
import os

def dumpstacks(signal, frame):
  id2name = dict((th.ident, th.name) for th in threading.enumerate())
  for threadId, stack in sys._current_frames().items():
    print(id2name[threadId])
    traceback.print_stack(f=stack)

signal.signal(signal.SIGQUIT, dumpstacks)

os.killpg(os.getpgid(0), signal.SIGQUIT)

For my needs I also filter threads by name.


回答 14

值得一看的是Pydb,“基于gdb命令集宽松地扩展了Python调试器的版本”。它包括信号管理器,该管理器可以在发送指定信号时负责启动调试器。

2006年夏天的Code项目研究了在名为mpdb的模块中向pydb添加远程调试功能。

It’s worth looking at Pydb, “an expanded version of the Python debugger loosely based on the gdb command set”. It includes signal managers which can take care of starting the debugger when a specified signal is sent.

A 2006 Summer of Code project looked at adding remote-debugging features to pydb in a module called mpdb.


回答 15

我一起破解了一些附加到正在运行的Python进程中的工具,并注入了一些代码来获取Python Shell。

看到这里:https : //github.com/albertz/pydbattach

I hacked together some tool which attaches into a running Python process and injects some code to get a Python shell.

See here: https://github.com/albertz/pydbattach


回答 16

可以使用出色的py-spy来完成。它是Python程序的采样分析器,因此它的工作是附加到Python进程并对其调用堆栈进行采样。因此,py-spy dump --pid $SOME_PID您需要做的就是转储$SOME_PID进程中所有线程的调用堆栈。通常,它需要升级的特权(以读取目标进程的内存)。

这是一个线程化Python应用程序的外观示例。

$ sudo py-spy dump --pid 31080
Process 31080: python3.7 -m chronologer -e production serve -u www-data -m
Python v3.7.1 (/usr/local/bin/python3.7)

Thread 0x7FEF5E410400 (active): "MainThread"
    _wait (cherrypy/process/wspbus.py:370)
    wait (cherrypy/process/wspbus.py:384)
    block (cherrypy/process/wspbus.py:321)
    start (cherrypy/daemon.py:72)
    serve (chronologer/cli.py:27)
    main (chronologer/cli.py:84)
    <module> (chronologer/__main__.py:5)
    _run_code (runpy.py:85)
    _run_module_as_main (runpy.py:193)
Thread 0x7FEF55636700 (active): "_TimeoutMonitor"
    run (cherrypy/process/plugins.py:518)
    _bootstrap_inner (threading.py:917)
    _bootstrap (threading.py:885)
Thread 0x7FEF54B35700 (active): "HTTPServer Thread-2"
    accept (socket.py:212)
    tick (cherrypy/wsgiserver/__init__.py:2075)
    start (cherrypy/wsgiserver/__init__.py:2021)
    _start_http_thread (cherrypy/process/servers.py:217)
    run (threading.py:865)
    _bootstrap_inner (threading.py:917)
    _bootstrap (threading.py:885)
...
Thread 0x7FEF2BFFF700 (idle): "CP Server Thread-10"
    wait (threading.py:296)
    get (queue.py:170)
    run (cherrypy/wsgiserver/__init__.py:1586)
    _bootstrap_inner (threading.py:917)
    _bootstrap (threading.py:885)  

It can be done with excellent py-spy. It’s a sampling profiler for Python programs, so its job is to attach to a Python processes and sample their call stacks. Hence, py-spy dump --pid $SOME_PID is all you need to do to dump call stacks of all threads in the $SOME_PID process. Typically it needs escalated privileges (to read the target process’ memory).

Here’s an example of how it looks like for a threaded Python application.

$ sudo py-spy dump --pid 31080
Process 31080: python3.7 -m chronologer -e production serve -u www-data -m
Python v3.7.1 (/usr/local/bin/python3.7)

Thread 0x7FEF5E410400 (active): "MainThread"
    _wait (cherrypy/process/wspbus.py:370)
    wait (cherrypy/process/wspbus.py:384)
    block (cherrypy/process/wspbus.py:321)
    start (cherrypy/daemon.py:72)
    serve (chronologer/cli.py:27)
    main (chronologer/cli.py:84)
    <module> (chronologer/__main__.py:5)
    _run_code (runpy.py:85)
    _run_module_as_main (runpy.py:193)
Thread 0x7FEF55636700 (active): "_TimeoutMonitor"
    run (cherrypy/process/plugins.py:518)
    _bootstrap_inner (threading.py:917)
    _bootstrap (threading.py:885)
Thread 0x7FEF54B35700 (active): "HTTPServer Thread-2"
    accept (socket.py:212)
    tick (cherrypy/wsgiserver/__init__.py:2075)
    start (cherrypy/wsgiserver/__init__.py:2021)
    _start_http_thread (cherrypy/process/servers.py:217)
    run (threading.py:865)
    _bootstrap_inner (threading.py:917)
    _bootstrap (threading.py:885)
...
Thread 0x7FEF2BFFF700 (idle): "CP Server Thread-10"
    wait (threading.py:296)
    get (queue.py:170)
    run (cherrypy/wsgiserver/__init__.py:1586)
    _bootstrap_inner (threading.py:917)
    _bootstrap (threading.py:885)  

回答 17

pyringe是一个调试器,可以与正在运行的python进程,打印堆栈跟踪,变量等进行交互,而无需任何先验设置。

尽管我过去经常使用信号处理程序解决方案,但在某些环境中重现问题仍然常常很困难。

pyringe is a debugger that can interact with running python processes, print stack traces, variables, etc. without any a priori setup.

While I’ve often used the signal handler solution in the past, it can still often be difficult to reproduce the issue in certain environments.


回答 18

没有办法挂接到正在运行的python进程中并获得合理的结果。如果进程锁定,我该怎么做就挂勾strace并试图弄清楚到底发生了什么。

不幸的是,strace经常是观察者“修复”竞态条件,因此输出在那里也无用。

There is no way to hook into a running python process and get reasonable results. What I do if processes lock up is hooking strace in and trying to figure out what exactly is happening.

Unfortunately often strace is the observer that “fixes” race conditions so that the output is useless there too.


回答 19

您可以使用PuDB(具有curses接口的Python调试器)来执行此操作。只需添加

from pudb import set_interrupt_handler; set_interrupt_handler()

您的代码,并在需要中断时使用Ctrl-C。c如果您错过了它并想再试一次,则可以继续并多次中断。

You can use PuDB, a Python debugger with a curses interface to do this. Just add

from pudb import set_interrupt_handler; set_interrupt_handler()

to your code and use Ctrl-C when you want to break. You can continue with c and break again multiple times if you miss it and want to try again.


回答 20

我在python扩展的GDB阵营中。跟随https://wiki.python.org/moin/DebuggingWithGdb,这意味着

  1. dnf install gdb python-debuginfo 要么 sudo apt-get install gdb python2.7-dbg
  2. gdb python <pid of running process>
  3. py-bt

同时考虑info threadsthread apply all py-bt

I am in the GDB camp with the python extensions. Follow https://wiki.python.org/moin/DebuggingWithGdb, which means

  1. dnf install gdb python-debuginfo or sudo apt-get install gdb python2.7-dbg
  2. gdb python <pid of running process>
  3. py-bt

Also consider info threads and thread apply all py-bt.


回答 21

如何在控制台中调试任何功能:

使用pdb.set_trace()的地方创建函数,然后创建要调试的函数。

>>> import pdb
>>> import my_function

>>> def f():
...     pdb.set_trace()
...     my_function()
... 

然后调用创建的函数:

>>> f()
> <stdin>(3)f()
(Pdb) s
--Call--
> <stdin>(1)my_function()
(Pdb) 

调试愉快:)

How to debug any function in console:

Create function where you use pdb.set_trace(), then function you want debug.

>>> import pdb
>>> import my_function

>>> def f():
...     pdb.set_trace()
...     my_function()
... 

Then call created function:

>>> f()
> <stdin>(3)f()
(Pdb) s
--Call--
> <stdin>(1)my_function()
(Pdb) 

Happy debugging :)


回答 22

我不知道任何类似于Java对SIGQUIT的响应,因此您可能必须将其内置到应用程序中。也许您可以在另一个线程中创建服务器,以便在响应某种消息时获得堆栈跟踪?

I don’t know of anything similar to java’s response to SIGQUIT, so you might have to build it in to your application. Maybe you could make a server in another thread that can get a stacktrace on response to a message of some kind?


回答 23

使用检查模块。

导入检查帮助(inspect.stack)模块检查中的功能堆栈帮助:

stack(context = 1)返回调用者框架上方的堆栈的记录列表。

我发现它确实很有帮助。

use the inspect module.

import inspect help(inspect.stack) Help on function stack in module inspect:

stack(context=1) Return a list of records for the stack above the caller’s frame.

I find it very helpful indeed.


回答 24

在Python 3中,第一次在调试器中使用c(ont(inue))时,pdb将自动安装信号处理程序。之后按Control-C将使您回到原来的位置。在Python 2中,这是一个单行代码,即使在相对较旧的版本中也可以使用(在2.7中进行了测试,但我检查了Python源代码回到2.4,看起来还可以):

import pdb, signal
signal.signal(signal.SIGINT, lambda sig, frame: pdb.Pdb().set_trace(frame))

如果您花费大量时间调试Python,pdb值得学习。该界面有点晦涩难懂,但使用过类似工具(例如gdb)的任何人都应该熟悉。

In Python 3, pdb will automatically install a signal handler the first time you use c(ont(inue)) in the debugger. Pressing Control-C afterwards will drop you right back in there. In Python 2, here’s a one-liner which should work even in relatively old versions (tested in 2.7 but I checked Python source back to 2.4 and it looked okay):

import pdb, signal
signal.signal(signal.SIGINT, lambda sig, frame: pdb.Pdb().set_trace(frame))

pdb is worth learning if you spend any amount of time debugging Python. The interface is a bit obtuse but should be familiar to anyone who has used similar tools, such as gdb.


回答 25

如果您需要使用uWSGI进行此操作,它内置了Python Tracebacker,只需在配置中启用它即可(每个工作人员的姓名均附有数字):

py-tracebacker=/var/run/uwsgi/pytrace

完成此操作后,只需连接到套接字即可打印回溯:

uwsgi --connect-and-read /var/run/uwsgi/pytrace1

In case you need to do this with uWSGI, it has Python Tracebacker built-in and it’s just matter of enabling it in the configuration (number is attached to the name for each worker):

py-tracebacker=/var/run/uwsgi/pytrace

Once you have done this, you can print backtrace simply by connecting to the socket:

uwsgi --connect-and-read /var/run/uwsgi/pytrace1

回答 26

在运行代码的那一点,您可以插入此小片段,以查看格式正确的打印堆栈跟踪。假定您logs在项目的根目录中有一个名为的文件夹。

# DEBUG: START DEBUG -->
import traceback

with open('logs/stack-trace.log', 'w') as file:
    traceback.print_stack(file=file)
# DEBUG: END DEBUG --!

At the point where the code is run, you can insert this small snippet to see a nicely formatted printed stack trace. It assumes that you have a folder called logs at your project’s root directory.

# DEBUG: START DEBUG -->
import traceback

with open('logs/stack-trace.log', 'w') as file:
    traceback.print_stack(file=file)
# DEBUG: END DEBUG --!

Python非本地语句

问题:Python非本地语句

Python nonlocal语句有什么作用(在Python 3.0及更高版本中)?

官方Python网站上没有文档,help("nonlocal")也无法使用。

What does the Python nonlocal statement do (in Python 3.0 and later)?

There’s no documentation on the official Python website and help("nonlocal") does not work, either.


回答 0

比较一下,不使用nonlocal

x = 0
def outer():
    x = 1
    def inner():
        x = 2
        print("inner:", x)

    inner()
    print("outer:", x)

outer()
print("global:", x)

# inner: 2
# outer: 1
# global: 0

对此,使用nonlocal,其中inner()x是现在还outer()x

x = 0
def outer():
    x = 1
    def inner():
        nonlocal x
        x = 2
        print("inner:", x)

    inner()
    print("outer:", x)

outer()
print("global:", x)

# inner: 2
# outer: 2
# global: 0

如果要使用global,它将绑定x到正确的“全局”值:

x = 0
def outer():
    x = 1
    def inner():
        global x
        x = 2
        print("inner:", x)

    inner()
    print("outer:", x)

outer()
print("global:", x)

# inner: 2
# outer: 1
# global: 2

Compare this, without using nonlocal:

x = 0
def outer():
    x = 1
    def inner():
        x = 2
        print("inner:", x)

    inner()
    print("outer:", x)

outer()
print("global:", x)

# inner: 2
# outer: 1
# global: 0

To this, using nonlocal, where inner()‘s x is now also outer()‘s x:

x = 0
def outer():
    x = 1
    def inner():
        nonlocal x
        x = 2
        print("inner:", x)

    inner()
    print("outer:", x)

outer()
print("global:", x)

# inner: 2
# outer: 2
# global: 0

If we were to use global, it would bind x to the properly “global” value:

x = 0
def outer():
    x = 1
    def inner():
        global x
        x = 2
        print("inner:", x)

    inner()
    print("outer:", x)

outer()
print("global:", x)

# inner: 2
# outer: 1
# global: 2

回答 1

简而言之,它使您可以将值分配给外部(但非全局)范围内的变量。有关所有血腥细节,请参阅PEP 3104

In short, it lets you assign values to a variable in an outer (but non-global) scope. See PEP 3104 for all the gory details.


回答 2

谷歌搜索“ python nonlocal”发现了该提案PEP 3104,该提案完整描述了该语句背后的语法和推理。简而言之,它的作用与global声明完全相同,不同之处在于,它用于引用既不全局也不局部于函数的变量。

这是您可以执行此操作的简短示例。可以重写计数器生成器以使用它,以便它看起来更像是带有闭包的语言惯用法。

def make_counter():
    count = 0
    def counter():
        nonlocal count
        count += 1
        return count
    return counter

显然,您可以将其编写为生成器,例如:

def counter_generator():
    count = 0
    while True:
        count += 1
        yield count

但是,尽管这是完全习惯用的python,但对于初学者来说,第一个版本似乎更加明显。通过调用返回的函数正确使用生成器是一个常见的困惑点。第一个版本显式返回一个函数。

A google search for “python nonlocal” turned up the Proposal, PEP 3104, which fully describes the syntax and reasoning behind the statement. in short, it works in exactly the same way as the global statement, except that it is used to refer to variables that are neither global nor local to the function.

Here’s a brief example of what you can do with this. The counter generator can be rewritten to use this so that it looks more like the idioms of languages with closures.

def make_counter():
    count = 0
    def counter():
        nonlocal count
        count += 1
        return count
    return counter

Obviously, you could write this as a generator, like:

def counter_generator():
    count = 0
    while True:
        count += 1
        yield count

But while this is perfectly idiomatic python, it seems that the first version would be a bit more obvious for beginners. Properly using generators, by calling the returned function, is a common point of confusion. The first version explicitly returns a function.


回答 3

@ooboo:

它与源代码中的参考点“最接近”。这称为“词法作用域”,现在已经有40多年的历史了。

Python的类成员确实在名为的字典中,__dict__并且无法通过词法作用域来访问。

如果您未指定,nonlocal而是这样做x = 7,它将创建一个新的局部变量“ x”。如果您指定nonlocal,它将找到“最近”“ x”并分配给它。如果指定nonlocal并且没有“ x”,它将给您一条错误消息。

关键字global在我看来一直很奇怪,因为它会很乐意忽略除最外面的一个以外的所有其他“ x”。奇怪的。

@ooboo:

It takes the one “closest” to the point of reference in the source code. This is called “Lexical Scoping” and is standard for >40 years now.

Python’s class members are really in a dictionary called __dict__ and will never be reached by lexical scoping.

If you don’t specify nonlocal but do x = 7, it will create a new local variable “x”. If you do specify nonlocal, it will find the “closest” “x” and assign to that. If you specify nonlocal and there is no “x”, it will give you an error message.

The keyword global has always seemed strange to me since it will happily ignore all the other “x” except for the outermost one. Weird.


回答 4

help(’nonlocal’)nonlocal语句


    nonlocal_stmt ::= "nonlocal" identifier ("," identifier)*

nonlocal语句使列出的标识符引用最近的封闭范围中的先前绑定的变量。这很重要,因为绑定的默认行为是首先搜索本地命名空间。该语句允许封装的代码在全局(模块)范围之外的本地范围之外重新绑定变量。

nonlocal与语句中列出的名称不同,语句中 列出的名称global必须引用封闭范围内的预先存在的绑定(不能明确确定应在其中创建新绑定的范围)。

nonlocal语句中列出的名称不得与本地范围内的现有绑定冲突。

也可以看看:

PEP 3104-访问外部作用域中
的名称nonlocal语句的规范。

相关帮助主题:全局,NAMESPACES

资料来源:Python语言参考

help(‘nonlocal’) The nonlocal statement


    nonlocal_stmt ::= "nonlocal" identifier ("," identifier)*

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope. This is important because the default behavior for binding is to search the local namespace first. The statement allows encapsulated code to rebind variables outside of the local scope besides the global (module) scope.

Names listed in a nonlocal statement, unlike to those listed in a global statement, must refer to pre-existing bindings in an enclosing scope (the scope in which a new binding should be created cannot be determined unambiguously).

Names listed in a nonlocal statement must not collide with pre- existing bindings in the local scope.

See also:

PEP 3104 – Access to Names in Outer Scopes
The specification for the nonlocal statement.

Related help topics: global, NAMESPACES

Source: Python Language Reference


回答 5

引用《Python 3参考》

非本地语句使列出的标识符引用最近的包围范围中的先前绑定的变量(全局变量除外)。

如参考文献中所述,如果有多个嵌套函数,则仅修改最近的封闭函数中的变量:

def outer():
    def inner():
        def innermost():
            nonlocal x
            x = 3

        x = 2
        innermost()
        if x == 3: print('Inner x has been modified')

    x = 1
    inner()
    if x == 3: print('Outer x has been modified')

x = 0
outer()
if x == 3: print('Global x has been modified')

# Inner x has been modified

“最近”变量可以相隔几个级别:

def outer():
    def inner():
        def innermost():
            nonlocal x
            x = 3

        innermost()

    x = 1
    inner()
    if x == 3: print('Outer x has been modified')

x = 0
outer()
if x == 3: print('Global x has been modified')

# Outer x has been modified

但是它不能是全局变量:

def outer():
    def inner():
        def innermost():
            nonlocal x
            x = 3

        innermost()

    inner()

x = 0
outer()
if x == 3: print('Global x has been modified')

# SyntaxError: no binding for nonlocal 'x' found

Quote from the Python 3 Reference:

The nonlocal statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope excluding globals.

As said in the reference, in case of several nested functions only variable in the nearest enclosing function is modified:

def outer():
    def inner():
        def innermost():
            nonlocal x
            x = 3

        x = 2
        innermost()
        if x == 3: print('Inner x has been modified')

    x = 1
    inner()
    if x == 3: print('Outer x has been modified')

x = 0
outer()
if x == 3: print('Global x has been modified')

# Inner x has been modified

The “nearest” variable can be several levels away:

def outer():
    def inner():
        def innermost():
            nonlocal x
            x = 3

        innermost()

    x = 1
    inner()
    if x == 3: print('Outer x has been modified')

x = 0
outer()
if x == 3: print('Global x has been modified')

# Outer x has been modified

But it cannot be a global variable:

def outer():
    def inner():
        def innermost():
            nonlocal x
            x = 3

        innermost()

    inner()

x = 0
outer()
if x == 3: print('Global x has been modified')

# SyntaxError: no binding for nonlocal 'x' found

回答 6

a = 0    #1. global variable with respect to every function in program

def f():
    a = 0          #2. nonlocal with respect to function g
    def g():
        nonlocal a
        a=a+1
        print("The value of 'a' using nonlocal is ", a)
    def h():
        global a               #3. using global variable
        a=a+5
        print("The value of a using global is ", a)
    def i():
        a = 0              #4. variable separated from all others
        print("The value of 'a' inside a function is ", a)

    g()
    h()
    i()
print("The value of 'a' global before any function", a)
f()
print("The value of 'a' global after using function f ", a)
a = 0    #1. global variable with respect to every function in program

def f():
    a = 0          #2. nonlocal with respect to function g
    def g():
        nonlocal a
        a=a+1
        print("The value of 'a' using nonlocal is ", a)
    def h():
        global a               #3. using global variable
        a=a+5
        print("The value of a using global is ", a)
    def i():
        a = 0              #4. variable separated from all others
        print("The value of 'a' inside a function is ", a)

    g()
    h()
    i()
print("The value of 'a' global before any function", a)
f()
print("The value of 'a' global after using function f ", a)

回答 7

我对“非本地”语句的个人理解(并且对不起,因为我是Python和程序设计的新手)所以,“非本地”是在迭代函数中使用全局功能的一种方式,而不是代码本身。 。如果愿意,可以在函数之间进行全局声明。

My personal understanding of the “nonlocal” statement (and do excuse me as I am new to Python and Programming in general) is that the “nonlocal” is a way to use the Global functionality within iterated functions rather than the body of the code itself. A Global statement between functions if you will.


回答 8

具有“非本地”内部函数(即嵌套内部函数)的用户可以获取外部父函数的特定变量的读取和“ 写入 ”权限。非本地只能在内部函数中使用,例如:

a = 10
def Outer(msg):
    a = 20
    b = 30
    def Inner():
        c = 50
        d = 60
        print("MU LCL =",locals())
        nonlocal a
        a = 100
        ans = a+c
        print("Hello from Inner",ans)       
        print("value of a Inner : ",a)
    Inner()
    print("value of a Outer : ",a)

res = Outer("Hello World")
print(res)
print("value of a Global : ",a)

with ‘nonlocal’ inner functions(ie;nested inner functions) can get read & ‘write‘ permission for that specific variable of the outer parent function. And nonlocal can be used only inside inner functions eg:

a = 10
def Outer(msg):
    a = 20
    b = 30
    def Inner():
        c = 50
        d = 60
        print("MU LCL =",locals())
        nonlocal a
        a = 100
        ans = a+c
        print("Hello from Inner",ans)       
        print("value of a Inner : ",a)
    Inner()
    print("value of a Outer : ",a)

res = Outer("Hello World")
print(res)
print("value of a Global : ",a)

Python argparse:如何在帮助文本中插入换行符?

问题:Python argparse:如何在帮助文本中插入换行符?

argparse在Python 2.7中用于解析输入选项。我的选择之一是多项选择。我想在其帮助文本中列出一个列表,例如

from argparse import ArgumentParser

parser = ArgumentParser(description='test')

parser.add_argument('-g', choices=['a', 'b', 'g', 'd', 'e'], default='a',
    help="Some option, where\n"
         " a = alpha\n"
         " b = beta\n"
         " g = gamma\n"
         " d = delta\n"
         " e = epsilon")

parser.parse_args()

但是,argparse删除所有换行符和连续的空格。结果看起来像

〜/下载:52 $ python2.7 x.py -h
用法:x.py [-h] [-g {a,b,g,d,e}]

测试

可选参数:
  -h,--help显示此帮助消息并退出
  -g {a,b,g,d,e}某些选项,其中a = alpha b = beta g = gamma d = delta e
                  = epsilon

如何在帮助文本中插入换行符?

I’m using argparse in Python 2.7 for parsing input options. One of my options is a multiple choice. I want to make a list in its help text, e.g.

from argparse import ArgumentParser

parser = ArgumentParser(description='test')

parser.add_argument('-g', choices=['a', 'b', 'g', 'd', 'e'], default='a',
    help="Some option, where\n"
         " a = alpha\n"
         " b = beta\n"
         " g = gamma\n"
         " d = delta\n"
         " e = epsilon")

parser.parse_args()

However, argparse strips all newlines and consecutive spaces. The result looks like

~/Downloads:52$ python2.7 x.py -h
usage: x.py [-h] [-g {a,b,g,d,e}]

test

optional arguments:
  -h, --help      show this help message and exit
  -g {a,b,g,d,e}  Some option, where a = alpha b = beta g = gamma d = delta e
                  = epsilon

How to insert newlines in the help text?


回答 0

尝试使用RawTextHelpFormatter

from argparse import RawTextHelpFormatter
parser = ArgumentParser(description='test', formatter_class=RawTextHelpFormatter)

Try using RawTextHelpFormatter:

from argparse import RawTextHelpFormatter
parser = ArgumentParser(description='test', formatter_class=RawTextHelpFormatter)

回答 1

如果您只想覆盖一个选项,则不应使用RawTextHelpFormatter。而是子类化,HelpFormatter并为应该“原始”处理的选项提供特殊的介绍(我使用"R|rest of help"):

import argparse

class SmartFormatter(argparse.HelpFormatter):

    def _split_lines(self, text, width):
        if text.startswith('R|'):
            return text[2:].splitlines()  
        # this is the RawTextHelpFormatter._split_lines
        return argparse.HelpFormatter._split_lines(self, text, width)

并使用它:

from argparse import ArgumentParser

parser = ArgumentParser(description='test', formatter_class=SmartFormatter)

parser.add_argument('-g', choices=['a', 'b', 'g', 'd', 'e'], default='a',
    help="R|Some option, where\n"
         " a = alpha\n"
         " b = beta\n"
         " g = gamma\n"
         " d = delta\n"
         " e = epsilon")

parser.parse_args()

对于其他任何.add_argument()不以帮助开头的呼叫,R|都将照常进行包装。

这是我对argparse进行改进的一部分。完整的SmartFormatter还支持将默认值添加到所有选项,以及实用程序描述的原始输入。完整版本具有自己的_split_lines方法,因此可以保留对版本字符串所做的任何格式化:

parser.add_argument('--version', '-v', action="version",
                    version="version...\n   42!")

If you just want to override the one option, you should not use RawTextHelpFormatter. Instead subclass the HelpFormatter and provide a special intro for the options that should be handled “raw” (I use "R|rest of help"):

import argparse

class SmartFormatter(argparse.HelpFormatter):

    def _split_lines(self, text, width):
        if text.startswith('R|'):
            return text[2:].splitlines()  
        # this is the RawTextHelpFormatter._split_lines
        return argparse.HelpFormatter._split_lines(self, text, width)

And use it:

from argparse import ArgumentParser

parser = ArgumentParser(description='test', formatter_class=SmartFormatter)

parser.add_argument('-g', choices=['a', 'b', 'g', 'd', 'e'], default='a',
    help="R|Some option, where\n"
         " a = alpha\n"
         " b = beta\n"
         " g = gamma\n"
         " d = delta\n"
         " e = epsilon")

parser.parse_args()

Any other calls to .add_argument() where the help does not start with R| will be wrapped as normal.

This is part of my improvements on argparse. The full SmartFormatter also supports adding the defaults to all options, and raw input of the utilities description. The full version has its own _split_lines method, so that any formatting done to e.g. version strings is preserved:

parser.add_argument('--version', '-v', action="version",
                    version="version...\n   42!")

回答 2

另一个简单的方法是包括textwrap

例如,

import argparse, textwrap
parser = argparse.ArgumentParser(description='some information',
        usage='use "python %(prog)s --help" for more information',
        formatter_class=argparse.RawTextHelpFormatter)

parser.add_argument('--argument', default=somedefault, type=sometype,
        help= textwrap.dedent('''\
        First line
        Second line
        More lines ... '''))

这样,我们可以避免每条输出线前面有很长的空白空间。

usage: use "python your_python_program.py --help" for more information

Prepare input file

optional arguments:
-h, --help            show this help message and exit
--argument ARGUMENT
                      First line
                      Second line
                      More lines ...

Another easy way to do it is to include textwrap.

For example,

import argparse, textwrap
parser = argparse.ArgumentParser(description='some information',
        usage='use "python %(prog)s --help" for more information',
        formatter_class=argparse.RawTextHelpFormatter)

parser.add_argument('--argument', default=somedefault, type=sometype,
        help= textwrap.dedent('''\
        First line
        Second line
        More lines ... '''))

In this way, we can avoid the long empty space in front of each output line.

usage: use "python your_python_program.py --help" for more information

Prepare input file

optional arguments:
-h, --help            show this help message and exit
--argument ARGUMENT
                      First line
                      Second line
                      More lines ...

回答 3

我也遇到过类似的问题(Python 2.7.6)。我尝试使用以下方式将描述部分分解为几行RawTextHelpFormatter

parser = ArgumentParser(description="""First paragraph 

                                       Second paragraph

                                       Third paragraph""",  
                                       usage='%(prog)s [OPTIONS]', 
                                       formatter_class=RawTextHelpFormatter)

options = parser.parse_args()

并得到:

用法:play-with-argparse.py [选项]

第一段 

                        第二段

                        第三段

可选参数:
  -h,--help显示此帮助消息并退出

所以RawTextHelpFormatter不是解决方案。因为它按源代码中的显示方式打印描述,所以保留了所有空白字符(我想在源代码中保留额外的制表符,以提高可读性,但我不想将它们全部打印出来。同样,原始格式化程序不会在换行时换行太长,例如超过80个字符)。

感谢@Anton启发了上面的正确方向。但是该解决方案需要稍作修改才能格式化描述部分。

无论如何,需要自定义格式化程序。我扩展了现有的HelpFormatter类并覆盖了这样的_fill_text方法:

import textwrap as _textwrap
class MultilineFormatter(argparse.HelpFormatter):
    def _fill_text(self, text, width, indent):
        text = self._whitespace_matcher.sub(' ', text).strip()
        paragraphs = text.split('|n ')
        multiline_text = ''
        for paragraph in paragraphs:
            formatted_paragraph = _textwrap.fill(paragraph, width, initial_indent=indent, subsequent_indent=indent) + '\n\n'
            multiline_text = multiline_text + formatted_paragraph
        return multiline_text

与来自argparse模块的原始源代码进行比较:

def _fill_text(self, text, width, indent):
    text = self._whitespace_matcher.sub(' ', text).strip()
    return _textwrap.fill(text, width, initial_indent=indent,
                                       subsequent_indent=indent)

在原始代码中,整个描述被包装。在上面的自定义格式化程序中,整个文本分为几个块,并且每个块都独立进行格式化。

因此,借助自定义格式化程序:

parser = ArgumentParser(description= """First paragraph 
                                        |n                              
                                        Second paragraph
                                        |n
                                        Third paragraph""",  
                usage='%(prog)s [OPTIONS]',
                formatter_class=MultilineFormatter)

options = parser.parse_args()

输出为:

用法:play-with-argparse.py [选项]

第一段

第二段

第三段

可选参数:
  -h,--help显示此帮助消息并退出

I’ve faced similar issue (Python 2.7.6). I’ve tried to break down description section into several lines using RawTextHelpFormatter:

parser = ArgumentParser(description="""First paragraph 

                                       Second paragraph

                                       Third paragraph""",  
                                       usage='%(prog)s [OPTIONS]', 
                                       formatter_class=RawTextHelpFormatter)

options = parser.parse_args()

And got:

usage: play-with-argparse.py [OPTIONS]

First paragraph 

                        Second paragraph

                        Third paragraph

optional arguments:
  -h, --help  show this help message and exit

So RawTextHelpFormatter is not a solution. Because it prints description as it appears in source code, preserving all whitespace characters (I want to keep extra tabs in my source code for readability but I don’t want to print them all. Also raw formatter doesn’t wrap line when it is too long, more than 80 characters for example).

Thanks to @Anton who inspired the right direction above. But that solution needs slight modification in order to format description section.

Anyway, custom formatter is needed. I extended existing HelpFormatter class and overrode _fill_text method like this:

import textwrap as _textwrap
class MultilineFormatter(argparse.HelpFormatter):
    def _fill_text(self, text, width, indent):
        text = self._whitespace_matcher.sub(' ', text).strip()
        paragraphs = text.split('|n ')
        multiline_text = ''
        for paragraph in paragraphs:
            formatted_paragraph = _textwrap.fill(paragraph, width, initial_indent=indent, subsequent_indent=indent) + '\n\n'
            multiline_text = multiline_text + formatted_paragraph
        return multiline_text

Compare with the original source code coming from argparse module:

def _fill_text(self, text, width, indent):
    text = self._whitespace_matcher.sub(' ', text).strip()
    return _textwrap.fill(text, width, initial_indent=indent,
                                       subsequent_indent=indent)

In the original code the whole description is being wrapped. In custom formatter above the whole text is split into several chunks, and each of them is formatted independently.

So with aid of custom formatter:

parser = ArgumentParser(description= """First paragraph 
                                        |n                              
                                        Second paragraph
                                        |n
                                        Third paragraph""",  
                usage='%(prog)s [OPTIONS]',
                formatter_class=MultilineFormatter)

options = parser.parse_args()

the output is:

usage: play-with-argparse.py [OPTIONS]

First paragraph

Second paragraph

Third paragraph

optional arguments:
  -h, --help  show this help message and exit

回答 4

我想在说明文字中同时包含手动换行符和自动换行符;但是这里没有任何建议对我有用-因此我最终修改了此处答案中给出的SmartFormatter类;尽管argparse方法名称不是公共API的问题,但这是我所拥有的(称为的文件test.py):

import argparse
from argparse import RawDescriptionHelpFormatter

# call with: python test.py -h

class SmartDescriptionFormatter(argparse.RawDescriptionHelpFormatter):
  #def _split_lines(self, text, width): # RawTextHelpFormatter, although function name might change depending on Python
  def _fill_text(self, text, width, indent): # RawDescriptionHelpFormatter, although function name might change depending on Python
    #print("splot",text)
    if text.startswith('R|'):
      paragraphs = text[2:].splitlines()
      rebroken = [argparse._textwrap.wrap(tpar, width) for tpar in paragraphs]
      #print(rebroken)
      rebrokenstr = []
      for tlinearr in rebroken:
        if (len(tlinearr) == 0):
          rebrokenstr.append("")
        else:
          for tlinepiece in tlinearr:
            rebrokenstr.append(tlinepiece)
      #print(rebrokenstr)
      return '\n'.join(rebrokenstr) #(argparse._textwrap.wrap(text[2:], width))
    # this is the RawTextHelpFormatter._split_lines
    #return argparse.HelpFormatter._split_lines(self, text, width)
    return argparse.RawDescriptionHelpFormatter._fill_text(self, text, width, indent)

parser = argparse.ArgumentParser(formatter_class=SmartDescriptionFormatter, description="""R|Blahbla bla blah blahh/blahbla (bla blah-blabla) a blahblah bl a blaha-blah .blah blah

Blah blah bla blahblah, bla blahblah blah blah bl blblah bl blahb; blah bl blah bl bl a blah, bla blahb bl:

  blah blahblah blah bl blah blahblah""")

options = parser.parse_args()

这是在2.7和3.4中的工作方式:

$ python test.py -h
usage: test.py [-h]

Blahbla bla blah blahh/blahbla (bla blah-blabla) a blahblah bl a blaha-blah
.blah blah

Blah blah bla blahblah, bla blahblah blah blah bl blblah bl blahb; blah bl
blah bl bl a blah, bla blahb bl:

  blah blahblah blah bl blah blahblah

optional arguments:
  -h, --help  show this help message and exit

I wanted to have both manual line breaks in the description text, and auto wrapping of it; but none of the suggestions here worked for me – so I ended up modifying the SmartFormatter class given in the answers here; the issues with the argparse method names not being a public API notwithstanding, here is what I have (as a file called test.py):

import argparse
from argparse import RawDescriptionHelpFormatter

# call with: python test.py -h

class SmartDescriptionFormatter(argparse.RawDescriptionHelpFormatter):
  #def _split_lines(self, text, width): # RawTextHelpFormatter, although function name might change depending on Python
  def _fill_text(self, text, width, indent): # RawDescriptionHelpFormatter, although function name might change depending on Python
    #print("splot",text)
    if text.startswith('R|'):
      paragraphs = text[2:].splitlines()
      rebroken = [argparse._textwrap.wrap(tpar, width) for tpar in paragraphs]
      #print(rebroken)
      rebrokenstr = []
      for tlinearr in rebroken:
        if (len(tlinearr) == 0):
          rebrokenstr.append("")
        else:
          for tlinepiece in tlinearr:
            rebrokenstr.append(tlinepiece)
      #print(rebrokenstr)
      return '\n'.join(rebrokenstr) #(argparse._textwrap.wrap(text[2:], width))
    # this is the RawTextHelpFormatter._split_lines
    #return argparse.HelpFormatter._split_lines(self, text, width)
    return argparse.RawDescriptionHelpFormatter._fill_text(self, text, width, indent)

parser = argparse.ArgumentParser(formatter_class=SmartDescriptionFormatter, description="""R|Blahbla bla blah blahh/blahbla (bla blah-blabla) a blahblah bl a blaha-blah .blah blah

Blah blah bla blahblah, bla blahblah blah blah bl blblah bl blahb; blah bl blah bl bl a blah, bla blahb bl:

  blah blahblah blah bl blah blahblah""")

options = parser.parse_args()

This is how it works in 2.7 and 3.4:

$ python test.py -h
usage: test.py [-h]

Blahbla bla blah blahh/blahbla (bla blah-blabla) a blahblah bl a blaha-blah
.blah blah

Blah blah bla blahblah, bla blahblah blah blah bl blblah bl blahb; blah bl
blah bl bl a blah, bla blahb bl:

  blah blahblah blah bl blah blahblah

optional arguments:
  -h, --help  show this help message and exit

回答 5

从上述的SmartFomatter开始,我结束了该解决方案:

class SmartFormatter(argparse.HelpFormatter):
    '''
         Custom Help Formatter used to split help text when '\n' was 
         inserted in it.
    '''

    def _split_lines(self, text, width):
        r = []
        for t in text.splitlines(): r.extend(argparse.HelpFormatter._split_lines(self, t, width))
        return r

请注意,奇怪的是,传递给顶级解析器的formatter_class参数没有被sub_parsers继承,必须为每个创建的sub_parser再次传递它。

Starting from SmartFomatter described above, I ended to that solution:

class SmartFormatter(argparse.HelpFormatter):
    '''
         Custom Help Formatter used to split help text when '\n' was 
         inserted in it.
    '''

    def _split_lines(self, text, width):
        r = []
        for t in text.splitlines(): r.extend(argparse.HelpFormatter._split_lines(self, t, width))
        return r

Note that strangely the formatter_class argument passed to top level parser is not inheritated by sub_parsers, one must pass it again for each created sub_parser.


回答 6

前言

对于这个问题,argparse.RawTextHelpFormatter对我有帮助。

现在,我想分享如何使用argparse

我知道这可能与问题无关,

但是这些问题困扰了我一段时间。

因此,我想分享自己的经验,希望对您有所帮助。

开始了。

第三方模块

colorama:用于更改文本颜色:pip install colorama

使ANSI转义字符序列(用于生成彩色的终端文本和光标定位)在MS Windows下工作

import colorama
from colorama import Fore, Back
from pathlib import Path
from os import startfile, system

SCRIPT_DIR = Path(__file__).resolve().parent
TEMPLATE_DIR = SCRIPT_DIR.joinpath('.')


def main(args):
    ...


if __name__ == '__main__':
    colorama.init(autoreset=True)

    from argparse import ArgumentParser, RawTextHelpFormatter

    format_text = FormatText([(20, '<'), (60, '<')])
    yellow_dc = format_text.new_dc(fore_color=Fore.YELLOW)
    green_dc = format_text.new_dc(fore_color=Fore.GREEN)
    red_dc = format_text.new_dc(fore_color=Fore.RED, back_color=Back.LIGHTYELLOW_EX)

    script_description = \
        '\n'.join([desc for desc in
                   [f'\n{green_dc(f"python {Path(__file__).name} [REFERENCE TEMPLATE] [OUTPUT FILE NAME]")} to create template.',
                    f'{green_dc(f"python {Path(__file__).name} -l *")} to get all available template',
                    f'{green_dc(f"python {Path(__file__).name} -o open")} open template directory so that you can put your template file there.',
                    # <- add your own description
                    ]])
    arg_parser = ArgumentParser(description=yellow_dc('CREATE TEMPLATE TOOL'),
                                # conflict_handler='resolve',
                                usage=script_description, formatter_class=RawTextHelpFormatter)

    arg_parser.add_argument("ref", help="reference template", nargs='?')
    arg_parser.add_argument("outfile", help="output file name", nargs='?')
    arg_parser.add_argument("action_number", help="action number", nargs='?', type=int)
    arg_parser.add_argument('--list', "-l", dest='list',
                            help=f"example: {green_dc('-l *')} \n"
                                 "description: list current available template. (accept regex)")

    arg_parser.add_argument('--option', "-o", dest='option',
                            help='\n'.join([format_text(msg_data_list) for msg_data_list in [
                                ['example', 'description'],
                                [green_dc('-o open'), 'open template directory so that you can put your template file there.'],
                                [green_dc('-o run'), '...'],
                                [green_dc('-o ...'), '...'],
                                # <- add your own description
                            ]]))

    g_args = arg_parser.parse_args()
    task_run_list = [[False, lambda: startfile('.')] if g_args.option == 'open' else None,
                     [False, lambda: [print(template_file_path.stem) for template_file_path in TEMPLATE_DIR.glob(f'{g_args.list}.py')]] if g_args.list else None,
                     # <- add your own function
                     ]
    for leave_flag, func in [task_list for task_list in task_run_list if task_list]:
        func()
        if leave_flag:
            exit(0)

    # CHECK POSITIONAL ARGUMENTS
    for attr_name, value in vars(g_args).items():
        if attr_name.startswith('-') or value is not None:
            continue
        system('cls')
        print(f'error required values of {red_dc(attr_name)} is None')
        print(f"if you need help, please use help command to help you: {red_dc(f'python {__file__} -h')}")
        exit(-1)
    main(g_args)

其中的类别FormatText如下

class FormatText:
    __slots__ = ['align_list']

    def __init__(self, align_list: list, autoreset=True):
        """
        USAGE::

            format_text = FormatText([(20, '<'), (60, '<')])
            red_dc = format_text.new_dc(fore_color=Fore.RED)
            print(red_dc(['column 1', 'column 2']))
            print(red_dc('good morning'))
        :param align_list:
        :param autoreset:
        """
        self.align_list = align_list
        colorama.init(autoreset=autoreset)

    def __call__(self, text_list: list):
        if len(text_list) != len(self.align_list):
            if isinstance(text_list, str):
                return text_list
            raise AttributeError
        return ' '.join(f'{txt:{flag}{int_align}}' for txt, (int_align, flag) in zip(text_list, self.align_list))

    def new_dc(self, fore_color: Fore = Fore.GREEN, back_color: Back = ""):  # DECORATOR
        """create a device context"""
        def wrap(msgs):
            return back_color + fore_color + self(msgs) + Fore.RESET
        return wrap

在此处输入图片说明

Preface

For this question, argparse.RawTextHelpFormatter is helpful to me.

Now, I want to share how do I use the argparse.

I know it may not be related to question,

but these questions have been bothered me for a while.

So I want to share my experience, hope that will be helpful for someone.

Here we go.

3rd Party Modules

colorama: for change the text color: pip install colorama

Makes ANSI escape character sequences (for producing colored terminal text and cursor positioning) work under MS Windows

Example

import colorama
from colorama import Fore, Back
from pathlib import Path
from os import startfile, system

SCRIPT_DIR = Path(__file__).resolve().parent
TEMPLATE_DIR = SCRIPT_DIR.joinpath('.')


def main(args):
    ...


if __name__ == '__main__':
    colorama.init(autoreset=True)

    from argparse import ArgumentParser, RawTextHelpFormatter

    format_text = FormatText([(20, '<'), (60, '<')])
    yellow_dc = format_text.new_dc(fore_color=Fore.YELLOW)
    green_dc = format_text.new_dc(fore_color=Fore.GREEN)
    red_dc = format_text.new_dc(fore_color=Fore.RED, back_color=Back.LIGHTYELLOW_EX)

    script_description = \
        '\n'.join([desc for desc in
                   [f'\n{green_dc(f"python {Path(__file__).name} [REFERENCE TEMPLATE] [OUTPUT FILE NAME]")} to create template.',
                    f'{green_dc(f"python {Path(__file__).name} -l *")} to get all available template',
                    f'{green_dc(f"python {Path(__file__).name} -o open")} open template directory so that you can put your template file there.',
                    # <- add your own description
                    ]])
    arg_parser = ArgumentParser(description=yellow_dc('CREATE TEMPLATE TOOL'),
                                # conflict_handler='resolve',
                                usage=script_description, formatter_class=RawTextHelpFormatter)

    arg_parser.add_argument("ref", help="reference template", nargs='?')
    arg_parser.add_argument("outfile", help="output file name", nargs='?')
    arg_parser.add_argument("action_number", help="action number", nargs='?', type=int)
    arg_parser.add_argument('--list', "-l", dest='list',
                            help=f"example: {green_dc('-l *')} \n"
                                 "description: list current available template. (accept regex)")

    arg_parser.add_argument('--option', "-o", dest='option',
                            help='\n'.join([format_text(msg_data_list) for msg_data_list in [
                                ['example', 'description'],
                                [green_dc('-o open'), 'open template directory so that you can put your template file there.'],
                                [green_dc('-o run'), '...'],
                                [green_dc('-o ...'), '...'],
                                # <- add your own description
                            ]]))

    g_args = arg_parser.parse_args()
    task_run_list = [[False, lambda: startfile('.')] if g_args.option == 'open' else None,
                     [False, lambda: [print(template_file_path.stem) for template_file_path in TEMPLATE_DIR.glob(f'{g_args.list}.py')]] if g_args.list else None,
                     # <- add your own function
                     ]
    for leave_flag, func in [task_list for task_list in task_run_list if task_list]:
        func()
        if leave_flag:
            exit(0)

    # CHECK POSITIONAL ARGUMENTS
    for attr_name, value in vars(g_args).items():
        if attr_name.startswith('-') or value is not None:
            continue
        system('cls')
        print(f'error required values of {red_dc(attr_name)} is None')
        print(f"if you need help, please use help command to help you: {red_dc(f'python {__file__} -h')}")
        exit(-1)
    main(g_args)


Where the class of FormatText is the following

class FormatText:
    __slots__ = ['align_list']

    def __init__(self, align_list: list, autoreset=True):
        """
        USAGE::

            format_text = FormatText([(20, '<'), (60, '<')])
            red_dc = format_text.new_dc(fore_color=Fore.RED)
            print(red_dc(['column 1', 'column 2']))
            print(red_dc('good morning'))
        :param align_list:
        :param autoreset:
        """
        self.align_list = align_list
        colorama.init(autoreset=autoreset)

    def __call__(self, text_list: list):
        if len(text_list) != len(self.align_list):
            if isinstance(text_list, str):
                return text_list
            raise AttributeError
        return ' '.join(f'{txt:{flag}{int_align}}' for txt, (int_align, flag) in zip(text_list, self.align_list))

    def new_dc(self, fore_color: Fore = Fore.GREEN, back_color: Back = ""):  # DECORATOR
        """create a device context"""
        def wrap(msgs):
            return back_color + fore_color + self(msgs) + Fore.RESET
        return wrap

enter image description here


Python Pandas错误标记数据

问题:Python Pandas错误标记数据

我正在尝试使用熊猫来操作.csv文件,但出现此错误:

pandas.parser.CParserError:标记数据时出错。C错误:第3行中应有2个字段,看到了12

我试图阅读熊猫文档,但一无所获。

我的代码很简单:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

我该如何解决?我应该使用csv模块还是其他语言?

文件来自Morningstar

I’m trying to use pandas to manipulate a .csv file but I get this error:

pandas.parser.CParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 12

I have tried to read the pandas docs, but found nothing.

My code is simple:

path = 'GOOG Key Ratios.csv'
#print(open(path).read())
data = pd.read_csv(path)

How can I resolve this? Should I use the csv module or another language ?

File is from Morningstar


回答 0

您也可以尝试;

data = pd.read_csv('file1.csv', error_bad_lines=False)

请注意,这将导致违规行被跳过。

you could also try;

data = pd.read_csv('file1.csv', error_bad_lines=False)

Do note that this will cause the offending lines to be skipped.


回答 1

这可能是一个问题

  • 数据中的分隔符
  • 第一行,如@TomAugspurger指出

要解决此问题,请尝试在调用时指定sepand / or header参数read_csv。例如,

df = pandas.read_csv(fileName, sep='delimiter', header=None)

在上面的代码中,sep定义您的定界符并header=None告诉熊猫您的源数据没有用于标题/列标题的行。因此说文档:“如果文件不包含标题行,那么你应该明确地传递标题=无”。在这种情况下,pandas自动为每个字段{0,1,2,…}创建整数索引。

根据文档,定界符问题应该成为问题。文档说:“如果sep为None [未指定],将尝试自动确定这一点。” 但是,我还没有遇到好运,包括带有明显分隔符的实例。

It might be an issue with

  • the delimiters in your data
  • the first row, as @TomAugspurger noted

To solve it, try specifying the sep and/or header arguments when calling read_csv. For instance,

df = pandas.read_csv(fileName, sep='delimiter', header=None)

In the code above, sep defines your delimiter and header=None tells pandas that your source data has no row for headers / column titles. Thus saith the docs: “If file contains no header row, then you should explicitly pass header=None”. In this instance, pandas automatically creates whole-number indices for each field {0,1,2,…}.

According to the docs, the delimiter thing should not be an issue. The docs say that “if sep is None [not specified], will try to automatically determine this.” I however have not had good luck with this, including instances with obvious delimiters.


回答 2

解析器被文件的标题弄糊涂了。它读取第一行并推断该行的列数。但是前两行并不代表文件中的实际数据。

试试看 data = pd.read_csv(path, skiprows=2)

The parser is getting confused by the header of the file. It reads the first row and infers the number of columns from that row. But the first two rows aren’t representative of the actual data in the file.

Try it with data = pd.read_csv(path, skiprows=2)


回答 3

您的CSV文件可能具有可变的列数,并read_csv从前几行推断出列数。在这种情况下,有两种解决方法:

1)更改CSV文件,使其第一行的虚拟行具有最大的列数(并指定 header=[0]

2)或使用names = list(range(0,N))其中N是最大列数。

Your CSV file might have variable number of columns and read_csv inferred the number of columns from the first few rows. Two ways to solve it in this case:

1) Change the CSV file to have a dummy first line with max number of columns (and specify header=[0])

2) Or use names = list(range(0,N)) where N is the max number of columns.


回答 4

这绝对是定界符的问题,因为大多数csv CSV都是使用创建的,sep='/t'因此请尝试read_csv使用带有分隔符的制表(\t)/t。因此,尝试使用以下代码行打开。

data=pd.read_csv("File_path", sep='\t')

This is definitely an issue of delimiter, as most of the csv CSV are got create using sep='/t' so try to read_csv using the tab character (\t) using separator /t. so, try to open using following code line.

data=pd.read_csv("File_path", sep='\t')

回答 5

我也有这个问题,但也许是出于不同的原因。我的CSV中有一些尾随逗号,这增加了pandas试图读取的附加列。使用以下方法,但它只是忽略了不良之处:

data = pd.read_csv('file1.csv', error_bad_lines=False)

如果要保留这些行以处理错误,请执行以下操作:

line     = []
expected = []
saw      = []     
cont     = True 

while cont == True:     
    try:
        data = pd.read_csv('file1.csv',skiprows=line)
        cont = False
    except Exception as e:    
        errortype = e.message.split('.')[0].strip()                                
        if errortype == 'Error tokenizing data':                        
           cerror      = e.message.split(':')[1].strip().replace(',','')
           nums        = [n for n in cerror.split(' ') if str.isdigit(n)]
           expected.append(int(nums[0]))
           saw.append(int(nums[2]))
           line.append(int(nums[1])-1)
         else:
           cerror      = 'Unknown'
           print 'Unknown Error - 222'

if line != []:
    # Handle the errors however you want

我继续编写脚本以将这些行重新插入到DataFrame中,因为不良行将由上述代码中的变量“ line”给出。只需使用csv阅读器,就可以避免所有这些情况。希望熊猫开发者将来可以使处理这种情况更加容易。

I had this problem as well but perhaps for a different reason. I had some trailing commas in my CSV that were adding an additional column that pandas was attempting to read. Using the following works but it simply ignores the bad lines:

data = pd.read_csv('file1.csv', error_bad_lines=False)

If you want to keep the lines an ugly kind of hack for handling the errors is to do something like the following:

line     = []
expected = []
saw      = []     
cont     = True 

while cont == True:     
    try:
        data = pd.read_csv('file1.csv',skiprows=line)
        cont = False
    except Exception as e:    
        errortype = e.message.split('.')[0].strip()                                
        if errortype == 'Error tokenizing data':                        
           cerror      = e.message.split(':')[1].strip().replace(',','')
           nums        = [n for n in cerror.split(' ') if str.isdigit(n)]
           expected.append(int(nums[0]))
           saw.append(int(nums[2]))
           line.append(int(nums[1])-1)
         else:
           cerror      = 'Unknown'
           print 'Unknown Error - 222'

if line != []:
    # Handle the errors however you want

I proceeded to write a script to reinsert the lines into the DataFrame since the bad lines will be given by the variable ‘line’ in the above code. This can all be avoided by simply using the csv reader. Hopefully the pandas developers can make it easier to deal with this situation in the future.


回答 6

我遇到了这个问题,我试图在不传递列名的情况下读取CSV文件。

df = pd.read_csv(filename, header=None)

我事先在列表中指定了列名称,然后将它们传递给names,它立即解决了它。如果您没有设置列名,则可以创建与数据中最大列数一样多的占位符名称。

col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)

I had this problem, where I was trying to read in a CSV without passing in column names.

df = pd.read_csv(filename, header=None)

I specified the column names in a list beforehand and then pass them into names, and it solved it immediately. If you don’t have set column names, you could just create as many placeholder names as the maximum number of columns that might be in your data.

col_names = ["col1", "col2", "col3", ...]
df = pd.read_csv(filename, names=col_names)

回答 7

我本人几次遇到这个问题。几乎每次,原因都是我试图打开的文件不是正确保存的CSV开头。“适当地”是指每行具有相同数量的分隔符或列。

通常发生这种情况是因为我在Excel中打开了CSV,然后错误地保存了它。即使文件扩展名仍然是.csv,纯CSV格式也已更改。

用pandas to_csv保存的所有文件都将正确格式化,并且不会出现该问题。但是,如果您使用其他程序打开它,则可能会更改结构。

希望能有所帮助。

I’ve had this problem a few times myself. Almost every time, the reason is that the file I was attempting to open was not a properly saved CSV to begin with. And by “properly”, I mean each row had the same number of separators or columns.

Typically it happened because I had opened the CSV in Excel then improperly saved it. Even though the file extension was still .csv, the pure CSV format had been altered.

Any file saved with pandas to_csv will be properly formatted and shouldn’t have that issue. But if you open it with another program, it may change the structure.

Hope that helps.


回答 8

我遇到了同样的问题。使用pd.read_table()相同的源文件似乎工作。我无法找到原因,但这对于我的情况是一个有用的解决方法。也许某个知识渊博的人可以阐明其工作原理。

编辑:我发现当文件中的某些文本与实际数据的格式不同时,此错误会逐渐蔓延。这通常是页眉或页脚信息(多于一行,因此skip_header无效),不会与实际数据用相同数量的逗号分隔(使用read_csv时)。使用read_table使用制表符作为分隔符,可以避免用户当前的错误,但会引入其他错误。

我通常通过将多余的数据读取到文件中然后使用read_csv()方法来解决此问题。

确切的解决方案可能会有所不同,具体取决于您的实际文件,但是这种方法在某些情况下对我有用

I came across the same issue. Using pd.read_table() on the same source file seemed to work. I could not trace the reason for this but it was a useful workaround for my case. Perhaps someone more knowledgeable can shed more light on why it worked.

Edit: I found that this error creeps up when you have some text in your file that does not have the same format as the actual data. This is usually header or footer information (greater than one line, so skip_header doesn’t work) which will not be separated by the same number of commas as your actual data (when using read_csv). Using read_table uses a tab as the delimiter which could circumvent the users current error but introduce others.

I usually get around this by reading the extra data into a file then use the read_csv() method.

The exact solution might differ depending on your actual file, but this approach has worked for me in several cases


回答 9

以下代码对我有用(我发布了此答案,因为我特别在Google合作笔记本中遇到了此问题):

df = pd.read_csv("/path/foo.csv", delimiter=';', skiprows=0, low_memory=False)

The following worked for me (I posted this answer, because I specifically had this problem in a Google Colaboratory Notebook):

df = pd.read_csv("/path/foo.csv", delimiter=';', skiprows=0, low_memory=False)

回答 10

尝试读取带有空格,逗号和引号的制表符分隔表时,我遇到了类似的问题:

1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444  2328    "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='\t', index_col=2, header=None, engine = 'c')

pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

这说明它与C解析引擎(默认引擎)有关。也许更改为python会改变一切

counts = pd.read_table(path_counts, sep='\t', index_col=2, header=None, engine='python')

Segmentation fault (core dumped)

现在,这是一个不同的错误。
如果我们继续尝试从表中删除空格,则python-engine的错误再次更改:

1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444  2328    "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


_csv.Error: '   ' expected after '"'

很明显,熊猫在解析我们的行时遇到问题。要使用python引擎解析表,我需要事先删除表中的所有空格和引号。同时,C引擎即使连续出现逗号也不断崩溃。

为了避免创建带有替换的新文件,我这样做是因为表很小:

from io import StringIO
with open(path_counts) as f:
    input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('\0',''))
    counts = pd.read_table(input, sep='\t', index_col=2, header=None, engine='python')

tl; dr
更改解析引擎,请尝试避免数据中出现任何非限定性的引号/逗号/空格。

I’ve had a similar problem while trying to read a tab-delimited table with spaces, commas and quotes:

1115794 4218    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", ""
1144102 3180    "k__Bacteria", "p__Firmicutes", "c__Bacilli", "o__Bacillales", "f__Bacillaceae", "g__Bacillus", ""
368444  2328    "k__Bacteria", "p__Bacteroidetes", "c__Bacteroidia", "o__Bacteroidales", "f__Bacteroidaceae", "g__Bacteroides", ""



import pandas as pd
# Same error for read_table
counts = pd.read_csv(path_counts, sep='\t', index_col=2, header=None, engine = 'c')

pandas.io.common.CParserError: Error tokenizing data. C error: out of memory

This says it has something to do with C parsing engine (which is the default one). Maybe changing to a python one will change anything

counts = pd.read_table(path_counts, sep='\t', index_col=2, header=None, engine='python')

Segmentation fault (core dumped)

Now that is a different error.
If we go ahead and try to remove spaces from the table, the error from python-engine changes once again:

1115794 4218    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae",""
1144102 3180    "k__Bacteria","p__Firmicutes","c__Bacilli","o__Bacillales","f__Bacillaceae","g__Bacillus",""
368444  2328    "k__Bacteria","p__Bacteroidetes","c__Bacteroidia","o__Bacteroidales","f__Bacteroidaceae","g__Bacteroides",""


_csv.Error: '   ' expected after '"'

And it gets clear that pandas was having problems parsing our rows. To parse a table with python engine I needed to remove all spaces and quotes from the table beforehand. Meanwhile C-engine kept crashing even with commas in rows.

To avoid creating a new file with replacements I did this, as my tables are small:

from io import StringIO
with open(path_counts) as f:
    input = StringIO(f.read().replace('", ""', '').replace('"', '').replace(', ', ',').replace('\0',''))
    counts = pd.read_table(input, sep='\t', index_col=2, header=None, engine='python')

tl;dr
Change parsing engine, try to avoid any non-delimiting quotes/commas/spaces in your data.


回答 11

我使用的数据集使用了很多引号(“)来进行格式化。通过包含以下参数,我能够解决此错误read_csv()

quoting=3 # 3 correlates to csv.QUOTE_NONE for pandas

The dataset that I used had a lot of quote marks (“) used extraneous of the formatting. I was able to fix the error by including this parameter for read_csv():

quoting=3 # 3 correlates to csv.QUOTE_NONE for pandas

回答 12

在参数中使用定界符

pd.read_csv(filename, delimiter=",", encoding='utf-8')

它会读取。

Use delimiter in parameter

pd.read_csv(filename, delimiter=",", encoding='utf-8')

It will read.


回答 13

尽管此问题并非如此,但压缩数据也可能出现此错误。明确设置该值可以kwarg compression解决我的问题。

result = pandas.read_csv(data_source, compression='gzip')

Although not the case for this question, this error may also appear with compressed data. Explicitly setting the value for kwarg compression resolved my problem.

result = pandas.read_csv(data_source, compression='gzip')

回答 14

我发现对处理类似的解析错误有用的另一种方法是使用CSV模块将数据重新路由到pandas df中。例如:

import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)

#once contents are available, I then put them in a list
csv_list = []
for l in reader:
    csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)

我发现CSV模块对于格式较差的逗号分隔文件更加健壮,因此在解决此类问题方面,此方法已取得成功。

An alternative that I have found to be useful in dealing with similar parsing errors uses the CSV module to re-route data into a pandas df. For example:

import csv
import pandas as pd
path = 'C:/FileLocation/'
file = 'filename.csv'
f = open(path+file,'rt')
reader = csv.reader(f)

#once contents are available, I then put them in a list
csv_list = []
for l in reader:
    csv_list.append(l)
f.close()
#now pandas has no problem getting into a df
df = pd.DataFrame(csv_list)

I find the CSV module to be a bit more robust to poorly formatted comma separated files and so have had success with this route to address issues like these.


回答 15

以下命令序列有效(我丢失了数据的第一行-no header = None present-,但至少已加载):

df = pd.read_csv(filename, usecols=range(0, 42)) df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']

以下操作无效:

df = pd.read_csv(filename, names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'], usecols=range(0, 42))

CParserError:标记数据时出错。C错误:在1605634行中应有53个字段,看到54个以下内容无效:

df = pd.read_csv(filename, header=None)

CParserError:标记数据时出错。C错误:在1605634行中预期有53个字段,看到了54

因此,在您的问题中,您必须通过 usecols=range(0, 2)

following sequence of commands works (I lose the first line of the data -no header=None present-, but at least it loads):

df = pd.read_csv(filename, usecols=range(0, 42)) df.columns = ['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14']

Following does NOT work:

df = pd.read_csv(filename, names=['YR', 'MO', 'DAY', 'HR', 'MIN', 'SEC', 'HUND', 'ERROR', 'RECTYPE', 'LANE', 'SPEED', 'CLASS', 'LENGTH', 'GVW', 'ESAL', 'W1', 'S1', 'W2', 'S2', 'W3', 'S3', 'W4', 'S4', 'W5', 'S5', 'W6', 'S6', 'W7', 'S7', 'W8', 'S8', 'W9', 'S9', 'W10', 'S10', 'W11', 'S11', 'W12', 'S12', 'W13', 'S13', 'W14'], usecols=range(0, 42))

CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54 Following does NOT work:

df = pd.read_csv(filename, header=None)

CParserError: Error tokenizing data. C error: Expected 53 fields in line 1605634, saw 54

Hence, in your problem you have to pass usecols=range(0, 2)


回答 16

对于那些在Linux OS上使用Python 3遇到类似问题的人。

pandas.errors.ParserError: Error tokenizing data. C error: Calling
read(nbytes) on source failed. Try engine='python'.

尝试:

df.read_csv('file.csv', encoding='utf8', engine='python')

For those who are having similar issue with Python 3 on linux OS.

pandas.errors.ParserError: Error tokenizing data. C error: Calling
read(nbytes) on source failed. Try engine='python'.

Try:

df.read_csv('file.csv', encoding='utf8', engine='python')

回答 17

有时问题不在于如何使用python,而在于原始数据。
我收到此错误消息

Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.

事实证明,在列说明中有时会出现逗号。这意味着需要清理CSV文件或使用其他分隔符。

Sometimes the problem is not how to use python, but with the raw data.
I got this error message

Error tokenizing data. C error: Expected 18 fields in line 72, saw 19.

It turned out that in the column description there were sometimes commas. This means that the CSV file needs to be cleaned up or another separator used.


回答 18

采用 pandas.read_csv('CSVFILENAME',header=None,sep=', ')

尝试从链接读取csv数据时

http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

我将数据从站点复制到了csvfile中。它有多余的空格,所以使用sep =’,’并且它起作用:)

use pandas.read_csv('CSVFILENAME',header=None,sep=', ')

when trying to read csv data from the link

http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data

I copied the data from the site into my csvfile. It had extra spaces so used sep =’, ‘ and it worked :)


回答 19

我有一个包含行号的数据集,我使用了index_col:

pd.read_csv('train.csv', index_col=0)

I had a dataset with prexisting row numbers, I used index_col:

pd.read_csv('train.csv', index_col=0)

回答 20

这就是我所做的。

sep='::' 解决了我的问题:

data=pd.read_csv('C:\\Users\\HP\\Downloads\\NPL ASSINGMENT 2 imdb_labelled\\imdb_labelled.txt',engine='python',header=None,sep='::')

This is what I did.

sep='::' solved my issue:

data=pd.read_csv('C:\\Users\\HP\\Downloads\\NPL ASSINGMENT 2 imdb_labelled\\imdb_labelled.txt',engine='python',header=None,sep='::')

回答 21

我有与此类似的情况

train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 

工作了

I had a similar case as this and setting

train = pd.read_csv('input.csv' , encoding='latin1',engine='python') 

worked


回答 22

当read_csv时,我有同样的问题:ParserError:标记数据时出错。我只是将旧的csv文件保存到新的csv文件中。问题已经解决了!

I have the same problem when read_csv: ParserError: Error tokenizing data. I just saved the old csv file to a new csv file. The problem is solved!


回答 23

对我来说,问题在于,当日 CSV追加了一个新列。接受的答案解决方案将无法正常工作,因为如果我使用的话,以后的每一行都会被丢弃error_bad_lines=False

在这种情况下,解决方案是使用中的usecols参数pd.read_csv()。这样,我可以仅指定需要读入CSV的列,并且只要标头列存在(并且列名不变),我的Python代码就可以对将来的CSV更改保持弹性。

usecols : list-like or callable, optional 

Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or
strings that correspond to column names provided either by the user in
names or inferred from the document header row(s). For example, a
valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar',
'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1,
0]. To instantiate a DataFrame from data with element order preserved
use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for
columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo',
'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

my_columns = ['foo', 'bar', 'bob']
df = pd.read_csv(file_path, usecols=my_columns)

这样做的另一个好处是,如果我仅使用3-4列的CSV(具有18-20列),则可以将较少的数据加载到内存中。

The issue for me was that a new column was appended to my CSV intraday. The accepted answer solution would not work as every future row would be discarded if I used error_bad_lines=False.

The solution in this case was to use the usecols parameter in pd.read_csv(). This way I can specify only the columns that I need to read into the CSV and my Python code will remain resilient to future CSV changes so long as a header column exists (and the column names do not change).

usecols : list-like or callable, optional 

Return a subset of the columns. If list-like, all elements must either
be positional (i.e. integer indices into the document columns) or
strings that correspond to column names provided either by the user in
names or inferred from the document header row(s). For example, a
valid list-like usecols parameter would be [0, 1, 2] or ['foo', 'bar',
'baz']. Element order is ignored, so usecols=[0, 1] is the same as [1,
0]. To instantiate a DataFrame from data with element order preserved
use pd.read_csv(data, usecols=['foo', 'bar'])[['foo', 'bar']] for
columns in ['foo', 'bar'] order or pd.read_csv(data, usecols=['foo',
'bar'])[['bar', 'foo']] for ['bar', 'foo'] order.

Example

my_columns = ['foo', 'bar', 'bob']
df = pd.read_csv(file_path, usecols=my_columns)

Another benefit of this is that I can load way less data into memory if I am only using 3-4 columns of a CSV that has 18-20 columns.


回答 24

简单的解决方法:在excel中打开csv文件,并以csv格式的其他名称文件保存。再次尝试导入spyder,将解决您的问题!

Simple resolution: Open the csv file in excel & save it with different name file of csv format. Again try importing it spyder, Your problem will be resolved!


回答 25

我遇到了带有引号的错误。我使用映射软件,在导出逗号分隔文件时,该软件会在文本项周围加上引号。当使用引号(例如’=英尺和“ =英寸)时,如果引起定界符冲突,则可能会出现问题。请考虑以下示例,该示例指出5英寸的测井记录质量较差:

UWI_key,Latitude,Longitude,Remark US42051316890000,30.4386484,-96.4330734,"poor 5""

5"速记的5 inch方式结束了在工程扔扳手。Excel会简单地删除多余的引号,但Pandas会崩溃而没有error_bad_lines=False上面提到的参数。

I have encountered this error with a stray quotation mark. I use mapping software which will put quotation marks around text items when exporting comma-delimited files. Text which uses quote marks (e.g. ‘ = feet and ” = inches) can be problematic when then induce delimiter collisions. Consider this example which notes that a 5-inch well log print is poor:

UWI_key,Latitude,Longitude,Remark US42051316890000,30.4386484,-96.4330734,"poor 5""

Using 5" as shorthand for 5 inch ends up throwing a wrench in the works. Excel will simply strip off the extra quote mark, but Pandas breaks down without the error_bad_lines=False argument mentioned above.


回答 26

据我所知,在查看文件后,问题在于您要加载的csv文件具有多个表。有空行或包含表标题的行。尝试看看这个Stackoverflow答案。它显示了如何以编程方式实现这一目标。

做到这一点的另一种动态方法是使用csv模块,一次读取每一行并进行完整性检查/正则表达式,以推断该行是否为(title / header / values / blank)。使用此方法还有一个优势,即可以根据需要在python对象中拆分/追加/收集数据。

最简单的方法是pd.read_clipboard()在手动选择表格并将其复制到剪贴板后使用pandas功能,以防您可以在excel中打开CSV或其他功能。

不相关的

此外,与您的问题无关,但是因为没有人提到此问题:seeds_dataset.txt从UCI 加载某些数据集时,我遇到了同样的问题。在我的情况下,发生此错误是因为某些分隔符比真正的tab具有更多的空格\t。例如,请参见下面的第3行

14.38   14.21   0.8951  5.386   3.312   2.462   4.956   1
14.69   14.49   0.8799  5.563   3.259   3.586   5.219   1
14.11   14.1    0.8911  5.42    3.302   2.7     5       1

因此,请使用\t+分隔符样式代替\t

data = pd.read_csv(path, sep='\t+`, header=None)

As far as I can tell, and after taking a look at your file, the problem is that the csv file you’re trying to load has multiple tables. There are empty lines, or lines that contain table titles. Try to have a look at this Stackoverflow answer. It shows how to achieve that programmatically.

Another dynamic approach to do that would be to use the csv module, read every single row at a time and make sanity checks/regular expressions, to infer if the row is (title/header/values/blank). You have one more advantage with this approach, that you can split/append/collect your data in python objects as desired.

The easiest of all would be to use pandas function pd.read_clipboard() after manually selecting and copying the table to the clipboard, in case you can open the csv in excel or something.

Irrelevant:

Additionally, irrelevant to your problem, but because no one made mention of this: I had this same issue when loading some datasets such as seeds_dataset.txt from UCI. In my case, the error was occurring because some separators had more whitespaces than a true tab \t. See line 3 in the following for instance

14.38   14.21   0.8951  5.386   3.312   2.462   4.956   1
14.69   14.49   0.8799  5.563   3.259   3.586   5.219   1
14.11   14.1    0.8911  5.42    3.302   2.7     5       1

Therefore, use \t+ in the separator pattern instead of \t.

data = pd.read_csv(path, sep='\t+`, header=None)

回答 27

就我而言,这是因为csv文件的第一行和最后两行的格式与文件的中间内容不同。

因此,我要做的是将csv文件作为字符串打开,解析字符串的内容,然后用于read_csv获取数据框。

import io
import pandas as pd

file = open(f'{file_path}/{file_name}', 'r')
content = file.read()

# change new line character from '\r\n' to '\n'
lines = content.replace('\r', '').split('\n')

# Remove the first and last 2 lines of the file
# StringIO can be considered as a file stored in memory
df = pd.read_csv(StringIO("\n".join(lines[2:-2])), header=None)

In my case, it is because the format of the first and last two lines of the csv file is different from the middle content of the file.

So what I do is open the csv file as a string, parse the content of the string, then use read_csv to get a dataframe.

import io
import pandas as pd

file = open(f'{file_path}/{file_name}', 'r')
content = file.read()

# change new line character from '\r\n' to '\n'
lines = content.replace('\r', '').split('\n')

# Remove the first and last 2 lines of the file
# StringIO can be considered as a file stored in memory
df = pd.read_csv(StringIO("\n".join(lines[2:-2])), header=None)

回答 28

在我的情况下,分隔符不是默认的“,”,而是Tab。

pd.read_csv(file_name.csv, sep='\\t',lineterminator='\\r', engine='python', header='infer')

注意:“ \ t”不符合某些来源的建议。需要“ \\ t”。

In my case the separator was not the default “,” but Tab.

pd.read_csv(file_name.csv, sep='\\t',lineterminator='\\r', engine='python', header='infer')

Note: “\t” did not work as suggested by some sources. “\\t” was required.


回答 29

我有一个类似的错误,问题是我的csv文件中有一些转义的引号,并且需要适当地设置escapechar参数。

I had a similar error and the issue was that I had some escaped quotes in my csv file and needed to set the escapechar parameter appropriately.


Python中字典的深层副本

问题:Python中字典的深层副本

我想dict在python中制作一个深层副本。不幸的是,该.deepcopy()方法不存在dict。我怎么做?

>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7

最后一行应为3

我希望所做的修改my_dict不会影响快照my_copy

我怎么做?该解决方案应与Python 3.x兼容。

I would like to make a deep copy of a dict in python. Unfortunately the .deepcopy() method doesn’t exist for the dict. How do I do that?

>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = my_dict.deepcopy()
Traceback (most recent calll last):
  File "<stdin>", line 1, in <module>
AttributeError: 'dict' object has no attribute 'deepcopy'
>>> my_copy = my_dict.copy()
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
7

The last line should be 3.

I would like that modifications in my_dict don’t impact the snapshot my_copy.

How do I do that? The solution should be compatible with Python 3.x.


回答 0

怎么样:

import copy
d = { ... }
d2 = copy.deepcopy(d)

Python 2或3:

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>

How about:

import copy
d = { ... }
d2 = copy.deepcopy(d)

Python 2 or 3:

Python 3.2 (r32:88445, Feb 20 2011, 21:30:00) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import copy
>>> my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
>>> my_copy = copy.deepcopy(my_dict)
>>> my_dict['a'][2] = 7
>>> my_copy['a'][2]
3
>>>

回答 1

dict.copy()是字典
id的浅表复制函数, 是内置函数,可为您提供变量的地址

首先,您需要了解“为什么会发生此特定问题?”

In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

In [2]: my_copy = my_dict.copy()

In [3]: id(my_dict)
Out[3]: 140190444167808

In [4]: id(my_copy)
Out[4]: 140190444170328

In [5]: id(my_copy['a'])
Out[5]: 140190444024104

In [6]: id(my_dict['a'])
Out[6]: 140190444024104

键“ a”的两个字典中都存在的列表地址指向同一位置。
因此,当您在my_dict中更改列表的值时,my_copy中的列表也会更改。


问题中提到的数据结构解决方案:

In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}

In [8]: id(my_copy['a'])
Out[8]: 140190444024176

或者您可以使用上述的Deepcopy。

dict.copy() is a shallow copy function for dictionary
id is built-in function that gives you the address of variable

First you need to understand “why is this particular problem is happening?”

In [1]: my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}

In [2]: my_copy = my_dict.copy()

In [3]: id(my_dict)
Out[3]: 140190444167808

In [4]: id(my_copy)
Out[4]: 140190444170328

In [5]: id(my_copy['a'])
Out[5]: 140190444024104

In [6]: id(my_dict['a'])
Out[6]: 140190444024104

The address of the list present in both the dicts for key ‘a’ is pointing to same location.
Therefore when you change value of the list in my_dict, the list in my_copy changes as well.


Solution for data structure mentioned in the question:

In [7]: my_copy = {key: value[:] for key, value in my_dict.items()}

In [8]: id(my_copy['a'])
Out[8]: 140190444024176

Or you can use deepcopy as mentioned above.


回答 2

Python 3.x

从复制导入深度复制

my_dict = {'one': 1, 'two': 2}
new_dict_deepcopy = deepcopy(my_dict)

如果没有Deepcopy,我将无法从域字典中删除主机名字典。

没有Deepcopy,我会收到以下错误:

"RuntimeError: dictionary changed size during iteration"

…当我尝试从另一本字典中的字典中删除所需的元素时。

import socket
import xml.etree.ElementTree as ET
from copy import deepcopy

域是一个字典对象

def remove_hostname(domain, hostname):
    domain_copy = deepcopy(domain)
    for domains, hosts in domain_copy.items():
        for host, port in hosts.items():
           if host == hostname:
                del domain[domains][host]
    return domain

输出示例:[ orginal ] domains = {‘localdomain’:{‘localhost’:{‘all’:’4000’}}}

[new] domains = {”localdomain’:{}}}

因此,这里发生的是我正在遍历字典的副本,而不是遍历字典本身。使用此方法,您可以根据需要删除元素。

Python 3.x

from copy import deepcopy

my_dict = {'one': 1, 'two': 2}
new_dict_deepcopy = deepcopy(my_dict)

Without deepcopy, I am unable to remove the hostname dictionary from within my domain dictionary.

Without deepcopy I get the following error:

"RuntimeError: dictionary changed size during iteration"

…when I try to remove the desired element from my dictionary inside of another dictionary.

import socket
import xml.etree.ElementTree as ET
from copy import deepcopy

domain is a dictionary object

def remove_hostname(domain, hostname):
    domain_copy = deepcopy(domain)
    for domains, hosts in domain_copy.items():
        for host, port in hosts.items():
           if host == hostname:
                del domain[domains][host]
    return domain

Example output: [orginal]domains = {‘localdomain’: {‘localhost’: {‘all’: ‘4000’}}}

[new]domains = {‘localdomain’: {} }}

So what’s going on here is I am iterating over a copy of a dictionary rather than iterating over the dictionary itself. With this method, you are able to remove elements as needed.


回答 3

我喜欢Lasse V. Karlsen并从中学到了很多。我将其修改为以下示例,该示例很好地突出了浅字典副本和深副本之间的区别:

    import copy

    my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
    my_copy = copy.copy(my_dict)
    my_deepcopy = copy.deepcopy(my_dict)

现在,如果你改变

    my_dict['a'][2] = 7

并做

    print("my_copy a[2]: ",my_copy['a'][2],",whereas my_deepcopy a[2]: ", my_deepcopy['a'][2])

你得到

    >> my_copy a[2]:  7 ,whereas my_deepcopy a[2]:  3

I like and learned a lot from Lasse V. Karlsen. I modified it into the following example, which highlights pretty well the difference between shallow dictionary copies and deep copies:

    import copy

    my_dict = {'a': [1, 2, 3], 'b': [4, 5, 6]}
    my_copy = copy.copy(my_dict)
    my_deepcopy = copy.deepcopy(my_dict)

Now if you change

    my_dict['a'][2] = 7

and do

    print("my_copy a[2]: ",my_copy['a'][2],",whereas my_deepcopy a[2]: ", my_deepcopy['a'][2])

you get

    >> my_copy a[2]:  7 ,whereas my_deepcopy a[2]:  3

回答 4

一个更简单(在我看来)的解决方案是创建一个新词典并用旧词典的内容进行更新:

my_dict={'a':1}

my_copy = {}

my_copy.update( my_dict )

my_dict['a']=2

my_dict['a']
Out[34]: 2

my_copy['a']
Out[35]: 1

这种方法的问题在于它可能不够“深入”。即没有递归的深度。对于简单的对象已经足够好了,但对于嵌套字典却不够。这是一个示例,可能不够深:

my_dict1={'b':2}

my_dict2={'c':3}

my_dict3={ 'b': my_dict1, 'c':my_dict2 }

my_copy = {}

my_copy.update( my_dict3 )

my_dict1['b']='z'

my_copy
Out[42]: {'b': {'b': 'z'}, 'c': {'c': 3}}

通过使用Deepcopy(),我可以消除半浅行为,但是我认为必须确定哪种方法适合您的应用程序。在大多数情况下,您可能并不在意,但应注意可能存在的陷阱…最后的示例:

import copy

my_copy2 = copy.deepcopy( my_dict3 )

my_dict1['b']='99'

my_copy2
Out[46]: {'b': {'b': 'z'}, 'c': {'c': 3}}

A simpler (in my view) solution is to create a new dictionary and update it with the contents of the old one:

my_dict={'a':1}

my_copy = {}

my_copy.update( my_dict )

my_dict['a']=2

my_dict['a']
Out[34]: 2

my_copy['a']
Out[35]: 1

The problem with this approach is it may not be ‘deep enough’. i.e. is not recursively deep. good enough for simple objects but not for nested dictionaries. Here is an example where it may not be deep enough:

my_dict1={'b':2}

my_dict2={'c':3}

my_dict3={ 'b': my_dict1, 'c':my_dict2 }

my_copy = {}

my_copy.update( my_dict3 )

my_dict1['b']='z'

my_copy
Out[42]: {'b': {'b': 'z'}, 'c': {'c': 3}}

By using Deepcopy() I can eliminate the semi-shallow behavior, but I think one must decide which approach is right for your application. In most cases you may not care, but should be aware of the possible pitfalls… final example:

import copy

my_copy2 = copy.deepcopy( my_dict3 )

my_dict1['b']='99'

my_copy2
Out[46]: {'b': {'b': 'z'}, 'c': {'c': 3}}

使用getter和setter的pythonic方法是什么?

问题:使用getter和setter的pythonic方法是什么?

我这样做:

def set_property(property,value):  
def get_property(property):  

要么

object.property = value  
value = object.property

我是Python的新手,因此我仍在探索语法,并且我希望对此提供一些建议。

I’m doing it like:

def set_property(property,value):  
def get_property(property):  

or

object.property = value  
value = object.property

I’m new to Python, so i’m still exploring the syntax, and i’d like some advice on doing this.


回答 0

试试这个:Python属性

示例代码是:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        print("getter of x called")
        return self._x

    @x.setter
    def x(self, value):
        print("setter of x called")
        self._x = value

    @x.deleter
    def x(self):
        print("deleter of x called")
        del self._x


c = C()
c.x = 'foo'  # setter called
foo = c.x    # getter called
del c.x      # deleter called

Try this: Python Property

The sample code is:

class C(object):
    def __init__(self):
        self._x = None

    @property
    def x(self):
        """I'm the 'x' property."""
        print("getter of x called")
        return self._x

    @x.setter
    def x(self, value):
        print("setter of x called")
        self._x = value

    @x.deleter
    def x(self):
        print("deleter of x called")
        del self._x


c = C()
c.x = 'foo'  # setter called
foo = c.x    # getter called
del c.x      # deleter called

回答 1

使用getter和setter的pythonic方法是什么?

“ Pythonic”方式不是使用“ getters”和“ setters”,而是使用简单的属性(如问题所展示的那样)并del用于删除(但名称被更改以保护无辜的内建函数):

value = 'something'

obj.attribute = value  
value = obj.attribute
del obj.attribute

如果以后要修改设置并获取,则可以通过使用property装饰器来进行,而无需更改用户代码:

class Obj:
    """property demo"""
    #
    @property            # first decorate the getter method
    def attribute(self): # This getter method name is *the* name
        return self._attribute
    #
    @attribute.setter    # the property decorates with `.setter` now
    def attribute(self, value):   # name, e.g. "attribute", is the same
        self._attribute = value   # the "value" name isn't special
    #
    @attribute.deleter     # decorate with `.deleter`
    def attribute(self):   # again, the method name is the same
        del self._attribute

(每个装饰器用法都会复制并更新先前的属性对象,因此请注意,对于每个设置,获取和删除功能/方法,都应使用相同的名称。

定义完上述内容后,原始设置,获取和删除代码都相同:

obj = Obj()
obj.attribute = value  
the_value = obj.attribute
del obj.attribute

您应该避免这种情况:

def set_property(property,value):  
def get_property(property):  

首先,上面的方法不起作用,因为您没有为该属性设置为(通常为self)的实例提供参数,该参数为:

class Obj:

    def set_property(self, property, value): # don't do this
        ...
    def get_property(self, property):        # don't do this either
        ...

其次,这种复制的两个特殊方法的目的,__setattr____getattr__

第三,我们还具有setattrgetattr内置功能。

setattr(object, 'property_name', value)
getattr(object, 'property_name', default_value)  # default is optional

@property装饰是创建getter和setter方法。

例如,我们可以修改设置行为以限制要设置的值:

class Protective(object):

    @property
    def protected_value(self):
        return self._protected_value

    @protected_value.setter
    def protected_value(self, value):
        if acceptable(value): # e.g. type or range check
            self._protected_value = value

通常,我们要避免property使用直接属性,而只使用直接属性。

这是Python用户所期望的。遵循最小惊奇规则,除非您有非常令人信服的相反理由,否则应尝试向用户提供他们期望的结果。

示范

例如,假设我们需要将对象的protected属性设置为0到100之间的整数(包括0和100),并防止其删除,并通过适当的消息通知用户其正确用法:

class Protective(object):
    """protected property demo"""
    #
    def __init__(self, start_protected_value=0):
        self.protected_value = start_protected_value
    # 
    @property
    def protected_value(self):
        return self._protected_value
    #
    @protected_value.setter
    def protected_value(self, value):
        if value != int(value):
            raise TypeError("protected_value must be an integer")
        if 0 <= value <= 100:
            self._protected_value = int(value)
        else:
            raise ValueError("protected_value must be " +
                             "between 0 and 100 inclusive")
    #
    @protected_value.deleter
    def protected_value(self):
        raise AttributeError("do not delete, protected_value can be set to 0")

(请注意,__init__是指self.protected_value但属性方法是指self._protected_value。这是为了__init__通过公共API使用该属性,确保该属性受到“保护”。)

和用法:

>>> p1 = Protective(3)
>>> p1.protected_value
3
>>> p1 = Protective(5.0)
>>> p1.protected_value
5
>>> p2 = Protective(-5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
  File "<stdin>", line 15, in protected_value
ValueError: protectected_value must be between 0 and 100 inclusive
>>> p1.protected_value = 7.3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 17, in protected_value
TypeError: protected_value must be an integer
>>> p1.protected_value = 101
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 15, in protected_value
ValueError: protectected_value must be between 0 and 100 inclusive
>>> del p1.protected_value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 18, in protected_value
AttributeError: do not delete, protected_value can be set to 0

名称重要吗?

是的,他们愿意.setter.deleter复制原始财产。这允许子类在不更改父级行为的情况下正确修改行为。

class Obj:
    """property demo"""
    #
    @property
    def get_only(self):
        return self._attribute
    #
    @get_only.setter
    def get_or_set(self, value):
        self._attribute = value
    #
    @get_or_set.deleter
    def get_set_or_delete(self):
        del self._attribute

现在要使它起作用,您必须使用相应的名称:

obj = Obj()
# obj.get_only = 'value' # would error
obj.get_or_set = 'value'  
obj.get_set_or_delete = 'new value'
the_value = obj.get_only
del obj.get_set_or_delete
# del obj.get_or_set # would error

我不确定这在哪里有用,但是用例是您是否需要获取,设置和/或仅删除属性。最好坚持使用具有相同名称的语义上相同的属性。

结论

从简单的属性开始。

如果以后需要围绕设置,获取和删除的功能,则可以使用属性装饰器添加它。

避免将函数命名为set_...get_...-这就是属性的作用。

What’s the pythonic way to use getters and setters?

The “Pythonic” way is not to use “getters” and “setters”, but to use plain attributes, like the question demonstrates, and del for deleting (but the names are changed to protect the innocent… builtins):

value = 'something'

obj.attribute = value  
value = obj.attribute
del obj.attribute

If later, you want to modify the setting and getting, you can do so without having to alter user code, by using the property decorator:

class Obj:
    """property demo"""
    #
    @property            # first decorate the getter method
    def attribute(self): # This getter method name is *the* name
        return self._attribute
    #
    @attribute.setter    # the property decorates with `.setter` now
    def attribute(self, value):   # name, e.g. "attribute", is the same
        self._attribute = value   # the "value" name isn't special
    #
    @attribute.deleter     # decorate with `.deleter`
    def attribute(self):   # again, the method name is the same
        del self._attribute

(Each decorator usage copies and updates the prior property object, so note that you should use the same name for each set, get, and delete function/method.

After defining the above, the original setting, getting, and deleting code is the same:

obj = Obj()
obj.attribute = value  
the_value = obj.attribute
del obj.attribute

You should avoid this:

def set_property(property,value):  
def get_property(property):  

Firstly, the above doesn’t work, because you don’t provide an argument for the instance that the property would be set to (usually self), which would be:

class Obj:

    def set_property(self, property, value): # don't do this
        ...
    def get_property(self, property):        # don't do this either
        ...

Secondly, this duplicates the purpose of two special methods, __setattr__ and __getattr__.

Thirdly, we also have the setattr and getattr builtin functions.

setattr(object, 'property_name', value)
getattr(object, 'property_name', default_value)  # default is optional

The @property decorator is for creating getters and setters.

For example, we could modify the setting behavior to place restrictions the value being set:

class Protective(object):

    @property
    def protected_value(self):
        return self._protected_value

    @protected_value.setter
    def protected_value(self, value):
        if acceptable(value): # e.g. type or range check
            self._protected_value = value

In general, we want to avoid using property and just use direct attributes.

This is what is expected by users of Python. Following the rule of least-surprise, you should try to give your users what they expect unless you have a very compelling reason to the contrary.

Demonstration

For example, say we needed our object’s protected attribute to be an integer between 0 and 100 inclusive, and prevent its deletion, with appropriate messages to inform the user of its proper usage:

class Protective(object):
    """protected property demo"""
    #
    def __init__(self, start_protected_value=0):
        self.protected_value = start_protected_value
    # 
    @property
    def protected_value(self):
        return self._protected_value
    #
    @protected_value.setter
    def protected_value(self, value):
        if value != int(value):
            raise TypeError("protected_value must be an integer")
        if 0 <= value <= 100:
            self._protected_value = int(value)
        else:
            raise ValueError("protected_value must be " +
                             "between 0 and 100 inclusive")
    #
    @protected_value.deleter
    def protected_value(self):
        raise AttributeError("do not delete, protected_value can be set to 0")

(Note that __init__ refers to self.protected_value but the property methods refer to self._protected_value. This is so that __init__ uses the property through the public API, ensuring it is “protected”.)

And usage:

>>> p1 = Protective(3)
>>> p1.protected_value
3
>>> p1 = Protective(5.0)
>>> p1.protected_value
5
>>> p2 = Protective(-5)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 3, in __init__
  File "<stdin>", line 15, in protected_value
ValueError: protectected_value must be between 0 and 100 inclusive
>>> p1.protected_value = 7.3
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 17, in protected_value
TypeError: protected_value must be an integer
>>> p1.protected_value = 101
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 15, in protected_value
ValueError: protectected_value must be between 0 and 100 inclusive
>>> del p1.protected_value
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 18, in protected_value
AttributeError: do not delete, protected_value can be set to 0

Do the names matter?

Yes they do. .setter and .deleter make copies of the original property. This allows subclasses to properly modify behavior without altering the behavior in the parent.

class Obj:
    """property demo"""
    #
    @property
    def get_only(self):
        return self._attribute
    #
    @get_only.setter
    def get_or_set(self, value):
        self._attribute = value
    #
    @get_or_set.deleter
    def get_set_or_delete(self):
        del self._attribute

Now for this to work, you have to use the respective names:

obj = Obj()
# obj.get_only = 'value' # would error
obj.get_or_set = 'value'  
obj.get_set_or_delete = 'new value'
the_value = obj.get_only
del obj.get_set_or_delete
# del obj.get_or_set # would error

I’m not sure where this would be useful, but the use-case is if you want a get, set, and/or delete-only property. Probably best to stick to semantically same property having the same name.

Conclusion

Start with simple attributes.

If you later need functionality around the setting, getting, and deleting, you can add it with the property decorator.

Avoid functions named set_... and get_... – that’s what properties are for.


回答 2

In [1]: class test(object):
    def __init__(self):
        self.pants = 'pants'
    @property
    def p(self):
        return self.pants
    @p.setter
    def p(self, value):
        self.pants = value * 2
   ....: 
In [2]: t = test()
In [3]: t.p
Out[3]: 'pants'
In [4]: t.p = 10
In [5]: t.p
Out[5]: 20
In [1]: class test(object):
    def __init__(self):
        self.pants = 'pants'
    @property
    def p(self):
        return self.pants
    @p.setter
    def p(self, value):
        self.pants = value * 2
   ....: 
In [2]: t = test()
In [3]: t.p
Out[3]: 'pants'
In [4]: t.p = 10
In [5]: t.p
Out[5]: 20

回答 3

使用@propertyand @attribute.setter帮助您不仅使用“ pythonic”方式,而且在创建对象和更改对象时都检查属性的有效性。

class Person(object):
    def __init__(self, p_name=None):
        self.name = p_name

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, new_name):
        if type(new_name) == str: #type checking for name property
            self._name = new_name
        else:
            raise Exception("Invalid value for name")

这样,您实际上可以“隐藏” _name客户端开发人员的属性,并且还可以检查名称属性类型。请注意,即使在启动过程中也遵循此方法,将调用设置程序。所以:

p = Person(12)

将导致:

Exception: Invalid value for name

但:

>>>p = person('Mike')
>>>print(p.name)
Mike
>>>p.name = 'George'
>>>print(p.name)
George
>>>p.name = 2.3 # Causes an exception

Using @property and @attribute.setter helps you to not only use the “pythonic” way but also to check the validity of attributes both while creating the object and when altering it.

class Person(object):
    def __init__(self, p_name=None):
        self.name = p_name

    @property
    def name(self):
        return self._name

    @name.setter
    def name(self, new_name):
        if type(new_name) == str: #type checking for name property
            self._name = new_name
        else:
            raise Exception("Invalid value for name")

By this, you actually ‘hide’ _name attribute from client developers and also perform checks on name property type. Note that by following this approach even during the initiation the setter gets called. So:

p = Person(12)

Will lead to:

Exception: Invalid value for name

But:

>>>p = person('Mike')
>>>print(p.name)
Mike
>>>p.name = 'George'
>>>print(p.name)
George
>>>p.name = 2.3 # Causes an exception

回答 4


回答 5

您可以使用存取器/更改器(即@attr.setter@property),但最重要的是要保持一致!

如果您只是@property用来访问属性,例如

class myClass:
    def __init__(a):
        self._a = a

    @property
    def a(self):
        return self._a

使用它来访问every *属性!在不使用访问器的情况下使用以下属性访问某些属性@property并使其他属性公开(即名称不带下划线)是不明智的做法,例如,不要这样做

class myClass:
    def __init__(a, b):
        self.a = a
        self.b = b

    @property
    def a(self):
        return self.a

请注意,self.b即使它是公共的,这里也没有显式访问器。

二传手(或mutators)类似,可以随意使用,@attribute.setter要保持一致!当你做例如

class myClass:
    def __init__(a, b):
        self.a = a
        self.b = b 

    @a.setter
    def a(self, value):
        return self.a = value

我很难猜测你的意图。一方面,您是说ab都是公开的(它们的名称中没有下划线),因此从理论上讲,应该允许我访问/更改(获取/设置)这两者。但是然后您只为a它指定一个显式的mutator ,这告诉我也许我不能设置b。由于您提供了一个显式的mutator,所以我不确定是否缺少显式的accessor(@property)意味着我不能访问这些变量之一,或者您在使用时节俭@property

*exceptions情况是,当您明确希望使某些变量可访问或可变,但不能同时使二者可变或者您希望在访问或更改属性时执行一些其他逻辑。这是我个人使用@property和的时候@attribute.setter(否则,没有用于公共属性的显式acessor / mutators)。

最后,PEP8和Google样式指南的建议:

PEP8,继承设计说:

对于简单的公共数据属性,最好仅公开属性名称,而不使用复杂的访问器/更改器方法。请记住,如果您发现简单的数据属性需要增强功能行为,那么Python为将来的增强提供了简便的途径。在那种情况下,使用属性将功能实现隐藏在简单的数据属性访问语法之后。

另一方面,根据Google样式指南Python语言规则/属性,建议:

使用新代码中的属性来访问或设置数据,而通常情况下,您应该使用简单,轻便的访问器或设置器方法。属性应使用@property装饰器创建。

这种方法的优点:

通过消除对简单属性访问的显式get和set方法调用,提高了可读性。允许计算是懒惰的。考虑使用Python方式维护类的接口。在性能方面,当直接变量访问合理时,允许属性绕过需要简单的访问器方法的情况。这也允许将来在不破坏接口的情况下添加访问器方法。

利弊:

必须object在Python 2中继承。可以隐藏副作用,就像运算符重载一样。对于子类可能会造成混淆。

You can use accessors/mutators (i.e. @attr.setter and @property) or not, but the most important thing is to be consistent!

If you’re using @property to simply access an attribute, e.g.

class myClass:
    def __init__(a):
        self._a = a

    @property
    def a(self):
        return self._a

use it to access every* attribute! It would be a bad practice to access some attributes using @property and leave some other properties public (i.e. name without an underscore) without an accessor, e.g. do not do

class myClass:
    def __init__(a, b):
        self.a = a
        self.b = b

    @property
    def a(self):
        return self.a

Note that self.b does not have an explicit accessor here even though it’s public.

Similarly with setters (or mutators), feel free to use @attribute.setter but be consistent! When you do e.g.

class myClass:
    def __init__(a, b):
        self.a = a
        self.b = b 

    @a.setter
    def a(self, value):
        return self.a = value

It’s hard for me to guess your intention. On one hand you’re saying that both a and b are public (no leading underscore in their names) so I should theoretically be allowed to access/mutate (get/set) both. But then you specify an explicit mutator only for a, which tells me that maybe I should not be able to set b. Since you’ve provided an explicit mutator I am not sure if the lack of explicit accessor (@property) means I should not be able to access either of those variables or you were simply being frugal in using @property.

*The exception is when you explicitly want to make some variables accessible or mutable but not both or you want to perform some additional logic when accessing or mutating an attribute. This is when I am personally using @property and @attribute.setter (otherwise no explicit acessors/mutators for public attributes).

Lastly, PEP8 and Google Style Guide suggestions:

PEP8, Designing for Inheritance says:

For simple public data attributes, it is best to expose just the attribute name, without complicated accessor/mutator methods. Keep in mind that Python provides an easy path to future enhancement, should you find that a simple data attribute needs to grow functional behavior. In that case, use properties to hide functional implementation behind simple data attribute access syntax.

On the other hand, according to Google Style Guide Python Language Rules/Properties the recommendation is to:

Use properties in new code to access or set data where you would normally have used simple, lightweight accessor or setter methods. Properties should be created with the @property decorator.

The pros of this approach:

Readability is increased by eliminating explicit get and set method calls for simple attribute access. Allows calculations to be lazy. Considered the Pythonic way to maintain the interface of a class. In terms of performance, allowing properties bypasses needing trivial accessor methods when a direct variable access is reasonable. This also allows accessor methods to be added in the future without breaking the interface.

and cons:

Must inherit from object in Python 2. Can hide side-effects much like operator overloading. Can be confusing for subclasses.


回答 6

您可以使用魔术方法__getattribute____setattr__

class MyClass:
    def __init__(self, attrvalue):
        self.myattr = attrvalue
    def __getattribute__(self, attr):
        if attr == "myattr":
            #Getter for myattr
    def __setattr__(self, attr):
        if attr == "myattr":
            #Setter for myattr

要知道,__getattr____getattribute__是不一样的。__getattr__仅在找不到属性时调用。

You can use the magic methods __getattribute__ and __setattr__.

class MyClass:
    def __init__(self, attrvalue):
        self.myattr = attrvalue
    def __getattribute__(self, attr):
        if attr == "myattr":
            #Getter for myattr
    def __setattr__(self, attr):
        if attr == "myattr":
            #Setter for myattr

Be aware that __getattr__ and __getattribute__ are not the same. __getattr__ is only invoked when the attribute is not found.


来自对象字段的Python字典

问题:来自对象字段的Python字典

您是否知道是否有内置函数可以从任意对象构建字典?我想做这样的事情:

>>> class Foo:
...     bar = 'hello'
...     baz = 'world'
...
>>> f = Foo()
>>> props(f)
{ 'bar' : 'hello', 'baz' : 'world' }

注意:它不应包含方法。仅字段。

Do you know if there is a built-in function to build a dictionary from an arbitrary object? I’d like to do something like this:

>>> class Foo:
...     bar = 'hello'
...     baz = 'world'
...
>>> f = Foo()
>>> props(f)
{ 'bar' : 'hello', 'baz' : 'world' }

NOTE: It should not include methods. Only fields.


回答 0

请注意,Python 2.7中的最佳实践是使用新型类(Python 3不需要),即

class Foo(object):
   ...

同样,“对象”和“类”之间也存在差异。要从任意对象构建字典,只需使用即可__dict__。通常,您将在类级别声明您的方法,并在实例级别声明您的属性,因此__dict__应该没问题。例如:

>>> class A(object):
...   def __init__(self):
...     self.b = 1
...     self.c = 2
...   def do_nothing(self):
...     pass
...
>>> a = A()
>>> a.__dict__
{'c': 2, 'b': 1}

更好的方法(由robert建议在注释中使用)是内置vars函数:

>>> vars(a)
{'c': 2, 'b': 1}

另外,根据您要执行的操作,最好继承自dict。然后,您的Class已经是字典,并且如果您愿意,可以覆盖getattr和/或setattr调用并设置字典。例如:

class Foo(dict):
    def __init__(self):
        pass
    def __getattr__(self, attr):
        return self[attr]

    # etc...

Note that best practice in Python 2.7 is to use new-style classes (not needed with Python 3), i.e.

class Foo(object):
   ...

Also, there’s a difference between an ‘object’ and a ‘class’. To build a dictionary from an arbitrary object, it’s sufficient to use __dict__. Usually, you’ll declare your methods at class level and your attributes at instance level, so __dict__ should be fine. For example:

>>> class A(object):
...   def __init__(self):
...     self.b = 1
...     self.c = 2
...   def do_nothing(self):
...     pass
...
>>> a = A()
>>> a.__dict__
{'c': 2, 'b': 1}

A better approach (suggested by robert in comments) is the builtin vars function:

>>> vars(a)
{'c': 2, 'b': 1}

Alternatively, depending on what you want to do, it might be nice to inherit from dict. Then your class is already a dictionary, and if you want you can override getattr and/or setattr to call through and set the dict. For example:

class Foo(dict):
    def __init__(self):
        pass
    def __getattr__(self, attr):
        return self[attr]

    # etc...

回答 1

取而代之的是x.__dict__,它实际上更具有Pythonic的用法vars(x)

Instead of x.__dict__, it’s actually more pythonic to use vars(x).


回答 2

dir内置会给你对象的所有属性,包括特殊的方法,如__str____dict__和一大堆人,你可能不希望的。但是您可以执行以下操作:

>>> class Foo(object):
...     bar = 'hello'
...     baz = 'world'
...
>>> f = Foo()
>>> [name for name in dir(f) if not name.startswith('__')]
[ 'bar', 'baz' ]
>>> dict((name, getattr(f, name)) for name in dir(f) if not name.startswith('__')) 
{ 'bar': 'hello', 'baz': 'world' }

因此可以通过定义如下props函数将其扩展为仅返回数据属性而不是方法:

import inspect

def props(obj):
    pr = {}
    for name in dir(obj):
        value = getattr(obj, name)
        if not name.startswith('__') and not inspect.ismethod(value):
            pr[name] = value
    return pr

The dir builtin will give you all the object’s attributes, including special methods like __str__, __dict__ and a whole bunch of others which you probably don’t want. But you can do something like:

>>> class Foo(object):
...     bar = 'hello'
...     baz = 'world'
...
>>> f = Foo()
>>> [name for name in dir(f) if not name.startswith('__')]
[ 'bar', 'baz' ]
>>> dict((name, getattr(f, name)) for name in dir(f) if not name.startswith('__')) 
{ 'bar': 'hello', 'baz': 'world' }

So can extend this to only return data attributes and not methods, by defining your props function like this:

import inspect

def props(obj):
    pr = {}
    for name in dir(obj):
        value = getattr(obj, name)
        if not name.startswith('__') and not inspect.ismethod(value):
            pr[name] = value
    return pr

回答 3

我已经结合了两个答案:

dict((key, value) for key, value in f.__dict__.iteritems() 
    if not callable(value) and not key.startswith('__'))

I’ve settled with a combination of both answers:

dict((key, value) for key, value in f.__dict__.iteritems() 
    if not callable(value) and not key.startswith('__'))

回答 4

我以为我会花些时间向您展示如何通过转换对象来决定字典dict(obj)

class A(object):
    d = '4'
    e = '5'
    f = '6'

    def __init__(self):
        self.a = '1'
        self.b = '2'
        self.c = '3'

    def __iter__(self):
        # first start by grabbing the Class items
        iters = dict((x,y) for x,y in A.__dict__.items() if x[:2] != '__')

        # then update the class items with the instance items
        iters.update(self.__dict__)

        # now 'yield' through the items
        for x,y in iters.items():
            yield x,y

a = A()
print(dict(a)) 
# prints "{'a': '1', 'c': '3', 'b': '2', 'e': '5', 'd': '4', 'f': '6'}"

此代码的关键部分是 __iter__功能。

正如评论所解释的,我们要做的第一件事是获取Class项,并防止以’__’开头的任何东西。

一旦创建了它dict,就可以使用updatedict函数并传入实例__dict__

这些将为您提供完整的成员类+实例字典。现在剩下的就是迭代它们并产生回报。

另外,如果您打算大量使用它,则可以创建一个@iterable类装饰器。

def iterable(cls):
    def iterfn(self):
        iters = dict((x,y) for x,y in cls.__dict__.items() if x[:2] != '__')
        iters.update(self.__dict__)

        for x,y in iters.items():
            yield x,y

    cls.__iter__ = iterfn
    return cls

@iterable
class B(object):
    d = 'd'
    e = 'e'
    f = 'f'

    def __init__(self):
        self.a = 'a'
        self.b = 'b'
        self.c = 'c'

b = B()
print(dict(b))

I thought I’d take some time to show you how you can translate an object to dict via dict(obj).

class A(object):
    d = '4'
    e = '5'
    f = '6'

    def __init__(self):
        self.a = '1'
        self.b = '2'
        self.c = '3'

    def __iter__(self):
        # first start by grabbing the Class items
        iters = dict((x,y) for x,y in A.__dict__.items() if x[:2] != '__')

        # then update the class items with the instance items
        iters.update(self.__dict__)

        # now 'yield' through the items
        for x,y in iters.items():
            yield x,y

a = A()
print(dict(a)) 
# prints "{'a': '1', 'c': '3', 'b': '2', 'e': '5', 'd': '4', 'f': '6'}"

The key section of this code is the __iter__ function.

As the comments explain, the first thing we do is grab the Class items and prevent anything that starts with ‘__’.

Once you’ve created that dict, then you can use the update dict function and pass in the instance __dict__.

These will give you a complete class+instance dictionary of members. Now all that’s left is to iterate over them and yield the returns.

Also, if you plan on using this a lot, you can create an @iterable class decorator.

def iterable(cls):
    def iterfn(self):
        iters = dict((x,y) for x,y in cls.__dict__.items() if x[:2] != '__')
        iters.update(self.__dict__)

        for x,y in iters.items():
            yield x,y

    cls.__iter__ = iterfn
    return cls

@iterable
class B(object):
    d = 'd'
    e = 'e'
    f = 'f'

    def __init__(self):
        self.a = 'a'
        self.b = 'b'
        self.c = 'c'

b = B()
print(dict(b))

回答 5

要从任意对象构建字典,只需使用即可__dict__

这会错过对象从其类继承的属性。例如,

class c(object):
    x = 3
a = c()

hasattr(a,’x’)是true,但是’x’不会出现在a .__ dict__

To build a dictionary from an arbitrary object, it’s sufficient to use __dict__.

This misses attributes that the object inherits from its class. For example,

class c(object):
    x = 3
a = c()

hasattr(a, ‘x’) is true, but ‘x’ does not appear in a.__dict__


回答 6

答案较晚,但提供了完整性和对Google员工的好处:

def props(x):
    return dict((key, getattr(x, key)) for key in dir(x) if key not in dir(x.__class__))

这不会显示在类中定义的方法,但仍会显示字段,包括分配给lambda的字段或以双下划线开头的字段。

Late answer but provided for completeness and the benefit of googlers:

def props(x):
    return dict((key, getattr(x, key)) for key in dir(x) if key not in dir(x.__class__))

This will not show methods defined in the class, but it will still show fields including those assigned to lambdas or those which start with a double underscore.


回答 7

我认为最简单的方法是为该类创建一个getitem属性。如果需要写入对象,则可以创建一个自定义setattr。这是getitem的示例:

class A(object):
    def __init__(self):
        self.b = 1
        self.c = 2
    def __getitem__(self, item):
        return self.__dict__[item]

# Usage: 
a = A()
a.__getitem__('b')  # Outputs 1
a.__dict__  # Outputs {'c': 2, 'b': 1}
vars(a)  # Outputs {'c': 2, 'b': 1}

dict将对象属性生成到字典中,并且字典对象可用于获取所需的项目。

I think the easiest way is to create a getitem attribute for the class. If you need to write to the object, you can create a custom setattr . Here is an example for getitem:

class A(object):
    def __init__(self):
        self.b = 1
        self.c = 2
    def __getitem__(self, item):
        return self.__dict__[item]

# Usage: 
a = A()
a.__getitem__('b')  # Outputs 1
a.__dict__  # Outputs {'c': 2, 'b': 1}
vars(a)  # Outputs {'c': 2, 'b': 1}

dict generates the objects attributes into a dictionary and the dictionary object can be used to get the item you need.


回答 8

使用的缺点 __dict__是它很浅。它不会将任何子类转换为字典。

如果您使用的是Python3.5或更高版本,则可以使用jsons

>>> import jsons
>>> jsons.dump(f)
{'bar': 'hello', 'baz': 'world'}

A downside of using __dict__ is that it is shallow; it won’t convert any subclasses to dictionaries.

If you’re using Python3.5 or higher, you can use jsons:

>>> import jsons
>>> jsons.dump(f)
{'bar': 'hello', 'baz': 'world'}

回答 9

如果要列出部分属性,请覆盖__dict__

def __dict__(self):
    d = {
    'attr_1' : self.attr_1,
    ...
    }
    return d

# Call __dict__
d = instance.__dict__()

如果您instance获得了一些大块数据,并且想要d像消息队列一样推送到Redis ,这将很有帮助。

If you want to list part of your attributes, override __dict__:

def __dict__(self):
    d = {
    'attr_1' : self.attr_1,
    ...
    }
    return d

# Call __dict__
d = instance.__dict__()

This helps a lot if your instance get some large block data and you want to push d to Redis like message queue.


回答 10

PYTHON 3:

class DateTimeDecoder(json.JSONDecoder):

   def __init__(self, *args, **kargs):
        JSONDecoder.__init__(self, object_hook=self.dict_to_object,
                         *args, **kargs)

   def dict_to_object(self, d):
       if '__type__' not in d:
          return d

       type = d.pop('__type__')
       try:
          dateobj = datetime(**d)
          return dateobj
       except:
          d['__type__'] = type
          return d

def json_default_format(value):
    try:
        if isinstance(value, datetime):
            return {
                '__type__': 'datetime',
                'year': value.year,
                'month': value.month,
                'day': value.day,
                'hour': value.hour,
                'minute': value.minute,
                'second': value.second,
                'microsecond': value.microsecond,
            }
        if isinstance(value, decimal.Decimal):
            return float(value)
        if isinstance(value, Enum):
            return value.name
        else:
            return vars(value)
    except Exception as e:
        raise ValueError

现在,您可以在自己的类中使用上述代码:

class Foo():
  def toJSON(self):
        return json.loads(
            json.dumps(self, sort_keys=True, indent=4, separators=(',', ': '), default=json_default_format), cls=DateTimeDecoder)


Foo().toJSON() 

PYTHON 3:

class DateTimeDecoder(json.JSONDecoder):

   def __init__(self, *args, **kargs):
        JSONDecoder.__init__(self, object_hook=self.dict_to_object,
                         *args, **kargs)

   def dict_to_object(self, d):
       if '__type__' not in d:
          return d

       type = d.pop('__type__')
       try:
          dateobj = datetime(**d)
          return dateobj
       except:
          d['__type__'] = type
          return d

def json_default_format(value):
    try:
        if isinstance(value, datetime):
            return {
                '__type__': 'datetime',
                'year': value.year,
                'month': value.month,
                'day': value.day,
                'hour': value.hour,
                'minute': value.minute,
                'second': value.second,
                'microsecond': value.microsecond,
            }
        if isinstance(value, decimal.Decimal):
            return float(value)
        if isinstance(value, Enum):
            return value.name
        else:
            return vars(value)
    except Exception as e:
        raise ValueError

Now you can use above code inside your own class :

class Foo():
  def toJSON(self):
        return json.loads(
            json.dumps(self, sort_keys=True, indent=4, separators=(',', ': '), default=json_default_format), cls=DateTimeDecoder)


Foo().toJSON() 

回答 11

vars() 很棒,但是不适用于对象的嵌套对象

将对象的嵌套对象转换为dict:

def to_dict(self):
    return json.loads(json.dumps(self, default=lambda o: o.__dict__))

vars() is great, but doesn’t work for nested objects of objects

Convert nested object of objects to dict:

def to_dict(self):
    return json.loads(json.dumps(self, default=lambda o: o.__dict__))

如何在Mac OS X 10.6.4上卸载Python 2.7?

问题:如何在Mac OS X 10.6.4上卸载Python 2.7?

我想从Mac OS X 10.6.4中完全删除Python 2.7。我设法PATH通过还原删除了变量中的条目.bash_profile。但我也想删除所有由python 2.7安装包安装的目录,文件,符号链接和条目。我从http://www.python.org/获得了安装包。我需要删除哪些目录/文件/配置文件条目?某处有清单吗?

I want to completely remove Python 2.7 from my Mac OS X 10.6.4. I managed to remove the entry from the PATH variable by reverting my .bash_profile. But I also want to remove all directories, files, symlinks, and entries that got installed by the Python 2.7 install package. I’ve got the install package from http://www.python.org/. What directories/files/configuration file entries do I need to remove? Is there a list somewhere?


回答 0

不要试图删除任何苹果公司提供的系统的Python这是在/System/Library/usr/bin,因为这可能会破坏你的整个操作系统。


注意: 以下列出的步骤不会影响Apple提供的系统Python 2.7;请参阅附录A。他们只会删除第三方Python框架,例如python.org安装程序安装的框架。


完整列表在此处记录。基本上,您需要做的是:

  1. 删除第三方Python 2.7框架

    sudo rm -rf /Library/Frameworks/Python.framework/Versions/2.7
  2. 删除Python 2.7应用程序目录

    sudo rm -rf "/Applications/Python 2.7"
  3. 在中删除/usr/local/bin指向此Python版本的符号链接。看到他们使用

    ls -l /usr/local/bin | grep '../Library/Frameworks/Python.framework/Versions/2.7' 

    然后运行以下命令删除所有链接:

    cd /usr/local/bin/
    ls -l /usr/local/bin | grep '../Library/Frameworks/Python.framework/Versions/2.7' | awk '{print $9}' | tr -d @ | xargs rm
  4. 如有必要,请编辑您的外壳配置文件,以删除添加/Library/Frameworks/Python.framework/Versions/2.7到您的PATH环境文件中的操作。根据您所使用的shell,任何下列文件可能已被修改: ~/.bash_login~/.bash_profile~/.cshrc~/.profile~/.tcshrc,和/或~/.zprofile

Do not attempt to remove any Apple-supplied system Python which are in /System/Library and /usr/bin, as this may break your whole operating system.


NOTE: The steps listed below do not affect the Apple-supplied system Python 2.7; they only remove a third-party Python framework, like those installed by python.org installers.


The complete list is documented here. Basically, all you need to do is the following:

  1. Remove the third-party Python 2.7 framework

    sudo rm -rf /Library/Frameworks/Python.framework/Versions/2.7
    
  2. Remove the Python 2.7 applications directory

    sudo rm -rf "/Applications/Python 2.7"
    
  3. Remove the symbolic links, in /usr/local/bin, that point to this Python version. See them using

    ls -l /usr/local/bin | grep '../Library/Frameworks/Python.framework/Versions/2.7' 
    

    and then run the following command to remove all the links:

    cd /usr/local/bin/
    ls -l /usr/local/bin | grep '../Library/Frameworks/Python.framework/Versions/2.7' | awk '{print $9}' | tr -d @ | xargs rm
    
  4. If necessary, edit your shell profile file(s) to remove adding /Library/Frameworks/Python.framework/Versions/2.7 to your PATH environment file. Depending on which shell you use, any of the following files may have been modified: ~/.bash_login, ~/.bash_profile, ~/.cshrc, ~/.profile, ~/.tcshrc, and/or ~/.zprofile.


回答 1

这个作品:

cd /usr/local/bin/
ls -l /usr/local/bin | grep '../Library/Frameworks/Python.framework/Versions/2.7' | awk '{print $9}' | tr -d @ | xargs rm

描述:列出所有链接,删除@字符,然后删除它们。

This one works:

cd /usr/local/bin/
ls -l /usr/local/bin | grep '../Library/Frameworks/Python.framework/Versions/2.7' | awk '{print $9}' | tr -d @ | xargs rm

Description: It list all the links, removes @ character and then removes them.


回答 2

如果使用PKG安装程序安装了它,则可以执行以下操作:

pkgutil --pkgs

或更好:

pkgutil --pkgs | grep org.python.Python

这将输出类似:

org.python.Python.PythonApplications-2.7
org.python.Python.PythonDocumentation-2.7
org.python.Python.PythonFramework-2.7
org.python.Python.PythonProfileChanges-2.7
org.python.Python.PythonUnixTools-2.7

您现在可以选择要取消链接(删除)的软件包。

这是取消链接文档:

 --unlink package-id
             Unlinks (removes) each file referenced by package-id. WARNING: This command makes no attempt to perform reference counting or dependency analy-
             sis. It can easily remove files required by your system. It may include unexpected files due to package tainting. Use the --files command first
             to double check.

在我的示例中,您将输入

pkgutil --unlink org.python.Python.PythonApplications-2.7
pkgutil --unlink org.python.Python.PythonDocumentation-2.7
pkgutil --unlink org.python.Python.PythonFramework-2.7
pkgutil --unlink org.python.Python.PythonProfileChanges-2.7
pkgutil --unlink org.python.Python.PythonUnixTools-2.7

或一行:

pkgutil --pkgs | grep org.python.Python | xargs -L1 pkgutil -f --unlink

重要提示:–unlink从Lion(从2014年第一季度开始,包括Lion,Mountain Lion和Mavericks)不再可用。如果涉及此说明的任何人都尝试将其与狮子一起使用,则应尝试改编本文所讲的内容:https : //wincent.com/wiki/Uninstalling_packages_(.pkg_files)_on_Mac_OS_X

If you installed it using the PKG installer, you can do:

pkgutil --pkgs

or better:

pkgutil --pkgs | grep org.python.Python

which will output something like:

org.python.Python.PythonApplications-2.7
org.python.Python.PythonDocumentation-2.7
org.python.Python.PythonFramework-2.7
org.python.Python.PythonProfileChanges-2.7
org.python.Python.PythonUnixTools-2.7

you can now select which packages you will unlink (remove).

This is the unlink documentation:

 --unlink package-id
             Unlinks (removes) each file referenced by package-id. WARNING: This command makes no attempt to perform reference counting or dependency analy-
             sis. It can easily remove files required by your system. It may include unexpected files due to package tainting. Use the --files command first
             to double check.

In my example you will type

pkgutil --unlink org.python.Python.PythonApplications-2.7
pkgutil --unlink org.python.Python.PythonDocumentation-2.7
pkgutil --unlink org.python.Python.PythonFramework-2.7
pkgutil --unlink org.python.Python.PythonProfileChanges-2.7
pkgutil --unlink org.python.Python.PythonUnixTools-2.7

or in one single line:

pkgutil --pkgs | grep org.python.Python | xargs -L1 pkgutil -f --unlink

Important: –unlink is not available anymore starting with Lion (as of Q1`2014 that would include Lion, Mountain Lion, and Mavericks). If anyone that comes to this instructions try to use it with lion, should try instead to adapt it with what this post is saying: https://wincent.com/wiki/Uninstalling_packages_(.pkg_files)_on_Mac_OS_X


回答 3

尝试使用卸载Python

brew uninstall python

不会删除本机安装了Python,而是版本安装brew

Trying to uninstall Python with

brew uninstall python

will not remove the natively installed Python but rather the version installed with brew.


回答 4

关于删除符号链接,我发现这很有用。

find /usr/local/bin -lname '../../../Library/Frameworks/Python.framework/Versions/2.7/*' -delete

In regards to deleting the symbolic links, I found this to be useful.

find /usr/local/bin -lname '../../../Library/Frameworks/Python.framework/Versions/2.7/*' -delete

回答 5

创建符号链接到最新版本

 ln -s -f /usr/local/bin/python3.8 /usr/local/bin/python

关闭并打开一个新终端

并尝试

 python --version

Create the symlink to latest version

 ln -s -f /usr/local/bin/python3.8 /usr/local/bin/python

Close and open a new terminal

and try

 python --version

回答 6

无需卸载旧的python版本。

只需安装新版本,说python-3.3.2-macosx10.6.dmg并将python的软链接更改为新安装的python3.3

使用以下命令检查默认python和python3.3的路径

“哪个python”和“哪个python3.3”

然后删除python的现有软链接并将其指向python3.3

No need to uninstall old python versions.

Just install new version say python-3.3.2-macosx10.6.dmg and change the soft link of python to newly installed python3.3

Check the path of default python and python3.3 with following commands

“which python” and “which python3.3”

then delete existing soft link of python and point it to python3.3


回答 7

OnurGüzel在他的博客文章“从OS X卸载Python包”中提供了解决方案。

您应该在终端中键入以下命令:

  1. sudo rm -rf /Library/Frameworks/Python.framework
  2. cd /usr/local/bin
  3. ls -l . | grep '../Library/Frameworks/Python.framework' | awk '{print $9}' | xargs sudo rm
  4. sudo rm -rf "/Applications/Python x.y"

    其中命令xy是安装的Python版本。根据您的问题,应该是2.7。

用Onur的话来说:

警告:此命令将删除与软件包一起安装的所有Python版本。系统提供的Python不会受到影响。

如果您从python.org安装了多个Python版本,请再次运行第四个命令,为每个要卸载的Python版本更改“ xy”。

Onur Güzel provides the solution in his blog post, “Uninstall Python Package from OS X.

You should type the following commands into the terminal:

  1. sudo rm -rf /Library/Frameworks/Python.framework
  2. cd /usr/local/bin
  3. ls -l . | grep '../Library/Frameworks/Python.framework' | awk '{print $9}' | xargs sudo rm
  4. sudo rm -rf "/Applications/Python x.y"

    where command x.y is the version of Python installed. According to your question, it should be 2.7.

In Onur’s words:

WARNING: This commands will remove all Python versions installed with packages. Python provided from the system will not be affected.

If you have more than 1 Python version installed from python.org, then run the fourth command again, changing “x.y” for each version of Python that is to be uninstalled.


回答 8

注意如果使用Homebrew安装了Python,则可以按照以下步骤操作,否则请寻找其他解决方案!


要卸载使用Homebrew安装的Python 2.7.10,可以简单地发出以下命令:

brew uninstall python

同样,如果要卸载Python 3(使用Homebrew安装),请执行以下操作:

brew uninstall --force python3

Note: If you installed Python using Homebrew, then you can follow the following steps, otherwise look for another solution!


To uninstall Python 2.7.10 which you installed using Homebrew, then you can simply issue the following command:

brew uninstall python

Similarly, if you want to uninstall Python 3 (which you installed using Homebrew):

brew uninstall --force python3

回答 9

无需卸载它或使用符号链接发疯,只需使用即可alias。升级到python 3.7.1时,我遇到了同样的问题。
只需使用安装新的python版本,brew install python然后在.bash_profile创建的别名中指向新的python版本即可;这样:alias python="/usr/local/bin/python3"然后保存并运行source ~/.bash_profile
做完了

No need to uninstall it or going crazy with symbolic links, just use an alias. I faced the same problem when upgrading to python 3.7.1.
Just install the new python version using brew install python then in your .bash_profile create an alias pointing to the new python version; like this: alias python="/usr/local/bin/python3" then save and run source ~/.bash_profile.
Done.


回答 10

如果您正在考虑手动删除Apple的默认Python 2.7,建议您立即执行以下操作:看起来Apple很快会为您完成此操作:

OSX 10.15 Catalina中不推荐使用Python 2.7

Catalina中不推荐使用Python 2.7-以及Ruby和Perl :(跳至“ 脚本语言运行时 ”>“ 不推荐使用 ”部分)

https://developer.apple.com/documentation/macos_release_notes/macos_catalina_10_15_release_notes

苹果将​​在OSX 10.16中删除Python 2.7

确实,如果您什么都不做,那么根据OS X版本10.16 的Mac Observer所述,Python 2.7将从您的系统中消失:

https://www.macobserver.com/analysis/macos-catalina-deprecates-unix-scripting-languages/

鉴于这一启示,我建议最好的做法是什么也不做,等待苹果为您清除。由于Apple即将为您删除它,因此似乎不值得尝试修改您的Python环境。

注意:我看到这个问题专门与OSX v 10.6.4有关,但是对于所有有兴趣从其系统中删除Python 2.7的OSX人士,无论他们运行的是哪个版本,该问题似乎都已成为一个关键点。

If you’re thinking about manually removing Apple’s default Python 2.7, I’d suggest you hang-fire and do-noting: Looks like Apple will very shortly do it for you:

Python 2.7 Deprecated in OSX 10.15 Catalina

Python 2.7- as well as Ruby & Perl- are deprecated in Catalina: (skip to section “Scripting Language Runtimes” > “Deprecations“)

https://developer.apple.com/documentation/macos_release_notes/macos_catalina_10_15_release_notes

Apple To Remove Python 2.7 in OSX 10.16

Indeed, if you do nothing at all, according to The Mac Observer, by OSX version 10.16, Python 2.7 will disappear from your system:

https://www.macobserver.com/analysis/macos-catalina-deprecates-unix-scripting-languages/

Given this revelation, I’d suggest the best course of action is do nothing and wait for Apple to wipe it for you. As Apple is imminently about to remove it for you, doesn’t seem worth the risk of tinkering with your Python environment.

NOTE: I see the question relates specifically to OSX v 10.6.4, but it appears this question has become a pivot-point for all OSX folks interested in removing Python 2.7 from their systems, whatever version they’re running.


如何在Python中获取星期数?

问题:如何在Python中获取星期数?

如何使用Python找出6月16日(wk24)当年的星期几?

How to find out what week number is current year on June 16th (wk24) with Python?


回答 0

datetime.date有一个isocalendar()方法,该方法返回包含日历周的元组:

>>> import datetime
>>> datetime.date(2010, 6, 16).isocalendar()[1]
24

datetime.date.isocalendar()是一种实例方法,该方法返回给定日期实例的元组,该元组按各自的顺序包含年,周号和周日。

datetime.date has a isocalendar() method, which returns a tuple containing the calendar week:

>>> import datetime
>>> datetime.date(2010, 6, 16).isocalendar()[1]
24

datetime.date.isocalendar() is an instance-method returning a tuple containing year, weeknumber and weekday in respective order for the given date instance.


回答 1

您可以直接从日期时间获取星期数作为字符串。

>>> import datetime
>>> datetime.date(2010, 6, 16).strftime("%V")
'24'

您还可以通过更改以下strftime参数来获得一年中星期几的不同“类型” :

%U一年中的周号(星期日为一周的第一天),以零填充的十进制数表示。在第一个星期天之前的新的一年的所有天都被认为是在本周0示例:00,01,…,53

%W-一年中的星期数(星期一为一周的第一天),以十进制数表示。第一个星期一之前的新的一年中的所有天都视为在第0周。例如:00,01,…,53

[…]

在Python 3.6中添加,并反向移植到某些发行版的Python 2.7中。)为方便起见,还包含了C89标准不需要的其他一些指令。这些参数都对应于ISO 8601日期值。当与该strftime()方法一起使用时,可能并非在所有平台上都可用。

[…]

%VISO 8601星期与周一的十进制数作为一周的第一天。周01是包含在1月4例子的一周:01,02,…,53

from:datetime-基本日期和时间类型-Python 3.7.3文档

我从这里找到了。它在Python 2.7.6中为我工作

You can get the week number directly from datetime as string.

>>> import datetime
>>> datetime.date(2010, 6, 16).strftime("%V")
'24'

Also you can get different “types” of the week number of the year changing the strftime parameter for:

%UWeek number of the year (Sunday as the first day of the week) as a zero padded decimal number. All days in a new year preceding the first Sunday are considered to be in week 0. Examples: 00, 01, …, 53

%W – Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. Examples: 00, 01, …, 53

[…]

(Added in Python 3.6, backported to some distribution’s Python 2.7’s) Several additional directives not required by the C89 standard are included for convenience. These parameters all correspond to ISO 8601 date values. These may not be available on all platforms when used with the strftime() method.

[…]

%VISO 8601 week as a decimal number with Monday as the first day of the week. Week 01 is the week containing Jan 4. Examples: 01, 02, …, 53

from: datetime — Basic date and time types — Python 3.7.3 documentation

I’ve found out about it from here. It worked for me in Python 2.7.6


回答 2

我相信date.isocalendar()这将是答案。本文介绍了ISO 8601日历背后的数学原理。查看Python文档的datetime页面的date.isocalendar()部分。

>>> dt = datetime.date(2010, 6, 16) 
>>> wk = dt.isocalendar()[1]
24

.isocalendar()返回带有(年,周,周,日)的三元组。dt.isocalendar()[0]返回年份,dt.isocalendar()[1]返回星期数,dt.isocalendar()[2]返回星期几。可以很简单。

I believe date.isocalendar() is going to be the answer. This article explains the math behind ISO 8601 Calendar. Check out the date.isocalendar() portion of the datetime page of the Python documentation.

>>> dt = datetime.date(2010, 6, 16) 
>>> wk = dt.isocalendar()[1]
24

.isocalendar() return a 3-tuple with (year, wk num, wk day). dt.isocalendar()[0] returns the year,dt.isocalendar()[1] returns the week number, dt.isocalendar()[2] returns the week day. Simple as can be.


回答 3

这是另一个选择:

import time
from time import gmtime, strftime
d = time.strptime("16 Jun 2010", "%d %b %Y")
print(strftime("%U", d))

哪个打印 24

请参阅:http//docs.python.org/library/datetime.html#strftime-and-strptime-behavior

Here’s another option:

import time
from time import gmtime, strftime
d = time.strptime("16 Jun 2010", "%d %b %Y")
print(strftime("%U", d))

which prints 24.

See: http://docs.python.org/library/datetime.html#strftime-and-strptime-behavior


回答 4

其他人建议的ISO周是不错的一周,但它可能不符合您的需求。假设每个星期都从星期一开始,这会导致年初和年底出现一些有趣的异常情况。

如果您想使用第1周始终为1月1日至1月7日的定义,而不管一周中的星期几,请使用类似以下的推导:

>>> testdate=datetime.datetime(2010,6,16)
>>> print(((testdate - datetime.datetime(testdate.year,1,1)).days // 7) + 1)
24

The ISO week suggested by others is a good one, but it might not fit your needs. It assumes each week begins with a Monday, which leads to some interesting anomalies at the beginning and end of the year.

If you’d rather use a definition that says week 1 is always January 1 through January 7, regardless of the day of the week, use a derivation like this:

>>> testdate=datetime.datetime(2010,6,16)
>>> print(((testdate - datetime.datetime(testdate.year,1,1)).days // 7) + 1)
24

回答 5

通常要获取当前的星期数(从星期日开始):

from datetime import *
today = datetime.today()
print today.strftime("%U")

Generally to get the current week number (starts from Sunday):

from datetime import *
today = datetime.today()
print today.strftime("%U")

回答 6

对于一年中瞬时周的整数值,请尝试:

import datetime
datetime.datetime.utcnow().isocalendar()[1]

For the integer value of the instantaneous week of the year try:

import datetime
datetime.datetime.utcnow().isocalendar()[1]

回答 7

许多用于周编号的系统。以下是简单地与代码示例一起放置的最常见系统:

  • ISO:第一周从星期一开始,必须包含1月4日。ISO日历已在Python中实现:

    >>> from datetime import date
    >>> date(2014, 12, 29).isocalendar()[:2]
    (2015, 1)
  • 北美:第一周从星期日开始,必须包含1月1日。以下代码是针对北美系统的Python ISO日历实现的修改后的版本:

    from datetime import date
    
    def week_from_date(date_object):
        date_ordinal = date_object.toordinal()
        year = date_object.year
        week = ((date_ordinal - _week1_start_ordinal(year)) // 7) + 1
        if week >= 52:
            if date_ordinal >= _week1_start_ordinal(year + 1):
                year += 1
                week = 1
        return year, week
    
    def _week1_start_ordinal(year):
        jan1 = date(year, 1, 1)
        jan1_ordinal = jan1.toordinal()
        jan1_weekday = jan1.weekday()
        week1_start_ordinal = jan1_ordinal - ((jan1_weekday + 1) % 7)
        return week1_start_ordinal
    >>> from datetime import date
    >>> week_from_date(date(2014, 12, 29))
    (2015, 1)
  • MMWR(CDC):第一周从星期日开始,必须包含1月4日。我专门为此编号系统创建了epiweeks程序包(也支持ISO系统)。这是一个例子:
    >>> from datetime import date
    >>> from epiweeks import Week
    >>> Week.fromdate(date(2014, 12, 29))
    (2014, 53)

There are many systems for week numbering. The following are the most common systems simply put with code examples:

  • ISO: First week starts with Monday and must contain the January 4th. The ISO calendar is already implemented in Python:

    >>> from datetime import date
    >>> date(2014, 12, 29).isocalendar()[:2]
    (2015, 1)
    
  • North American: First week starts with Sunday and must contain the January 1st. The following code is my modified version of Python’s ISO calendar implementation for the North American system:

    from datetime import date
    
    def week_from_date(date_object):
        date_ordinal = date_object.toordinal()
        year = date_object.year
        week = ((date_ordinal - _week1_start_ordinal(year)) // 7) + 1
        if week >= 52:
            if date_ordinal >= _week1_start_ordinal(year + 1):
                year += 1
                week = 1
        return year, week
    
    def _week1_start_ordinal(year):
        jan1 = date(year, 1, 1)
        jan1_ordinal = jan1.toordinal()
        jan1_weekday = jan1.weekday()
        week1_start_ordinal = jan1_ordinal - ((jan1_weekday + 1) % 7)
        return week1_start_ordinal
    
    >>> from datetime import date
    >>> week_from_date(date(2014, 12, 29))
    (2015, 1)
    
  • MMWR (CDC): First week starts with Sunday and must contain the January 4th. I created the epiweeks package specifically for this numbering system (also has support for the ISO system). Here is an example:
    >>> from datetime import date
    >>> from epiweeks import Week
    >>> Week.fromdate(date(2014, 12, 29))
    (2014, 53)
    

回答 8

如果您仅使用等周日历号,则以下内容就足够了:

import datetime
week = date(year=2014, month=1, day=1).isocalendar()[1]

这将检索isocalendar返回的元组的第二个成员,作为我们的星期数。

但是,如果您要使用公历中处理的日期函数,则仅等距日历无法正常工作!请看以下示例:

import datetime
date = datetime.datetime.strptime("2014-1-1", "%Y-%W-%w")
week = date.isocalendar()[1]

此处的字符串表示返回2014年第一周的星期一作为我们的日期。当我们使用isocalendar在此处检索星期数时,我们希望可以返回相同的星期数,但事实并非如此。相反,我们得到的周数为2。为什么?

阳历的第一周是包含星期一的第一周。等值线的第1周是包含星期四的第一周。2014年初的不完整一周包含一个星期四,因此这是等距日历的第1周,date第2周。

如果要获得公历周,我们将需要从等距转换为公历。这是个简单的功能,可以解决问题。

import datetime

def gregorian_week(date):
    # The isocalendar week for this date
    iso_week = date.isocalendar()[1]

    # The baseline Gregorian date for the beginning of our date's year
    base_greg = datetime.datetime.strptime('%d-1-1' % date.year, "%Y-%W-%w")

    # If the isocalendar week for this date is not 1, we need to 
    # decrement the iso_week by 1 to get the Gregorian week number
    return iso_week if base_greg.isocalendar()[1] == 1 else iso_week - 1

If you are only using the isocalendar week number across the board the following should be sufficient:

import datetime
week = date(year=2014, month=1, day=1).isocalendar()[1]

This retrieves the second member of the tuple returned by isocalendar for our week number.

However, if you are going to be using date functions that deal in the Gregorian calendar, isocalendar alone will not work! Take the following example:

import datetime
date = datetime.datetime.strptime("2014-1-1", "%Y-%W-%w")
week = date.isocalendar()[1]

The string here says to return the Monday of the first week in 2014 as our date. When we use isocalendar to retrieve the week number here, we would expect to get the same week number back, but we don’t. Instead we get a week number of 2. Why?

Week 1 in the Gregorian calendar is the first week containing a Monday. Week 1 in the isocalendar is the first week containing a Thursday. The partial week at the beginning of 2014 contains a Thursday, so this is week 1 by the isocalendar, and making date week 2.

If we want to get the Gregorian week, we will need to convert from the isocalendar to the Gregorian. Here is a simple function that does the trick.

import datetime

def gregorian_week(date):
    # The isocalendar week for this date
    iso_week = date.isocalendar()[1]

    # The baseline Gregorian date for the beginning of our date's year
    base_greg = datetime.datetime.strptime('%d-1-1' % date.year, "%Y-%W-%w")

    # If the isocalendar week for this date is not 1, we need to 
    # decrement the iso_week by 1 to get the Gregorian week number
    return iso_week if base_greg.isocalendar()[1] == 1 else iso_week - 1

回答 9

您可以尝试使用%W指令,如下所示:

d = datetime.datetime.strptime('2016-06-16','%Y-%m-%d')
print(datetime.datetime.strftime(d,'%W'))

‘%W’:一年中的第几周(星期一为一周的第一天),以十进制数表示。第一个星期一之前的新的一年中的所有天都视为在第0周。(00,01,…,53)

You can try %W directive as below:

d = datetime.datetime.strptime('2016-06-16','%Y-%m-%d')
print(datetime.datetime.strftime(d,'%W'))

‘%W’: Week number of the year (Monday as the first day of the week) as a decimal number. All days in a new year preceding the first Monday are considered to be in week 0. (00, 01, …, 53)


回答 10

isocalendar()返回某些日期的不正确的年和周数值:

Python 2.7.3 (default, Feb 27 2014, 19:58:35) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime as dt
>>> myDateTime = dt.datetime.strptime("20141229T000000.000Z",'%Y%m%dT%H%M%S.%fZ')
>>> yr,weekNumber,weekDay = myDateTime.isocalendar()
>>> print "Year is " + str(yr) + ", weekNumber is " + str(weekNumber)
Year is 2015, weekNumber is 1

与Mark Ransom的方法进行比较:

>>> yr = myDateTime.year
>>> weekNumber = ((myDateTime - dt.datetime(yr,1,1)).days/7) + 1
>>> print "Year is " + str(yr) + ", weekNumber is " + str(weekNumber)
Year is 2014, weekNumber is 52

isocalendar() returns incorrect year and weeknumber values for some dates:

Python 2.7.3 (default, Feb 27 2014, 19:58:35) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import datetime as dt
>>> myDateTime = dt.datetime.strptime("20141229T000000.000Z",'%Y%m%dT%H%M%S.%fZ')
>>> yr,weekNumber,weekDay = myDateTime.isocalendar()
>>> print "Year is " + str(yr) + ", weekNumber is " + str(weekNumber)
Year is 2015, weekNumber is 1

Compare with Mark Ransom’s approach:

>>> yr = myDateTime.year
>>> weekNumber = ((myDateTime - dt.datetime(yr,1,1)).days/7) + 1
>>> print "Year is " + str(yr) + ", weekNumber is " + str(weekNumber)
Year is 2014, weekNumber is 52

回答 11

我将讨论概括为两个步骤:

  1. 将原始格式转换为datetime对象。
  2. 使用datetime对象或date对象的功能来计算星期数。

暖身

python

from datetime import datetime, date, time
d = date(2005, 7, 14)
t = time(12, 30)
dt = datetime.combine(d, t)
print(dt)

“`

第一步

要手动生成datetime对象,我们可以使用datetime.datetime(2017,5,3)datetime.datetime.now()

但是实际上,我们通常需要解析一个现有的字符串。我们可以使用strptime函数,例如datetime.strptime('2017-5-3','%Y-%m-%d')您必须指定的格式。有关不同格式代码的详细信息,请参见官方文档中

或者,更方便的方法是使用dateparse模块。例子有dateparser.parse('16 Jun 2010')dateparser.parse('12/2/12')dateparser.parse('2017-5-3')

以上两种方法将返回一个datetime对象。

第二步

使用获得的 datetime对象进行调用strptime(format)。例如,

python

dt = datetime.strptime('2017-01-1','%Y-%m-%d') # return a datetime object. This day is Sunday
print(dt.strftime("%W")) # '00' Monday as the 1st day of the week. All days in a new year preceding the 1st Monday are considered to be in week 0.
print(dt.strftime("%U")) # '01' Sunday as the 1st day of the week. All days in a new year preceding the 1st Sunday are considered to be in week 0.
print(dt.strftime("%V")) # '52' Monday as the 1st day of the week. Week 01 is the week containing Jan 4.

“`

决定使用哪种格式是非常棘手的。更好的方法是获取date对象进行调用isocalendar()。例如,

python

dt = datetime.strptime('2017-01-1','%Y-%m-%d') # return a datetime object
d = dt.date() # convert to a date object. equivalent to d = date(2017,1,1), but date.strptime() don't have the parse function
year, week, weekday = d.isocalendar() 
print(year, week, weekday) # (2016,52,7) in the ISO standard

“`

实际上,您将更有可能用来date.isocalendar()编写每周报告,尤其是在“圣诞节-新年”购物季节。

I summarize the discussion to two steps:

  1. Convert the raw format to a datetime object.
  2. Use the function of a datetime object or a date object to calculate the week number.

Warm up

“`python

from datetime import datetime, date, time
d = date(2005, 7, 14)
t = time(12, 30)
dt = datetime.combine(d, t)
print(dt)

“`

1st step

To manually generate a datetime object, we can use datetime.datetime(2017,5,3) or datetime.datetime.now().

But in reality, we usually need to parse an existing string. we can use strptime function, such as datetime.strptime('2017-5-3','%Y-%m-%d') in which you have to specific the format. Detail of different format code can be found in the official documentation.

Alternatively, a more convenient way is to use dateparse module. Examples are dateparser.parse('16 Jun 2010'), dateparser.parse('12/2/12') or dateparser.parse('2017-5-3')

The above two approaches will return a datetime object.

2nd step

Use the obtained datetime object to call strptime(format). For example,

“`python

dt = datetime.strptime('2017-01-1','%Y-%m-%d') # return a datetime object. This day is Sunday
print(dt.strftime("%W")) # '00' Monday as the 1st day of the week. All days in a new year preceding the 1st Monday are considered to be in week 0.
print(dt.strftime("%U")) # '01' Sunday as the 1st day of the week. All days in a new year preceding the 1st Sunday are considered to be in week 0.
print(dt.strftime("%V")) # '52' Monday as the 1st day of the week. Week 01 is the week containing Jan 4.

“`

It’s very tricky to decide which format to use. A better way is to get a date object to call isocalendar(). For example,

“`python

dt = datetime.strptime('2017-01-1','%Y-%m-%d') # return a datetime object
d = dt.date() # convert to a date object. equivalent to d = date(2017,1,1), but date.strptime() don't have the parse function
year, week, weekday = d.isocalendar() 
print(year, week, weekday) # (2016,52,7) in the ISO standard

“`

In reality, you will be more likely to use date.isocalendar() to prepare a weekly report, especially in the “Christmas-New Year” shopping season.


回答 12

userInput = input ("Please enter project deadline date (dd/mm/yyyy/): ")

import datetime

currentDate = datetime.datetime.today()

testVar = datetime.datetime.strptime(userInput ,"%d/%b/%Y").date()

remainDays = testVar - currentDate.date()

remainWeeks = (remainDays.days / 7.0) + 1


print ("Please pay attention for deadline of project X in days and weeks are  : " ,(remainDays) , "and" ,(remainWeeks) , "Weeks ,\nSo  hurryup.............!!!") 
userInput = input ("Please enter project deadline date (dd/mm/yyyy/): ")

import datetime

currentDate = datetime.datetime.today()

testVar = datetime.datetime.strptime(userInput ,"%d/%b/%Y").date()

remainDays = testVar - currentDate.date()

remainWeeks = (remainDays.days / 7.0) + 1


print ("Please pay attention for deadline of project X in days and weeks are  : " ,(remainDays) , "and" ,(remainWeeks) , "Weeks ,\nSo  hurryup.............!!!") 

回答 13

已经给出了很多答案,但是id喜欢添加到它们中。

如果您需要将星期显示为年/周样式(例如1953年-2019年第53周,2001年-2020年第1周等),则可以执行以下操作:

import datetime

year = datetime.datetime.now()
week_num = datetime.date(year.year, year.month, year.day).strftime("%V")
long_week_num = str(year.year)[0:2] + str(week_num)

这将需要当前的年和周,而在撰写本文时,long_week_num将是:

>>> 2006

A lot of answers have been given, but id like to add to them.

If you need the week to display as a year/week style (ex. 1953 – week 53 of 2019, 2001 – week 1 of 2020 etc.), you can do this:

import datetime

year = datetime.datetime.now()
week_num = datetime.date(year.year, year.month, year.day).strftime("%V")
long_week_num = str(year.year)[0:2] + str(week_num)

It will take the current year and week, and long_week_num in the day of writing this will be:

>>> 2006

Python中的ISO时间(ISO 8601)

问题:Python中的ISO时间(ISO 8601)

我有一个档案。在Python中,我想保留其创建时间,并将其转换为ISO时间(ISO 8601)字符串, 同时保留它是在东部时区(ET)创建的事实

如何获取文件的ctime并将其转换为表示东部时区的ISO时间字符串(并在必要时考虑夏令时)?

I have a file. In Python, I would like to take its creation time, and convert it to an ISO time (ISO 8601) string while preserving the fact that it was created in the Eastern Time Zone (ET).

How do I take the file’s ctime and convert it to an ISO time string that indicates the Eastern Time Zone (and takes into account daylight savings time, if necessary)?


回答 0

ISO 8601的本地:

import datetime
datetime.datetime.now().isoformat()
>>> 2020-03-20T14:28:23.382748

UTC符合ISO 8601:

import datetime
datetime.datetime.utcnow().isoformat()
>>> 2020-03-20T01:30:08.180856

ISO 8601本地,无微秒:

import datetime
datetime.datetime.now().replace(microsecond=0).isoformat()
>>> 2020-03-20T14:30:43

具有TimeZone信息的UTC到ISO 8601(Python 3):

import datetime
datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()
>>> 2020-03-20T01:31:12.467113+00:00

UTC到ISO 8601,具有本地时区信息,无微秒(Python 3):

import datetime
datetime.datetime.now().astimezone().replace(microsecond=0).isoformat()
>>> 2020-03-20T14:31:43+13:00

具有TimeZone信息的ISO 8601本地(Python 3):

import datetime
datetime.datetime.now().astimezone().isoformat()
>>> 2020-03-20T14:32:16.458361+13:00

注意astimezone()在utc时间使用时有一个错误。这给出了错误的结果:

datetime.datetime.utcnow().astimezone().isoformat() #Incorrect result

对于Python 2,请参阅并使用pytz

Local to ISO 8601:

import datetime
datetime.datetime.now().isoformat()
>>> 2020-03-20T14:28:23.382748

UTC to ISO 8601:

import datetime
datetime.datetime.utcnow().isoformat()
>>> 2020-03-20T01:30:08.180856

Local to ISO 8601 without microsecond:

import datetime
datetime.datetime.now().replace(microsecond=0).isoformat()
>>> 2020-03-20T14:30:43

UTC to ISO 8601 with TimeZone information (Python 3):

import datetime
datetime.datetime.utcnow().replace(tzinfo=datetime.timezone.utc).isoformat()
>>> 2020-03-20T01:31:12.467113+00:00

UTC to ISO 8601 with Local TimeZone information without microsecond (Python 3):

import datetime
datetime.datetime.now().astimezone().replace(microsecond=0).isoformat()
>>> 2020-03-20T14:31:43+13:00

Local to ISO 8601 with TimeZone information (Python 3):

import datetime
datetime.datetime.now().astimezone().isoformat()
>>> 2020-03-20T14:32:16.458361+13:00

Notice there is a bug when using astimezone() on utc time. This gives an incorrect result:

datetime.datetime.utcnow().astimezone().isoformat() #Incorrect result

For Python 2, see and use pytz.


回答 1

这是我用来转换为XSD日期时间格式的方法:

from datetime import datetime
datetime.now().replace(microsecond=0).isoformat()
# You get your ISO string

我在寻找XSD日期时间格式(xs:dateTime)时遇到了这个问题。我需要从中删除微秒isoformat

Here is what I use to convert to the XSD datetime format:

from datetime import datetime
datetime.now().replace(microsecond=0).isoformat()
# You get your ISO string

I came across this question when looking for the XSD date time format (xs:dateTime). I needed to remove the microseconds from isoformat.


回答 2

ISO 8601时间表示

国际标准ISO 8601描述了日期和时间的字符串表示形式。此格式的两个简单示例是

2010-12-16 17:22:15
20101216T172215

(都代表2010年12月16日),但是该格式还允许亚秒级的解析时间并指定时区。这种格式当然不是特定于Python的,但是它对于以可移植格式存储日期和时间很有用。有关此格式的详细信息,请参见Markus Kuhn条目

我建议使用这种格式在文件中存储时间。

获取此表示形式中的当前时间的一种方法是使用Python标准库中时间模块中的strftime:

>>> from time import strftime
>>> strftime("%Y-%m-%d %H:%M:%S")
'2010-03-03 21:16:45'

您可以使用datetime类的strptime构造函数:

>>> from datetime import datetime
>>> datetime.strptime("2010-06-04 21:08:12", "%Y-%m-%d %H:%M:%S")
datetime.datetime(2010, 6, 4, 21, 8, 12)

最强大的是Egenix mxDateTime模块:

>>> from mx.DateTime.ISO import ParseDateTimeUTC
>>> from datetime import datetime
>>> x = ParseDateTimeUTC("2010-06-04 21:08:12")
>>> datetime.fromtimestamp(x)
datetime.datetime(2010, 3, 6, 21, 8, 12)

参考文献

ISO 8601 Time Representation

The international standard ISO 8601 describes a string representation for dates and times. Two simple examples of this format are

2010-12-16 17:22:15
20101216T172215

(which both stand for the 16th of December 2010), but the format also allows for sub-second resolution times and to specify time zones. This format is of course not Python-specific, but it is good for storing dates and times in a portable format. Details about this format can be found in the Markus Kuhn entry.

I recommend use of this format to store times in files.

One way to get the current time in this representation is to use strftime from the time module in the Python standard library:

>>> from time import strftime
>>> strftime("%Y-%m-%d %H:%M:%S")
'2010-03-03 21:16:45'

You can use the strptime constructor of the datetime class:

>>> from datetime import datetime
>>> datetime.strptime("2010-06-04 21:08:12", "%Y-%m-%d %H:%M:%S")
datetime.datetime(2010, 6, 4, 21, 8, 12)

The most robust is the Egenix mxDateTime module:

>>> from mx.DateTime.ISO import ParseDateTimeUTC
>>> from datetime import datetime
>>> x = ParseDateTimeUTC("2010-06-04 21:08:12")
>>> datetime.fromtimestamp(x)
datetime.datetime(2010, 3, 6, 21, 8, 12)

References


回答 3

我在文档中找到了datetime.isoformat 。它似乎可以满足您的要求:

datetime.isoformat([sep])

Return a string representing the date and time in ISO 8601 format, YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0, YYYY-MM-DDTHH:MM:SS

If utcoffset() does not return None, a 6-character string is appended, giving the UTC offset in (signed) hours and minutes: YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or, if microsecond is 0 YYYY-MM-DDTHH:MM:SS+HH:MM

The optional argument sep (default 'T') is a one-character separator, placed between the date and time portions of the result. For example,
>>>

>>> from datetime import tzinfo, timedelta, datetime
>>> class TZ(tzinfo):
...     def utcoffset(self, dt): return timedelta(minutes=-399)
...
>>> datetime(2002, 12, 25, tzinfo=TZ()).isoformat(' ')
'2002-12-25 00:00:00-06:39'

I found the datetime.isoformat in the documentation. It seems to do what you want:

datetime.isoformat([sep])

Return a string representing the date and time in ISO 8601 format, YYYY-MM-DDTHH:MM:SS.mmmmmm or, if microsecond is 0, YYYY-MM-DDTHH:MM:SS

If utcoffset() does not return None, a 6-character string is appended, giving the UTC offset in (signed) hours and minutes: YYYY-MM-DDTHH:MM:SS.mmmmmm+HH:MM or, if microsecond is 0 YYYY-MM-DDTHH:MM:SS+HH:MM

The optional argument sep (default 'T') is a one-character separator, placed between the date and time portions of the result. For example,
>>>

>>> from datetime import tzinfo, timedelta, datetime
>>> class TZ(tzinfo):
...     def utcoffset(self, dt): return timedelta(minutes=-399)
...
>>> datetime(2002, 12, 25, tzinfo=TZ()).isoformat(' ')
'2002-12-25 00:00:00-06:39'

回答 4

ISO 8601允许紧凑的表示形式,除了之外没有任何分隔符T,所以我喜欢使用这种单线来获得快速的时间戳字符串:

>>> datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ")
'20180905T140903.591680Z'

如果您不需要微秒,则只需忽略以下.%f部分:

>>> datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
'20180905T140903Z'

对于当地时间:

>>> datetime.datetime.now().strftime("%Y%m%dT%H%M%S")
'20180905T140903'

编辑:

阅读更多内容后,建议您不要使用标点符号。RFC3339建议使用这种样式,因为如果每个人都使用标点符号,则不会有多个ISO 8601字符串按标点符号分组的风险。因此,符合要求的字符串的一个衬里是:

>>> datetime.datetime.now().strftime("%Y-%m-%dT%H:%M%SZ")
'2018-09-05T14:09:03Z'

ISO 8601 allows a compact representation with no separators except for the T, so I like to use this one-liner to get a quick timestamp string:

>>> datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%S.%fZ")
'20180905T140903.591680Z'

If you don’t need the microseconds, just leave out the .%f part:

>>> datetime.datetime.utcnow().strftime("%Y%m%dT%H%M%SZ")
'20180905T140903Z'

For local time:

>>> datetime.datetime.now().strftime("%Y%m%dT%H%M%S")
'20180905T140903'

Edit:

After reading up on this some more, I recommend you leave the punctuation in. RFC 3339 recommends that style because if everyone uses punctuation, there isn’t a risk of things like multiple ISO 8601 strings being sorted in groups on their punctuation. So the one liner for a compliant string would be:

>>> datetime.datetime.now().strftime("%Y-%m-%dT%H:%M%SZ")
'2018-09-05T14:09:03Z'

回答 5

ISO 8601时的格式不存储时区的名称,只有相应的UTC偏移被保留。

要将文件ctime转换为ISO 8601时间字符串,同时在Python中保留UTC偏移量3:

>>> import os
>>> from datetime import datetime, timezone
>>> ts = os.path.getctime(some_file)
>>> dt = datetime.fromtimestamp(ts, timezone.utc)
>>> dt.astimezone().isoformat()
'2015-11-27T00:29:06.839600-05:00'

该代码假定您的本地时区为 东部时区(ET),并且系统为给定的POSIX时间戳(ts)提供了正确的UTC偏移,即Python可以访问您系统上的历史时区数据库或该时区具有在给定日期使用相同的规则。

如果您需要便携式解决方案;使用pytz提供访问tz数据库模块

>>> import os
>>> from datetime import datetime
>>> import pytz  # pip install pytz
>>> ts = os.path.getctime(some_file)
>>> dt = datetime.fromtimestamp(ts, pytz.timezone('America/New_York'))
>>> dt.isoformat()
'2015-11-27T00:29:06.839600-05:00'

在这种情况下,结果是相同的。

如果您需要时区名称/缩写/时区ID,请分别存储。

>>> dt.astimezone().strftime('%Y-%m-%d %H:%M:%S%z (%Z)')
'2015-11-27 00:29:06-0500 (EST)'

注意:不能,:在UTC偏移量和EST时区中,缩写不是ISO 8601时间格式的一部分。它不是唯一的。

同一库的不同库/不同版本可能对同一日期/时区使用不同的时区规则。如果是将来的日期,那么规则可能还未知。换句话说,根据您使用的规则,相同的UTC时间可能对应于不同的本地时间-以ISO 8601格式保存时间可以保留UTC时间,并且本地时间与平台上使用的当前时区规则相对应。如果规则不同,则可能需要在其他平台上重新计算本地时间。

The ISO 8601 time format does not store a time zone name, only the corresponding UTC offset is preserved.

To convert a file ctime to an ISO 8601 time string while preserving the UTC offset in Python 3:

>>> import os
>>> from datetime import datetime, timezone
>>> ts = os.path.getctime(some_file)
>>> dt = datetime.fromtimestamp(ts, timezone.utc)
>>> dt.astimezone().isoformat()
'2015-11-27T00:29:06.839600-05:00'

The code assumes that your local timezone is Eastern Time Zone (ET) and that your system provides a correct UTC offset for the given POSIX timestamp (ts), i.e., Python has access to a historical timezone database on your system or the time zone had the same rules at a given date.

If you need a portable solution; use the pytz module that provides access to the tz database:

>>> import os
>>> from datetime import datetime
>>> import pytz  # pip install pytz
>>> ts = os.path.getctime(some_file)
>>> dt = datetime.fromtimestamp(ts, pytz.timezone('America/New_York'))
>>> dt.isoformat()
'2015-11-27T00:29:06.839600-05:00'

The result is the same in this case.

If you need the time zone name/abbreviation/zone id, store it separately.

>>> dt.astimezone().strftime('%Y-%m-%d %H:%M:%S%z (%Z)')
'2015-11-27 00:29:06-0500 (EST)'

Note: no, : in the UTC offset and EST timezone abbreviation is not part of the ISO 8601 time format. It is not unique.

Different libraries/different versions of the same library may use different time zone rules for the same date/timezone. If it is a future date then the rules might be unknown yet. In other words, the same UTC time may correspond to a different local time depending on what rules you use — saving a time in ISO 8601 format preserves UTC time and the local time that corresponds to the current time zone rules in use on your platform. You might need to recalculate the local time on a different platform if it has different rules.


回答 6

您需要使用os.stat来获取文件创建时间,并结合使用time.strftimetime.timezone进行格式化:

>>> import time
>>> import os
>>> t = os.stat('C:/Path/To/File.txt').st_ctime
>>> t = time.localtime(t)
>>> formatted = time.strftime('%Y-%m-%d %H:%M:%S', t)
>>> tz = str.format('{0:+06.2f}', float(time.timezone) / 3600)
>>> final = formatted + tz
>>> 
>>> final
'2008-11-24 14:46:08-02.00'

You’ll need to use os.stat to get the file creation time and a combination of time.strftime and time.timezone for formatting:

>>> import time
>>> import os
>>> t = os.stat('C:/Path/To/File.txt').st_ctime
>>> t = time.localtime(t)
>>> formatted = time.strftime('%Y-%m-%d %H:%M:%S', t)
>>> tz = str.format('{0:+06.2f}', float(time.timezone) / 3600)
>>> final = formatted + tz
>>> 
>>> final
'2008-11-24 14:46:08-02.00'

回答 7

如果我错了(不是),请纠正我,但是与UTC的偏差会随着夏时制而变化。所以你应该使用

tz = str.format('{0:+06.2f}', float(time.altzone) / 3600)

我还认为该标志应有所不同:

tz = str.format('{0:+06.2f}', -float(time.altzone) / 3600)

我可能是错的,但我不这么认为。

Correct me if I’m wrong (I’m not), but the offset from UTC changes with daylight saving time. So you should use

tz = str.format('{0:+06.2f}', float(time.altzone) / 3600)

I also believe that the sign should be different:

tz = str.format('{0:+06.2f}', -float(time.altzone) / 3600)

I could be wrong, but I don’t think so.


回答 8

我同意Jarek的观点,并且还要注意ISO偏移分隔符是冒号,因此我认为最终答案应该是:

isodate.datetime_isoformat(datetime.datetime.now()) + str.format('{0:+06.2f}', -float(time.timezone) / 3600).replace('.', ':')

I agree with Jarek, and I furthermore note that the ISO offset separator character is a colon, so I think the final answer should be:

isodate.datetime_isoformat(datetime.datetime.now()) + str.format('{0:+06.2f}', -float(time.timezone) / 3600).replace('.', ':')

回答 9

在estani的出色回答中添加了一个小的变化

具有TimeZone且没有微秒信息(Python 3)的ISO 8601本地:

import datetime, time

utc_offset_sec = time.altzone if time.localtime().tm_isdst else time.timezone
utc_offset = datetime.timedelta(seconds=-utc_offset_sec)
datetime.datetime.now().replace(microsecond=0, tzinfo=datetime.timezone(offset=utc_offset)).isoformat()

样本输出:

'2019-11-06T12:12:06-08:00'

测试了此输出可以同时被Javascript Date和C#解析DateTime/DateTimeOffset

Adding a small variation to estani’s excellent answer

Local to ISO 8601 with TimeZone and no microsecond info (Python 3):

import datetime, time

utc_offset_sec = time.altzone if time.localtime().tm_isdst else time.timezone
utc_offset = datetime.timedelta(seconds=-utc_offset_sec)
datetime.datetime.now().replace(microsecond=0, tzinfo=datetime.timezone(offset=utc_offset)).isoformat()

Sample Output:

'2019-11-06T12:12:06-08:00'

Tested that this output can be parsed by both Javascript Date and C# DateTime/DateTimeOffset


回答 10

我开发了此功能:

def iso_8601_format(dt):
    """YYYY-MM-DDThh:mm:ssTZD (1997-07-16T19:20:30-03:00)"""

    if dt is None:
        return ""

    fmt_datetime = dt.strftime('%Y-%m-%dT%H:%M:%S')
    tz = dt.utcoffset()
    if tz is None:
        fmt_timezone = "+00:00"
    else:
        fmt_timezone = str.format('{0:+06.2f}', float(tz.total_seconds() / 3600))

    return fmt_datetime + fmt_timezone

I’ve developed this function:

def iso_8601_format(dt):
    """YYYY-MM-DDThh:mm:ssTZD (1997-07-16T19:20:30-03:00)"""

    if dt is None:
        return ""

    fmt_datetime = dt.strftime('%Y-%m-%dT%H:%M:%S')
    tz = dt.utcoffset()
    if tz is None:
        fmt_timezone = "+00:00"
    else:
        fmt_timezone = str.format('{0:+06.2f}', float(tz.total_seconds() / 3600))

    return fmt_datetime + fmt_timezone

回答 11

import datetime, time    
def convert_enddate_to_seconds(self, ts):
    """Takes ISO 8601 format(string) and converts into epoch time."""
     dt = datetime.datetime.strptime(ts[:-7],'%Y-%m-%dT%H:%M:%S.%f')+\
                datetime.timedelta(hours=int(ts[-5:-3]),
                minutes=int(ts[-2:]))*int(ts[-6:-5]+'1')
    seconds = time.mktime(dt.timetuple()) + dt.microsecond/1000000.0
    return seconds 

>>> import datetime, time
>>> ts = '2012-09-30T15:31:50.262-08:00'
>>> dt = datetime.datetime.strptime(ts[:-7],'%Y-%m-%dT%H:%M:%S.%f')+ datetime.timedelta(hours=int(ts[-5:-3]), minutes=int(ts[-2:]))*int(ts[-6:-5]+'1')
>>> seconds = time.mktime(dt.timetuple()) + dt.microsecond/1000000.0
>>> seconds
1348990310.26
import datetime, time    
def convert_enddate_to_seconds(self, ts):
    """Takes ISO 8601 format(string) and converts into epoch time."""
     dt = datetime.datetime.strptime(ts[:-7],'%Y-%m-%dT%H:%M:%S.%f')+\
                datetime.timedelta(hours=int(ts[-5:-3]),
                minutes=int(ts[-2:]))*int(ts[-6:-5]+'1')
    seconds = time.mktime(dt.timetuple()) + dt.microsecond/1000000.0
    return seconds 

>>> import datetime, time
>>> ts = '2012-09-30T15:31:50.262-08:00'
>>> dt = datetime.datetime.strptime(ts[:-7],'%Y-%m-%dT%H:%M:%S.%f')+ datetime.timedelta(hours=int(ts[-5:-3]), minutes=int(ts[-2:]))*int(ts[-6:-5]+'1')
>>> seconds = time.mktime(dt.timetuple()) + dt.microsecond/1000000.0
>>> seconds
1348990310.26