标签归档:Python

如何查看文件中的更改?

问题:如何查看文件中的更改?

我有一个日志文件正在由另一个进程写入,我想监视它的更改。每次发生更改时,我都希望读入新数据以对其进行一些处理。

最好的方法是什么?我希望从PyWin32库中获得某种吸引。我找到了该win32file.FindNextChangeNotification功能,但不知道如何要求它观看特定文件。

如果有人做了这样的事情,我将不胜感激。

[编辑]我应该提到我在寻求不需要轮询的解决方案。

[编辑]诅咒!看来这在映射的网络驱动器上不起作用。我猜想Windows不会像在本地磁盘上那样“听到”文件的任何更新。

I have a log file being written by another process which I want to watch for changes. Each time a change occurs I’d like to read the new data in to do some processing on it.

What’s the best way to do this? I was hoping there’d be some sort of hook from the PyWin32 library. I’ve found the win32file.FindNextChangeNotification function but have no idea how to ask it to watch a specific file.

If anyone’s done anything like this I’d be really grateful to hear how…

[Edit] I should have mentioned that I was after a solution that doesn’t require polling.

[Edit] Curses! It seems this doesn’t work over a mapped network drive. I’m guessing windows doesn’t ‘hear’ any updates to the file the way it does on a local disk.


回答 0

您是否已经看过http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html上的可用文档?如果只需要它在Windows下运行,则第二个示例似乎正是您想要的(如果您将目录的路径与要观看的文件之一交换了)。

否则,轮询将可能是唯一真正与平台无关的选项。

注意:我还没有尝试过任何这些解决方案。

Have you already looked at the documentation available on http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html? If you only need it to work under Windows the 2nd example seems to be exactly what you want (if you exchange the path of the directory with the one of the file you want to watch).

Otherwise, polling will probably be the only really platform-independent option.

Note: I haven’t tried any of these solutions.


回答 1

您尝试使用看门狗了吗?

Python API库和Shell实用程序可监视文件系统事件。

目录监视变得容易

  • 跨平台API。
  • 一种外壳程序工具,用于响应目录更改而运行命令。

通过快速入门中的一个简单示例快速入门

Did you try using Watchdog?

Python API library and shell utilities to monitor file system events.

Directory monitoring made easy with

  • A cross-platform API.
  • A shell tool to run commands in response to directory changes.

Get started quickly with a simple example in Quickstart


回答 2

如果轮询对您来说足够好,那么我只看“修改时间”文件的统计信息是否发生了变化。阅读:

os.stat(filename).st_mtime

(还要注意,Windows本机更改事件解决方案并非在所有情况下(例如在网络驱动器上)都适用。)

import os

class Monkey(object):
    def __init__(self):
        self._cached_stamp = 0
        self.filename = '/path/to/file'

    def ook(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...

If polling is good enough for you, I’d just watch if the “modified time” file stat changes. To read it:

os.stat(filename).st_mtime

(Also note that the Windows native change event solution does not work in all circumstances, e.g. on network drives.)

import os

class Monkey(object):
    def __init__(self):
        self._cached_stamp = 0
        self.filename = '/path/to/file'

    def ook(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...

回答 3

如果您需要多平台解决方案,请检查QFileSystemWatcher。这里是一个示例代码(未清除):

from PyQt4 import QtCore

@QtCore.pyqtSlot(str)
def directory_changed(path):
    print('Directory Changed!!!')

@QtCore.pyqtSlot(str)
def file_changed(path):
    print('File Changed!!!')

fs_watcher = QtCore.QFileSystemWatcher(['/path/to/files_1', '/path/to/files_2', '/path/to/files_3'])

fs_watcher.connect(fs_watcher, QtCore.SIGNAL('directoryChanged(QString)'), directory_changed)
fs_watcher.connect(fs_watcher, QtCore.SIGNAL('fileChanged(QString)'), file_changed)

If you want a multiplatform solution, then check QFileSystemWatcher. Here an example code (not sanitized):

from PyQt4 import QtCore

@QtCore.pyqtSlot(str)
def directory_changed(path):
    print('Directory Changed!!!')

@QtCore.pyqtSlot(str)
def file_changed(path):
    print('File Changed!!!')

fs_watcher = QtCore.QFileSystemWatcher(['/path/to/files_1', '/path/to/files_2', '/path/to/files_3'])

fs_watcher.connect(fs_watcher, QtCore.SIGNAL('directoryChanged(QString)'), directory_changed)
fs_watcher.connect(fs_watcher, QtCore.SIGNAL('fileChanged(QString)'), file_changed)

回答 4

它不能在Windows上运行(也许使用cygwin吗?),但是对于Unix用户,您应该使用“ fcntl”系统调用。这是Python中的示例。如果您需要用C编写(相同的函数名称),则几乎是相同的代码

import time
import fcntl
import os
import signal

FNAME = "/HOME/TOTO/FILETOWATCH"

def handler(signum, frame):
    print "File %s modified" % (FNAME,)

signal.signal(signal.SIGIO, handler)
fd = os.open(FNAME,  os.O_RDONLY)
fcntl.fcntl(fd, fcntl.F_SETSIG, 0)
fcntl.fcntl(fd, fcntl.F_NOTIFY,
            fcntl.DN_MODIFY | fcntl.DN_CREATE | fcntl.DN_MULTISHOT)

while True:
    time.sleep(10000)

It should not work on windows (maybe with cygwin ?), but for unix user, you should use the “fcntl” system call. Here is an example in Python. It’s mostly the same code if you need to write it in C (same function names)

import time
import fcntl
import os
import signal

FNAME = "/HOME/TOTO/FILETOWATCH"

def handler(signum, frame):
    print "File %s modified" % (FNAME,)

signal.signal(signal.SIGIO, handler)
fd = os.open(FNAME,  os.O_RDONLY)
fcntl.fcntl(fd, fcntl.F_SETSIG, 0)
fcntl.fcntl(fd, fcntl.F_NOTIFY,
            fcntl.DN_MODIFY | fcntl.DN_CREATE | fcntl.DN_MULTISHOT)

while True:
    time.sleep(10000)

回答 5

查看pyinotify

inotify在较新的linux中替换了dnotify(来自较早的答案),并允许文件级而不是目录级的监视。

Check out pyinotify.

inotify replaces dnotify (from an earlier answer) in newer linuxes and allows file-level rather than directory-level monitoring.


回答 6

在对Tim Golden的脚本进行了一些修改之后,我发现以下内容似乎运行良好:

import os

import win32file
import win32con

path_to_watch = "." # look at the current directory
file_to_watch = "test.txt" # look for changes to a file called test.txt

def ProcessNewData( newData ):
    print "Text added: %s"%newData

# Set up the bits we'll need for output
ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001
hDir = win32file.CreateFile (
  path_to_watch,
  FILE_LIST_DIRECTORY,
  win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
  None,
  win32con.OPEN_EXISTING,
  win32con.FILE_FLAG_BACKUP_SEMANTICS,
  None
)

# Open the file we're interested in
a = open(file_to_watch, "r")

# Throw away any exising log data
a.read()

# Wait for new data and call ProcessNewData for each new chunk that's written
while 1:
  # Wait for a change to occur
  results = win32file.ReadDirectoryChangesW (
    hDir,
    1024,
    False,
    win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
    None,
    None
  )

  # For each change, check to see if it's updating the file we're interested in
  for action, file in results:
    full_filename = os.path.join (path_to_watch, file)
    #print file, ACTIONS.get (action, "Unknown")
    if file == file_to_watch:
        newText = a.read()
        if newText != "":
            ProcessNewData( newText )

它可能可以处理更多的错误检查,但是对于简单地查看日志文件并在将其吐到屏幕上之前对其进行一些处理,这很好。

感谢大家的投入-很棒的东西!

Well after a bit of hacking of Tim Golden’s script, I have the following which seems to work quite well:

import os

import win32file
import win32con

path_to_watch = "." # look at the current directory
file_to_watch = "test.txt" # look for changes to a file called test.txt

def ProcessNewData( newData ):
    print "Text added: %s"%newData

# Set up the bits we'll need for output
ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001
hDir = win32file.CreateFile (
  path_to_watch,
  FILE_LIST_DIRECTORY,
  win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
  None,
  win32con.OPEN_EXISTING,
  win32con.FILE_FLAG_BACKUP_SEMANTICS,
  None
)

# Open the file we're interested in
a = open(file_to_watch, "r")

# Throw away any exising log data
a.read()

# Wait for new data and call ProcessNewData for each new chunk that's written
while 1:
  # Wait for a change to occur
  results = win32file.ReadDirectoryChangesW (
    hDir,
    1024,
    False,
    win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
    None,
    None
  )

  # For each change, check to see if it's updating the file we're interested in
  for action, file in results:
    full_filename = os.path.join (path_to_watch, file)
    #print file, ACTIONS.get (action, "Unknown")
    if file == file_to_watch:
        newText = a.read()
        if newText != "":
            ProcessNewData( newText )

It could probably do with a load more error checking, but for simply watching a log file and doing some processing on it before spitting it out to the screen, this works well.

Thanks everyone for your input – great stuff!


回答 7

为了观看具有轮询和最小依赖性的单个文件,下面是一个基于Deestan(上图)的答案的充实示例:

import os
import sys 
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_file, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self.filename = watch_file
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...
            print('File changed')
            if self.call_func_on_change is not None:
                self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop        
    def watch(self):
        while self.running: 
            try: 
                # Look for changes
                time.sleep(self.refresh_delay_secs) 
                self.look() 
            except KeyboardInterrupt: 
                print('\nDone') 
                break 
            except FileNotFoundError:
                # Action on file not found
                pass
            except: 
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)

watch_file = 'my_file.txt'

# watcher = Watcher(watch_file)  # simple
watcher = Watcher(watch_file, custom_action, text='yes, changed')  # also call custom action function
watcher.watch()  # start the watch going

For watching a single file with polling, and minimal dependencies, here is a fully fleshed-out example, based on answer from Deestan (above):

import os
import sys 
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_file, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self.filename = watch_file
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...
            print('File changed')
            if self.call_func_on_change is not None:
                self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop        
    def watch(self):
        while self.running: 
            try: 
                # Look for changes
                time.sleep(self.refresh_delay_secs) 
                self.look() 
            except KeyboardInterrupt: 
                print('\nDone') 
                break 
            except FileNotFoundError:
                # Action on file not found
                pass
            except: 
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)

watch_file = 'my_file.txt'

# watcher = Watcher(watch_file)  # simple
watcher = Watcher(watch_file, custom_action, text='yes, changed')  # also call custom action function
watcher.watch()  # start the watch going

回答 8

检查类似问题的回答。您可以在Python中尝试相同的循环。该页面建议:

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline

另请参见问题tail()使用Python的文件

Check my answer to a similar question. You could try the same loop in Python. This page suggests:

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        time.sleep(1)
        file.seek(where)
    else:
        print line, # already has newline

Also see the question tail() a file with Python.


回答 9

对我来说,最简单的解决方案是使用看门狗工具watchmedo

现在从https://pypi.python.org/pypi/watchdog获得一个进程,该进程可以查找目录中的sql文件,并在必要时执行它们。

watchmedo shell-command \
--patterns="*.sql" \
--recursive \
--command='~/Desktop/load_files_into_mysql_database.sh' \
.

Simplest solution for me is using watchdog’s tool watchmedo

From https://pypi.python.org/pypi/watchdog I now have a process that looks up the sql files in a directory and executes them if necessary.

watchmedo shell-command \
--patterns="*.sql" \
--recursive \
--command='~/Desktop/load_files_into_mysql_database.sh' \
.

回答 10

好吧,由于您使用的是Python,因此您可以打开一个文件并继续读取其中的行。

f = open('file.log')

如果读取的行不为空,则对其进行处理。

line = f.readline()
if line:
    // Do what you want with the line

您可能会错过继续拨打readlineEOF的权限。在这种情况下,它将仅返回一个空字符串。并且,将某些内容添加到日志文件后,将根据需要从停止的位置继续读取。

如果您正在寻找使用事件或特定库的解决方案,请在问题中进行指定。否则,我认为这种解决方案就可以了。

Well, since you are using Python, you can just open a file and keep reading lines from it.

f = open('file.log')

If the line read is not empty, you process it.

line = f.readline()
if line:
    // Do what you want with the line

You may be missing that it is ok to keep calling readline at the EOF. It will just keep returning an empty string in this case. And when something is appended to the log file, the reading will continue from where it stopped, as you need.

If you are looking for a solution that uses events, or a particular library, please specify this in your question. Otherwise, I think this solution is just fine.


回答 11

这是Kender代码的简化版本,看起来可以起到相同的作用,并且不会导入整个文件:

# Check file for new data.

import time

f = open(r'c:\temp\test.txt', 'r')

while True:

    line = f.readline()
    if not line:
        time.sleep(1)
        print 'Nothing New'
    else:
        print 'Call Function: ', line

Here is a simplified version of Kender’s code that appears to do the same trick and does not import the entire file:

# Check file for new data.

import time

f = open(r'c:\temp\test.txt', 'r')

while True:

    line = f.readline()
    if not line:
        time.sleep(1)
        print 'Nothing New'
    else:
        print 'Call Function: ', line

回答 12

这是Tim Goldan脚本的另一种修改,该脚本可在unix类型上运行,并通过使用dict(file => time)添加了一个简单的文件修改监视程序。

用法:whateverName.py path_to_dir_to_watch

#!/usr/bin/env python

import os, sys, time

def files_to_timestamp(path):
    files = [os.path.join(path, f) for f in os.listdir(path)]
    return dict ([(f, os.path.getmtime(f)) for f in files])

if __name__ == "__main__":

    path_to_watch = sys.argv[1]
    print('Watching {}..'.format(path_to_watch))

    before = files_to_timestamp(path_to_watch)

    while 1:
        time.sleep (2)
        after = files_to_timestamp(path_to_watch)

        added = [f for f in after.keys() if not f in before.keys()]
        removed = [f for f in before.keys() if not f in after.keys()]
        modified = []

        for f in before.keys():
            if not f in removed:
                if os.path.getmtime(f) != before.get(f):
                    modified.append(f)

        if added: print('Added: {}'.format(', '.join(added)))
        if removed: print('Removed: {}'.format(', '.join(removed)))
        if modified: print('Modified: {}'.format(', '.join(modified)))

        before = after

This is another modification of Tim Goldan’s script that runs on unix types and adds a simple watcher for file modification by using a dict (file=>time).

usage: whateverName.py path_to_dir_to_watch

#!/usr/bin/env python

import os, sys, time

def files_to_timestamp(path):
    files = [os.path.join(path, f) for f in os.listdir(path)]
    return dict ([(f, os.path.getmtime(f)) for f in files])

if __name__ == "__main__":

    path_to_watch = sys.argv[1]
    print('Watching {}..'.format(path_to_watch))

    before = files_to_timestamp(path_to_watch)

    while 1:
        time.sleep (2)
        after = files_to_timestamp(path_to_watch)

        added = [f for f in after.keys() if not f in before.keys()]
        removed = [f for f in before.keys() if not f in after.keys()]
        modified = []

        for f in before.keys():
            if not f in removed:
                if os.path.getmtime(f) != before.get(f):
                    modified.append(f)

        if added: print('Added: {}'.format(', '.join(added)))
        if removed: print('Removed: {}'.format(', '.join(removed)))
        if modified: print('Modified: {}'.format(', '.join(modified)))

        before = after

回答 13

正如您在由Horst Gutmann指出的Tim Golden的文章中所看到,WIN32比较复杂,并且监视目录,而不是单个文件。

我想建议您研究IronPython,它是一个.NET python实现。借助IronPython,您可以使用所有.NET功能-包括

System.IO.FileSystemWatcher

使用简单的事件接口处理单个文件。

As you can see in Tim Golden’s article, pointed by Horst Gutmann, WIN32 is relatively complex and watches directories, not a single file.

I’d like to suggest you look into IronPython, which is a .NET python implementation. With IronPython you can use all the .NET functionality – including

System.IO.FileSystemWatcher

Which handles single files with a simple Event interface.


回答 14

这是检查文件更改的示例。这样做可能不是最好的方法,但是肯定是很短的方法。

对源进行更改后重新启动应用程序的便捷工具。我在使用pygame玩游戏时做到了这一点,因此我可以看到文件保存后立即发生了效果。

在pygame中使用时,请确保“ while”循环中的内容已放置在您的游戏循环中,也称为update或其他内容。否则,您的应用程序将陷入无限循环,并且您将看不到游戏更新。

file_size_stored = os.stat('neuron.py').st_size

  while True:
    try:
      file_size_current = os.stat('neuron.py').st_size
      if file_size_stored != file_size_current:
        restart_program()
    except: 
      pass

如果您想要我在网上找到的重启代码。这里是。(与问题无关,尽管可以派上用场)

def restart_program(): #restart application
    python = sys.executable
    os.execl(python, python, * sys.argv)

让电子做您想做的事情变得有趣。

This is an example of checking a file for changes. One that may not be the best way of doing it, but it sure is a short way.

Handy tool for restarting application when changes have been made to the source. I made this when playing with pygame so I can see effects take place immediately after file save.

When used in pygame make sure the stuff in the ‘while’ loop is placed in your game loop aka update or whatever. Otherwise your application will get stuck in an infinite loop and you will not see your game updating.

file_size_stored = os.stat('neuron.py').st_size

  while True:
    try:
      file_size_current = os.stat('neuron.py').st_size
      if file_size_stored != file_size_current:
        restart_program()
    except: 
      pass

In case you wanted the restart code which I found on the web. Here it is. (Not relevant to the question, though it could come in handy)

def restart_program(): #restart application
    python = sys.executable
    os.execl(python, python, * sys.argv)

Have fun making electrons do what you want them to do.


回答 15

ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001

class myThread (threading.Thread):
    def __init__(self, threadID, fileName, directory, origin):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.fileName = fileName
        self.daemon = True
        self.dir = directory
        self.originalFile = origin
    def run(self):
        startMonitor(self.fileName, self.dir, self.originalFile)

def startMonitor(fileMonitoring,dirPath,originalFile):
    hDir = win32file.CreateFile (
        dirPath,
        FILE_LIST_DIRECTORY,
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    # Wait for new data and call ProcessNewData for each new chunk that's
    # written
    while 1:
        # Wait for a change to occur
        results = win32file.ReadDirectoryChangesW (
            hDir,
            1024,
            False,
            win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
            None,
            None
        )
        # For each change, check to see if it's updating the file we're
        # interested in
        for action, file_M in results:
            full_filename = os.path.join (dirPath, file_M)
            #print file, ACTIONS.get (action, "Unknown")
            if len(full_filename) == len(fileMonitoring) and action == 3:
                #copy to main file
                ...
ACTIONS = {
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
}
FILE_LIST_DIRECTORY = 0x0001

class myThread (threading.Thread):
    def __init__(self, threadID, fileName, directory, origin):
        threading.Thread.__init__(self)
        self.threadID = threadID
        self.fileName = fileName
        self.daemon = True
        self.dir = directory
        self.originalFile = origin
    def run(self):
        startMonitor(self.fileName, self.dir, self.originalFile)

def startMonitor(fileMonitoring,dirPath,originalFile):
    hDir = win32file.CreateFile (
        dirPath,
        FILE_LIST_DIRECTORY,
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
        None,
        win32con.OPEN_EXISTING,
        win32con.FILE_FLAG_BACKUP_SEMANTICS,
        None
    )
    # Wait for new data and call ProcessNewData for each new chunk that's
    # written
    while 1:
        # Wait for a change to occur
        results = win32file.ReadDirectoryChangesW (
            hDir,
            1024,
            False,
            win32con.FILE_NOTIFY_CHANGE_LAST_WRITE,
            None,
            None
        )
        # For each change, check to see if it's updating the file we're
        # interested in
        for action, file_M in results:
            full_filename = os.path.join (dirPath, file_M)
            #print file, ACTIONS.get (action, "Unknown")
            if len(full_filename) == len(fileMonitoring) and action == 3:
                #copy to main file
                ...

回答 16

这是一个示例,用于观察每秒写入不超过一行但通常少得多的输入文件。目标是将最后一行(最新写入)追加到指定的输出文件。我已从我的一个项目中复制了此内容,并删除了所有不相关的行。您必须填写或更改缺少的符号。

from PyQt5.QtCore import QFileSystemWatcher, QSettings, QThread
from ui_main_window import Ui_MainWindow   # Qt Creator gen'd 

class MainWindow(QMainWindow, Ui_MainWindow):
    def __init__(self, parent=None):
        QMainWindow.__init__(self, parent)
        Ui_MainWindow.__init__(self)
        self._fileWatcher = QFileSystemWatcher()
        self._fileWatcher.fileChanged.connect(self.fileChanged)

    def fileChanged(self, filepath):
        QThread.msleep(300)    # Reqd on some machines, give chance for write to complete
        # ^^ About to test this, may need more sophisticated solution
        with open(filepath) as file:
            lastLine = list(file)[-1]
        destPath = self._filemap[filepath]['dest file']
        with open(destPath, 'a') as out_file:               # a= append
            out_file.writelines([lastLine])

当然,并不是严格要求包含QMainWindow类。您可以单独使用QFileSystemWatcher。

Here’s an example geared toward watching input files that write no more than one line per second but usually a lot less. The goal is to append the last line (most recent write) to the specified output file. I’ve copied this from one of my projects and just deleted all the irrelevant lines. You’ll have to fill in or change the missing symbols.

from PyQt5.QtCore import QFileSystemWatcher, QSettings, QThread
from ui_main_window import Ui_MainWindow   # Qt Creator gen'd 

class MainWindow(QMainWindow, Ui_MainWindow):
    def __init__(self, parent=None):
        QMainWindow.__init__(self, parent)
        Ui_MainWindow.__init__(self)
        self._fileWatcher = QFileSystemWatcher()
        self._fileWatcher.fileChanged.connect(self.fileChanged)

    def fileChanged(self, filepath):
        QThread.msleep(300)    # Reqd on some machines, give chance for write to complete
        # ^^ About to test this, may need more sophisticated solution
        with open(filepath) as file:
            lastLine = list(file)[-1]
        destPath = self._filemap[filepath]['dest file']
        with open(destPath, 'a') as out_file:               # a= append
            out_file.writelines([lastLine])

Of course, the encompassing QMainWindow class is not strictly required, ie. you can use QFileSystemWatcher alone.


回答 17

您还可以使用一个名为repyt的简单库,这是一个示例:

repyt ./app.py

You can also use a simple library called repyt, here is an example:

repyt ./app.py

回答 18

似乎没有人发布fswatch。它是一个跨平台的文件系统监视程序。只需安装,运行并按照提示进行操作即可。

我已经将它与python和golang程序一起使用,并且可以正常工作。

Seems that no one has posted fswatch. It is a cross-platform file system watcher. Just install it, run it and follow the prompts.

I’ve used it with python and golang programs and it just works.


回答 19

相关的@ 4Oh4解决方案可以平滑地更改要查看的文件列表;

import os
import sys
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_files, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self._cached_stamp_files = {}
        self.filenames = watch_files
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        for file in self.filenames:
            stamp = os.stat(file).st_mtime
            if not file in self._cached_stamp_files:
                self._cached_stamp_files[file] = 0
            if stamp != self._cached_stamp_files[file]:
                self._cached_stamp_files[file] = stamp
                # File has changed, so do something...
                file_to_read = open(file, 'r')
                value = file_to_read.read()
                print("value from file", value)
                file_to_read.seek(0)
                if self.call_func_on_change is not None:
                    self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop
    def watch(self):
        while self.running:
            try:
                # Look for changes
                time.sleep(self.refresh_delay_secs)
                self.look()
            except KeyboardInterrupt:
                print('\nDone')
                break
            except FileNotFoundError:
                # Action on file not found
                pass
            except Exception as e:
                print(e)
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)
    # pass

watch_files = ['/Users/mexekanez/my_file.txt', '/Users/mexekanez/my_file1.txt']

# watcher = Watcher(watch_file)  # simple



if __name__ == "__main__":
    watcher = Watcher(watch_files, custom_action, text='yes, changed')  # also call custom action function
    watcher.watch()  # start the watch going

related @4Oh4 solution a smooth change for a list of files to watch;

import os
import sys
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_files, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self._cached_stamp_files = {}
        self.filenames = watch_files
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        for file in self.filenames:
            stamp = os.stat(file).st_mtime
            if not file in self._cached_stamp_files:
                self._cached_stamp_files[file] = 0
            if stamp != self._cached_stamp_files[file]:
                self._cached_stamp_files[file] = stamp
                # File has changed, so do something...
                file_to_read = open(file, 'r')
                value = file_to_read.read()
                print("value from file", value)
                file_to_read.seek(0)
                if self.call_func_on_change is not None:
                    self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop
    def watch(self):
        while self.running:
            try:
                # Look for changes
                time.sleep(self.refresh_delay_secs)
                self.look()
            except KeyboardInterrupt:
                print('\nDone')
                break
            except FileNotFoundError:
                # Action on file not found
                pass
            except Exception as e:
                print(e)
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    print(text)
    # pass

watch_files = ['/Users/mexekanez/my_file.txt', '/Users/mexekanez/my_file1.txt']

# watcher = Watcher(watch_file)  # simple



if __name__ == "__main__":
    watcher = Watcher(watch_files, custom_action, text='yes, changed')  # also call custom action function
    watcher.watch()  # start the watch going

回答 20

最好和最简单的解决方案是使用pygtail:https ://pypi.python.org/pypi/pygtail

from pygtail import Pygtail
import sys

while True:
    for line in Pygtail("some.log"):
        sys.stdout.write(line)

The best and simplest solution is to use pygtail: https://pypi.python.org/pypi/pygtail

from pygtail import Pygtail
import sys

while True:
    for line in Pygtail("some.log"):
        sys.stdout.write(line)

回答 21

我不知道Windows的任何特定功能。您可以尝试每秒钟/分钟/小时获取文件的MD5哈希值(取决于您需要的速度),并将其与最后一个哈希值进行比较。如果不同,您就知道文件已更改,并读出了最新的行。

I don’t know any Windows specific function. You could try getting the MD5 hash of the file every second/minute/hour (depends on how fast you need it) and compare it to the last hash. When it differs you know the file has been changed and you read out the newest lines.


回答 22

我会尝试这样的事情。

    try:
            f = open(filePath)
    except IOError:
            print "No such file: %s" % filePath
            raw_input("Press Enter to close window")
    try:
            lines = f.readlines()
            while True:
                    line = f.readline()
                    try:
                            if not line:
                                    time.sleep(1)
                            else:
                                    functionThatAnalisesTheLine(line)
                    except Exception, e:
                            # handle the exception somehow (for example, log the trace) and raise the same exception again
                            raw_input("Press Enter to close window")
                            raise e
    finally:
            f.close()

循环检查自上次读取文件以来是否存在新行-如果存在,则将其读取并传递给functionThatAnalisesTheLine函数。如果不是,脚本将等待1秒钟,然后重试该过程。

I’d try something like this.

    try:
            f = open(filePath)
    except IOError:
            print "No such file: %s" % filePath
            raw_input("Press Enter to close window")
    try:
            lines = f.readlines()
            while True:
                    line = f.readline()
                    try:
                            if not line:
                                    time.sleep(1)
                            else:
                                    functionThatAnalisesTheLine(line)
                    except Exception, e:
                            # handle the exception somehow (for example, log the trace) and raise the same exception again
                            raw_input("Press Enter to close window")
                            raise e
    finally:
            f.close()

The loop checks if there is a new line(s) since last time file was read – if there is, it’s read and passed to the functionThatAnalisesTheLine function. If not, script waits 1 second and retries the process.


如何使用matplotlib.pyplot更改图例大小

问题:如何使用matplotlib.pyplot更改图例大小

这里有一个简单的问题:我试图使用matplotlib.pyplot较小的图例(即,文本较小)。我正在使用的代码是这样的:

plot.figure()
plot.scatter(k, sum_cf, color='black', label='Sum of Cause Fractions')
plot.scatter(k, data[:, 0],  color='b', label='Dis 1: cf = .6, var = .2')
plot.scatter(k, data[:, 1],  color='r',  label='Dis 2: cf = .2, var = .1')
plot.scatter(k, data[:, 2],  color='g', label='Dis 3: cf = .1, var = .01')
plot.legend(loc=2)

Simple question here: I’m trying to get the size of my legend using matplotlib.pyplot to be smaller (i.e., the text to be smaller). The code I’m using goes something like this:

plot.figure()
plot.scatter(k, sum_cf, color='black', label='Sum of Cause Fractions')
plot.scatter(k, data[:, 0],  color='b', label='Dis 1: cf = .6, var = .2')
plot.scatter(k, data[:, 1],  color='r',  label='Dis 2: cf = .2, var = .1')
plot.scatter(k, data[:, 2],  color='g', label='Dis 3: cf = .1, var = .01')
plot.legend(loc=2)

回答 0

您可以通过调整prop关键字为图例设置单独的字体大小。

plot.legend(loc=2, prop={'size': 6})

这需要对应于matplotlib.font_manager.FontProperties属性的关键字字典。请参阅说明文件的文档

关键字参数:

prop: [ None | FontProperties | dict ]
    A matplotlib.font_manager.FontProperties instance. If prop is a 
    dictionary, a new instance will be created with prop. If None, use
    rc settings.

1.2.1版开始,也可以使用关键字fontsize

You can set an individual font size for the legend by adjusting the prop keyword.

plot.legend(loc=2, prop={'size': 6})

This takes a dictionary of keywords corresponding to matplotlib.font_manager.FontProperties properties. See the documentation for legend:

Keyword arguments:

prop: [ None | FontProperties | dict ]
    A matplotlib.font_manager.FontProperties instance. If prop is a 
    dictionary, a new instance will be created with prop. If None, use
    rc settings.

It is also possible, as of version 1.2.1, to use the keyword fontsize.


回答 1

这应该做

import pylab as plot
params = {'legend.fontsize': 20,
          'legend.handlelength': 2}
plot.rcParams.update(params)

然后再做图。

还有很多其他rcParam,它们也可以在matplotlibrc文件中设置。

大概还可以通过matplotlib.font_manager.FontProperties实例更改它,但是我不知道该怎么做。->请参阅Yann的答案。

This should do

import pylab as plot
params = {'legend.fontsize': 20,
          'legend.handlelength': 2}
plot.rcParams.update(params)

Then do the plot afterwards.

There are a ton of other rcParams, they can also be set in the matplotlibrc file.

Also presumably you can change it passing a matplotlib.font_manager.FontProperties instance but this I don’t know how to do. –> see Yann’s answer.


回答 2

使用 import matplotlib.pyplot as plt

方法1:调用图例时指定字体大小(重复)

plt.legend(fontsize=20) # using a size in points
plt.legend(fontsize="x-large") # using a named size

使用此方法,您可以在创建时为每个图例设置字体大小(允许您拥有多个具有不同字体大小的图例)。但是,每次创建图例时,都必须手动键入所有内容。

(注意:@ Mathias711在他的答案中列出了可用的命名字体大小)

方法2:在rcParams中指定字体大小(方便)

plt.rc('legend',fontsize=20) # using a size in points
plt.rc('legend',fontsize='medium') # using a named size

使用此方法,您可以设置默认的图例字体大小,除非使用方法1另行指定,否则所有图例将自动使用该字体。这意味着您可以在代码开头设置图例字体大小,而不必担心为每个图例设置它。

如果你使用了一个名为大小例如'medium',那么传说中的文本将与全球规模font.sizercParams。改变font.size用途plt.rc(font.size='medium')

using import matplotlib.pyplot as plt

Method 1: specify the fontsize when calling legend (repetitive)

plt.legend(fontsize=20) # using a size in points
plt.legend(fontsize="x-large") # using a named size

With this method you can set the fontsize for each legend at creation (allowing you to have multiple legends with different fontsizes). However, you will have to type everything manually each time you create a legend.

(Note: @Mathias711 listed the available named fontsizes in his answer)

Method 2: specify the fontsize in rcParams (convenient)

plt.rc('legend',fontsize=20) # using a size in points
plt.rc('legend',fontsize='medium') # using a named size

With this method you set the default legend fontsize, and all legends will automatically use that unless you specify otherwise using method 1. This means you can set your legend fontsize at the beginning of your code, and not worry about setting it for each individual legend.

If you use a named size e.g. 'medium', then the legend text will scale with the global font.size in rcParams. To change font.size use plt.rc(font.size='medium')


回答 3

除了点的大小,还有一些命名的fontsizes

xx-small
x-small
small
medium
large
x-large
xx-large

用法:

pyplot.legend(loc=2, fontsize = 'x-small')

There are also a few named fontsizes, apart from the size in points:

xx-small
x-small
small
medium
large
x-large
xx-large

Usage:

pyplot.legend(loc=2, fontsize = 'x-small')

回答 4

有多种设置可用于调整图例大小。我发现最有用的两个是:

  • labelspacing:以字体大小的倍数设置标签条目之间的间距。例如使用10磅字体,legend(..., labelspacing=0.2)会将条目之间的间距减少到2点。我安装的默认值约为0.5。
  • prop:可以完全控制字体大小等。您可以使用设置8点字体legend(..., prop={'size':8})。我安装的默认值约为14点。

此外,图例的文档列出了许多其他填充的和间隔的参数,包括:borderpadhandlelengthhandletextpadborderaxespad,和columnspacing。这些都遵循相同的格式,与labelspacing和area相同,也是fontsize的倍数。

也可以使用matplotlibrc文件将这些值设置为所有图形的默认值。

There are multiple settings for adjusting the legend size. The two I find most useful are:

  • labelspacing: which sets the spacing between label entries in multiples of the font size. For instance with a 10 point font, legend(..., labelspacing=0.2) will reduce the spacing between entries to 2 points. The default on my install is about 0.5.
  • prop: which allows full control of the font size, etc. You can set an 8 point font using legend(..., prop={'size':8}). The default on my install is about 14 points.

In addition, the legend documentation lists a number of other padding and spacing parameters including: borderpad, handlelength, handletextpad, borderaxespad, and columnspacing. These all follow the same form as labelspacing and area also in multiples of fontsize.

These values can also be set as the defaults for all figures using the matplotlibrc file.


回答 5

在我的安装中,FontProperties仅更改文本大小,但它仍然太大且间隔开。我在pyplot.rcParams:中找到了一个参数legend.labelspacing,我猜它被设置为字体大小的一小部分。我已经改变了

pyplot.rcParams.update({'legend.labelspacing':0.25})

我不确定如何将其指定给pyplot.legend函数-传递

prop={'labelspacing':0.25}

要么

prop={'legend.labelspacing':0.25}

返回错误。

On my install, FontProperties only changes the text size, but it’s still too large and spaced out. I found a parameter in pyplot.rcParams: legend.labelspacing, which I’m guessing is set to a fraction of the font size. I’ve changed it with

pyplot.rcParams.update({'legend.labelspacing':0.25})

I’m not sure how to specify it to the pyplot.legend function – passing

prop={'labelspacing':0.25}

or

prop={'legend.labelspacing':0.25}

comes back with an error.


回答 6

plot.legend(loc =’右下角’,decimal_places = 2,fontsize =’11’,title =’嘿’,title_fontsize =’20’)

plot.legend(loc = ‘lower right’, decimal_places = 2, fontsize = ’11’, title = ‘Hey there’, title_fontsize = ’20’)


将Python字典转换为kwargs?

问题:将Python字典转换为kwargs?

我想使用类继承构建一个针对sunburnt(solr interface)的查询,因此将键-值对加在一起。sunburnt接口带有关键字参数。如何将字典({'type':'Event'})转换为关键字参数(type='Event')

I want to build a query for sunburnt(solr interface) using class inheritance and therefore adding key – value pairs together. The sunburnt interface takes keyword arguments. How can I transform a dict ({'type':'Event'}) into keyword arguments (type='Event')?


回答 0

使用双星运算符(又名double-splat?):

func(**{'type':'Event'})

相当于

func(type='Event')

Use the double-star (aka double-splat?) operator:

func(**{'type':'Event'})

is equivalent to

func(type='Event')

回答 1

** 操作员在这里会有所帮助。

**操作员将解开dict元素的包装,因此**{'type':'Event'}将被视为type='Event'

func(**{'type':'Event'}) 与…相同 func(type='Event') dict元素将转换为相同keyword arguments

费耶

* 将解压缩列表元素,它们将被视为 positional arguments

func(*['one', 'two']) 与…相同 func('one', 'two')

** operator would be helpful here.

** operator will unpack the dict elements and thus **{'type':'Event'} would be treated as type='Event'

func(**{'type':'Event'}) is same as func(type='Event') i.e the dict elements would be converted to the keyword arguments.

FYI

* will unpack the list elements and they would be treated as positional arguments.

func(*['one', 'two']) is same as func('one', 'two')


回答 2

这是一个完整的示例,显示了如何使用**运算符将字典中的值作为关键字参数传递。

>>> def f(x=2):
...     print(x)
... 
>>> new_x = {'x': 4}
>>> f()        #    default value x=2
2
>>> f(x=3)     #   explicit value x=3
3
>>> f(**new_x) # dictionary value x=4 
4

Here is a complete example showing how to use the ** operator to pass values from a dictionary as keyword arguments.

>>> def f(x=2):
...     print(x)
... 
>>> new_x = {'x': 4}
>>> f()        #    default value x=2
2
>>> f(x=3)     #   explicit value x=3
3
>>> f(**new_x) # dictionary value x=4 
4

pip在哪里安装其软件包?

问题:pip在哪里安装其软件包?

我激活了已安装pip的virtualenv。我做了

pip3 install Django==1.8

和Django成功下载。现在,我想打开Django文件夹。文件夹在哪里?通常它会在“下载”中,但是我不确定如果在virtualenv中使用pip安装它会在哪里。

I activated a virtualenv which has pip installed. I did

pip3 install Django==1.8

and Django successfully downloaded. Now, I want to open up the Django folder. Where is the folder located? Normally it would be in “downloads” but I’m not sure where it would be if I installed it using pip in a virtualenv.


回答 0

virtualenv一起使用时,pip通常会在路径中安装软件包<virtualenv_name>/lib/<python_ver>/site-packages

例如,我使用Python 2.7 创建了一个名为venv_test的测试virtualenv ,该文件夹位于中。djangovenv_test/lib/python2.7/site-packages/django

pip when used with virtualenv will generally install packages in the path <virtualenv_name>/lib/<python_ver>/site-packages.

For example, I created a test virtualenv named venv_test with Python 2.7, and the django folder is in venv_test/lib/python2.7/site-packages/django.


回答 1

根据大众需求,通过发布的答案提供了一个选项:

pip show <package name>将提供Windows和macOS的位置,我猜是任何系统。:)

例如:

> pip show cvxopt
Name: cvxopt
Version: 1.2.0
...
Location: /usr/local/lib/python2.7/site-packages

By popular demand, an option provided via posted answer:

pip show <package name> will provide the location for Windows and macOS, and I’m guessing any system. :)

For example:

> pip show cvxopt
Name: cvxopt
Version: 1.2.0
...
Location: /usr/local/lib/python2.7/site-packages

回答 2

pip list -v可用于列出软件包的安装位置,该位置在https://pip.pypa.io/zh/stable/news/#b1-2018-03-31中引入

当列表命令与“ -v”选项一起运行时,显示安装位置。(#979)

>pip list -v
Package                  Version   Location                                                             Installer
------------------------ --------- -------------------------------------------------------------------- ---------
alabaster                0.7.12    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
apipkg                   1.5       c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
argcomplete              1.10.3    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
astroid                  2.3.3     c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
...

更新pip10.0.0b1中引入了此功能。在Ubuntu 18.04上,pippip3安装有sudo apt install python-pip或是sudo apt install python3-pip9.0.1的版本,但没有此功能。检查https://github.com/pypa/pip/issues/5599,了解升级pip或升级的合适方法pip3

pip list -v can be used to list packages’ install locations, introduced in https://pip.pypa.io/en/stable/news/#b1-2018-03-31

Show install locations when list command ran with “-v” option. (#979)

>pip list -v
Package                  Version   Location                                                             Installer
------------------------ --------- -------------------------------------------------------------------- ---------
alabaster                0.7.12    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
apipkg                   1.5       c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
argcomplete              1.10.3    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
astroid                  2.3.3     c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
...

Update: This feature is introduced in pip 10.0.0b1. On Ubuntu 18.04, pip or pip3 installed with sudo apt install python-pip or sudo apt install python3-pip is 9.0.1 which doesn’t have this feature. Check https://github.com/pypa/pip/issues/5599 for suitable ways of upgrading pip or pip3.


回答 3

默认情况下,在Linux上,Pip将软件包安装到/usr/local/lib/python2.7/dist-packages。

在安装过程中使用virtualenv或–user将更改此默认位置。如果使用,请pip show确保使用的用户正确,否则pip可能看不到您所引用的软件包。

By default, on Linux, Pip installs packages to /usr/local/lib/python2.7/dist-packages.

Using virtualenv or –user during install will change this default location. If you use pip show make sure you are using the right user or else pip may not see the packages you are referencing.


回答 4

在Python解释器或脚本中,您可以执行

import site
site.getsitepackages() # list of global package locations

site.getusersitepackages() #string for user-specific package location

位置安装了第三方软件包(不在核心Python发行版中)。

在MacOS上我的Brew安装的Python上,前者输出

['/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages']

pip show如上一个答案所述,它规范化到所输出的相同路径:

$ readlink -f /usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
/usr/local/lib/python3.7/site-packages

参考:https : //docs.python.org/3/library/site.html#site.getsitepackages

In a Python interpreter or script, you can do

import site
site.getsitepackages() # list of global package locations

and

site.getusersitepackages() #string for user-specific package location

for locations 3rd party packages (those not in the core Python distribution) are installed to.

On my Brew-installed Python on MacOS, the former outputs

['/usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages'],

which canonicalizes to the same path output by pip show, as mentioned in a previous answer:

$ readlink -f /usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages
/usr/local/lib/python3.7/site-packages

Reference: https://docs.python.org/3/library/site.html#site.getsitepackages


如何在Python中将浮点数格式化为固定宽度

问题:如何在Python中将浮点数格式化为固定宽度

如何按照以下要求将浮点数格式化为固定宽度:

  1. 如果n <1,则前导零
  2. 添加尾随的十进制零以填充固定宽度
  3. 截断超出固定宽度的十进制数字
  4. 对齐所有小数点

例如:

% formatter something like '{:06}'
numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print formatter.format(number)

输出会像

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

How do I format a floating number to a fixed width with the following requirements:

  1. Leading zero if n < 1
  2. Add trailing decimal zero(s) to fill up fixed width
  3. Truncate decimal digits past fixed width
  4. Align all decimal points

For example:

% formatter something like '{:06}'
numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print formatter.format(number)

The output would be like

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

回答 0

for x in numbers:
    print "{:10.4f}".format(x)

版画

   23.2300
    0.1233
    1.0000
    4.2230
 9887.2000

花括号内的格式说明符遵循Python格式字符串语法。具体来说,在这种情况下,它由以下部分组成:

  • 空字符串冒号前的手段“采取下一个提供参数format()” -在这种情况下,x作为唯一的参数。
  • 10.4f冒号之后的部分是格式规范
  • f表示定点表示法。
  • 10是该领域的总宽度被印刷,用空格lefted-填充。
  • 4是小数点后的位数。
for x in numbers:
    print "{:10.4f}".format(x)

prints

   23.2300
    0.1233
    1.0000
    4.2230
 9887.2000

The format specifier inside the curly braces follows the Python format string syntax. Specifically, in this case, it consists of the following parts:

  • The empty string before the colon means “take the next provided argument to format()” – in this case the x as the only argument.
  • The 10.4f part after the colon is the format specification.
  • The f denotes fixed-point notation.
  • The 10 is the total width of the field being printed, lefted-padded by spaces.
  • The 4 is the number of digits after the decimal point.

回答 1

自从这个答案问了已经好几年了,但是从Python 3.6(PEP498)开始,您可以使用新的f-strings

numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print(f'{number:9.4f}')

印刷品:

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

It has been a few years since this was answered, but as of Python 3.6 (PEP498) you could use the new f-strings:

numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print(f'{number:9.4f}')

Prints:

  23.2300
   0.1233
   1.0000
   4.2230
9887.2000

回答 2

在python3中,以下工作原理:

>>> v=10.4
>>> print('% 6.2f' % v)
  10.40
>>> print('% 12.1f' % v)
        10.4
>>> print('%012.1f' % v)
0000000010.4

In python3 the following works:

>>> v=10.4
>>> print('% 6.2f' % v)
  10.40
>>> print('% 12.1f' % v)
        10.4
>>> print('%012.1f' % v)
0000000010.4

回答 3

请参阅Python 3.x 格式字符串语法

IDLE 3.5.1   
numbers = ['23.23', '.1233', '1', '4.223', '9887.2']

for x in numbers:  
    print('{0: >#016.4f}'. format(float(x)))  

     23.2300
      0.1233
      1.0000
      4.2230
   9887.2000

See Python 3.x format string syntax:

IDLE 3.5.1   
numbers = ['23.23', '.1233', '1', '4.223', '9887.2']

for x in numbers:  
    print('{0: >#016.4f}'. format(float(x)))  

     23.2300
      0.1233
      1.0000
      4.2230
   9887.2000

回答 4

您也可以将零填充为零。例如,如果您number要有9个字符的长度,请用零左填充,请使用:

print('{:09.3f}'.format(number))

因此,如果为number = 4.656,则输出为:00004.656

对于您的示例,输出将如下所示:

numbers  = [23.2300, 0.1233, 1.0000, 4.2230, 9887.2000]
for x in numbers: 
    print('{:010.4f}'.format(x))

印刷品:

00023.2300
00000.1233
00001.0000
00004.2230
09887.2000

一个可能有用的示例是当您要按字母顺序正确列出文件名时。我注意到在某些linux系统中,数字是:1,10,11,.. 2,20,21,…

因此,如果要在文件名中强制执行必要的数字顺序,则需要在键盘上填充适当数量的零。

You can also left pad with zeros. For example if you want number to have 9 characters length, left padded with zeros use:

print('{:09.3f}'.format(number))

Thus, if number = 4.656, the output is: 00004.656

For your example the output will look like this:

numbers  = [23.2300, 0.1233, 1.0000, 4.2230, 9887.2000]
for x in numbers: 
    print('{:010.4f}'.format(x))

prints:

00023.2300
00000.1233
00001.0000
00004.2230
09887.2000

One example where this may be useful is when you want to properly list filenames in alphabetical order. I noticed in some linux systems, the number is: 1,10,11,..2,20,21,…

Thus if you want to enforce the necessary numeric order in filenames, you need to left pad with the appropriate number of zeros.


回答 5

在Python 3中。

GPA = 2.5
print(" %6.1f " % GPA)

6.1f点之后手段1个数字显示,如果你,你应该只点打印后2位%6.2f,使得%6.3f3位点后打印。

In Python 3.

GPA = 2.5
print(" %6.1f " % GPA)

6.1f means after the dots 1 digits show if you print 2 digits after the dots you should only %6.2f such that %6.3f 3 digits print after the point.


Python 3 ImportError:没有名为“ ConfigParser”的模块

问题:Python 3 ImportError:没有名为“ ConfigParser”的模块

我想pip installMySQL-python包,但我得到的ImportError

Jans-MacBook-Pro:~ jan$ /Library/Frameworks/Python.framework/Versions/3.3/bin/pip-3.3 install MySQL-python
Downloading/unpacking MySQL-python
  Running setup.py egg_info for package MySQL-python
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>
        from setup_posix import get_config
      File "./setup_posix.py", line 2, in <module>
        from ConfigParser import SafeConfigParser
    ImportError: No module named 'ConfigParser'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>

    from setup_posix import get_config

  File "./setup_posix.py", line 2, in <module>

    from ConfigParser import SafeConfigParser

ImportError: No module named 'ConfigParser'

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python
Storing complete log in /Users/jan/.pip/pip.log
Jans-MacBook-Pro:~ jan$ 

有任何想法吗?

I am trying to pip install the MySQL-python package, but I get an ImportError.

Jans-MacBook-Pro:~ jan$ /Library/Frameworks/Python.framework/Versions/3.3/bin/pip-3.3 install MySQL-python
Downloading/unpacking MySQL-python
  Running setup.py egg_info for package MySQL-python
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>
        from setup_posix import get_config
      File "./setup_posix.py", line 2, in <module>
        from ConfigParser import SafeConfigParser
    ImportError: No module named 'ConfigParser'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>

    from setup_posix import get_config

  File "./setup_posix.py", line 2, in <module>

    from ConfigParser import SafeConfigParser

ImportError: No module named 'ConfigParser'

----------------------------------------
Command python setup.py egg_info failed with error code 1 in /var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python
Storing complete log in /Users/jan/.pip/pip.log
Jans-MacBook-Pro:~ jan$ 

Any ideas?


回答 0

在Python 3中,ConfigParser已被重命名configparser为PEP 8合规性。您正在安装的软件包似乎不支持Python 3。

In Python 3, ConfigParser has been renamed to configparser for PEP 8 compliance. It looks like the package you are installing does not support Python 3.


回答 1

您可以改为使用该mysqlclient软件包作为MySQL-python的直接替代品。它是MySQL-python对Python 3的新增支持。

我很幸运

pip install mysqlclient

在我的python3.4 virtualenv之后

sudo apt-get install python3-dev libmysqlclient-dev

这显然是针对ubuntu / debian的,但我只是想分享我的成功:)

You can instead use the mysqlclient package as a drop-in replacement for MySQL-python. It is a fork of MySQL-python with added support for Python 3.

I had luck with simply

pip install mysqlclient

in my python3.4 virtualenv after

sudo apt-get install python3-dev libmysqlclient-dev

which is obviously specific to ubuntu/debian, but I just wanted to share my success :)


回答 2

这是一个在Python 2.x和3.x中均应适用的代码

显然,您将需要该six模块,但是几乎不可能编写在两个版本中都没有六个版本的模块。

try:
    import configparser
except:
    from six.moves import configparser

Here is a code that should work in both Python 2.x and 3.x

Obviously you will need the six module, but it’s almost impossible to write modules that work in both versions without six.

try:
    import configparser
except:
    from six.moves import configparser

回答 3

pip install configparser
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py

然后尝试再次安装MYSQL-python。对我有用

pip install configparser
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py

Then try to install the MYSQL-python again. That Worked for me


回答 4

python3不支持MySQL-python,您可以使用mysqlclient

如果您正在fedora/centos/Red Hat安装以下软件包

  1. yum install python3-devel
  2. pip install mysqlclient

MySQL-python is not supported on python3 instead of this you can use mysqlclient

If you are on fedora/centos/Red Hat install following package

  1. yum install python3-devel
  2. pip install mysqlclient

回答 5

如果您使用的是CentOS,则需要使用

  1. yum install python34-devel.x86_64
  2. yum groupinstall -y 'development tools'
  3. pip3 install mysql-connector
  4. pip install mysqlclient

If you are using CentOS, then you need to use

  1. yum install python34-devel.x86_64
  2. yum groupinstall -y 'development tools'
  3. pip3 install mysql-connector
  4. pip install mysqlclient

回答 6

configparser可以通过six库简单地解决Python 2/3的兼容性

from six.moves import configparser

Compatibility of Python 2/3 for configparser can be solved simply by six library

from six.moves import configparser

回答 7

pip3 install PyMySQL然后再做pip3 install mysqlclient。为我工作

Do pip3 install PyMySQL and then pip3 install mysqlclient. Worked for me


回答 8

我遇到了同样的问题。原来,我需要在centos上安装python3 devel。首先,您需要搜索与系统兼容的软件包。

yum search python3 | grep devel

然后,将软件包安装为:

yum install -y python3-devel.x86_64

然后,从pip安装mysqlclient

pip install mysqlclient

I was having the same problem. Turns out, I needed to install python3 devel on my centos. First, you need to search for the package that is compatible with your system.

yum search python3 | grep devel

Then, install the package as:

yum install -y python3-devel.x86_64

Then, install mysqlclient from pip

pip install mysqlclient

回答 9

我进一步了解了Valeres的答案:

pip install configparser sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py然后尝试再次安装MYSQL-python。对我有用

我建议链接文件而不是复制它。保存更新。我将文件链接到/usr/lib/python3/目录。

I got further with Valeres answer:

pip install configparser sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py Then try to install the MYSQL-python again. That Worked for me

I would suggest to link the file instead of copy it. It is save to update. I linked the file to /usr/lib/python3/ directory.


回答 10

试试这个对我来说很好的解决方案。

基本上它是重新安装/升级到最新版本的MySQL冲泡,然后安装mysqlclientMySQL-Pythonglobal pip3代替virtualenv pip3

然后访问virtualenv并成功安装mysqlclientMySQL-Python

Try this solution which worked fine for me.

Basically it’s to reinstall/upgrade to latest version of mysql from brew, and then installing mysqlclient or MySQL-Python from global pip3 instead of virtualenv pip3.

Then accessing the virtualenv and successfully install mysqlclient or MySQL-Python.


回答 11

如何检查您首先使用的Python版本。

import six
if six.PY2:
    import ConfigParser as configparser
else:
    import configparser

how about checking the version of Python you are using first.

import six
if six.PY2:
    import ConfigParser as configparser
else:
    import configparser

回答 12

我运行kali linux- Rolling,并在更新到python 3.6.0之后尝试在终端中运行cupp.py时遇到了这个问题。一些研究和试验后,我发现,改变 ConfigParserconfigparser我的工作,但那时,我发现另一个问题就来了。

config = configparser.configparser() AttributeError: module 'configparser' has no attribute 'configparser'

经过更多研究,我意识到python 3 ConfigParser已更改为, configparser但请注意它具有属性 ConfigParser()

I run kali linux- Rolling and I came across this problem ,when I tried running cupp.py in the terminal, after updating to python 3.6.0. After some research and trial I found that changing ConfigParser to configparser worked for me but then I came across another issue.

config = configparser.configparser() AttributeError: module 'configparser' has no attribute 'configparser'

After a bit more research I realised that for python 3 ConfigParser is changed to configparser but note that it has an attribute ConfigParser().


回答 13

我在Mac OS 10,Python 3.7.6和Django 2.2.7上遇到了相同的错误。在尝试了多种解决方案之后,我想借此机会分享对我有用的东西。

脚步

  1. 通过链接为Mac OS安装了Connector / Python 8.0.20

  2. 将当前依赖项复制到requirements.txt文件,停用当前虚拟环境,并使用删除它;

    创建文件(如果尚未创建的话); touch requirements.txt

    将依赖项复制到文件; python -m pip3 freeze > requirements.txt

    停用并删除当前虚拟环境; deactivate && rm -rf <virtual-env-name>

  3. 创建另一个虚拟环境并使用激活它; python -m venv <virtual-env-name> && source <virtual-env-name>/bin/activate

  4. 使用安装以前的依赖项; python -m pip3 install -r requirements.txt

I was getting the same error on Mac OS 10, Python 3.7.6 & Django 2.2.7. I want to use this opportunity to share what worked for me after trying out numerous solutions.

Steps

  1. Installed Connector/Python 8.0.20 for Mac OS from link

  2. Copy current dependencies into requirements.txt file, deactivated the current virtual env, and deleted it using;

    create the file if not already created with; touch requirements.txt

    copy dependency to file; python -m pip3 freeze > requirements.txt

    deactivate and delete current virtual env; deactivate && rm -rf <virtual-env-name>

  3. Created another virtual env and activated it using; python -m venv <virtual-env-name> && source <virtual-env-name>/bin/activate

  4. Install previous dependencies using; python -m pip3 install -r requirements.txt


回答 14

请看看/usr/bin/python指的是什么

如果它指向python3 or higher 更改为python2.7

这应该可以解决问题。

我收到所有python软件包的安装错误。安倍·卡普拉斯(Abe Karplus)的解决方案和讨论向我暗示了可能是什么问题。然后我回想起我已经手动将/usr/bin/pythonfrom从更改python2.7/usr/bin/python3.5,这实际上是导致问题的原因。有一次我reverted一样。解决了。

Kindly to see what is /usr/bin/python pointing to

if it is pointing to python3 or higher change to python2.7

This should solve the issue.

I was getting install error for all the python packages. Abe Karplus’s solution & discussion gave me the hint as to what could be the problem. Then I recalled that I had manually changed the /usr/bin/python from python2.7 to /usr/bin/python3.5, which actually was causing the issue. Once I reverted the same. It got solved.


回答 15

这对我有用

cp /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py

This worked for me

cp /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py

Python:defaultdict的defaultdict?

问题:Python:defaultdict的defaultdict?

有没有一种方法可以defaultdict(defaultdict(int))使以下代码正常工作?

for x in stuff:
    d[x.a][x.b] += x.c_int

d需要临时构建,具体取决于x.ax.b元素。

我可以使用:

for x in stuff:
    d[x.a,x.b] += x.c_int

但后来我将无法使用:

d.keys()
d[x.a].keys()

Is there a way to have a defaultdict(defaultdict(int)) in order to make the following code work?

for x in stuff:
    d[x.a][x.b] += x.c_int

d needs to be built ad-hoc, depending on x.a and x.b elements.

I could use:

for x in stuff:
    d[x.a,x.b] += x.c_int

but then I wouldn’t be able to use:

d.keys()
d[x.a].keys()

回答 0

是这样的:

defaultdict(lambda: defaultdict(int))

当您尝试访问不存在的键时,将调用的参数defaultdict(在这种情况下为lambda: defaultdict(int))。它的返回值将设置为该密钥的新值,这意味着在我们的情况下,d[Key_doesnt_exist]将为defaultdict(int)

如果尝试从最后一个defaultdict访问密钥,即d[Key_doesnt_exist][Key_doesnt_exist]它将返回0,这是最后一个defaultdict的参数的返回值int()

Yes like this:

defaultdict(lambda: defaultdict(int))

The argument of a defaultdict (in this case is lambda: defaultdict(int)) will be called when you try to access a key that doesn’t exist. The return value of it will be set as the new value of this key, which means in our case the value of d[Key_doesnt_exist] will be defaultdict(int).

If you try to access a key from this last defaultdict i.e. d[Key_doesnt_exist][Key_doesnt_exist] it will return 0, which is the return value of the argument of the last defaultdict i.e. int().


回答 1

defaultdict构造函数的参数是用于构建新元素的函数。因此,让我们使用lambda!

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0

从Python 2.7开始,使用Counter有了一个更好的解决方案

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

一些额外功能

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

有关更多信息,请参见PyMOTW-集合-容器数据类型Python文档-集合

The parameter to the defaultdict constructor is the function which will be called for building new elements. So let’s use a lambda !

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]
0

Since Python 2.7, there’s an even better solution using Counter:

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

Some bonus features

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

For more information see PyMOTW – Collections – Container data types and Python Documentation – collections


回答 2

我发现使用起来稍微更优雅partial

import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)

当然,这与lambda相同。

I find it slightly more elegant to use partial:

import functools
dd_int = functools.partial(defaultdict, int)
defaultdict(dd_int)

Of course, this is the same as a lambda.


回答 3

作为参考,可以通过以下方式实现通用的嵌套defaultdict工厂方法:

from collections import defaultdict
from functools import partial
from itertools import repeat


def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

深度定义了default_factory使用中定义的类型之前嵌套字典的数量。例如:

my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')

For reference, it’s possible to implement a generic nested defaultdict factory method through:

from collections import defaultdict
from functools import partial
from itertools import repeat


def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

The depth defines the number of nested dictionary before the type defined in default_factory is used. For example:

my_dict = nested_defaultdict(list, 3)
my_dict['a']['b']['c'].append('e')

回答 4

先前的答案已经解决了如何制作两级或n级defaultdict。在某些情况下,您需要无限个:

def ddict():
    return defaultdict(ddict)

用法:

>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

Previous answers have addressed how to make a two-levels or n-levels defaultdict. In some cases you want an infinite one:

def ddict():
    return defaultdict(ddict)

Usage:

>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

回答 5

其他人已经正确回答了您如何使以下各项正常工作的问题:

for x in stuff:
    d[x.a][x.b] += x.c_int

一种替代方法是使用元组作为键:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

这种方法的好处是它很简单并且可以轻松扩展。如果您需要三个层次的映射,只需使用一个三项元组作为键。

Others have answered correctly your question of how to get the following to work:

for x in stuff:
    d[x.a][x.b] += x.c_int

An alternative would be to use tuples for keys:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

The nice thing about this approach is that it is simple and can be easily expanded. If you need a mapping three levels deep, just use a three item tuple for the key.


向现有对象实例添加方法

问题:向现有对象实例添加方法

我读过,可以在Python中向现有对象(即不在类定义中)添加方法。

我了解这样做并不总是一件好事。但是怎么可能呢?

I’ve read that it is possible to add a method to an existing object (i.e., not in the class definition) in Python.

I understand that it’s not always good to do so. But how might one do this?


回答 0

在Python中,函数和绑定方法之间存在差异。

>>> def foo():
...     print "foo"
...
>>> class A:
...     def bar( self ):
...         print "bar"
...
>>> a = A()
>>> foo
<function foo at 0x00A98D70>
>>> a.bar
<bound method A.bar of <__main__.A instance at 0x00A9BC88>>
>>>

绑定方法已“绑定”(具有描述性)到实例,并且只要调用该方法,该实例将作为第一个参数传递。

但是,作为类(而不是实例)的属性的可调用对象仍未绑定,因此您可以在需要时修改类定义:

>>> def fooFighters( self ):
...     print "fooFighters"
...
>>> A.fooFighters = fooFighters
>>> a2 = A()
>>> a2.fooFighters
<bound method A.fooFighters of <__main__.A instance at 0x00A9BEB8>>
>>> a2.fooFighters()
fooFighters

先前定义的实例也会被更新(只要它们本身没有覆盖属性):

>>> a.fooFighters()
fooFighters

当您要将方法附加到单个实例时,就会出现问题:

>>> def barFighters( self ):
...     print "barFighters"
...
>>> a.barFighters = barFighters
>>> a.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: barFighters() takes exactly 1 argument (0 given)

该函数直接附加到实例时不会自动绑定:

>>> a.barFighters
<function barFighters at 0x00A98EF0>

要绑定它,我们可以在类型模块中使用MethodType函数

>>> import types
>>> a.barFighters = types.MethodType( barFighters, a )
>>> a.barFighters
<bound method ?.barFighters of <__main__.A instance at 0x00A9BC88>>
>>> a.barFighters()
barFighters

这次,该类的其他实例没有受到影响:

>>> a2.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'barFighters'

通过阅读有关描述符元类 编程的信息,可以找到更多信息。

In Python, there is a difference between functions and bound methods.

>>> def foo():
...     print "foo"
...
>>> class A:
...     def bar( self ):
...         print "bar"
...
>>> a = A()
>>> foo
<function foo at 0x00A98D70>
>>> a.bar
<bound method A.bar of <__main__.A instance at 0x00A9BC88>>
>>>

Bound methods have been “bound” (how descriptive) to an instance, and that instance will be passed as the first argument whenever the method is called.

Callables that are attributes of a class (as opposed to an instance) are still unbound, though, so you can modify the class definition whenever you want:

>>> def fooFighters( self ):
...     print "fooFighters"
...
>>> A.fooFighters = fooFighters
>>> a2 = A()
>>> a2.fooFighters
<bound method A.fooFighters of <__main__.A instance at 0x00A9BEB8>>
>>> a2.fooFighters()
fooFighters

Previously defined instances are updated as well (as long as they haven’t overridden the attribute themselves):

>>> a.fooFighters()
fooFighters

The problem comes when you want to attach a method to a single instance:

>>> def barFighters( self ):
...     print "barFighters"
...
>>> a.barFighters = barFighters
>>> a.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: barFighters() takes exactly 1 argument (0 given)

The function is not automatically bound when it’s attached directly to an instance:

>>> a.barFighters
<function barFighters at 0x00A98EF0>

To bind it, we can use the MethodType function in the types module:

>>> import types
>>> a.barFighters = types.MethodType( barFighters, a )
>>> a.barFighters
<bound method ?.barFighters of <__main__.A instance at 0x00A9BC88>>
>>> a.barFighters()
barFighters

This time other instances of the class have not been affected:

>>> a2.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'barFighters'

More information can be found by reading about descriptors and metaclass programming.


回答 1

自python 2.6起不推荐使用new模块,并在3.0版中将其删除,请使用类型

参见http://docs.python.org/library/new.html

在下面的示例中,我故意从patch_me()函数中删除了返回值。我认为提供返回值可能会使人相信patch返回了一个新对象,这是不正确的-它修改了传入的对象。可能这可以促进对Monkey补丁的更严格的使用。

import types

class A(object):#but seems to work for old style objects too
    pass

def patch_me(target):
    def method(target,x):
        print "x=",x
        print "called from", target
    target.method = types.MethodType(method,target)
    #add more if needed

a = A()
print a
#out: <__main__.A object at 0x2b73ac88bfd0>  
patch_me(a)    #patch instance
a.method(5)
#out: x= 5
#out: called from <__main__.A object at 0x2b73ac88bfd0>
patch_me(A)
A.method(6)        #can patch class too
#out: x= 6
#out: called from <class '__main__.A'>

Module new is deprecated since python 2.6 and removed in 3.0, use types

see http://docs.python.org/library/new.html

In the example below I’ve deliberately removed return value from patch_me() function. I think that giving return value may make one believe that patch returns a new object, which is not true – it modifies the incoming one. Probably this can facilitate a more disciplined use of monkeypatching.

import types

class A(object):#but seems to work for old style objects too
    pass

def patch_me(target):
    def method(target,x):
        print "x=",x
        print "called from", target
    target.method = types.MethodType(method,target)
    #add more if needed

a = A()
print a
#out: <__main__.A object at 0x2b73ac88bfd0>  
patch_me(a)    #patch instance
a.method(5)
#out: x= 5
#out: called from <__main__.A object at 0x2b73ac88bfd0>
patch_me(A)
A.method(6)        #can patch class too
#out: x= 6
#out: called from <class '__main__.A'>

回答 2

前言-有关兼容性的说明:其他答案可能仅在Python 2中有效-此答案在Python 2和3中应该可以很好地工作。如果仅编写Python 3,则可能会显式地继承自object,但是代码应保持不变。

向现有对象实例添加方法

我读过,可以在Python中向现有对象(例如不在类定义中)添加方法。

我了解这样做并非总是一个好的决定。但是,怎么可能呢?

是的,有可能-但不建议

我不建议这样做。这是一个坏主意。不要这样

这有两个原因:

  • 您将向执行此操作的每个实例添加一个绑定对象。如果您经常这样做,则可能会浪费大量内存。通常仅在调用的短时间内创建绑定方法,然后在自动垃圾回收时它们不再存在。如果手动执行此操作,则将具有一个引用绑定方法的名称绑定-这将防止使用时对其进行垃圾回收。
  • 给定类型的对象实例通常在该类型的所有对象上都有其方法。如果在其他位置添加方法,则某些实例将具有那些方法,而其他实例则不会。程序员不会期望如此,您可能会违反最不惊奇规则
  • 由于还有其他非常好的理由不这样做,因此,如果这样做,您的声誉也会很差。

因此,我建议您除非有充分的理由,否则不要这样做。这是更好的在类定义来定义的正确方法更少的类直接优选猴的贴剂,是这样的:

Foo.sample_method = sample_method

由于具有指导意义,因此,我将向您展示一些这样做的方法。

怎么做

这是一些设置代码。我们需要一个类定义。可以将其导入,但这并不重要。

class Foo(object):
    '''An empty class to demonstrate adding a method to an instance'''

创建一个实例:

foo = Foo()

创建一个添加方法:

def sample_method(self, bar, baz):
    print(bar + baz)

方法零(0)-使用描述符方法, __get__

在函数上进行的点分查找__get__使用实例调用函数的方法,将对象绑定到该方法,从而创建“绑定方法”。

foo.sample_method = sample_method.__get__(foo)

现在:

>>> foo.sample_method(1,2)
3

方法一-types.MethodType

首先,导入类型,从中我们将获得方法构造函数:

import types

现在我们将方法添加到实例中。为此,我们需要types模块(上面已导入)中的MethodType构造函数。

types.MethodType的参数签名为(function, instance, class)

foo.sample_method = types.MethodType(sample_method, foo, Foo)

和用法:

>>> foo.sample_method(1,2)
3

方法二:词法绑定

首先,我们创建一个包装器函数,将方法绑定到实例:

def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn

用法:

>>> foo.sample_method = bind(foo, sample_method)    
>>> foo.sample_method(1,2)
3

方法三:functools.partial

局部函数将第一个参数应用于函数(以及可选的关键字参数),以后可以与其余参数(以及覆盖的关键字参数)一起调用。从而:

>>> from functools import partial
>>> foo.sample_method = partial(sample_method, foo)
>>> foo.sample_method(1,2)
3    

当您认为绑定方法是实例的部分功能时,这很有意义。

未绑定函数作为对象属性-为什么不起作用:

如果尝试以与将其添加到类中相同的方式添加sample_method,则它不受实例约束,并且不会将隐式self作为第一个参数。

>>> foo.sample_method = sample_method
>>> foo.sample_method(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sample_method() takes exactly 3 arguments (2 given)

我们可以通过显式传递实例(或其他任何方法,因为此方法实际上不使用self参数变量)来使未绑定函数起作用,但是它将与其他实例的预期签名不一致(如果我们进行了Monkey修补)此实例):

>>> foo.sample_method(foo, 1, 2)
3

结论

现在,您知道可以执行此操作的几种方法,但是,认真地说-不要这样做。

Preface – a note on compatibility: other answers may only work in Python 2 – this answer should work perfectly well in Python 2 and 3. If writing Python 3 only, you might leave out explicitly inheriting from object, but otherwise the code should remain the same.

Adding a Method to an Existing Object Instance

I’ve read that it is possible to add a method to an existing object (e.g. not in the class definition) in Python.

I understand that it’s not always a good decision to do so. But, how might one do this?

Yes, it is possible – But not recommended

I don’t recommend this. This is a bad idea. Don’t do it.

Here’s a couple of reasons:

  • You’ll add a bound object to every instance you do this to. If you do this a lot, you’ll probably waste a lot of memory. Bound methods are typically only created for the short duration of their call, and they then cease to exist when automatically garbage collected. If you do this manually, you’ll have a name binding referencing the bound method – which will prevent its garbage collection on usage.
  • Object instances of a given type generally have its methods on all objects of that type. If you add methods elsewhere, some instances will have those methods and others will not. Programmers will not expect this, and you risk violating the rule of least surprise.
  • Since there are other really good reasons not to do this, you’ll additionally give yourself a poor reputation if you do it.

Thus, I suggest that you not do this unless you have a really good reason. It is far better to define the correct method in the class definition or less preferably to monkey-patch the class directly, like this:

Foo.sample_method = sample_method

Since it’s instructive, however, I’m going to show you some ways of doing this.

How it can be done

Here’s some setup code. We need a class definition. It could be imported, but it really doesn’t matter.

class Foo(object):
    '''An empty class to demonstrate adding a method to an instance'''

Create an instance:

foo = Foo()

Create a method to add to it:

def sample_method(self, bar, baz):
    print(bar + baz)

Method nought (0) – use the descriptor method, __get__

Dotted lookups on functions call the __get__ method of the function with the instance, binding the object to the method and thus creating a “bound method.”

foo.sample_method = sample_method.__get__(foo)

and now:

>>> foo.sample_method(1,2)
3

Method one – types.MethodType

First, import types, from which we’ll get the method constructor:

import types

Now we add the method to the instance. To do this, we require the MethodType constructor from the types module (which we imported above).

The argument signature for types.MethodType is (function, instance, class):

foo.sample_method = types.MethodType(sample_method, foo, Foo)

and usage:

>>> foo.sample_method(1,2)
3

Method two: lexical binding

First, we create a wrapper function that binds the method to the instance:

def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn

usage:

>>> foo.sample_method = bind(foo, sample_method)    
>>> foo.sample_method(1,2)
3

Method three: functools.partial

A partial function applies the first argument(s) to a function (and optionally keyword arguments), and can later be called with the remaining arguments (and overriding keyword arguments). Thus:

>>> from functools import partial
>>> foo.sample_method = partial(sample_method, foo)
>>> foo.sample_method(1,2)
3    

This makes sense when you consider that bound methods are partial functions of the instance.

Unbound function as an object attribute – why this doesn’t work:

If we try to add the sample_method in the same way as we might add it to the class, it is unbound from the instance, and doesn’t take the implicit self as the first argument.

>>> foo.sample_method = sample_method
>>> foo.sample_method(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sample_method() takes exactly 3 arguments (2 given)

We can make the unbound function work by explicitly passing the instance (or anything, since this method doesn’t actually use the self argument variable), but it would not be consistent with the expected signature of other instances (if we’re monkey-patching this instance):

>>> foo.sample_method(foo, 1, 2)
3

Conclusion

You now know several ways you could do this, but in all seriousness – don’t do this.


回答 3

我认为以上答案错过了关键点。

让我们来一个带有方法的类:

class A(object):
    def m(self):
        pass

现在,让我们在ipython中玩它:

In [2]: A.m
Out[2]: <unbound method A.m>

好的,因此m()以某种方式成为A的未绑定方法。但是真的是那样吗?

In [5]: A.__dict__['m']
Out[5]: <function m at 0xa66b8b4>

事实证明,m()只是一个函数,对它的引用已添加到A类字典中-没有魔术。那为什么要给我们一个不受约束的方法?这是因为该点未转换为简单的字典查找。实际上是A .__ class __.__ getattribute __(A,’m’)的调用:

In [11]: class MetaA(type):
   ....:     def __getattribute__(self, attr_name):
   ....:         print str(self), '-', attr_name

In [12]: class A(object):
   ....:     __metaclass__ = MetaA

In [23]: A.m
<class '__main__.A'> - m
<class '__main__.A'> - m

现在,我不确定为什么最后一行要打印两次,但是仍然很清楚那里发生了什么。

现在,默认的__getattribute__所做的是检查属性是否为所谓的描述符,即,是否实现了特殊的__get__方法。如果实现该方法,则返回该__get__方法的结果。回到我们的A类的第一个版本,这是我们拥有的:

In [28]: A.__dict__['m'].__get__(None, A)
Out[28]: <unbound method A.m>

而且由于Python函数实现了描述符协议,所以如果它们代表一个对象被调用,它们将通过__get__方法将自身绑定到该对象。

好的,如何为现有对象添加方法?假设您不介意修补类,那么它很简单:

B.m = m

然后,借助描述符魔术,Bm “成为”一个不受约束的方法。

而且,如果您只想向单个对象添加方法,则必须使用types.MethodType来自己模拟机制。

b.m = types.MethodType(m, b)

顺便说说:

In [2]: A.m
Out[2]: <unbound method A.m>

In [59]: type(A.m)
Out[59]: <type 'instancemethod'>

In [60]: type(b.m)
Out[60]: <type 'instancemethod'>

In [61]: types.MethodType
Out[61]: <type 'instancemethod'>

I think that the above answers missed the key point.

Let’s have a class with a method:

class A(object):
    def m(self):
        pass

Now, let’s play with it in ipython:

In [2]: A.m
Out[2]: <unbound method A.m>

Ok, so m() somehow becomes an unbound method of A. But is it really like that?

In [5]: A.__dict__['m']
Out[5]: <function m at 0xa66b8b4>

It turns out that m() is just a function, reference to which is added to A class dictionary – there’s no magic. Then why A.m gives us an unbound method? It’s because the dot is not translated to a simple dictionary lookup. It’s de facto a call of A.__class__.__getattribute__(A, ‘m’):

In [11]: class MetaA(type):
   ....:     def __getattribute__(self, attr_name):
   ....:         print str(self), '-', attr_name

In [12]: class A(object):
   ....:     __metaclass__ = MetaA

In [23]: A.m
<class '__main__.A'> - m
<class '__main__.A'> - m

Now, I’m not sure out of the top of my head why the last line is printed twice, but still it’s clear what’s going on there.

Now, what the default __getattribute__ does is that it checks if the attribute is a so-called descriptor or not, i.e. if it implements a special __get__ method. If it implements that method, then what is returned is the result of calling that __get__ method. Going back to the first version of our A class, this is what we have:

In [28]: A.__dict__['m'].__get__(None, A)
Out[28]: <unbound method A.m>

And because Python functions implement the descriptor protocol, if they are called on behalf of an object, they bind themselves to that object in their __get__ method.

Ok, so how to add a method to an existing object? Assuming you don’t mind patching class, it’s as simple as:

B.m = m

Then B.m “becomes” an unbound method, thanks to the descriptor magic.

And if you want to add a method just to a single object, then you have to emulate the machinery yourself, by using types.MethodType:

b.m = types.MethodType(m, b)

By the way:

In [2]: A.m
Out[2]: <unbound method A.m>

In [59]: type(A.m)
Out[59]: <type 'instancemethod'>

In [60]: type(b.m)
Out[60]: <type 'instancemethod'>

In [61]: types.MethodType
Out[61]: <type 'instancemethod'>

回答 4

在Python中,Monkey修补通常通过覆盖您自己的类或函数签名来起作用。以下是Zope Wiki的示例:

from SomeOtherProduct.SomeModule import SomeClass
def speak(self):
   return "ook ook eee eee eee!"
SomeClass.speak = speak

该代码将覆盖/创建一个在类上称为“讲话”的方法。在杰夫·阿特伍德(Jeff Atwood)最近关于Monkey修补的文章中。他显示了C#3.0中的示例,这是我在工作中使用的当前语言。

In Python monkey patching generally works by overwriting a class or functions signature with your own. Below is an example from the Zope Wiki:

from SomeOtherProduct.SomeModule import SomeClass
def speak(self):
   return "ook ook eee eee eee!"
SomeClass.speak = speak

That code will overwrite/create a method called speak on the class. In Jeff Atwood’s recent post on monkey patching. He shows an example in C# 3.0 which is the current language I use for work.


回答 5

您可以使用lambda将方法绑定到实例:

def run(self):
    print self._instanceString

class A(object):
    def __init__(self):
        self._instanceString = "This is instance string"

a = A()
a.run = lambda: run(a)
a.run()

输出:

This is instance string

You can use lambda to bind a method to an instance:

def run(self):
    print self._instanceString

class A(object):
    def __init__(self):
        self._instanceString = "This is instance string"

a = A()
a.run = lambda: run(a)
a.run()

Output:

This is instance string

回答 6

没有至少一种方法可以将方法附加到实例types.MethodType

>>> class A:
...  def m(self):
...   print 'im m, invoked with: ', self

>>> a = A()
>>> a.m()
im m, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.m
<bound method A.m of <__main__.A instance at 0x973ec6c>>
>>> 
>>> def foo(firstargument):
...  print 'im foo, invoked with: ', firstargument

>>> foo
<function foo at 0x978548c>

1:

>>> a.foo = foo.__get__(a, A) # or foo.__get__(a, type(a))
>>> a.foo()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo
<bound method A.foo of <__main__.A instance at 0x973ec6c>>

2:

>>> instancemethod = type(A.m)
>>> instancemethod
<type 'instancemethod'>
>>> a.foo2 = instancemethod(foo, a, type(a))
>>> a.foo2()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo2
<bound method instance.foo of <__main__.A instance at 0x973ec6c>>

有用的链接:
数据模型-调用描述符描述
符方法指南-调用描述符

There are at least two ways for attach a method to an instance without types.MethodType:

>>> class A:
...  def m(self):
...   print 'im m, invoked with: ', self

>>> a = A()
>>> a.m()
im m, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.m
<bound method A.m of <__main__.A instance at 0x973ec6c>>
>>> 
>>> def foo(firstargument):
...  print 'im foo, invoked with: ', firstargument

>>> foo
<function foo at 0x978548c>

1:

>>> a.foo = foo.__get__(a, A) # or foo.__get__(a, type(a))
>>> a.foo()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo
<bound method A.foo of <__main__.A instance at 0x973ec6c>>

2:

>>> instancemethod = type(A.m)
>>> instancemethod
<type 'instancemethod'>
>>> a.foo2 = instancemethod(foo, a, type(a))
>>> a.foo2()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo2
<bound method instance.foo of <__main__.A instance at 0x973ec6c>>

Useful links:
Data model – invoking descriptors
Descriptor HowTo Guide – invoking descriptors


回答 7

setattr我相信您在寻找什么。使用此设置对象上的属性。

>>> def printme(s): print repr(s)
>>> class A: pass
>>> setattr(A,'printme',printme)
>>> a = A()
>>> a.printme() # s becomes the implicit 'self' variable
< __ main __ . A instance at 0xABCDEFG>

What you’re looking for is setattr I believe. Use this to set an attribute on an object.

>>> def printme(s): print repr(s)
>>> class A: pass
>>> setattr(A,'printme',printme)
>>> a = A()
>>> a.printme() # s becomes the implicit 'self' variable
< __ main __ . A instance at 0xABCDEFG>

回答 8

由于此问题要求使用非Python版本,因此以下是JavaScript:

a.methodname = function () { console.log("Yay, a new method!") }

Since this question asked for non-Python versions, here’s JavaScript:

a.methodname = function () { console.log("Yay, a new method!") }

回答 9

通过查看不同绑定方法的结果,合并Jason Pratt和社区Wiki的答案:

尤其要注意将绑定函数添加为类方法的工作原理,但是引用范围不正确。

#!/usr/bin/python -u
import types
import inspect

## dynamically adding methods to a unique instance of a class


# get a list of a class's method type attributes
def listattr(c):
    for m in [(n, v) for n, v in inspect.getmembers(c, inspect.ismethod) if isinstance(v,types.MethodType)]:
        print m[0], m[1]

# externally bind a function as a method of an instance of a class
def ADDMETHOD(c, method, name):
    c.__dict__[name] = types.MethodType(method, c)

class C():
    r = 10 # class attribute variable to test bound scope

    def __init__(self):
        pass

    #internally bind a function as a method of self's class -- note that this one has issues!
    def addmethod(self, method, name):
        self.__dict__[name] = types.MethodType( method, self.__class__ )

    # predfined function to compare with
    def f0(self, x):
        print 'f0\tx = %d\tr = %d' % ( x, self.r)

a = C() # created before modified instnace
b = C() # modified instnace


def f1(self, x): # bind internally
    print 'f1\tx = %d\tr = %d' % ( x, self.r )
def f2( self, x): # add to class instance's .__dict__ as method type
    print 'f2\tx = %d\tr = %d' % ( x, self.r )
def f3( self, x): # assign to class as method type
    print 'f3\tx = %d\tr = %d' % ( x, self.r )
def f4( self, x): # add to class instance's .__dict__ using a general function
    print 'f4\tx = %d\tr = %d' % ( x, self.r )


b.addmethod(f1, 'f1')
b.__dict__['f2'] = types.MethodType( f2, b)
b.f3 = types.MethodType( f3, b)
ADDMETHOD(b, f4, 'f4')


b.f0(0) # OUT: f0   x = 0   r = 10
b.f1(1) # OUT: f1   x = 1   r = 10
b.f2(2) # OUT: f2   x = 2   r = 10
b.f3(3) # OUT: f3   x = 3   r = 10
b.f4(4) # OUT: f4   x = 4   r = 10


k = 2
print 'changing b.r from {0} to {1}'.format(b.r, k)
b.r = k
print 'new b.r = {0}'.format(b.r)

b.f0(0) # OUT: f0   x = 0   r = 2
b.f1(1) # OUT: f1   x = 1   r = 10  !!!!!!!!!
b.f2(2) # OUT: f2   x = 2   r = 2
b.f3(3) # OUT: f3   x = 3   r = 2
b.f4(4) # OUT: f4   x = 4   r = 2

c = C() # created after modifying instance

# let's have a look at each instance's method type attributes
print '\nattributes of a:'
listattr(a)
# OUT:
# attributes of a:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FD88>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FD88>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FD88>>

print '\nattributes of b:'
listattr(b)
# OUT:
# attributes of b:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FE08>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FE08>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FE08>>
# f1 <bound method ?.f1 of <class __main__.C at 0x000000000237AB28>>
# f2 <bound method ?.f2 of <__main__.C instance at 0x000000000230FE08>>
# f3 <bound method ?.f3 of <__main__.C instance at 0x000000000230FE08>>
# f4 <bound method ?.f4 of <__main__.C instance at 0x000000000230FE08>>

print '\nattributes of c:'
listattr(c)
# OUT:
# attributes of c:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002313108>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002313108>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002313108>>

就个人而言,我更喜欢使用外部ADDMETHOD函数路由,因为它也允许我在迭代器中动态分配新的方法名称。

def y(self, x):
    pass
d = C()
for i in range(1,5):
    ADDMETHOD(d, y, 'f%d' % i)
print '\nattributes of d:'
listattr(d)
# OUT:
# attributes of d:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002303508>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002303508>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002303508>>
# f1 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f2 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f3 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f4 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>

Consolidating Jason Pratt’s and the community wiki answers, with a look at the results of different methods of binding:

Especially note how adding the binding function as a class method works, but the referencing scope is incorrect.

#!/usr/bin/python -u
import types
import inspect

## dynamically adding methods to a unique instance of a class


# get a list of a class's method type attributes
def listattr(c):
    for m in [(n, v) for n, v in inspect.getmembers(c, inspect.ismethod) if isinstance(v,types.MethodType)]:
        print m[0], m[1]

# externally bind a function as a method of an instance of a class
def ADDMETHOD(c, method, name):
    c.__dict__[name] = types.MethodType(method, c)

class C():
    r = 10 # class attribute variable to test bound scope

    def __init__(self):
        pass

    #internally bind a function as a method of self's class -- note that this one has issues!
    def addmethod(self, method, name):
        self.__dict__[name] = types.MethodType( method, self.__class__ )

    # predfined function to compare with
    def f0(self, x):
        print 'f0\tx = %d\tr = %d' % ( x, self.r)

a = C() # created before modified instnace
b = C() # modified instnace


def f1(self, x): # bind internally
    print 'f1\tx = %d\tr = %d' % ( x, self.r )
def f2( self, x): # add to class instance's .__dict__ as method type
    print 'f2\tx = %d\tr = %d' % ( x, self.r )
def f3( self, x): # assign to class as method type
    print 'f3\tx = %d\tr = %d' % ( x, self.r )
def f4( self, x): # add to class instance's .__dict__ using a general function
    print 'f4\tx = %d\tr = %d' % ( x, self.r )


b.addmethod(f1, 'f1')
b.__dict__['f2'] = types.MethodType( f2, b)
b.f3 = types.MethodType( f3, b)
ADDMETHOD(b, f4, 'f4')


b.f0(0) # OUT: f0   x = 0   r = 10
b.f1(1) # OUT: f1   x = 1   r = 10
b.f2(2) # OUT: f2   x = 2   r = 10
b.f3(3) # OUT: f3   x = 3   r = 10
b.f4(4) # OUT: f4   x = 4   r = 10


k = 2
print 'changing b.r from {0} to {1}'.format(b.r, k)
b.r = k
print 'new b.r = {0}'.format(b.r)

b.f0(0) # OUT: f0   x = 0   r = 2
b.f1(1) # OUT: f1   x = 1   r = 10  !!!!!!!!!
b.f2(2) # OUT: f2   x = 2   r = 2
b.f3(3) # OUT: f3   x = 3   r = 2
b.f4(4) # OUT: f4   x = 4   r = 2

c = C() # created after modifying instance

# let's have a look at each instance's method type attributes
print '\nattributes of a:'
listattr(a)
# OUT:
# attributes of a:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FD88>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FD88>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FD88>>

print '\nattributes of b:'
listattr(b)
# OUT:
# attributes of b:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FE08>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FE08>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FE08>>
# f1 <bound method ?.f1 of <class __main__.C at 0x000000000237AB28>>
# f2 <bound method ?.f2 of <__main__.C instance at 0x000000000230FE08>>
# f3 <bound method ?.f3 of <__main__.C instance at 0x000000000230FE08>>
# f4 <bound method ?.f4 of <__main__.C instance at 0x000000000230FE08>>

print '\nattributes of c:'
listattr(c)
# OUT:
# attributes of c:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002313108>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002313108>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002313108>>

Personally, I prefer the external ADDMETHOD function route, as it allows me to dynamically assign new method names within an iterator as well.

def y(self, x):
    pass
d = C()
for i in range(1,5):
    ADDMETHOD(d, y, 'f%d' % i)
print '\nattributes of d:'
listattr(d)
# OUT:
# attributes of d:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002303508>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002303508>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002303508>>
# f1 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f2 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f3 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f4 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>

回答 10

这实际上是“杰森·普拉特”答案的附加内容

尽管杰森斯(Jasons)回答有效,但只有在要向类中添加函数时才起作用。当我尝试从.py源代码文件中重新加载现有方法时,它对我不起作用。

我花了很长时间才找到解决方法,但是这个技巧似乎很简单… 1.st从源代码文件导入代码2.nd强制重新加载3.rd使用types.FunctionType(…)来转换导入并绑定到函数的方法,您还可以传递当前的全局变量,因为重新加载的方法将位于不同的命名空间4.现在,您可以按照类型的“ Jason Pratt”的建议继续使用type.MethodType(… )

例:

# this class resides inside ReloadCodeDemo.py
class A:
    def bar( self ):
        print "bar1"

    def reloadCode(self, methodName):
        ''' use this function to reload any function of class A'''
        import types
        import ReloadCodeDemo as ReloadMod # import the code as module
        reload (ReloadMod) # force a reload of the module
        myM = getattr(ReloadMod.A,methodName) #get reloaded Method
        myTempFunc = types.FunctionType(# convert the method to a simple function
                                myM.im_func.func_code, #the methods code
                                globals(), # globals to use
                                argdefs=myM.im_func.func_defaults # default values for variables if any
                                ) 
        myNewM = types.MethodType(myTempFunc,self,self.__class__) #convert the function to a method
        setattr(self,methodName,myNewM) # add the method to the function

if __name__ == '__main__':
    a = A()
    a.bar()
    # now change your code and save the file
    a.reloadCode('bar') # reloads the file
    a.bar() # now executes the reloaded code

This is actually an addon to the answer of “Jason Pratt”

Although Jasons answer works, it does only work if one wants to add a function to a class. It did not work for me when I tried to reload an already existing method from the .py source code file.

It took me for ages to find a workaround, but the trick seems simple… 1.st import the code from the source code file 2.nd force a reload 3.rd use types.FunctionType(…) to convert the imported and bound method to a function you can also pass on the current global variables, as the reloaded method would be in a different namespace 4.th now you can continue as suggested by “Jason Pratt” using the types.MethodType(…)

Example:

# this class resides inside ReloadCodeDemo.py
class A:
    def bar( self ):
        print "bar1"

    def reloadCode(self, methodName):
        ''' use this function to reload any function of class A'''
        import types
        import ReloadCodeDemo as ReloadMod # import the code as module
        reload (ReloadMod) # force a reload of the module
        myM = getattr(ReloadMod.A,methodName) #get reloaded Method
        myTempFunc = types.FunctionType(# convert the method to a simple function
                                myM.im_func.func_code, #the methods code
                                globals(), # globals to use
                                argdefs=myM.im_func.func_defaults # default values for variables if any
                                ) 
        myNewM = types.MethodType(myTempFunc,self,self.__class__) #convert the function to a method
        setattr(self,methodName,myNewM) # add the method to the function

if __name__ == '__main__':
    a = A()
    a.bar()
    # now change your code and save the file
    a.reloadCode('bar') # reloads the file
    a.bar() # now executes the reloaded code

回答 11

如果有什么帮助,我最近发布了一个名为Gorilla的Python库,以使Monkey修补过程更加方便。

使用函数needle()来修补名为的模块的过程guineapig如下:

import gorilla
import guineapig
@gorilla.patch(guineapig)
def needle():
    print("awesome")

但它也需要照顾的更有趣的使用情况如图所示FAQ文档

该代码可在GitHub上获得

If it can be of any help, I recently released a Python library named Gorilla to make the process of monkey patching more convenient.

Using a function needle() to patch a module named guineapig goes as follows:

import gorilla
import guineapig
@gorilla.patch(guineapig)
def needle():
    print("awesome")

But it also takes care of more interesting use cases as shown in the FAQ from the documentation.

The code is available on GitHub.


回答 12

这个问题是几年前提出的,但是,有一种简单的方法可以使用装饰器来模拟函数与类实例的绑定:

def binder (function, instance):
  copy_of_function = type (function) (function.func_code, {})
  copy_of_function.__bind_to__ = instance
  def bound_function (*args, **kwargs):
    return copy_of_function (copy_of_function.__bind_to__, *args, **kwargs)
  return bound_function


class SupaClass (object):
  def __init__ (self):
    self.supaAttribute = 42


def new_method (self):
  print self.supaAttribute


supaInstance = SupaClass ()
supaInstance.supMethod = binder (new_method, supaInstance)

otherInstance = SupaClass ()
otherInstance.supaAttribute = 72
otherInstance.supMethod = binder (new_method, otherInstance)

otherInstance.supMethod ()
supaInstance.supMethod ()

在那里,当您将函数和实例传递给活页夹装饰器时,它将创建一个新函数,其代码对象与第一个相同。然后,该类的给定实例存储在新创建的函数的属性中。装饰器返回一个(第三个)函数,该函数自动调用复制的函数,并将实例作为第一个参数。

总之,您将获得一个模拟它绑定到类实例的函数。保留原始功能不变。

This question was opened years ago, but hey, there’s an easy way to simulate the binding of a function to a class instance using decorators:

def binder (function, instance):
  copy_of_function = type (function) (function.func_code, {})
  copy_of_function.__bind_to__ = instance
  def bound_function (*args, **kwargs):
    return copy_of_function (copy_of_function.__bind_to__, *args, **kwargs)
  return bound_function


class SupaClass (object):
  def __init__ (self):
    self.supaAttribute = 42


def new_method (self):
  print self.supaAttribute


supaInstance = SupaClass ()
supaInstance.supMethod = binder (new_method, supaInstance)

otherInstance = SupaClass ()
otherInstance.supaAttribute = 72
otherInstance.supMethod = binder (new_method, otherInstance)

otherInstance.supMethod ()
supaInstance.supMethod ()

There, when you pass the function and the instance to the binder decorator, it will create a new function, with the same code object as the first one. Then, the given instance of the class is stored in an attribute of the newly created function. The decorator return a (third) function calling automatically the copied function, giving the instance as the first parameter.

In conclusion you get a function simulating it’s binding to the class instance. Letting the original function unchanged.


回答 13

Jason Pratt发表的内容是正确的。

>>> class Test(object):
...   def a(self):
...     pass
... 
>>> def b(self):
...   pass
... 
>>> Test.b = b
>>> type(b)
<type 'function'>
>>> type(Test.a)
<type 'instancemethod'>
>>> type(Test.b)
<type 'instancemethod'>

如您所见,Python认为b()与a()没有什么不同。在Python中,所有方法只是碰巧是函数的变量。

What Jason Pratt posted is correct.

>>> class Test(object):
...   def a(self):
...     pass
... 
>>> def b(self):
...   pass
... 
>>> Test.b = b
>>> type(b)
<type 'function'>
>>> type(Test.a)
<type 'instancemethod'>
>>> type(Test.b)
<type 'instancemethod'>

As you can see, Python doesn’t consider b() any different than a(). In Python all methods are just variables that happen to be functions.


回答 14

我感到奇怪的是,没有人提到上面列出的所有方法都会在添加的方法和实例之间创建一个循环引用,从而导致对象在垃圾回收之前一直保持不变。有一个古老的技巧通过扩展对象的类来添加描述符:

def addmethod(obj, name, func):
    klass = obj.__class__
    subclass = type(klass.__name__, (klass,), {})
    setattr(subclass, name, func)
    obj.__class__ = subclass

I find it strange that nobody mentioned that all of the methods listed above creates a cycle reference between the added method and the instance, causing the object to be persistent till garbage collection. There was an old trick adding a descriptor by extending the class of the object:

def addmethod(obj, name, func):
    klass = obj.__class__
    subclass = type(klass.__name__, (klass,), {})
    setattr(subclass, name, func)
    obj.__class__ = subclass

回答 15

from types import MethodType

def method(self):
   print 'hi!'


setattr( targetObj, method.__name__, MethodType(method, targetObj, type(method)) )

有了这个,你可以使用self指针

from types import MethodType

def method(self):
   print 'hi!'


setattr( targetObj, method.__name__, MethodType(method, targetObj, type(method)) )

With this, you can use the self pointer


从字符串中删除标点符号的最佳方法

问题:从字符串中删除标点符号的最佳方法

似乎应该有一个比以下方法更简单的方法:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

在那儿?

It seems like there should be a simpler way than:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

Is there?


回答 0

从效率的角度来看,您不会被击败

s.translate(None, string.punctuation)

对于更高版本的Python,请使用以下代码:

s.translate(str.maketrans('', '', string.punctuation))

它使用查找表在C语言中执行原始字符串操作-除了编写自己的C代码之外,没有什么比这更好的了。

如果不担心速度,那么另一个选择是:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

这比每个char的s.replace更快,但效果不如regexes或string.translate等非纯python方法,如下面的时序所示。对于这种类型的问题,在尽可能低的水平上进行操作会有所回报。

时间码:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

得到以下结果:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

From an efficiency perspective, you’re not going to beat

s.translate(None, string.punctuation)

For higher versions of Python use the following code:

s.translate(str.maketrans('', '', string.punctuation))

It’s performing raw string operations in C with a lookup table – there’s not much that will beat that but writing your own C code.

If speed isn’t a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won’t perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

Timing code:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

回答 1

如果您知道正则表达式,就足够简单了。

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

Regular expressions are simple enough, if you know them.

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

回答 2

为了方便使用,我在Python 2和Python 3中总结了从字符串中删除标点符号的注意事项。有关详细说明,请参阅其他答案。


Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python 2 and Python 3. Please refer to other answers for the detailed description.


Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

回答 3

myString.translate(None, string.punctuation)
myString.translate(None, string.punctuation)

回答 4

我通常使用这样的东西:

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
...
>>> s
'string With Punctuation'

I usually use something like this:

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
...
>>> s
'string With Punctuation'

回答 5

string.punctuation是ASCII !一种更正确(但也慢得多)的方法是使用unicodedata模块:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

您也可以概括和去除其他类型的字符:

''.join(ch for ch in s if category(ch)[0] not in 'SP')

它还会~*+§$根据个人的视点去掉那些可能为“标点”或不为“标点”的字符。

string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

You can generalize and strip other types of characters as well:

''.join(ch for ch in s if category(ch)[0] not in 'SP')

It will also strip characters like ~*+§$ which may or may not be “punctuation” depending on one’s point of view.


回答 6

如果您对re家族更加熟悉,则不一定会更简单,但会采用另一种方式。

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

回答 7

对于Python 3 str或Python 2 unicode值,str.translate()只需要一个字典;在该映射中查找代码点(整数),并None删除所有映射到的代码点。

然后要删除(某些?)标点符号,请使用:

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

使用dict.fromkeys()class方法可以轻松创建映射,并None根据键序列将所有值设置为。

要删除所有标点符号,而不仅仅是ASCII标点符号,您的表需要更大一些。参见JF Sebastian的答案(Python 3版本):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

To remove (some?) punctuation then, use:

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))
s.translate(remove_punct_map)

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian’s answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

回答 8

string.punctuation错过了现实世界中常用的大量标点符号。一种适用于非ASCII标点的解决方案怎么样?

import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

我个人认为这是从Python中的字符串中删除标点符号的最佳方法,因为:

  • 删除所有Unicode标点符号
  • 它很容易修改,例如,\{S}如果要删除标点符号,则可以将其删除,但要保留诸如$
  • 您可以真正确定要保留的内容和要删除的内容,例如\{Pd}仅删除破折号。
  • 此正则表达式还规范了空格。它将制表符,回车符和其他奇数映射到漂亮的单个空格。

它使用Unicode字符属性,您可以在Wikipedia上了解更多信息

string.punctuation misses loads of punctuation marks that are commonly used in the real world. How about a solution that works for non-ASCII punctuation?

import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

Personally, I believe this is the best way to remove punctuation from a string in Python because:

  • It removes all Unicode punctuation
  • It’s easily modifiable, e.g. you can remove the \{S} if you want to remove punctuation, but keep symbols like $.
  • You can get really specific about what you want to keep and what you want to remove, for example \{Pd} will only remove dashes.
  • This regex also normalizes whitespace. It maps tabs, carriage returns, and other oddities to nice, single spaces.

This uses Unicode character properties, which you can read more about on Wikipedia.


回答 9

我还没有看到这个答案。只需使用正则表达式即可;它会删除单词字符(\w)和数字字符(\d)之外的所有字符,然后删除空格字符(\s):

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

I haven’t seen this answer yet. Just use a regex; it removes all characters besides word characters (\w) and number characters (\d), followed by a whitespace character (\s):

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

回答 10

这是Python 3.5的一线式:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

Here’s a one-liner for Python 3.5:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

回答 11

这可能不是最佳解决方案,但是这就是我的方法。

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

This might not be the best solution however this is how I did it.

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

回答 12

这是我编写的函数。它不是很有效,但是很简单,您可以添加或删除所需的标点符号:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

Here is a function I wrote. It’s not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

回答 13

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)
import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

回答 14

作为更新,我重写了Python 3中的@Brian示例并对其进行了更改,以将regex编译步骤移至函数内部。我的想法是计时使该功能起作用所需的每个步骤。也许您使用的是分布式计算,并且您的工作人员之间无法共享正则表达式对象,因此需要re.compile在每个工作人员中走一步。另外,我很好奇地为Python 3的maketrans的两种不同实现计时了

table = str.maketrans({key: None for key in string.punctuation})

table = str.maketrans('', '', string.punctuation)

另外,我添加了另一种使用set的方法,其中利用了交集函数来减少迭代次数。

这是完整的代码:

import re, string, timeit

s = "string. With. Punctuation"


def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)


def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())


def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)


def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)


def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)
    return(s.translate(table))


def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s


print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

这是我的结果:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

Just as an update, I rewrote the @Brian example in Python 3 and made changes to it to move regex compile step inside of the function. My thought here was to time every single step needed to make the function work. Perhaps you are using distributed computing and can’t have regex object shared between your workers and need to have re.compile step at each worker. Also, I was curious to time two different implementations of maketrans for Python 3

table = str.maketrans({key: None for key in string.punctuation})

vs

table = str.maketrans('', '', string.punctuation)

Plus I added another method to use set, where I take advantage of intersection function to reduce number of iterations.

This is the complete code:

import re, string, timeit

s = "string. With. Punctuation"


def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)


def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())


def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)


def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)


def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)
    return(s.translate(table))


def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
        s=s.replace(c,"")
    return s


print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

This is my results:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

回答 15

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']
>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)


['string', 'With', 'Punctuation']

回答 16

这是没有正则表达式的解决方案。

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then
  • 用空格替换标点符号
  • 用单个空格替换单词之间的多个空格
  • 如果有strip(),请删除尾随空格

Here’s a solution without regex.

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then
  • Replaces the punctuations with spaces
  • Replace multiple spaces in between words with a single space
  • Remove the trailing spaces, if any with strip()

回答 17

在不太严格的情况下,单线可能会有所帮助:

''.join([c for c in s if c.isalnum() or c.isspace()])

A one-liner might be helpful in not very strict cases:

''.join([c for c in s if c.isalnum() or c.isspace()])

回答 18

#FIRST METHOD
#Storing all punctuations in a variable    
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
                  newstring+=i
print "The string without punctuation is",newstring

#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring


#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage
#FIRST METHOD
#Storing all punctuations in a variable    
punctuation='!?,.:;"\')(_-'
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
                  newstring+=i
print "The string without punctuation is",newstring

#SECOND METHOD
word=raw_input("Enter string: ")
punctuation='!?,.:;"\')(_-'
newstring=word.translate(None,punctuation)
print "The string without punctuation is",newstring


#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage

回答 19

with open('one.txt','r')as myFile:

    str1=myFile.read()

    print(str1)


    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList=[]
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print(i,end='\n')
    print ("____________")
with open('one.txt','r')as myFile:

    str1=myFile.read()

    print(str1)


    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList=[]
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print(i,end='\n')
    print ("____________")

回答 20

为什么你们没人使用这个?

 ''.join(filter(str.isalnum, s)) 

太慢了?

Why none of you use this?

 ''.join(filter(str.isalnum, s)) 

Too slow?


回答 21

考虑unicode。代码在python3中检查。

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

Considering unicode. Code checked in python3.

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

回答 22

使用Python从文本文件中删除停用词

print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

    str1=myFile.read()

    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

    myList=[]

    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")

            print(i,end='\n')

Remove stop words from the text file using Python

print('====THIS IS HOW TO REMOVE STOP WORS====')

with open('one.txt','r')as myFile:

    str1=myFile.read()

    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"

    myList=[]

    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")

            print(i,end='\n')

回答 23

我喜欢使用这样的功能:

def scrub(abc):
    while abc[-1] is in list(string.punctuation):
        abc=abc[:-1]
    while abc[0] is in list(string.punctuation):
        abc=abc[1:]
    return abc

I like to use a function like this:

def scrub(abc):
    while abc[-1] is in list(string.punctuation):
        abc=abc[:-1]
    while abc[0] is in list(string.punctuation):
        abc=abc[1:]
    return abc

iloc,ix和loc有何不同?

问题:iloc,ix和loc有何不同?

有人可以解释这三种切片方法有何不同吗?
我看过文档,也看过这些 答案,但仍然发现自己无法解释这三者之间的区别。在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。

例如,假设我们要获取的前五行DataFrame。这三者如何运作?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

有人可以提出三种用法之间的区别更清楚的情况吗?

Can someone explain how these three methods of slicing are different?
I’ve seen the docs, and I’ve seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?

df.loc[:5]
df.ix[:5]
df.iloc[:5]

Can someone present three cases where the distinction in uses are clearer?


回答 0

注意:在熊猫版本0.20.0及更高版本中,ix弃用,建议改为使用lociloc。我留下了ix完整的答案部分,以供早期版本的熊猫用户参考。下面添加了示例,显示了的替代方案 ix


首先,以下是三种方法的概述:

  • loc从索引中获取带有特定标签的行(或列)。
  • iloc在索引中的特定位置获取行(或列)(因此仅获取整数)。
  • ix通常会尝试表现得像,lociloc如果索引中没有标签,则会回落为行为。

重要的是要注意一些细微之处,这些细微之处可能会使ix使用起来有些棘手:

  • 如果索引是整数类型,ix则将仅使用基于标签的索引,而不会使用基于位置的索引。如果标签不在索引中,则会引发错误。

  • 如果指数不包含唯一整数,然后给出一个整数,ix将立即使用基于位置的索引,而不是基于标签的索引。但是,如果ix给定其他类型(例如字符串),则可以使用基于标签的索引。


为了说明这三种方法之间的差异,请考虑以下系列:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

我们将看看用整数值切片3

在这种情况下,向s.iloc[:3]我们返回前3行(因为它将3视为位置),并向s.loc[:3]我们返回前8行(因为将3视为标签):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

注意s.ix[:3]s.loc[:3]由于它首先查找标签,而不是在位置上工作(因此,其索引为s整数类型),因此Notification 返回相同的Series 。

如果我们尝试使用不在索引中的整数标签(例如6)怎么办?

此处s.iloc[:6]按预期返回Series的前6行。但是,s.loc[:6]由于6不在索引中,所以引发KeyError 。

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

根据上面提到的细微之处,s.ix[:6]现在引发KeyError,因为它试图像在索引中loc找到一个那样工作,但找不到它6。因为我们的索引是整数类型,ix所以不会回落为iloc

但是,如果我们的索引为混合类型,则给定的整数ixiloc立即表现出来,而不是引发KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

请记住,ix它仍然可以接受非整数并表现为loc

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

作为一般建议,如果您仅使用标签建立索引,或者仅使用整数位置建立索引,请坚持使用lociloc避免出现意外结果-请勿使用ix


结合基于位置和基于标签的索引

有时在给定DataFrame的情况下,您将需要为行和列混合使用标签和位置索引方法。

例如,考虑以下DataFrame。如何最好地将行切成“ c” 包括前四列?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

在早期版本的pandas(0.20.0之前)中ix,您可以整齐地进行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列,ix由于4不是列名,因此默认为基于位置的切片 ):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

在更高版本的熊猫中,我们可以使用iloc并借助另一种方法来获得此结果:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc()是一种索引方法,意思是“获取标签在此索引中的位置”。请注意,由于切片与iloc不包含其端点,因此如果还要行’c’,则必须在此值上加1。

此处的熊猫文档中还有其他示例。

Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.


First, here’s a recap of the three methods:

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

It’s important to note some subtleties that can make ix slightly tricky to use:

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.


To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

We’ll look at slicing with the integer value 3.

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

What if we try with an integer label that isn’t in the index (say 6)?

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can’t find a 6 in the index. Because our index is of integer type ix doesn’t fall back to behaving like iloc.

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
True
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

As general advice, if you’re only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results – try not use ix.


Combining position-based and label-based indexing

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

For example, consider the following DataFrame. How best to slice the rows up to and including ‘c’ and take the first four columns?

>>> df = pd.DataFrame(np.nan, 
                      index=list('abcde'),
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly – we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() is an index method meaning “get the position of the label in this index”. Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row ‘c’ as well.

There are further examples in pandas’ documentation here.


回答 1

iloc基于整数定位工作。因此,无论您的行标签是什么,您都可以始终执行以下操作:

df.iloc[0]

或最后五行

df.iloc[-5:]

您也可以在列上使用它。这将检索第三列:

df.iloc[:, 2]    # the : in the first position indicates all rows

您可以将它们结合起来以获得行和列的交集:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

另一方面,.loc使用命名索引。让我们设置一个带有字符串作为行和列标签的数据框:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

然后我们可以得到第一行

df.loc['a']     # equivalent to df.iloc[0]

和第二两排的'date'柱通过

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

等等。现在,可能值得指出的是,a的默认行和列索引DataFrame是从0开始的整数,在这种情况下iloc,它们的loc工作方式相同。这就是为什么您的三个示例是等效的。如果您有非数字索引(例如字符串或日期时间), df.loc[:5] 则会引发错误。

另外,您可以仅使用数据框的进行列检索__getitem__

df['time']    # equivalent to df.loc[:, 'time']

现在假设您要混合使用位置索引和命名索引,即使用行上的名称和列上的位置进行索引(为澄清起见,我的意思是从我们的数据框中选择内容,而不是使用行索引中包含字符串和整数的方式创建数据框列索引)。这是.ix进来的地方:

df.ix[:2, 'time']    # the first two rows of the 'time' column

我认为也值得一提的是,您也可以将布尔向量传递给该loc方法。例如:

 b = [True, False, True]
 df.loc[b] 

将返回的第一行和第三行df。这等效df[b]于选择,但也可以用于通过布尔向量进行分配:

df.loc[b, 'name'] = 'Mary', 'John'

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing

df.iloc[0]

or the last five rows by doing

df.iloc[-5:]

You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .loc use named indices. Let’s set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date' column by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it’s probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame’s __getitem__:

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it’s also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True]
 df.loc[b] 

Will return the 1st and 3rd rows of df. This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John'

回答 2

我认为,可接受的答案令人困惑,因为它使用仅缺少值的DataFrame。我也不喜欢术语基于位置.iloc,相反,喜欢整数位置,因为它是更描述性,正是.iloc代表。关键字是.ilocINTEGER-需要INTEGERS。

请参阅我关于子集选择的非常详细的博客系列,以了解更多信息


.ix已弃用且含糊不清,切勿使用

由于.ix已弃用,因此我们仅关注.loc和之间的差异.iloc

在讨论差异之前,重要的是要了解DataFrames具有标签,这些标签可帮助标识每个列和每个索引。让我们看一个示例DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

所有粗体字均为标签。标签,agecolorfoodheightscorestate被用于。其他标签,JaneNickAaronPenelopeDeanChristinaCornelia被用于索引


在DataFrame中选择特定行的主要方法是使用.loc.iloc索引器。这些索引器中的每一个也可以用于同时选择列,但是现在只关注行更容易。同样,每个索引器都使用紧跟其名称的一组括号进行选择。

.loc仅通过标签选择数据

我们将首先讨论.loc仅通过索引或列标签选择数据的索引器。在示例DataFrame中,我们提供了有意义的名称作为索引值。许多DataFrame都没有任何有意义的名称,而是默认为0到n-1之间的整数,其中n是DataFrame的长度。

您可以使用三种不同的输入 .loc

  • 一串
  • 字符串列表
  • 使用字符串作为起始值和终止值的切片符号

用带字符串的.loc选择单行

要选择一行数据,请将索引标签放在后面的括号内.loc

df.loc['Penelope']

这将数据行作为系列返回

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

使用.loc与字符串列表选择多行

df.loc[['Cornelia', 'Jane', 'Dean']]

这将返回一个DataFrame,其中的数据行按列表中指定的顺序进行:

使用带有切片符号的.loc选择多行

切片符号由开始,停止和步进值定义。按标签切片时,大熊猫在返回值中包含停止值。以下是从亚伦到迪恩(含)的片段。它的步长未明确定义,但默认为1。

df.loc['Aaron':'Dean']

可以采用与Python列表相同的方式获取复杂的切片。

.iloc仅按整数位置选择数据

现在转到.iloc。DataFrame中数据的每一行和每一列都有一个定义它的整数位置。这是在输出中直观显示的标签的补充。整数位置只是从0开始从顶部/左侧开始的行/列数。

您可以使用三种不同的输入 .iloc

  • 一个整数
  • 整数列表
  • 使用整数作为起始值和终止值的切片符号

用带整数的.iloc选择单行

df.iloc[4]

这将返回第5行(整数位置4)为系列

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

用.iloc选择带有整数列表的多行

df.iloc[[2, -2]]

这将返回第三行和倒数第二行的DataFrame:

使用带切片符号的.iloc选择多行

df.iloc[:5:3]


使用.loc和.iloc同时选择行和列

两者的一项出色功能.loc/.iloc是它们可以同时选择行和列。在上面的示例中,所有列都是从每个选择中返回的。我们可以选择输入类型与行相同的列。我们只需要用逗号分隔行和列选择即可。

例如,我们可以选择Jane行和Dean行,它们的高度,得分和状态如下:

df.loc[['Jane', 'Dean'], 'height':]

这对行使用标签列表,对列使用切片符号

我们自然可以.iloc只使用整数来执行类似的操作。

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

带标签和整数位置的同时选择

.ix用来与标签和整数位置同时进行选择,这很有用,但有时会造成混淆和模棱两可,值得庆幸的是,它已被弃用。如果您需要混合使用标签和整数位置进行选择,则必须同时选择标签或整数位置。

例如,如果我们要选择行Nick以及第Cornelia2列和第4列,则可以.loc通过以下方式将整数转换为标签来使用:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

或者,可以使用get_locindex方法将索引标签转换为整数。

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

布尔选择

.loc索引器还可以进行布尔选择。例如,如果我们有兴趣查找年龄在30岁以上的所有行,并仅返回foodscore列,则可以执行以下操作:

df.loc[df['age'] > 30, ['food', 'score']] 

您可以使用复制它,.iloc但是不能将其传递为布尔系列。您必须将boolean Series转换为numpy数组,如下所示:

df.iloc[(df['age'] > 30).values, [2, 4]] 

选择所有行

可以.loc/.iloc仅用于列选择。您可以使用如下冒号来选择所有行:

df.loc[:, 'color':'score':2]


索引运算符[]可以选择行和列,但不能同时选择。

大多数人都熟悉DataFrame索引运算符的主要目的,即选择列。字符串选择单个列作为系列,而字符串列表选择多个列作为DataFrame。

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

使用列表选择多个列

df[['food', 'score']]

人们所不熟悉的是,当使用切片符号时,选择是通过行标签或整数位置进行的。这非常令人困惑,我几乎从未使用过,但是确实可以使用。

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

.loc/.iloc选择行的显式性是高度首选的。单独的索引运算符无法同时选择行和列。

df[3:5, 'color']
TypeError: unhashable type: 'slice'

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. The key word is INTEGER – .iloc needs INTEGERS.

See my extremely detailed blog series on subset selection for more


.ix is deprecated and ambiguous and should never be used

Because .ix is deprecated we will only focus on the differences between .loc and .iloc.

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let’s take a look at a sample DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                   },
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used for the index.


The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for .loc

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc.

df.loc['Penelope']

This returns the row of data as a Series

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.

df.loc['Aaron':'Dean']

Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let’s now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for .iloc

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer

df.iloc[4]

This returns the 5th row (integer location 4) as a Series

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

Selecting multiple rows with .iloc with slice notation

df.iloc[:5:3]


Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':]

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]] 

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2]


The indexing operator, [], can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.

df['food']

Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

df[['food', 'score']]

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color']
TypeError: unhashable type: 'slice'