







I have a log file being written by another process which I want to watch for changes. Each time a change occurs I’d like to read the new data in to do some processing on it.

What’s the best way to do this? I was hoping there’d be some sort of hook from the PyWin32 library. I’ve found the win32file.FindNextChangeNotification function but have no idea how to ask it to watch a specific file.

If anyone’s done anything like this I’d be really grateful to hear how…

[Edit] I should have mentioned that I was after a solution that doesn’t require polling.

[Edit] Curses! It seems this doesn’t work over a mapped network drive. I’m guessing windows doesn’t ‘hear’ any updates to the file the way it does on a local disk.

回答 0




Have you already looked at the documentation available on http://timgolden.me.uk/python/win32_how_do_i/watch_directory_for_changes.html? If you only need it to work under Windows the 2nd example seems to be exactly what you want (if you exchange the path of the directory with the one of the file you want to watch).

Otherwise, polling will probably be the only really platform-independent option.

Note: I haven’t tried any of these solutions.

回答 1


Did you try using Watchdog?

Python API library and shell utilities to monitor file system events.

Directory monitoring made easy with

  • A cross-platform API.
  • A shell tool to run commands in response to directory changes.

Get started quickly with a simple example in Quickstart

回答 2




import os

class Monkey(object):
    def __init__(self):
        self._cached_stamp = 0
        self.filename = '/path/to/file'

    def ook(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...

If polling is good enough for you, I’d just watch if the “modified time” file stat changes. To read it:


(Also note that the Windows native change event solution does not work in all circumstances, e.g. on network drives.)

import os

class Monkey(object):
    def __init__(self):
        self._cached_stamp = 0
        self.filename = '/path/to/file'

    def ook(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...

回答 3


from PyQt4 import QtCore

def directory_changed(path):
    print('Directory Changed!!!')

def file_changed(path):
    print('File Changed!!!')

fs_watcher = QtCore.QFileSystemWatcher(['/path/to/files_1', '/path/to/files_2', '/path/to/files_3'])

fs_watcher.connect(fs_watcher, QtCore.SIGNAL('directoryChanged(QString)'), directory_changed)
fs_watcher.connect(fs_watcher, QtCore.SIGNAL('fileChanged(QString)'), file_changed)

If you want a multiplatform solution, then check QFileSystemWatcher. Here an example code (not sanitized):

from PyQt4 import QtCore

def directory_changed(path):
    print('Directory Changed!!!')

def file_changed(path):
    print('File Changed!!!')

fs_watcher = QtCore.QFileSystemWatcher(['/path/to/files_1', '/path/to/files_2', '/path/to/files_3'])

fs_watcher.connect(fs_watcher, QtCore.SIGNAL('directoryChanged(QString)'), directory_changed)
fs_watcher.connect(fs_watcher, QtCore.SIGNAL('fileChanged(QString)'), file_changed)

回答 4

import time
import fcntl
import os
import signal


def handler(signum, frame):
    print "File %s modified" % (FNAME,)

signal.signal(signal.SIGIO, handler)
fd = os.open(FNAME,  os.O_RDONLY)
fcntl.fcntl(fd, fcntl.F_SETSIG, 0)
fcntl.fcntl(fd, fcntl.F_NOTIFY,
            fcntl.DN_MODIFY | fcntl.DN_CREATE | fcntl.DN_MULTISHOT)

while True:

It should not work on windows (maybe with cygwin ?), but for unix user, you should use the “fcntl” system call. Here is an example in Python. It’s mostly the same code if you need to write it in C (same function names)

import time
import fcntl
import os
import signal


def handler(signum, frame):
    print "File %s modified" % (FNAME,)

signal.signal(signal.SIGIO, handler)
fd = os.open(FNAME,  os.O_RDONLY)
fcntl.fcntl(fd, fcntl.F_SETSIG, 0)
fcntl.fcntl(fd, fcntl.F_NOTIFY,
            fcntl.DN_MODIFY | fcntl.DN_CREATE | fcntl.DN_MULTISHOT)

while True:

回答 5



Check out pyinotify.

inotify replaces dnotify (from an earlier answer) in newer linuxes and allows file-level rather than directory-level monitoring.

回答 6

import os

import win32file
import win32con

path_to_watch = "." # look at the current directory
file_to_watch = "test.txt" # look for changes to a file called test.txt

def ProcessNewData( newData ):
    print "Text added: %s"%newData

# Set up the bits we'll need for output
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
hDir = win32file.CreateFile (
  win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,

# Open the file we're interested in
a = open(file_to_watch, "r")

# Throw away any exising log data

# Wait for new data and call ProcessNewData for each new chunk that's written
while 1:
  # Wait for a change to occur
  results = win32file.ReadDirectoryChangesW (

  # For each change, check to see if it's updating the file we're interested in
  for action, file in results:
    full_filename = os.path.join (path_to_watch, file)
    #print file, ACTIONS.get (action, "Unknown")
    if file == file_to_watch:
        newText = a.read()
        if newText != "":
            ProcessNewData( newText )



Well after a bit of hacking of Tim Golden’s script, I have the following which seems to work quite well:

import os

import win32file
import win32con

path_to_watch = "." # look at the current directory
file_to_watch = "test.txt" # look for changes to a file called test.txt

def ProcessNewData( newData ):
    print "Text added: %s"%newData

# Set up the bits we'll need for output
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"
hDir = win32file.CreateFile (
  win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,

# Open the file we're interested in
a = open(file_to_watch, "r")

# Throw away any exising log data

# Wait for new data and call ProcessNewData for each new chunk that's written
while 1:
  # Wait for a change to occur
  results = win32file.ReadDirectoryChangesW (

  # For each change, check to see if it's updating the file we're interested in
  for action, file in results:
    full_filename = os.path.join (path_to_watch, file)
    #print file, ACTIONS.get (action, "Unknown")
    if file == file_to_watch:
        newText = a.read()
        if newText != "":
            ProcessNewData( newText )

It could probably do with a load more error checking, but for simply watching a log file and doing some processing on it before spitting it out to the screen, this works well.

Thanks everyone for your input – great stuff!

回答 7


import os
import sys 
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_file, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self.filename = watch_file
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...
            print('File changed')
            if self.call_func_on_change is not None:
                self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop        
    def watch(self):
        while self.running: 
                # Look for changes
            except KeyboardInterrupt: 
            except FileNotFoundError:
                # Action on file not found
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):

watch_file = 'my_file.txt'

# watcher = Watcher(watch_file)  # simple
watcher = Watcher(watch_file, custom_action, text='yes, changed')  # also call custom action function
watcher.watch()  # start the watch going

For watching a single file with polling, and minimal dependencies, here is a fully fleshed-out example, based on answer from Deestan (above):

import os
import sys 
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_file, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self.filename = watch_file
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        stamp = os.stat(self.filename).st_mtime
        if stamp != self._cached_stamp:
            self._cached_stamp = stamp
            # File has changed, so do something...
            print('File changed')
            if self.call_func_on_change is not None:
                self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop        
    def watch(self):
        while self.running: 
                # Look for changes
            except KeyboardInterrupt: 
            except FileNotFoundError:
                # Action on file not found
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):

watch_file = 'my_file.txt'

# watcher = Watcher(watch_file)  # simple
watcher = Watcher(watch_file, custom_action, text='yes, changed')  # also call custom action function
watcher.watch()  # start the watch going

回答 8


import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        print line, # already has newline


Check my answer to a similar question. You could try the same loop in Python. This page suggests:

import time

while 1:
    where = file.tell()
    line = file.readline()
    if not line:
        print line, # already has newline

Also see the question tail() a file with Python.

回答 9



watchmedo shell-command \
--patterns="*.sql" \
--recursive \
--command='~/Desktop/load_files_into_mysql_database.sh' \

Simplest solution for me is using watchdog’s tool watchmedo

From https://pypi.python.org/pypi/watchdog I now have a process that looks up the sql files in a directory and executes them if necessary.

watchmedo shell-command \
--patterns="*.sql" \
--recursive \
--command='~/Desktop/load_files_into_mysql_database.sh' \

回答 10


f = open('file.log')


line = f.readline()
if line:
    // Do what you want with the line



Well, since you are using Python, you can just open a file and keep reading lines from it.

f = open('file.log')

If the line read is not empty, you process it.

line = f.readline()
if line:
    // Do what you want with the line

You may be missing that it is ok to keep calling readline at the EOF. It will just keep returning an empty string in this case. And when something is appended to the log file, the reading will continue from where it stopped, as you need.

If you are looking for a solution that uses events, or a particular library, please specify this in your question. Otherwise, I think this solution is just fine.

回答 11


# Check file for new data.

import time

f = open(r'c:\temp\test.txt', 'r')

while True:

    line = f.readline()
    if not line:
        print 'Nothing New'
        print 'Call Function: ', line

Here is a simplified version of Kender’s code that appears to do the same trick and does not import the entire file:

# Check file for new data.

import time

f = open(r'c:\temp\test.txt', 'r')

while True:

    line = f.readline()
    if not line:
        print 'Nothing New'
        print 'Call Function: ', line

回答 12

用法:whateverName.py path_to_dir_to_watch

#!/usr/bin/env python

import os, sys, time

def files_to_timestamp(path):
    files = [os.path.join(path, f) for f in os.listdir(path)]
    return dict ([(f, os.path.getmtime(f)) for f in files])

if __name__ == "__main__":

    path_to_watch = sys.argv[1]
    print('Watching {}..'.format(path_to_watch))

    before = files_to_timestamp(path_to_watch)

    while 1:
        time.sleep (2)
        after = files_to_timestamp(path_to_watch)

        added = [f for f in after.keys() if not f in before.keys()]
        removed = [f for f in before.keys() if not f in after.keys()]
        modified = []

        for f in before.keys():
            if not f in removed:
                if os.path.getmtime(f) != before.get(f):

        if added: print('Added: {}'.format(', '.join(added)))
        if removed: print('Removed: {}'.format(', '.join(removed)))
        if modified: print('Modified: {}'.format(', '.join(modified)))

        before = after

This is another modification of Tim Goldan’s script that runs on unix types and adds a simple watcher for file modification by using a dict (file=>time).

usage: whateverName.py path_to_dir_to_watch

#!/usr/bin/env python

import os, sys, time

def files_to_timestamp(path):
    files = [os.path.join(path, f) for f in os.listdir(path)]
    return dict ([(f, os.path.getmtime(f)) for f in files])

if __name__ == "__main__":

    path_to_watch = sys.argv[1]
    print('Watching {}..'.format(path_to_watch))

    before = files_to_timestamp(path_to_watch)

    while 1:
        time.sleep (2)
        after = files_to_timestamp(path_to_watch)

        added = [f for f in after.keys() if not f in before.keys()]
        removed = [f for f in before.keys() if not f in after.keys()]
        modified = []

        for f in before.keys():
            if not f in removed:
                if os.path.getmtime(f) != before.get(f):

        if added: print('Added: {}'.format(', '.join(added)))
        if removed: print('Removed: {}'.format(', '.join(removed)))
        if modified: print('Modified: {}'.format(', '.join(modified)))

        before = after

回答 13

As you can see in Tim Golden’s article, pointed by Horst Gutmann, WIN32 is relatively complex and watches directories, not a single file.

I’d like to suggest you look into IronPython, which is a .NET python implementation. With IronPython you can use all the .NET functionality – including


Which handles single files with a simple Event interface.

回答 14



file_size_stored = os.stat('neuron.py').st_size

  while True:
      file_size_current = os.stat('neuron.py').st_size
      if file_size_stored != file_size_current:


def restart_program(): #restart application
    python = sys.executable
    os.execl(python, python, * sys.argv)


This is an example of checking a file for changes. One that may not be the best way of doing it, but it sure is a short way.

Handy tool for restarting application when changes have been made to the source. I made this when playing with pygame so I can see effects take place immediately after file save.

When used in pygame make sure the stuff in the ‘while’ loop is placed in your game loop aka update or whatever. Otherwise your application will get stuck in an infinite loop and you will not see your game updating.

file_size_stored = os.stat('neuron.py').st_size

  while True:
      file_size_current = os.stat('neuron.py').st_size
      if file_size_stored != file_size_current:

In case you wanted the restart code which I found on the web. Here it is. (Not relevant to the question, though it could come in handy)

def restart_program(): #restart application
    python = sys.executable
    os.execl(python, python, * sys.argv)

Have fun making electrons do what you want them to do.

回答 15

  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"

class myThread (threading.Thread):
    def __init__(self, threadID, fileName, directory, origin):
        self.threadID = threadID
        self.fileName = fileName
        self.daemon = True
        self.dir = directory
        self.originalFile = origin
    def run(self):
        startMonitor(self.fileName, self.dir, self.originalFile)

def startMonitor(fileMonitoring,dirPath,originalFile):
    hDir = win32file.CreateFile (
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
    # Wait for new data and call ProcessNewData for each new chunk that's
    # written
    while 1:
        # Wait for a change to occur
        results = win32file.ReadDirectoryChangesW (
        # For each change, check to see if it's updating the file we're
        # interested in
        for action, file_M in results:
            full_filename = os.path.join (dirPath, file_M)
            #print file, ACTIONS.get (action, "Unknown")
            if len(full_filename) == len(fileMonitoring) and action == 3:
                #copy to main file
  1 : "Created",
  2 : "Deleted",
  3 : "Updated",
  4 : "Renamed from something",
  5 : "Renamed to something"

class myThread (threading.Thread):
    def __init__(self, threadID, fileName, directory, origin):
        self.threadID = threadID
        self.fileName = fileName
        self.daemon = True
        self.dir = directory
        self.originalFile = origin
    def run(self):
        startMonitor(self.fileName, self.dir, self.originalFile)

def startMonitor(fileMonitoring,dirPath,originalFile):
    hDir = win32file.CreateFile (
        win32con.FILE_SHARE_READ | win32con.FILE_SHARE_WRITE,
    # Wait for new data and call ProcessNewData for each new chunk that's
    # written
    while 1:
        # Wait for a change to occur
        results = win32file.ReadDirectoryChangesW (
        # For each change, check to see if it's updating the file we're
        # interested in
        for action, file_M in results:
            full_filename = os.path.join (dirPath, file_M)
            #print file, ACTIONS.get (action, "Unknown")
            if len(full_filename) == len(fileMonitoring) and action == 3:
                #copy to main file

回答 16


from PyQt5.QtCore import QFileSystemWatcher, QSettings, QThread
from ui_main_window import Ui_MainWindow   # Qt Creator gen'd 

class MainWindow(QMainWindow, Ui_MainWindow):
    def __init__(self, parent=None):
        QMainWindow.__init__(self, parent)
        self._fileWatcher = QFileSystemWatcher()

    def fileChanged(self, filepath):
        QThread.msleep(300)    # Reqd on some machines, give chance for write to complete
        # ^^ About to test this, may need more sophisticated solution
        with open(filepath) as file:
            lastLine = list(file)[-1]
        destPath = self._filemap[filepath]['dest file']
        with open(destPath, 'a') as out_file:               # a= append


Here’s an example geared toward watching input files that write no more than one line per second but usually a lot less. The goal is to append the last line (most recent write) to the specified output file. I’ve copied this from one of my projects and just deleted all the irrelevant lines. You’ll have to fill in or change the missing symbols.

from PyQt5.QtCore import QFileSystemWatcher, QSettings, QThread
from ui_main_window import Ui_MainWindow   # Qt Creator gen'd 

class MainWindow(QMainWindow, Ui_MainWindow):
    def __init__(self, parent=None):
        QMainWindow.__init__(self, parent)
        self._fileWatcher = QFileSystemWatcher()

    def fileChanged(self, filepath):
        QThread.msleep(300)    # Reqd on some machines, give chance for write to complete
        # ^^ About to test this, may need more sophisticated solution
        with open(filepath) as file:
            lastLine = list(file)[-1]
        destPath = self._filemap[filepath]['dest file']
        with open(destPath, 'a') as out_file:               # a= append

Of course, the encompassing QMainWindow class is not strictly required, ie. you can use QFileSystemWatcher alone.

回答 17


repyt ./app.py

You can also use a simple library called repyt, here is an example:

repyt ./app.py

回答 18



Seems that no one has posted fswatch. It is a cross-platform file system watcher. Just install it, run it and follow the prompts.

I’ve used it with python and golang programs and it just works.

回答 19

import os
import sys
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_files, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self._cached_stamp_files = {}
        self.filenames = watch_files
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        for file in self.filenames:
            stamp = os.stat(file).st_mtime
            if not file in self._cached_stamp_files:
                self._cached_stamp_files[file] = 0
            if stamp != self._cached_stamp_files[file]:
                self._cached_stamp_files[file] = stamp
                # File has changed, so do something...
                file_to_read = open(file, 'r')
                value = file_to_read.read()
                print("value from file", value)
                if self.call_func_on_change is not None:
                    self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop
    def watch(self):
        while self.running:
                # Look for changes
            except KeyboardInterrupt:
            except FileNotFoundError:
                # Action on file not found
            except Exception as e:
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    # pass

watch_files = ['/Users/mexekanez/my_file.txt', '/Users/mexekanez/my_file1.txt']

# watcher = Watcher(watch_file)  # simple

if __name__ == "__main__":
    watcher = Watcher(watch_files, custom_action, text='yes, changed')  # also call custom action function
    watcher.watch()  # start the watch going

import os
import sys
import time

class Watcher(object):
    running = True
    refresh_delay_secs = 1

    # Constructor
    def __init__(self, watch_files, call_func_on_change=None, *args, **kwargs):
        self._cached_stamp = 0
        self._cached_stamp_files = {}
        self.filenames = watch_files
        self.call_func_on_change = call_func_on_change
        self.args = args
        self.kwargs = kwargs

    # Look for changes
    def look(self):
        for file in self.filenames:
            stamp = os.stat(file).st_mtime
            if not file in self._cached_stamp_files:
                self._cached_stamp_files[file] = 0
            if stamp != self._cached_stamp_files[file]:
                self._cached_stamp_files[file] = stamp
                # File has changed, so do something...
                file_to_read = open(file, 'r')
                value = file_to_read.read()
                print("value from file", value)
                if self.call_func_on_change is not None:
                    self.call_func_on_change(*self.args, **self.kwargs)

    # Keep watching in a loop
    def watch(self):
        while self.running:
                # Look for changes
            except KeyboardInterrupt:
            except FileNotFoundError:
                # Action on file not found
            except Exception as e:
                print('Unhandled error: %s' % sys.exc_info()[0])

# Call this function each time a change happens
def custom_action(text):
    # pass

watch_files = ['/Users/mexekanez/my_file.txt', '/Users/mexekanez/my_file1.txt']

# watcher = Watcher(watch_file)  # simple

if __name__ == "__main__":
    watcher = Watcher(watch_files, custom_action, text='yes, changed')  # also call custom action function
    watcher.watch()  # start the watch going

回答 20

from pygtail import Pygtail
import sys

while True:
    for line in Pygtail("some.log"):

The best and simplest solution is to use pygtail: https://pypi.python.org/pypi/pygtail

from pygtail import Pygtail
import sys

while True:
    for line in Pygtail("some.log"):

回答 21


I don’t know any Windows specific function. You could try getting the MD5 hash of the file every second/minute/hour (depends on how fast you need it) and compare it to the last hash. When it differs you know the file has been changed and you read out the newest lines.

回答 22


            f = open(filePath)
    except IOError:
            print "No such file: %s" % filePath
            raw_input("Press Enter to close window")
            lines = f.readlines()
            while True:
                    line = f.readline()
                            if not line:
                    except Exception, e:
                            # handle the exception somehow (for example, log the trace) and raise the same exception again
                            raw_input("Press Enter to close window")
                            raise e


I’d try something like this.

            f = open(filePath)
    except IOError:
            print "No such file: %s" % filePath
            raw_input("Press Enter to close window")
            lines = f.readlines()
            while True:
                    line = f.readline()
                            if not line:
                    except Exception, e:
                            # handle the exception somehow (for example, log the trace) and raise the same exception again
                            raw_input("Press Enter to close window")
                            raise e

The loop checks if there is a new line(s) since last time file was read – if there is, it’s read and passed to the functionThatAnalisesTheLine function. If not, script waits 1 second and retries the process.




plot.scatter(k, sum_cf, color='black', label='Sum of Cause Fractions')
plot.scatter(k, data[:, 0],  color='b', label='Dis 1: cf = .6, var = .2')
plot.scatter(k, data[:, 1],  color='r',  label='Dis 2: cf = .2, var = .1')
plot.scatter(k, data[:, 2],  color='g', label='Dis 3: cf = .1, var = .01')

Simple question here: I’m trying to get the size of my legend using matplotlib.pyplot to be smaller (i.e., the text to be smaller). The code I’m using goes something like this:

plot.scatter(k, sum_cf, color='black', label='Sum of Cause Fractions')
plot.scatter(k, data[:, 0],  color='b', label='Dis 1: cf = .6, var = .2')
plot.scatter(k, data[:, 1],  color='r',  label='Dis 2: cf = .2, var = .1')
plot.scatter(k, data[:, 2],  color='g', label='Dis 3: cf = .1, var = .01')

回答 0


plot.legend(loc=2, prop={'size': 6})



prop: [ None | FontProperties | dict ]
    A matplotlib.font_manager.FontProperties instance. If prop is a 
    dictionary, a new instance will be created with prop. If None, use
    rc settings.


You can set an individual font size for the legend by adjusting the prop keyword.

plot.legend(loc=2, prop={'size': 6})

This takes a dictionary of keywords corresponding to matplotlib.font_manager.FontProperties properties. See the documentation for legend:

Keyword arguments:

prop: [ None | FontProperties | dict ]
    A matplotlib.font_manager.FontProperties instance. If prop is a 
    dictionary, a new instance will be created with prop. If None, use
    rc settings.

It is also possible, as of version 1.2.1, to use the keyword fontsize.

回答 1


import pylab as plot
params = {'legend.fontsize': 20,
          'legend.handlelength': 2}




This should do

import pylab as plot
params = {'legend.fontsize': 20,
          'legend.handlelength': 2}

Then do the plot afterwards.

There are a ton of other rcParams, they can also be set in the matplotlibrc file.

Also presumably you can change it passing a matplotlib.font_manager.FontProperties instance but this I don’t know how to do. –> see Yann’s answer.

回答 2

使用 import matplotlib.pyplot as plt


plt.legend(fontsize=20) # using a size in points
plt.legend(fontsize="x-large") # using a named size


plt.rc('legend',fontsize=20) # using a size in points
plt.rc('legend',fontsize='medium') # using a named size



using import matplotlib.pyplot as plt

Method 1: specify the fontsize when calling legend (repetitive)

plt.legend(fontsize=20) # using a size in points
plt.legend(fontsize="x-large") # using a named size

With this method you can set the fontsize for each legend at creation (allowing you to have multiple legends with different fontsizes). However, you will have to type everything manually each time you create a legend.

(Note: @Mathias711 listed the available named fontsizes in his answer)

Method 2: specify the fontsize in rcParams (convenient)

plt.rc('legend',fontsize=20) # using a size in points
plt.rc('legend',fontsize='medium') # using a named size

With this method you set the default legend fontsize, and all legends will automatically use that unless you specify otherwise using method 1. This means you can set your legend fontsize at the beginning of your code, and not worry about setting it for each individual legend.

If you use a named size e.g. 'medium', then the legend text will scale with the global font.size in rcParams. To change font.size use plt.rc(font.size='medium')

回答 3




pyplot.legend(loc=2, fontsize = 'x-small')

There are also a few named fontsizes, apart from the size in points:



pyplot.legend(loc=2, fontsize = 'x-small')

回答 4


There are multiple settings for adjusting the legend size. The two I find most useful are:

  • labelspacing: which sets the spacing between label entries in multiples of the font size. For instance with a 10 point font, legend(..., labelspacing=0.2) will reduce the spacing between entries to 2 points. The default on my install is about 0.5.
  • prop: which allows full control of the font size, etc. You can set an 8 point font using legend(..., prop={'size':8}). The default on my install is about 14 points.

In addition, the legend documentation lists a number of other padding and spacing parameters including: borderpad, handlelength, handletextpad, borderaxespad, and columnspacing. These all follow the same form as labelspacing and area also in multiples of fontsize.

These values can also be set as the defaults for all figures using the matplotlibrc file.

回答 5








On my install, FontProperties only changes the text size, but it’s still too large and spaced out. I found a parameter in pyplot.rcParams: legend.labelspacing, which I’m guessing is set to a fraction of the font size. I’ve changed it with


I’m not sure how to specify it to the pyplot.legend function – passing




comes back with an error.

回答 6

plot.legend(loc = ‘lower right’, decimal_places = 2, fontsize = ’11’, title = ‘Hey there’, title_fontsize = ’20’)



I want to build a query for sunburnt(solr interface) using class inheritance and therefore adding key – value pairs together. The sunburnt interface takes keyword arguments. How can I transform a dict ({'type':'Event'}) into keyword arguments (type='Event')?

回答 0





Use the double-star (aka double-splat?) operator:


is equivalent to


回答 1

** 操作员在这里会有所帮助。


func(**{'type':'Event'}) 与…相同 func(type='Event') dict元素将转换为相同keyword arguments


* 将解压缩列表元素,它们将被视为 positional arguments

func(*['one', 'two']) 与…相同 func('one', 'two')

** operator would be helpful here.

** operator will unpack the dict elements and thus **{'type':'Event'} would be treated as type='Event'

func(**{'type':'Event'}) is same as func(type='Event') i.e the dict elements would be converted to the keyword arguments.


* will unpack the list elements and they would be treated as positional arguments.

func(*['one', 'two']) is same as func('one', 'two')

回答 2


>>> def f(x=2):
...     print(x)
>>> new_x = {'x': 4}
>>> f()        #    default value x=2
>>> f(x=3)     #   explicit value x=3
>>> f(**new_x) # dictionary value x=4 

Here is a complete example showing how to use the ** operator to pass values from a dictionary as keyword arguments.

>>> def f(x=2):
...     print(x)
>>> new_x = {'x': 4}
>>> f()        #    default value x=2
>>> f(x=3)     #   explicit value x=3
>>> f(**new_x) # dictionary value x=4 




pip3 install Django==1.8


I activated a virtualenv which has pip installed. I did

pip3 install Django==1.8

and Django successfully downloaded. Now, I want to open up the Django folder. Where is the folder located? Normally it would be in “downloads” but I’m not sure where it would be if I installed it using pip in a virtualenv.

回答 0


例如,我使用Python 2.7 创建了一个名为venv_test的测试virtualenv ,该文件夹位于中。djangovenv_test/lib/python2.7/site-packages/django

pip when used with virtualenv will generally install packages in the path <virtualenv_name>/lib/<python_ver>/site-packages.

For example, I created a test virtualenv named venv_test with Python 2.7, and the django folder is in venv_test/lib/python2.7/site-packages/django.

回答 1


pip show <package name>将提供Windows和macOS的位置,我猜是任何系统。:)


> pip show cvxopt
Name: cvxopt
Version: 1.2.0
Location: /usr/local/lib/python2.7/site-packages

By popular demand, an option provided via posted answer:

pip show <package name> will provide the location for Windows and macOS, and I’m guessing any system. :)

For example:

> pip show cvxopt
Name: cvxopt
Version: 1.2.0
Location: /usr/local/lib/python2.7/site-packages

回答 2

pip list -v可用于列出软件包的安装位置,该位置在https://pip.pypa.io/zh/stable/news/#b1-2018-03-31中引入

当列表命令与“ -v”选项一起运行时,显示安装位置。(#979)

>pip list -v
Package                  Version   Location                                                             Installer
------------------------ --------- -------------------------------------------------------------------- ---------
alabaster                0.7.12    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
apipkg                   1.5       c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
argcomplete              1.10.3    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
astroid                  2.3.3     c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip

pip list -v can be used to list packages’ install locations, introduced in https://pip.pypa.io/en/stable/news/#b1-2018-03-31

Show install locations when list command ran with “-v” option. (#979)

>pip list -v
Package                  Version   Location                                                             Installer
------------------------ --------- -------------------------------------------------------------------- ---------
alabaster                0.7.12    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
apipkg                   1.5       c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
argcomplete              1.10.3    c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip
astroid                  2.3.3     c:\users\me\appdata\local\programs\python\python38\lib\site-packages pip

Update: This feature is introduced in pip 10.0.0b1. On Ubuntu 18.04, pip or pip3 installed with sudo apt install python-pip or sudo apt install python3-pip is 9.0.1 which doesn’t have this feature. Check https://github.com/pypa/pip/issues/5599 for suitable ways of upgrading pip or pip3.

回答 3


在安装过程中使用virtualenv或–user将更改此默认位置。如果使用,请pip show确保使用的用户正确,否则pip可能看不到您所引用的软件包。

By default, on Linux, Pip installs packages to /usr/local/lib/python2.7/dist-packages.

Using virtualenv or –user during install will change this default location. If you use pip show make sure you are using the right user or else pip may not see the packages you are referencing.

回答 4


import site
site.getsitepackages() # list of global package locations

site.getusersitepackages() #string for user-specific package location




pip show如上一个答案所述,它规范化到所输出的相同路径:

$ readlink -f /usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages

参考:https : //docs.python.org/3/library/site.html#site.getsitepackages

In a Python interpreter or script, you can do

import site
site.getsitepackages() # list of global package locations


site.getusersitepackages() #string for user-specific package location

for locations 3rd party packages (those not in the core Python distribution) are installed to.

On my Brew-installed Python on MacOS, the former outputs


which canonicalizes to the same path output by pip show, as mentioned in a previous answer:

$ readlink -f /usr/local/Cellar/python/3.7.4/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages

Reference: https://docs.python.org/3/library/site.html#site.getsitepackages




% formatter something like '{:06}'
numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print formatter.format(number)



How do I format a floating number to a fixed width with the following requirements:

  1. Leading zero if n < 1
  2. Add trailing decimal zero(s) to fill up fixed width
  3. Truncate decimal digits past fixed width
  4. Align all decimal points

For example:

% formatter something like '{:06}'
numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:
    print formatter.format(number)

The output would be like


回答 0

for x in numbers:
    print "{:10.4f}".format(x)




  • 空字符串冒号前的手段“采取下一个提供参数format()” -在这种情况下,x作为唯一的参数。
  • 10.4f冒号之后的部分是格式规范
  • f表示定点表示法。
  • 10是该领域的总宽度被印刷,用空格lefted-填充。
  • 4是小数点后的位数。
for x in numbers:
    print "{:10.4f}".format(x)



The format specifier inside the curly braces follows the Python format string syntax. Specifically, in this case, it consists of the following parts:

  • The empty string before the colon means “take the next provided argument to format()” – in this case the x as the only argument.
  • The 10.4f part after the colon is the format specification.
  • The f denotes fixed-point notation.
  • The 10 is the total width of the field being printed, lefted-padded by spaces.
  • The 4 is the number of digits after the decimal point.

回答 1

自从这个答案问了已经好几年了,但是从Python 3.6(PEP498)开始,您可以使用新的f-strings

numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:



It has been a few years since this was answered, but as of Python 3.6 (PEP498) you could use the new f-strings:

numbers = [23.23, 0.123334987, 1, 4.223, 9887.2]

for number in numbers:



回答 2


>>> v=10.4
>>> print('% 6.2f' % v)
>>> print('% 12.1f' % v)
>>> print('%012.1f' % v)

In python3 the following works:

>>> v=10.4
>>> print('% 6.2f' % v)
>>> print('% 12.1f' % v)
>>> print('%012.1f' % v)

回答 3

请参阅Python 3.x 格式字符串语法

IDLE 3.5.1   
numbers = ['23.23', '.1233', '1', '4.223', '9887.2']

for x in numbers:  
    print('{0: >#016.4f}'. format(float(x)))  


See Python 3.x format string syntax:

IDLE 3.5.1   
numbers = ['23.23', '.1233', '1', '4.223', '9887.2']

for x in numbers:  
    print('{0: >#016.4f}'. format(float(x)))  


回答 4



因此,如果为number = 4.656,则输出为:00004.656


numbers  = [23.2300, 0.1233, 1.0000, 4.2230, 9887.2000]
for x in numbers: 



一个可能有用的示例是当您要按字母顺序正确列出文件名时。我注意到在某些linux系统中,数字是:1,10,11,.. 2,20,21,…


You can also left pad with zeros. For example if you want number to have 9 characters length, left padded with zeros use:


Thus, if number = 4.656, the output is: 00004.656

For your example the output will look like this:

numbers  = [23.2300, 0.1233, 1.0000, 4.2230, 9887.2000]
for x in numbers: 



One example where this may be useful is when you want to properly list filenames in alphabetical order. I noticed in some linux systems, the number is: 1,10,11,..2,20,21,…

Thus if you want to enforce the necessary numeric order in filenames, you need to left pad with the appropriate number of zeros.

回答 5

在Python 3中。

GPA = 2.5
print(" %6.1f " % GPA)


In Python 3.

GPA = 2.5
print(" %6.1f " % GPA)

6.1f means after the dots 1 digits show if you print 2 digits after the dots you should only %6.2f such that %6.3f 3 digits print after the point.

Python 3 ImportError:没有名为“ ConfigParser”的模块

问题:Python 3 ImportError:没有名为“ ConfigParser”的模块

我想pip installMySQL-python包,但我得到的ImportError

Jans-MacBook-Pro:~ jan$ /Library/Frameworks/Python.framework/Versions/3.3/bin/pip-3.3 install MySQL-python
Downloading/unpacking MySQL-python
  Running setup.py egg_info for package MySQL-python
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>
        from setup_posix import get_config
      File "./setup_posix.py", line 2, in <module>
        from ConfigParser import SafeConfigParser
    ImportError: No module named 'ConfigParser'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>

    from setup_posix import get_config

  File "./setup_posix.py", line 2, in <module>

    from ConfigParser import SafeConfigParser

ImportError: No module named 'ConfigParser'

Command python setup.py egg_info failed with error code 1 in /var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python
Storing complete log in /Users/jan/.pip/pip.log
Jans-MacBook-Pro:~ jan$ 


I am trying to pip install the MySQL-python package, but I get an ImportError.

Jans-MacBook-Pro:~ jan$ /Library/Frameworks/Python.framework/Versions/3.3/bin/pip-3.3 install MySQL-python
Downloading/unpacking MySQL-python
  Running setup.py egg_info for package MySQL-python
    Traceback (most recent call last):
      File "<string>", line 16, in <module>
      File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>
        from setup_posix import get_config
      File "./setup_posix.py", line 2, in <module>
        from ConfigParser import SafeConfigParser
    ImportError: No module named 'ConfigParser'
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):

  File "<string>", line 16, in <module>

  File "/var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python/setup.py", line 14, in <module>

    from setup_posix import get_config

  File "./setup_posix.py", line 2, in <module>

    from ConfigParser import SafeConfigParser

ImportError: No module named 'ConfigParser'

Command python setup.py egg_info failed with error code 1 in /var/folders/lf/myf7bjr57_jg7_5c4014bh640000gn/T/pip-build/MySQL-python
Storing complete log in /Users/jan/.pip/pip.log
Jans-MacBook-Pro:~ jan$ 

Any ideas?

回答 0

在Python 3中,ConfigParser已被重命名configparser为PEP 8合规性。您正在安装的软件包似乎不支持Python 3。

In Python 3, ConfigParser has been renamed to configparser for PEP 8 compliance. It looks like the package you are installing does not support Python 3.

回答 1

您可以改为使用该mysqlclient软件包作为MySQL-python的直接替代品。它是MySQL-python对Python 3的新增支持。


pip install mysqlclient

在我的python3.4 virtualenv之后

sudo apt-get install python3-dev libmysqlclient-dev

这显然是针对ubuntu / debian的,但我只是想分享我的成功:)

You can instead use the mysqlclient package as a drop-in replacement for MySQL-python. It is a fork of MySQL-python with added support for Python 3.

I had luck with simply

pip install mysqlclient

in my python3.4 virtualenv after

sudo apt-get install python3-dev libmysqlclient-dev

which is obviously specific to ubuntu/debian, but I just wanted to share my success :)

回答 2

这是一个在Python 2.x和3.x中均应适用的代码


    import configparser
    from six.moves import configparser

Here is a code that should work in both Python 2.x and 3.x

Obviously you will need the six module, but it’s almost impossible to write modules that work in both versions without six.

    import configparser
    from six.moves import configparser

回答 3

pip install configparser
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py


pip install configparser
sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py

Then try to install the MYSQL-python again. That Worked for me

回答 4


如果您正在fedora/centos/Red Hat安装以下软件包

  1. yum install python3-devel
  2. pip install mysqlclient

MySQL-python is not supported on python3 instead of this you can use mysqlclient

If you are on fedora/centos/Red Hat install following package

  1. yum install python3-devel
  2. pip install mysqlclient

回答 5


  1. yum install python34-devel.x86_64
  2. yum groupinstall -y 'development tools'
  3. pip3 install mysql-connector
  4. pip install mysqlclient

If you are using CentOS, then you need to use

  1. yum install python34-devel.x86_64
  2. yum groupinstall -y 'development tools'
  3. pip3 install mysql-connector
  4. pip install mysqlclient

回答 6

configparser可以通过six库简单地解决Python 2/3的兼容性

from six.moves import configparser

Compatibility of Python 2/3 for configparser can be solved simply by six library

from six.moves import configparser

回答 7

pip3 install PyMySQL然后再做pip3 install mysqlclient。为我工作

Do pip3 install PyMySQL and then pip3 install mysqlclient. Worked for me

回答 8

我遇到了同样的问题。原来,我需要在centos上安装python3 devel。首先,您需要搜索与系统兼容的软件包。

yum search python3 | grep devel


yum install -y python3-devel.x86_64


pip install mysqlclient

I was having the same problem. Turns out, I needed to install python3 devel on my centos. First, you need to search for the package that is compatible with your system.

yum search python3 | grep devel

Then, install the package as:

yum install -y python3-devel.x86_64

Then, install mysqlclient from pip

pip install mysqlclient

回答 9


pip install configparser sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py然后尝试再次安装MYSQL-python。对我有用


I got further with Valeres answer:

pip install configparser sudo cp /usr/lib/python3.6/configparser.py /usr/lib/python3.6/ConfigParser.py Then try to install the MYSQL-python again. That Worked for me

I would suggest to link the file instead of copy it. It is save to update. I linked the file to /usr/lib/python3/ directory.

回答 10


基本上它是重新安装/升级到最新版本的MySQL冲泡,然后安装mysqlclientMySQL-Pythonglobal pip3代替virtualenv pip3


Try this solution which worked fine for me.

Basically it’s to reinstall/upgrade to latest version of mysql from brew, and then installing mysqlclient or MySQL-Python from global pip3 instead of virtualenv pip3.

Then accessing the virtualenv and successfully install mysqlclient or MySQL-Python.

回答 11


import six
if six.PY2:
    import ConfigParser as configparser
    import configparser

how about checking the version of Python you are using first.

import six
if six.PY2:
    import ConfigParser as configparser
    import configparser

回答 12

我运行kali linux- Rolling,并在更新到python 3.6.0之后尝试在终端中运行cupp.py时遇到了这个问题。一些研究和试验后,我发现,改变 ConfigParserconfigparser我的工作,但那时,我发现另一个问题就来了。

config = configparser.configparser() AttributeError: module 'configparser' has no attribute 'configparser'

经过更多研究,我意识到python 3 ConfigParser已更改为, configparser但请注意它具有属性 ConfigParser()

I run kali linux- Rolling and I came across this problem ,when I tried running cupp.py in the terminal, after updating to python 3.6.0. After some research and trial I found that changing ConfigParser to configparser worked for me but then I came across another issue.

config = configparser.configparser() AttributeError: module 'configparser' has no attribute 'configparser'

After a bit more research I realised that for python 3 ConfigParser is changed to configparser but note that it has an attribute ConfigParser().

回答 13

  1. 通过链接为Mac OS安装了Connector / Python 8.0.20

  2. 将当前依赖项复制到requirements.txt文件,停用当前虚拟环境,并使用删除它;

    创建文件(如果尚未创建的话); touch requirements.txt

    将依赖项复制到文件; python -m pip3 freeze > requirements.txt

    停用并删除当前虚拟环境; deactivate && rm -rf <virtual-env-name>

  3. 创建另一个虚拟环境并使用激活它; python -m venv <virtual-env-name> && source <virtual-env-name>/bin/activate

  4. 使用安装以前的依赖项; python -m pip3 install -r requirements.txt

I was getting the same error on Mac OS 10, Python 3.7.6 & Django 2.2.7. I want to use this opportunity to share what worked for me after trying out numerous solutions.


  1. Installed Connector/Python 8.0.20 for Mac OS from link

  2. Copy current dependencies into requirements.txt file, deactivated the current virtual env, and deleted it using;

    create the file if not already created with; touch requirements.txt

    copy dependency to file; python -m pip3 freeze > requirements.txt

    deactivate and delete current virtual env; deactivate && rm -rf <virtual-env-name>

  3. Created another virtual env and activated it using; python -m venv <virtual-env-name> && source <virtual-env-name>/bin/activate

  4. Install previous dependencies using; python -m pip3 install -r requirements.txt

回答 14


如果它指向python3 or higher 更改为python2.7


我收到所有python软件包的安装错误。安倍·卡普拉斯(Abe Karplus)的解决方案和讨论向我暗示了可能是什么问题。然后我回想起我已经手动将/usr/bin/pythonfrom从更改python2.7/usr/bin/python3.5,这实际上是导致问题的原因。有一次我reverted一样。解决了。

Kindly to see what is /usr/bin/python pointing to

if it is pointing to python3 or higher change to python2.7

This should solve the issue.

I was getting install error for all the python packages. Abe Karplus’s solution & discussion gave me the hint as to what could be the problem. Then I recalled that I had manually changed the /usr/bin/python from python2.7 to /usr/bin/python3.5, which actually was causing the issue. Once I reverted the same. It got solved.

回答 15


cp /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py

This worked for me

cp /usr/local/lib/python3.5/configparser.py /usr/local/lib/python3.5/ConfigParser.py




for x in stuff:
    d[x.a][x.b] += x.c_int



for x in stuff:
    d[x.a,x.b] += x.c_int



Is there a way to have a defaultdict(defaultdict(int)) in order to make the following code work?

for x in stuff:
    d[x.a][x.b] += x.c_int

d needs to be built ad-hoc, depending on x.a and x.b elements.

I could use:

for x in stuff:
    d[x.a,x.b] += x.c_int

but then I wouldn’t be able to use:


回答 0


defaultdict(lambda: defaultdict(int))

当您尝试访问不存在的键时,将调用的参数defaultdict(在这种情况下为lambda: defaultdict(int))。它的返回值将设置为该密钥的新值,这意味着在我们的情况下,d[Key_doesnt_exist]将为defaultdict(int)


Yes like this:

defaultdict(lambda: defaultdict(int))

The argument of a defaultdict (in this case is lambda: defaultdict(int)) will be called when you try to access a key that doesn’t exist. The return value of it will be set as the new value of this key, which means in our case the value of d[Key_doesnt_exist] will be defaultdict(int).

If you try to access a key from this last defaultdict i.e. d[Key_doesnt_exist][Key_doesnt_exist] it will return 0, which is the return value of the argument of the last defaultdict i.e. int().

回答 1


>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]

从Python 2.7开始,使用Counter有了一个更好的解决方案

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})


>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]


The parameter to the defaultdict constructor is the function which will be called for building new elements. So let’s use a lambda !

>>> from collections import defaultdict
>>> d = defaultdict(lambda : defaultdict(int))
>>> print d[0]
defaultdict(<type 'int'>, {})
>>> print d[0]["x"]

Since Python 2.7, there’s an even better solution using Counter:

>>> from collections import Counter
>>> c = Counter()
>>> c["goodbye"]+=1
>>> c["and thank you"]=42
>>> c["for the fish"]-=5
>>> c
Counter({'and thank you': 42, 'goodbye': 1, 'for the fish': -5})

Some bonus features

>>> c.most_common()[:2]
[('and thank you', 42), ('goodbye', 1)]

For more information see PyMOTW – Collections – Container data types and Python Documentation – collections

回答 2


import functools
dd_int = functools.partial(defaultdict, int)


I find it slightly more elegant to use partial:

import functools
dd_int = functools.partial(defaultdict, int)

Of course, this is the same as a lambda.

回答 3


from collections import defaultdict
from functools import partial
from itertools import repeat

def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()


my_dict = nested_defaultdict(list, 3)

For reference, it’s possible to implement a generic nested defaultdict factory method through:

from collections import defaultdict
from functools import partial
from itertools import repeat

def nested_defaultdict(default_factory, depth=1):
    result = partial(defaultdict, default_factory)
    for _ in repeat(None, depth - 1):
        result = partial(defaultdict, result)
    return result()

The depth defines the number of nested dictionary before the type defined in default_factory is used. For example:

my_dict = nested_defaultdict(list, 3)

回答 4


def ddict():
    return defaultdict(ddict)


>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

Previous answers have addressed how to make a two-levels or n-levels defaultdict. In some cases you want an infinite one:

def ddict():
    return defaultdict(ddict)


>>> d = ddict()
>>> d[1]['a'][True] = 0.5
>>> d[1]['b'] = 3
>>> import pprint; pprint.pprint(d)
defaultdict(<function ddict at 0x7fcac68bf048>,
            {1: defaultdict(<function ddict at 0x7fcac68bf048>,
                            {'a': defaultdict(<function ddict at 0x7fcac68bf048>,
                                              {True: 0.5}),
                             'b': 3})})

回答 5


for x in stuff:
    d[x.a][x.b] += x.c_int


d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key


Others have answered correctly your question of how to get the following to work:

for x in stuff:
    d[x.a][x.b] += x.c_int

An alternative would be to use tuples for keys:

d = defaultdict(int)
for x in stuff:
    d[x.a,x.b] += x.c_int
    # ^^^^^^^ tuple key

The nice thing about this approach is that it is simple and can be easily expanded. If you need a mapping three levels deep, just use a three item tuple for the key.





I’ve read that it is possible to add a method to an existing object (i.e., not in the class definition) in Python.

I understand that it’s not always good to do so. But how might one do this?

回答 0


>>> def foo():
...     print "foo"
>>> class A:
...     def bar( self ):
...         print "bar"
>>> a = A()
>>> foo
<function foo at 0x00A98D70>
>>> a.bar
<bound method A.bar of <__main__.A instance at 0x00A9BC88>>



>>> def fooFighters( self ):
...     print "fooFighters"
>>> A.fooFighters = fooFighters
>>> a2 = A()
>>> a2.fooFighters
<bound method A.fooFighters of <__main__.A instance at 0x00A9BEB8>>
>>> a2.fooFighters()


>>> a.fooFighters()


>>> def barFighters( self ):
...     print "barFighters"
>>> a.barFighters = barFighters
>>> a.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: barFighters() takes exactly 1 argument (0 given)


>>> a.barFighters
<function barFighters at 0x00A98EF0>


>>> import types
>>> a.barFighters = types.MethodType( barFighters, a )
>>> a.barFighters
<bound method ?.barFighters of <__main__.A instance at 0x00A9BC88>>
>>> a.barFighters()


>>> a2.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'barFighters'

通过阅读有关描述符元类 编程的信息,可以找到更多信息。

In Python, there is a difference between functions and bound methods.

>>> def foo():
...     print "foo"
>>> class A:
...     def bar( self ):
...         print "bar"
>>> a = A()
>>> foo
<function foo at 0x00A98D70>
>>> a.bar
<bound method A.bar of <__main__.A instance at 0x00A9BC88>>

Bound methods have been “bound” (how descriptive) to an instance, and that instance will be passed as the first argument whenever the method is called.

Callables that are attributes of a class (as opposed to an instance) are still unbound, though, so you can modify the class definition whenever you want:

>>> def fooFighters( self ):
...     print "fooFighters"
>>> A.fooFighters = fooFighters
>>> a2 = A()
>>> a2.fooFighters
<bound method A.fooFighters of <__main__.A instance at 0x00A9BEB8>>
>>> a2.fooFighters()

Previously defined instances are updated as well (as long as they haven’t overridden the attribute themselves):

>>> a.fooFighters()

The problem comes when you want to attach a method to a single instance:

>>> def barFighters( self ):
...     print "barFighters"
>>> a.barFighters = barFighters
>>> a.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: barFighters() takes exactly 1 argument (0 given)

The function is not automatically bound when it’s attached directly to an instance:

>>> a.barFighters
<function barFighters at 0x00A98EF0>

To bind it, we can use the MethodType function in the types module:

>>> import types
>>> a.barFighters = types.MethodType( barFighters, a )
>>> a.barFighters
<bound method ?.barFighters of <__main__.A instance at 0x00A9BC88>>
>>> a.barFighters()

This time other instances of the class have not been affected:

>>> a2.barFighters()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: A instance has no attribute 'barFighters'

More information can be found by reading about descriptors and metaclass programming.

回答 1

自python 2.6起不推荐使用new模块,并在3.0版中将其删除,请使用类型



import types

class A(object):#but seems to work for old style objects too

def patch_me(target):
    def method(target,x):
        print "x=",x
        print "called from", target
    target.method = types.MethodType(method,target)
    #add more if needed

a = A()
print a
#out: <__main__.A object at 0x2b73ac88bfd0>  
patch_me(a)    #patch instance
#out: x= 5
#out: called from <__main__.A object at 0x2b73ac88bfd0>
A.method(6)        #can patch class too
#out: x= 6
#out: called from <class '__main__.A'>

Module new is deprecated since python 2.6 and removed in 3.0, use types

see http://docs.python.org/library/new.html

In the example below I’ve deliberately removed return value from patch_me() function. I think that giving return value may make one believe that patch returns a new object, which is not true – it modifies the incoming one. Probably this can facilitate a more disciplined use of monkeypatching.

import types

class A(object):#but seems to work for old style objects too

def patch_me(target):
    def method(target,x):
        print "x=",x
        print "called from", target
    target.method = types.MethodType(method,target)
    #add more if needed

a = A()
print a
#out: <__main__.A object at 0x2b73ac88bfd0>  
patch_me(a)    #patch instance
#out: x= 5
#out: called from <__main__.A object at 0x2b73ac88bfd0>
A.method(6)        #can patch class too
#out: x= 6
#out: called from <class '__main__.A'>

回答 2

前言-有关兼容性的说明:其他答案可能仅在Python 2中有效-此答案在Python 2和3中应该可以很好地工作。如果仅编写Python 3,则可能会显式地继承自object,但是代码应保持不变。







Foo.sample_method = sample_method




class Foo(object):
    '''An empty class to demonstrate adding a method to an instance'''


foo = Foo()


def sample_method(self, bar, baz):
    print(bar + baz)

方法零(0)-使用描述符方法, __get__


foo.sample_method = sample_method.__get__(foo)


>>> foo.sample_method(1,2)



import types


types.MethodType的参数签名为(function, instance, class)

foo.sample_method = types.MethodType(sample_method, foo, Foo)


>>> foo.sample_method(1,2)



def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn


>>> foo.sample_method = bind(foo, sample_method)    
>>> foo.sample_method(1,2)



>>> from functools import partial
>>> foo.sample_method = partial(sample_method, foo)
>>> foo.sample_method(1,2)




>>> foo.sample_method = sample_method
>>> foo.sample_method(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sample_method() takes exactly 3 arguments (2 given)


>>> foo.sample_method(foo, 1, 2)



Preface – a note on compatibility: other answers may only work in Python 2 – this answer should work perfectly well in Python 2 and 3. If writing Python 3 only, you might leave out explicitly inheriting from object, but otherwise the code should remain the same.

Adding a Method to an Existing Object Instance

I’ve read that it is possible to add a method to an existing object (e.g. not in the class definition) in Python.

I understand that it’s not always a good decision to do so. But, how might one do this?

Yes, it is possible – But not recommended

I don’t recommend this. This is a bad idea. Don’t do it.

Here’s a couple of reasons:

  • You’ll add a bound object to every instance you do this to. If you do this a lot, you’ll probably waste a lot of memory. Bound methods are typically only created for the short duration of their call, and they then cease to exist when automatically garbage collected. If you do this manually, you’ll have a name binding referencing the bound method – which will prevent its garbage collection on usage.
  • Object instances of a given type generally have its methods on all objects of that type. If you add methods elsewhere, some instances will have those methods and others will not. Programmers will not expect this, and you risk violating the rule of least surprise.
  • Since there are other really good reasons not to do this, you’ll additionally give yourself a poor reputation if you do it.

Thus, I suggest that you not do this unless you have a really good reason. It is far better to define the correct method in the class definition or less preferably to monkey-patch the class directly, like this:

Foo.sample_method = sample_method

Since it’s instructive, however, I’m going to show you some ways of doing this.

How it can be done

Here’s some setup code. We need a class definition. It could be imported, but it really doesn’t matter.

class Foo(object):
    '''An empty class to demonstrate adding a method to an instance'''

Create an instance:

foo = Foo()

Create a method to add to it:

def sample_method(self, bar, baz):
    print(bar + baz)

Method nought (0) – use the descriptor method, __get__

Dotted lookups on functions call the __get__ method of the function with the instance, binding the object to the method and thus creating a “bound method.”

foo.sample_method = sample_method.__get__(foo)

and now:

>>> foo.sample_method(1,2)

Method one – types.MethodType

First, import types, from which we’ll get the method constructor:

import types

Now we add the method to the instance. To do this, we require the MethodType constructor from the types module (which we imported above).

The argument signature for types.MethodType is (function, instance, class):

foo.sample_method = types.MethodType(sample_method, foo, Foo)

and usage:

>>> foo.sample_method(1,2)

Method two: lexical binding

First, we create a wrapper function that binds the method to the instance:

def bind(instance, method):
    def binding_scope_fn(*args, **kwargs): 
        return method(instance, *args, **kwargs)
    return binding_scope_fn


>>> foo.sample_method = bind(foo, sample_method)    
>>> foo.sample_method(1,2)

Method three: functools.partial

A partial function applies the first argument(s) to a function (and optionally keyword arguments), and can later be called with the remaining arguments (and overriding keyword arguments). Thus:

>>> from functools import partial
>>> foo.sample_method = partial(sample_method, foo)
>>> foo.sample_method(1,2)

This makes sense when you consider that bound methods are partial functions of the instance.

Unbound function as an object attribute – why this doesn’t work:

If we try to add the sample_method in the same way as we might add it to the class, it is unbound from the instance, and doesn’t take the implicit self as the first argument.

>>> foo.sample_method = sample_method
>>> foo.sample_method(1,2)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: sample_method() takes exactly 3 arguments (2 given)

We can make the unbound function work by explicitly passing the instance (or anything, since this method doesn’t actually use the self argument variable), but it would not be consistent with the expected signature of other instances (if we’re monkey-patching this instance):

>>> foo.sample_method(foo, 1, 2)


You now know several ways you could do this, but in all seriousness – don’t do this.

回答 3



class A(object):
    def m(self):


In [2]: A.m
Out[2]: <unbound method A.m>


In [5]: A.__dict__['m']
Out[5]: <function m at 0xa66b8b4>

I think that the above answers missed the key point.

Let’s have a class with a method:

class A(object):
    def m(self):

Now, let’s play with it in ipython:

In [2]: A.m
Out[2]: <unbound method A.m>

Ok, so m() somehow becomes an unbound method of A. But is it really like that?

In [5]: A.__dict__['m']
Out[5]: <function m at 0xa66b8b4>

It turns out that m() is just a function, reference to which is added to A class dictionary – there’s no magic. Then why A.m gives us an unbound method? It’s because the dot is not translated to a simple dictionary lookup. It’s de facto a call of A.__class__.__getattribute__(A, ‘m’):

In [11]: class MetaA(type):
   ....:     def __getattribute__(self, attr_name):
   ....:         print str(self), '-', attr_name

In [12]: class A(object):
   ....:     __metaclass__ = MetaA

In [23]: A.m
<class '__main__.A'> - m
<class '__main__.A'> - m

Now, I’m not sure out of the top of my head why the last line is printed twice, but still it’s clear what’s going on there.

Now, what the default __getattribute__ does is that it checks if the attribute is a so-called descriptor or not, i.e. if it implements a special __get__ method. If it implements that method, then what is returned is the result of calling that __get__ method. Going back to the first version of our A class, this is what we have:

In [28]: A.__dict__['m'].__get__(None, A)
Out[28]: <unbound method A.m>

And because Python functions implement the descriptor protocol, if they are called on behalf of an object, they bind themselves to that object in their __get__ method.

Ok, so how to add a method to an existing object? Assuming you don’t mind patching class, it’s as simple as:

B.m = m

Then B.m “becomes” an unbound method, thanks to the descriptor magic.

And if you want to add a method just to a single object, then you have to emulate the machinery yourself, by using types.MethodType:

b.m = types.MethodType(m, b)

By the way:

In [2]: A.m
Out[2]: <unbound method A.m>

In [59]: type(A.m)
Out[59]: <type 'instancemethod'>

In [60]: type(b.m)
Out[60]: <type 'instancemethod'>

In [61]: types.MethodType
Out[61]: <type 'instancemethod'>

回答 4

回答 5


def run(self):
    print self._instanceString

class A(object):
    def __init__(self):
        self._instanceString = "This is instance string"

a = A()
a.run = lambda: run(a)


This is instance string

You can use lambda to bind a method to an instance:

def run(self):
    print self._instanceString

class A(object):
    def __init__(self):
        self._instanceString = "This is instance string"

a = A()
a.run = lambda: run(a)


This is instance string

回答 6


>>> class A:
...  def m(self):
...   print 'im m, invoked with: ', self

>>> a = A()
>>> a.m()
im m, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.m
<bound method A.m of <__main__.A instance at 0x973ec6c>>
>>> def foo(firstargument):
...  print 'im foo, invoked with: ', firstargument

>>> foo
<function foo at 0x978548c>


>>> a.foo = foo.__get__(a, A) # or foo.__get__(a, type(a))
>>> a.foo()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo
<bound method A.foo of <__main__.A instance at 0x973ec6c>>


>>> instancemethod = type(A.m)
>>> instancemethod
<type 'instancemethod'>
>>> a.foo2 = instancemethod(foo, a, type(a))
>>> a.foo2()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo2
<bound method instance.foo of <__main__.A instance at 0x973ec6c>>


There are at least two ways for attach a method to an instance without types.MethodType:

>>> class A:
...  def m(self):
...   print 'im m, invoked with: ', self

>>> a = A()
>>> a.m()
im m, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.m
<bound method A.m of <__main__.A instance at 0x973ec6c>>
>>> def foo(firstargument):
...  print 'im foo, invoked with: ', firstargument

>>> foo
<function foo at 0x978548c>


>>> a.foo = foo.__get__(a, A) # or foo.__get__(a, type(a))
>>> a.foo()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo
<bound method A.foo of <__main__.A instance at 0x973ec6c>>


>>> instancemethod = type(A.m)
>>> instancemethod
<type 'instancemethod'>
>>> a.foo2 = instancemethod(foo, a, type(a))
>>> a.foo2()
im foo, invoked with:  <__main__.A instance at 0x973ec6c>
>>> a.foo2
<bound method instance.foo of <__main__.A instance at 0x973ec6c>>

Useful links:
Data model – invoking descriptors
Descriptor HowTo Guide – invoking descriptors

回答 7


>>> def printme(s): print repr(s)
>>> class A: pass
>>> setattr(A,'printme',printme)
>>> a = A()
>>> a.printme() # s becomes the implicit 'self' variable
< __ main __ . A instance at 0xABCDEFG>

What you’re looking for is setattr I believe. Use this to set an attribute on an object.

>>> def printme(s): print repr(s)
>>> class A: pass
>>> setattr(A,'printme',printme)
>>> a = A()
>>> a.printme() # s becomes the implicit 'self' variable
< __ main __ . A instance at 0xABCDEFG>

回答 8


a.methodname = function () { console.log("Yay, a new method!") }

Since this question asked for non-Python versions, here’s JavaScript:

a.methodname = function () { console.log("Yay, a new method!") }

回答 9

Consolidating Jason Pratt’s and the community wiki answers, with a look at the results of different methods of binding:

Especially note how adding the binding function as a class method works, but the referencing scope is incorrect.

#!/usr/bin/python -u
import types
import inspect

## dynamically adding methods to a unique instance of a class

# get a list of a class's method type attributes
def listattr(c):
    for m in [(n, v) for n, v in inspect.getmembers(c, inspect.ismethod) if isinstance(v,types.MethodType)]:
        print m[0], m[1]

# externally bind a function as a method of an instance of a class
def ADDMETHOD(c, method, name):
    c.__dict__[name] = types.MethodType(method, c)

class C():
    r = 10 # class attribute variable to test bound scope

    def __init__(self):

    #internally bind a function as a method of self's class -- note that this one has issues!
    def addmethod(self, method, name):
        self.__dict__[name] = types.MethodType( method, self.__class__ )

    # predfined function to compare with
    def f0(self, x):
        print 'f0\tx = %d\tr = %d' % ( x, self.r)

a = C() # created before modified instnace
b = C() # modified instnace

def f1(self, x): # bind internally
    print 'f1\tx = %d\tr = %d' % ( x, self.r )
def f2( self, x): # add to class instance's .__dict__ as method type
    print 'f2\tx = %d\tr = %d' % ( x, self.r )
def f3( self, x): # assign to class as method type
    print 'f3\tx = %d\tr = %d' % ( x, self.r )
def f4( self, x): # add to class instance's .__dict__ using a general function
    print 'f4\tx = %d\tr = %d' % ( x, self.r )

b.addmethod(f1, 'f1')
b.__dict__['f2'] = types.MethodType( f2, b)
b.f3 = types.MethodType( f3, b)
ADDMETHOD(b, f4, 'f4')

b.f0(0) # OUT: f0   x = 0   r = 10
b.f1(1) # OUT: f1   x = 1   r = 10
b.f2(2) # OUT: f2   x = 2   r = 10
b.f3(3) # OUT: f3   x = 3   r = 10
b.f4(4) # OUT: f4   x = 4   r = 10

k = 2
print 'changing b.r from {0} to {1}'.format(b.r, k)
b.r = k
print 'new b.r = {0}'.format(b.r)

b.f0(0) # OUT: f0   x = 0   r = 2
b.f1(1) # OUT: f1   x = 1   r = 10  !!!!!!!!!
b.f2(2) # OUT: f2   x = 2   r = 2
b.f3(3) # OUT: f3   x = 3   r = 2
b.f4(4) # OUT: f4   x = 4   r = 2

c = C() # created after modifying instance

# let's have a look at each instance's method type attributes
print '\nattributes of a:'
# OUT:
# attributes of a:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FD88>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FD88>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FD88>>

print '\nattributes of b:'
# OUT:
# attributes of b:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x000000000230FE08>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x000000000230FE08>>
# f0 <bound method C.f0 of <__main__.C instance at 0x000000000230FE08>>
# f1 <bound method ?.f1 of <class __main__.C at 0x000000000237AB28>>
# f2 <bound method ?.f2 of <__main__.C instance at 0x000000000230FE08>>
# f3 <bound method ?.f3 of <__main__.C instance at 0x000000000230FE08>>
# f4 <bound method ?.f4 of <__main__.C instance at 0x000000000230FE08>>

print '\nattributes of c:'
# OUT:
# attributes of c:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002313108>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002313108>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002313108>>

Personally, I prefer the external ADDMETHOD function route, as it allows me to dynamically assign new method names within an iterator as well.

def y(self, x):
d = C()
for i in range(1,5):
    ADDMETHOD(d, y, 'f%d' % i)
print '\nattributes of d:'
# OUT:
# attributes of d:
# __init__ <bound method C.__init__ of <__main__.C instance at 0x0000000002303508>>
# addmethod <bound method C.addmethod of <__main__.C instance at 0x0000000002303508>>
# f0 <bound method C.f0 of <__main__.C instance at 0x0000000002303508>>
# f1 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f2 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f3 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>
# f4 <bound method ?.y of <__main__.C instance at 0x0000000002303508>>

回答 10



回答 11



import gorilla
import guineapig
def needle():



If it can be of any help, I recently released a Python library named Gorilla to make the process of monkey patching more convenient.

Using a function needle() to patch a module named guineapig goes as follows:

import gorilla
import guineapig
def needle():

But it also takes care of more interesting use cases as shown in the FAQ from the documentation.

The code is available on GitHub.

回答 12


def binder (function, instance):
  copy_of_function = type (function) (function.func_code, {})
  copy_of_function.__bind_to__ = instance
  def bound_function (*args, **kwargs):
    return copy_of_function (copy_of_function.__bind_to__, *args, **kwargs)
  return bound_function

class SupaClass (object):
  def __init__ (self):
    self.supaAttribute = 42

def new_method (self):
  print self.supaAttribute

supaInstance = SupaClass ()
supaInstance.supMethod = binder (new_method, supaInstance)

otherInstance = SupaClass ()
otherInstance.supaAttribute = 72
otherInstance.supMethod = binder (new_method, otherInstance)

otherInstance.supMethod ()
supaInstance.supMethod ()



This question was opened years ago, but hey, there’s an easy way to simulate the binding of a function to a class instance using decorators:

def binder (function, instance):
  copy_of_function = type (function) (function.func_code, {})
  copy_of_function.__bind_to__ = instance
  def bound_function (*args, **kwargs):
    return copy_of_function (copy_of_function.__bind_to__, *args, **kwargs)
  return bound_function

class SupaClass (object):
  def __init__ (self):
    self.supaAttribute = 42

def new_method (self):
  print self.supaAttribute

supaInstance = SupaClass ()
supaInstance.supMethod = binder (new_method, supaInstance)

otherInstance = SupaClass ()
otherInstance.supaAttribute = 72
otherInstance.supMethod = binder (new_method, otherInstance)

otherInstance.supMethod ()
supaInstance.supMethod ()

There, when you pass the function and the instance to the binder decorator, it will create a new function, with the same code object as the first one. Then, the given instance of the class is stored in an attribute of the newly created function. The decorator return a (third) function calling automatically the copied function, giving the instance as the first parameter.

In conclusion you get a function simulating it’s binding to the class instance. Letting the original function unchanged.

回答 13

回答 14


def addmethod(obj, name, func):
    klass = obj.__class__
    subclass = type(klass.__name__, (klass,), {})
    setattr(subclass, name, func)
    obj.__class__ = subclass

I find it strange that nobody mentioned that all of the methods listed above creates a cycle reference between the added method and the instance, causing the object to be persistent till garbage collection. There was an old trick adding a descriptor by extending the class of the object:

def addmethod(obj, name, func):
    klass = obj.__class__
    subclass = type(klass.__name__, (klass,), {})
    setattr(subclass, name, func)
    obj.__class__ = subclass

回答 15

from types import MethodType

def method(self):
   print 'hi!'

setattr( targetObj, method.__name__, MethodType(method, targetObj, type(method)) )


from types import MethodType

def method(self):
   print 'hi!'

setattr( targetObj, method.__name__, MethodType(method, targetObj, type(method)) )

With this, you can use the self pointer




import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)


It seems like there should be a simpler way than:

import string
s = "string. With. Punctuation?" # Sample string 
out = s.translate(string.maketrans("",""), string.punctuation)

Is there?

回答 0


s.translate(None, string.punctuation)


s.translate(str.maketrans('', '', string.punctuation))



exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)



import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)


sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

From an efficiency perspective, you’re not going to beat

s.translate(None, string.punctuation)

For higher versions of Python use the following code:

s.translate(str.maketrans('', '', string.punctuation))

It’s performing raw string operations in C with a lookup table – there’s not much that will beat that but writing your own C code.

If speed isn’t a worry, another option though is:

exclude = set(string.punctuation)
s = ''.join(ch for ch in s if ch not in exclude)

This is faster than s.replace with each char, but won’t perform as well as non-pure python approaches such as regexes or string.translate, as you can see from the below timings. For this type of problem, doing it at as low a level as possible pays off.

Timing code:

import re, string, timeit

s = "string. With. Punctuation"
exclude = set(string.punctuation)
table = string.maketrans("","")
regex = re.compile('[%s]' % re.escape(string.punctuation))

def test_set(s):
    return ''.join(ch for ch in s if ch not in exclude)

def test_re(s):  # From Vinko's solution, with fix.
    return regex.sub('', s)

def test_trans(s):
    return s.translate(table, string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
    return s

print "sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000)
print "regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000)
print "translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000)
print "replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000)

This gives the following results:

sets      : 19.8566138744
regex     : 6.86155414581
translate : 2.12455511093
replace   : 28.4436721802

回答 1


import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

Regular expressions are simple enough, if you know them.

import re
s = "string. With. Punctuation?"
s = re.sub(r'[^\w\s]','',s)

回答 2

为了方便使用,我在Python 2和Python 3中总结了从字符串中删除标点符号的注意事项。有关详细说明,请参阅其他答案。

Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

For the convenience of usage, I sum up the note of striping punctuation from a string in both Python 2 and Python 3. Please refer to other answers for the detailed description.

Python 2

import string

s = "string. With. Punctuation?"
table = string.maketrans("","")
new_s = s.translate(table, string.punctuation)      # Output: string without punctuation

Python 3

import string

s = "string. With. Punctuation?"
table = str.maketrans(dict.fromkeys(string.punctuation))  # OR {key: None for key in string.punctuation}
new_s = s.translate(table)                          # Output: string without punctuation

回答 3

myString.translate(None, string.punctuation)
myString.translate(None, string.punctuation)

回答 4


>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
>>> s
'string With Punctuation'

I usually use something like this:

>>> s = "string. With. Punctuation?" # Sample string
>>> import string
>>> for c in string.punctuation:
...     s= s.replace(c,"")
>>> s
'string With Punctuation'

回答 5

string.punctuation是ASCII !一种更正确(但也慢得多)的方法是使用unicodedata模块:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s


''.join(ch for ch in s if category(ch)[0] not in 'SP')


string.punctuation is ASCII only! A more correct (but also much slower) way is to use the unicodedata module:

# -*- coding: utf-8 -*-
from unicodedata import category
s = u'String — with -  «punctation »...'
s = ''.join(ch for ch in s if category(ch)[0] != 'P')
print 'stripped', s

You can generalize and strip other types of characters as well:

''.join(ch for ch in s if category(ch)[0] not in 'SP')

It will also strip characters like ~*+§$ which may or may not be “punctuation” depending on one’s point of view.

回答 6


import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

Not necessarily simpler, but a different way, if you are more familiar with the re family.

import re, string
s = "string. With. Punctuation?" # Sample string 
out = re.sub('[%s]' % re.escape(string.punctuation), '', s)

回答 7

对于Python 3 str或Python 2 unicode值,str.translate()只需要一个字典;在该映射中查找代码点(整数),并None删除所有映射到的代码点。


import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))


要删除所有标点符号,而不仅仅是ASCII标点符号,您的表需要更大一些。参见JF Sebastian的答案(Python 3版本):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

For Python 3 str or Python 2 unicode values, str.translate() only takes a dictionary; codepoints (integers) are looked up in that mapping and anything mapped to None is removed.

To remove (some?) punctuation then, use:

import string

remove_punct_map = dict.fromkeys(map(ord, string.punctuation))

The dict.fromkeys() class method makes it trivial to create the mapping, setting all values to None based on the sequence of keys.

To remove all punctuation, not just ASCII punctuation, your table needs to be a little bigger; see J.F. Sebastian’s answer (Python 3 version):

import unicodedata
import sys

remove_punct_map = dict.fromkeys(i for i in range(sys.maxunicode)
                                 if unicodedata.category(chr(i)).startswith('P'))

回答 8


import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()


  • 删除所有Unicode标点符号
  • 它很容易修改,例如,\{S}如果要删除标点符号,则可以将其删除,但要保留诸如$
  • 您可以真正确定要保留的内容和要删除的内容,例如\{Pd}仅删除破折号。
  • 此正则表达式还规范了空格。它将制表符,回车符和其他奇数映射到漂亮的单个空格。


string.punctuation misses loads of punctuation marks that are commonly used in the real world. How about a solution that works for non-ASCII punctuation?

import regex
s = u"string. With. Some・Really Weird、Non?ASCII。 「(Punctuation)」?"
remove = regex.compile(ur'[\p{C}|\p{M}|\p{P}|\p{S}|\p{Z}]+', regex.UNICODE)
remove.sub(u" ", s).strip()

Personally, I believe this is the best way to remove punctuation from a string in Python because:

  • It removes all Unicode punctuation
  • It’s easily modifiable, e.g. you can remove the \{S} if you want to remove punctuation, but keep symbols like $.
  • You can get really specific about what you want to keep and what you want to remove, for example \{Pd} will only remove dashes.
  • This regex also normalizes whitespace. It maps tabs, carriage returns, and other oddities to nice, single spaces.

This uses Unicode character properties, which you can read more about on Wikipedia.

回答 9


import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

I haven’t seen this answer yet. Just use a regex; it removes all characters besides word characters (\w) and number characters (\d), followed by a whitespace character (\s):

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(ur'[^\w\d\s]+', '', s)

回答 10

这是Python 3.5的一线式:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

Here’s a one-liner for Python 3.5:

import string
"l*ots! o(f. p@u)n[c}t]u[a'ti\"on#$^?/".translate(str.maketrans({a:None for a in string.punctuation}))

回答 11


import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

This might not be the best solution however this is how I did it.

import string
f = lambda x: ''.join([i for i in x if i not in string.punctuation])

回答 12


def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

Here is a function I wrote. It’s not very efficient, but it is simple and you can add or remove any punctuation that you desire:

def stripPunc(wordList):
    """Strips punctuation from list of words"""
    puncList = [".",";",":","!","?","/","\\",",","#","@","$","&",")","(","\""]
    for punc in puncList:
        for word in wordList:
            wordList=[word.replace(punc,'') for word in wordList]
    return wordList

回答 13

import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)
import re
s = "string. With. Punctuation?" # Sample string 
out = re.sub(r'[^a-zA-Z0-9\s]', '', s)

回答 14

作为更新,我重写了Python 3中的@Brian示例并对其进行了更改,以将regex编译步骤移至函数内部。我的想法是计时使该功能起作用所需的每个步骤。也许您使用的是分布式计算,并且您的工作人员之间无法共享正则表达式对象,因此需要re.compile在每个工作人员中走一步。另外,我很好奇地为Python 3的maketrans的两种不同实现计时了

table = str.maketrans({key: None for key in string.punctuation})

table = str.maketrans('', '', string.punctuation)



import re, string, timeit

s = "string. With. Punctuation"

def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)

def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())

def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)

def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)

def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
    return s

print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))


sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

Just as an update, I rewrote the @Brian example in Python 3 and made changes to it to move regex compile step inside of the function. My thought here was to time every single step needed to make the function work. Perhaps you are using distributed computing and can’t have regex object shared between your workers and need to have re.compile step at each worker. Also, I was curious to time two different implementations of maketrans for Python 3

table = str.maketrans({key: None for key in string.punctuation})


table = str.maketrans('', '', string.punctuation)

Plus I added another method to use set, where I take advantage of intersection function to reduce number of iterations.

This is the complete code:

import re, string, timeit

s = "string. With. Punctuation"

def test_set(s):
    exclude = set(string.punctuation)
    return ''.join(ch for ch in s if ch not in exclude)

def test_set2(s):
    _punctuation = set(string.punctuation)
    for punct in set(s).intersection(_punctuation):
        s = s.replace(punct, ' ')
    return ' '.join(s.split())

def test_re(s):  # From Vinko's solution, with fix.
    regex = re.compile('[%s]' % re.escape(string.punctuation))
    return regex.sub('', s)

def test_trans(s):
    table = str.maketrans({key: None for key in string.punctuation})
    return s.translate(table)

def test_trans2(s):
    table = str.maketrans('', '', string.punctuation)

def test_repl(s):  # From S.Lott's solution
    for c in string.punctuation:
    return s

print("sets      :",timeit.Timer('f(s)', 'from __main__ import s,test_set as f').timeit(1000000))
print("sets2      :",timeit.Timer('f(s)', 'from __main__ import s,test_set2 as f').timeit(1000000))
print("regex     :",timeit.Timer('f(s)', 'from __main__ import s,test_re as f').timeit(1000000))
print("translate :",timeit.Timer('f(s)', 'from __main__ import s,test_trans as f').timeit(1000000))
print("translate2 :",timeit.Timer('f(s)', 'from __main__ import s,test_trans2 as f').timeit(1000000))
print("replace   :",timeit.Timer('f(s)', 'from __main__ import s,test_repl as f').timeit(1000000))

This is my results:

sets      : 3.1830138750374317
sets2      : 2.189873124472797
regex     : 7.142953420989215
translate : 4.243278483860195
translate2 : 2.427158243022859
replace   : 4.579746678471565

回答 15

>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)

['string', 'With', 'Punctuation']
>>> s = "string. With. Punctuation?"
>>> s = re.sub(r'[^\w\s]','',s)
>>> re.split(r'\s*', s)

['string', 'With', 'Punctuation']

回答 16


import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then
  • 用空格替换标点符号
  • 用单个空格替换单词之间的多个空格
  • 如果有strip(),请删除尾随空格

Here’s a solution without regex.

import string

input_text = "!where??and!!or$$then:)"
punctuation_replacer = string.maketrans(string.punctuation, ' '*len(string.punctuation))    
print ' '.join(input_text.translate(punctuation_replacer).split()).strip()

Output>> where and or then
  • Replaces the punctuations with spaces
  • Replace multiple spaces in between words with a single space
  • Remove the trailing spaces, if any with strip()

回答 17


''.join([c for c in s if c.isalnum() or c.isspace()])

A one-liner might be helpful in not very strict cases:

''.join([c for c in s if c.isalnum() or c.isspace()])

回答 18

#Storing all punctuations in a variable    
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
print "The string without punctuation is",newstring

word=raw_input("Enter string: ")
print "The string without punctuation is",newstring

#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage
#Storing all punctuations in a variable    
newstring='' #Creating empty string
word=raw_input("Enter string: ")
for i in word:
     if(i not in punctuation):
print "The string without punctuation is",newstring

word=raw_input("Enter string: ")
print "The string without punctuation is",newstring

#Output for both methods
Enter string: hello! welcome -to_python(programming.language)??,
The string without punctuation is: hello welcome topythonprogramminglanguage

回答 19

with open('one.txt','r')as myFile:



    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print ("____________")
with open('one.txt','r')as myFile:



    punctuation = ['(', ')', '?', ':', ';', ',', '.', '!', '/', '"', "'"] 

for i in punctuation:

        str1 = str1.replace(i," ") 
        myList.extend(str1.split(" "))
print (str1) 
for i in myList:

    print ("____________")

回答 20


 ''.join(filter(str.isalnum, s)) 


Why none of you use this?

 ''.join(filter(str.isalnum, s)) 

Too slow?

回答 21


from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

Considering unicode. Code checked in python3.

from unicodedata import category
text = 'hi, how are you?'
text_without_punc = ''.join(ch for ch in text if not category(ch).startswith('P'))

回答 22



with open('one.txt','r')as myFile:


    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"


    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")


Remove stop words from the text file using Python


with open('one.txt','r')as myFile:


    stop_words ="not", "is", "it", "By","between","This","By","A","when","And","up","Then","was","by","It","If","can","an","he","This","or","And","a","i","it","am","at","on","in","of","to","is","so","too","my","the","and","but","are","very","here","even","from","them","then","than","this","that","though","be","But","these"


    myList.extend(str1.split(" "))

    for i in myList:

        if i not in stop_words:

            print ("____________")


回答 23


def scrub(abc):
    while abc[-1] is in list(string.punctuation):
    while abc[0] is in list(string.punctuation):
    return abc

I like to use a function like this:

def scrub(abc):
    while abc[-1] is in list(string.punctuation):
    while abc[0] is in list(string.punctuation):
    return abc



我看过文档,也看过这些 答案,但仍然发现自己无法解释这三者之间的区别。在我看来,它们在很大程度上似乎是可互换的,因为它们处于切片的较低级别。




Can someone explain how these three methods of slicing are different?
I’ve seen the docs, and I’ve seen these answers, but I still find myself unable to explain how the three are different. To me, they seem interchangeable in large part, because they are at the lower levels of slicing.

For example, say we want to get the first five rows of a DataFrame. How is it that all three of these work?


Can someone present three cases where the distinction in uses are clearer?

回答 0

注意:在熊猫版本0.20.0及更高版本中,ix弃用,建议改为使用lociloc。我留下了ix完整的答案部分,以供早期版本的熊猫用户参考。下面添加了示例,显示了的替代方案 ix


  • loc从索引中获取带有特定标签的行(或列)。
  • iloc在索引中的特定位置获取行(或列)(因此仅获取整数)。
  • ix通常会尝试表现得像,lociloc如果索引中没有标签,则会回落为行为。


  • 如果索引是整数类型,ix则将仅使用基于标签的索引,而不会使用基于位置的索引。如果标签不在索引中,则会引发错误。

  • 如果指数不包含唯一整数,然后给出一个整数,ix将立即使用基于位置的索引,而不是基于标签的索引。但是,如果ix给定其他类型(例如字符串),则可以使用基于标签的索引。


>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN



>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

注意s.ix[:3]s.loc[:3]由于它首先查找标签,而不是在位置上工作(因此,其索引为s整数类型),因此Notification 返回相同的Series 。


此处s.iloc[:6]按预期返回Series的前6行。但是,s.loc[:6]由于6不在索引中,所以引发KeyError 。

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6



>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN


>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN




例如,考虑以下DataFrame。如何最好地将行切成“ c” 包括前四列?

>>> df = pd.DataFrame(np.nan, 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

在早期版本的pandas(0.20.0之前)中ix,您可以整齐地进行此操作-我们可以按标签对行进行切片,按位置对列进行切片(请注意,对于列,ix由于4不是列名,因此默认为基于位置的切片 ):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN


>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN



Note: in pandas version 0.20.0 and above, ix is deprecated and the use of loc and iloc is encouraged instead. I have left the parts of this answer that describe ix intact as a reference for users of earlier versions of pandas. Examples have been added below showing alternatives to ix.

First, here’s a recap of the three methods:

  • loc gets rows (or columns) with particular labels from the index.
  • iloc gets rows (or columns) at particular positions in the index (so it only takes integers).
  • ix usually tries to behave like loc but falls back to behaving like iloc if a label is not present in the index.

It’s important to note some subtleties that can make ix slightly tricky to use:

  • if the index is of integer type, ix will only use label-based indexing and not fall back to position-based indexing. If the label is not in the index, an error is raised.

  • if the index does not contain only integers, then given an integer, ix will immediately use position-based indexing rather than label-based indexing. If however ix is given another type (e.g. a string), it can use label-based indexing.

To illustrate the differences between the three methods, consider the following Series:

>>> s = pd.Series(np.nan, index=[49,48,47,46,45, 1, 2, 3, 4, 5])
>>> s
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN
4    NaN
5    NaN

We’ll look at slicing with the integer value 3.

In this case, s.iloc[:3] returns us the first 3 rows (since it treats 3 as a position) and s.loc[:3] returns us the first 8 rows (since it treats 3 as a label):

>>> s.iloc[:3] # slice the first three rows
49   NaN
48   NaN
47   NaN

>>> s.loc[:3] # slice up to and including label 3
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

>>> s.ix[:3] # the integer is in the index so s.ix[:3] works like loc
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN
2    NaN
3    NaN

Notice s.ix[:3] returns the same Series as s.loc[:3] since it looks for the label first rather than working on the position (and the index for s is of integer type).

What if we try with an integer label that isn’t in the index (say 6)?

Here s.iloc[:6] returns the first 6 rows of the Series as expected. However, s.loc[:6] raises a KeyError since 6 is not in the index.

>>> s.iloc[:6]
49   NaN
48   NaN
47   NaN
46   NaN
45   NaN
1    NaN

>>> s.loc[:6]
KeyError: 6

>>> s.ix[:6]
KeyError: 6

As per the subtleties noted above, s.ix[:6] now raises a KeyError because it tries to work like loc but can’t find a 6 in the index. Because our index is of integer type ix doesn’t fall back to behaving like iloc.

If, however, our index was of mixed type, given an integer ix would behave like iloc immediately instead of raising a KeyError:

>>> s2 = pd.Series(np.nan, index=['a','b','c','d','e', 1, 2, 3, 4, 5])
>>> s2.index.is_mixed() # index is mix of different types
>>> s2.ix[:6] # now behaves like iloc given integer
a   NaN
b   NaN
c   NaN
d   NaN
e   NaN
1   NaN

Keep in mind that ix can still accept non-integers and behave like loc:

>>> s2.ix[:'c'] # behaves like loc given non-integer
a   NaN
b   NaN
c   NaN

As general advice, if you’re only indexing using labels, or only indexing using integer positions, stick with loc or iloc to avoid unexpected results – try not use ix.

Combining position-based and label-based indexing

Sometimes given a DataFrame, you will want to mix label and positional indexing methods for the rows and columns.

For example, consider the following DataFrame. How best to slice the rows up to and including ‘c’ and take the first four columns?

>>> df = pd.DataFrame(np.nan, 
                      columns=['x','y','z', 8, 9])
>>> df
    x   y   z   8   9
a NaN NaN NaN NaN NaN
b NaN NaN NaN NaN NaN
c NaN NaN NaN NaN NaN
d NaN NaN NaN NaN NaN
e NaN NaN NaN NaN NaN

In earlier versions of pandas (before 0.20.0) ix lets you do this quite neatly – we can slice the rows by label and the columns by position (note that for the columns, ix will default to position-based slicing since 4 is not a column name):

>>> df.ix[:'c', :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

In later versions of pandas, we can achieve this result using iloc and the help of another method:

>>> df.iloc[:df.index.get_loc('c') + 1, :4]
    x   y   z   8
a NaN NaN NaN NaN
b NaN NaN NaN NaN
c NaN NaN NaN NaN

get_loc() is an index method meaning “get the position of the label in this index”. Note that since slicing with iloc is exclusive of its endpoint, we must add 1 to this value if we want row ‘c’ as well.

There are further examples in pandas’ documentation here.

回答 1






df.iloc[:, 2]    # the : in the first position indicates all rows


df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)


df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])


df.loc['a']     # equivalent to df.iloc[0]


df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

等等。现在,可能值得指出的是,a的默认行和列索引DataFrame是从0开始的整数,在这种情况下iloc,它们的loc工作方式相同。这就是为什么您的三个示例是等效的。如果您有非数字索引(例如字符串或日期时间), df.loc[:5] 则会引发错误。


df['time']    # equivalent to df.loc[:, 'time']


df.ix[:2, 'time']    # the first two rows of the 'time' column


 b = [True, False, True]


df.loc[b, 'name'] = 'Mary', 'John'

iloc works based on integer positioning. So no matter what your row labels are, you can always, e.g., get the first row by doing


or the last five rows by doing


You can also use it on the columns. This retrieves the 3rd column:

df.iloc[:, 2]    # the : in the first position indicates all rows

You can combine them to get intersections of rows and columns:

df.iloc[:3, :3] # The upper-left 3 X 3 entries (assuming df has 3+ rows and columns)

On the other hand, .loc use named indices. Let’s set up a data frame with strings as row and column labels:

df = pd.DataFrame(index=['a', 'b', 'c'], columns=['time', 'date', 'name'])

Then we can get the first row by

df.loc['a']     # equivalent to df.iloc[0]

and the second two rows of the 'date' column by

df.loc['b':, 'date']   # equivalent to df.iloc[1:, 1]

and so on. Now, it’s probably worth pointing out that the default row and column indices for a DataFrame are integers from 0 and in this case iloc and loc would work in the same way. This is why your three examples are equivalent. If you had a non-numeric index such as strings or datetimes, df.loc[:5] would raise an error.

Also, you can do column retrieval just by using the data frame’s __getitem__:

df['time']    # equivalent to df.loc[:, 'time']

Now suppose you want to mix position and named indexing, that is, indexing using names on rows and positions on columns (to clarify, I mean select from our data frame, rather than creating a data frame with strings in the row index and integers in the column index). This is where .ix comes in:

df.ix[:2, 'time']    # the first two rows of the 'time' column

I think it’s also worth mentioning that you can pass boolean vectors to the loc method as well. For example:

 b = [True, False, True]

Will return the 1st and 3rd rows of df. This is equivalent to df[b] for selection, but it can also be used for assigning via boolean vectors:

df.loc[b, 'name'] = 'Mary', 'John'

回答 2






df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])





您可以使用三种不同的输入 .loc

  • 一串
  • 字符串列表
  • 使用字符串作为起始值和终止值的切片符号





age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object


df.loc[['Cornelia', 'Jane', 'Dean']]








您可以使用三种不同的输入 .iloc

  • 一个整数
  • 整数列表
  • 使用整数作为起始值和终止值的切片符号




age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object


df.iloc[[2, -2]]







df.loc[['Jane', 'Dean'], 'height':]



df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object




col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 


labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]



df.loc[df['age'] > 30, ['food', 'score']] 

您可以使用复制它,.iloc但是不能将其传递为布尔系列。您必须将boolean Series转换为numpy数组,如下所示:

df.iloc[(df['age'] > 30).values, [2, 4]] 



df.loc[:, 'color':'score':2]




Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object


df[['food', 'score']]


df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location


df[3:5, 'color']
TypeError: unhashable type: 'slice'

In my opinion, the accepted answer is confusing, since it uses a DataFrame with only missing values. I also do not like the term position-based for .iloc and instead, prefer integer location as it is much more descriptive and exactly what .iloc stands for. The key word is INTEGER – .iloc needs INTEGERS.

See my extremely detailed blog series on subset selection for more

.ix is deprecated and ambiguous and should never be used

Because .ix is deprecated we will only focus on the differences between .loc and .iloc.

Before we talk about the differences, it is important to understand that DataFrames have labels that help identify each column and each index. Let’s take a look at a sample DataFrame:

df = pd.DataFrame({'age':[30, 2, 12, 4, 32, 33, 69],
                   'color':['blue', 'green', 'red', 'white', 'gray', 'black', 'red'],
                   'food':['Steak', 'Lamb', 'Mango', 'Apple', 'Cheese', 'Melon', 'Beans'],
                   'height':[165, 70, 120, 80, 180, 172, 150],
                   'score':[4.6, 8.3, 9.0, 3.3, 1.8, 9.5, 2.2],
                   'state':['NY', 'TX', 'FL', 'AL', 'AK', 'TX', 'TX']
                  index=['Jane', 'Nick', 'Aaron', 'Penelope', 'Dean', 'Christina', 'Cornelia'])

All the words in bold are the labels. The labels, age, color, food, height, score and state are used for the columns. The other labels, Jane, Nick, Aaron, Penelope, Dean, Christina, Cornelia are used for the index.

The primary ways to select particular rows in a DataFrame are with the .loc and .iloc indexers. Each of these indexers can also be used to simultaneously select columns but it is easier to just focus on rows for now. Also, each of the indexers use a set of brackets that immediately follow their name to make their selections.

.loc selects data only by labels

We will first talk about the .loc indexer which only selects data by the index or column labels. In our sample DataFrame, we have provided meaningful names as values for the index. Many DataFrames will not have any meaningful names and will instead, default to just the integers from 0 to n-1, where n is the length of the DataFrame.

There are three different inputs you can use for .loc

  • A string
  • A list of strings
  • Slice notation using strings as the start and stop values

Selecting a single row with .loc with a string

To select a single row of data, place the index label inside of the brackets following .loc.


This returns the row of data as a Series

age           4
color     white
food      Apple
height       80
score       3.3
state        AL
Name: Penelope, dtype: object

Selecting multiple rows with .loc with a list of strings

df.loc[['Cornelia', 'Jane', 'Dean']]

This returns a DataFrame with the rows in the order specified in the list:

Selecting multiple rows with .loc with slice notation

Slice notation is defined by a start, stop and step values. When slicing by label, pandas includes the stop value in the return. The following slices from Aaron to Dean, inclusive. Its step size is not explicitly defined but defaulted to 1.


Complex slices can be taken in the same manner as Python lists.

.iloc selects data only by integer location

Let’s now turn to .iloc. Every row and column of data in a DataFrame has an integer location that defines it. This is in addition to the label that is visually displayed in the output. The integer location is simply the number of rows/columns from the top/left beginning at 0.

There are three different inputs you can use for .iloc

  • An integer
  • A list of integers
  • Slice notation using integers as the start and stop values

Selecting a single row with .iloc with an integer


This returns the 5th row (integer location 4) as a Series

age           32
color       gray
food      Cheese
height       180
score        1.8
state         AK
Name: Dean, dtype: object

Selecting multiple rows with .iloc with a list of integers

df.iloc[[2, -2]]

This returns a DataFrame of the third and second to last rows:

Selecting multiple rows with .iloc with slice notation


Simultaneous selection of rows and columns with .loc and .iloc

One excellent ability of both .loc/.iloc is their ability to select both rows and columns simultaneously. In the examples above, all the columns were returned from each selection. We can choose columns with the same types of inputs as we do for rows. We simply need to separate the row and column selection with a comma.

For example, we can select rows Jane, and Dean with just the columns height, score and state like this:

df.loc[['Jane', 'Dean'], 'height':]

This uses a list of labels for the rows and slice notation for the columns

We can naturally do similar operations with .iloc using only integers.

df.iloc[[1,4], 2]
Nick      Lamb
Dean    Cheese
Name: food, dtype: object

Simultaneous selection with labels and integer location

.ix was used to make selections simultaneously with labels and integer location which was useful but confusing and ambiguous at times and thankfully it has been deprecated. In the event that you need to make a selection with a mix of labels and integer locations, you will have to make both your selections labels or integer locations.

For instance, if we want to select rows Nick and Cornelia along with columns 2 and 4, we could use .loc by converting the integers to labels with the following:

col_names = df.columns[[2, 4]]
df.loc[['Nick', 'Cornelia'], col_names] 

Or alternatively, convert the index labels to integers with the get_loc index method.

labels = ['Nick', 'Cornelia']
index_ints = [df.index.get_loc(label) for label in labels]
df.iloc[index_ints, [2, 4]]

Boolean Selection

The .loc indexer can also do boolean selection. For instance, if we are interested in finding all the rows wher age is above 30 and return just the food and score columns we can do the following:

df.loc[df['age'] > 30, ['food', 'score']] 

You can replicate this with .iloc but you cannot pass it a boolean series. You must convert the boolean Series into a numpy array like this:

df.iloc[(df['age'] > 30).values, [2, 4]] 

Selecting all rows

It is possible to use .loc/.iloc for just column selection. You can select all the rows by using a colon like this:

df.loc[:, 'color':'score':2]

The indexing operator, [], can select rows and columns too but not simultaneously.

Most people are familiar with the primary purpose of the DataFrame indexing operator, which is to select columns. A string selects a single column as a Series and a list of strings selects multiple columns as a DataFrame.


Jane          Steak
Nick           Lamb
Aaron         Mango
Penelope      Apple
Dean         Cheese
Christina     Melon
Cornelia      Beans
Name: food, dtype: object

Using a list selects multiple columns

df[['food', 'score']]

What people are less familiar with, is that, when slice notation is used, then selection happens by row labels or by integer location. This is very confusing and something that I almost never use but it does work.

df['Penelope':'Christina'] # slice rows by label

df[2:6:2] # slice rows by integer location

The explicitness of .loc/.iloc for selecting rows is highly preferred. The indexing operator alone is unable to select rows and columns simultaneously.

df[3:5, 'color']
TypeError: unhashable type: 'slice'