Python 实用宝典

Question 1

I have been playing with Numpy and matplotlib in the last few days. I am having problems trying to make matplotlib plot a function without blocking execution. I know there are already many threads here on SO asking similar questions, and I ‘ve googled quite a lot but haven’t managed to make this work.

I have tried using show(block=False) as some people suggest, but all I get is a frozen window. If I simply call show(), the result is plotted properly but execution is blocked until the window is closed. From other threads I ‘ve read, I suspect that whether show(block=False) works or not depends on the backend. Is this correct? My back end is Qt4Agg. Could you have a look at my code and tell me if you see something wrong? Here is my code. Thanks for any help.

from math import *
from matplotlib import pyplot as plt
print plt.get_backend()



def main():
    x = range(-50, 51, 1)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4

        y = [Xi**pow for Xi in x]
        print y

        plt.plot(x, y)
        plt.draw()
        #plt.show()             #this plots correctly, but blocks execution.
        plt.show(block=False)   #this creates an empty frozen window.
        _ = raw_input("Press [enter] to continue.")


if __name__ == '__main__':
    main()

PS. I forgot to say that I would like to update the existing window every time I plot something, instead of creating a new one.

Question 2

I spent a long time looking for solutions, and found this answer.

It looks like, in order to get what you (and I) want, you need the combination of plt.ion(), plt.show() (not with block=False) and, most importantly, plt.pause(.001) (or whatever time you want). The pause is needed because the GUI events happen while the main code is sleeping, including drawing. It’s possible that this is implemented by picking up time from a sleeping thread, so maybe IDEs mess with that—I don’t know.

Here’s an implementation that works for me on python 3.5:

import numpy as np
from matplotlib import pyplot as plt

def main():
    plt.axis([-50,50,0,10000])
    plt.ion()
    plt.show()

    x = np.arange(-50, 51)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4
        y = [Xi**pow for Xi in x]
        plt.plot(x, y)
        plt.draw()
        plt.pause(0.001)
        input("Press [enter] to continue.")

if __name__ == '__main__':
    main()

Question 3

A simple trick that works for me is the following:

Use the block = False argument inside show: plt.show(block = False)
Use another plt.show() at the end of the .py script.

Example:

import matplotlib.pyplot as plt

plt.imshow(add_something)
plt.xlabel("x")
plt.ylabel("y")

plt.show(block=False)

#more code here (e.g. do calculations and use print to see them on the screen

plt.show()

Note: plt.show() is the last line of my script.

Question 4

You can avoid blocking execution by writing the plot to an array, then displaying the array in a different thread. Here is an example of generating and displaying plots simultaneously using pf.screen from pyformulas 0.2.8:

import pyformulas as pf
import matplotlib.pyplot as plt
import numpy as np
import time

fig = plt.figure()

canvas = np.zeros((480,640))
screen = pf.screen(canvas, 'Sinusoid')

start = time.time()
while True:
    now = time.time() - start

    x = np.linspace(now-2, now, 100)
    y = np.sin(2*np.pi*x) + np.sin(3*np.pi*x)
    plt.xlim(now-2,now+1)
    plt.ylim(-3,3)
    plt.plot(x, y, c='black')

    # If we haven't already shown or saved the plot, then we need to draw the figure first...
    fig.canvas.draw()

    image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8, sep='')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

    screen.update(image)

#screen.close()

Result:

Disclaimer: I’m the maintainer for pyformulas.

Reference: Matplotlib: save plot to numpy array

Question 5

A lot of these answers are super inflated and from what I can find, the answer isn’t all that difficult to understand.

You can use plt.ion() if you want, but I found using plt.draw() just as effective

For my specific project I’m plotting images, but you can use plot() or scatter() or whatever instead of figimage(), it doesn’t matter.

plt.figimage(image_to_show)
plt.draw()
plt.pause(0.001)

Or

fig = plt.figure()
...
fig.figimage(image_to_show)
fig.canvas.draw()
plt.pause(0.001)

If you’re using an actual figure.
I used @krs013, and @Default Picture’s answers to figure this out
Hopefully this saves someone from having launch every single figure on a separate thread, or from having to read these novels just to figure this out

Question 6

Live Plotting

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
# plt.axis([x[0], x[-1], -1, 1])      # disable autoscaling
for point in x:
    plt.plot(point, np.sin(2 * point), '.', color='b')
    plt.draw()
    plt.pause(0.01)
# plt.clf()                           # clear the current figure

if the amount of data is too much you can lower the update rate with a simple counter

cnt += 1
if (cnt == 10):       # update plot each 10 points
    plt.draw()
    plt.pause(0.01)
    cnt = 0

Holding Plot after Program Exit

This was my actual problem that couldn’t find satisfactory answer for, I wanted plotting that didn’t close after the script was finished (like MATLAB),

If you think about it, after the script is finished, the program is terminated and there is no logical way to hold the plot this way, so there are two options

block the script from exiting (that’s plt.show() and not what I want)
run the plot on a separate thread (too complicated)

this wasn’t satisfactory for me so I found another solution outside of the box

SaveToFile and View in external viewer

For this the saving and viewing should be both fast and the viewer shouldn’t lock the file and should update the content automatically

Selecting Format for Saving

vector based formats are both small and fast

SVG is good but coudn’t find good viewer for it except the web browser which by default needs manual refresh
PDF can support vector formats and there are lightweight viewers which support live updating

Fast Lightweight Viewer with Live Update

For PDF there are several good options

On Windows I use SumatraPDF which is free, fast and light (only uses 1.8MB RAM for my case)
On Linux there are several options such as Evince (GNOME) and Ocular (KDE)

Sample Code & Results

Sample code for outputing plot to a file

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(2 * x)
plt.plot(x, y)
plt.savefig("fig.pdf")

after first run, open the output file in one of the viewers mentioned above and enjoy.

Here is a screenshot of VSCode alongside SumatraPDF, also the process is fast enough to get semi-live update rate (I can get near 10Hz on my setup just use time.sleep() between intervals)

Question 7

Iggy’s answer was the easiest for me to follow, but I got the following error when doing a subsequent subplot command that was not there when I was just doing show:

MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

In order to avoid this error, it helps to close (or clear) the plot after the user hits enter.

Here’s the code that worked for me:

def plt_show():
    '''Text-blocking version of plt.show()
    Use this instead of plt.show()'''
    plt.draw()
    plt.pause(0.001)
    input("Press enter to continue...")
    plt.close()

Question 8

The Python package drawnow allows to update a plot in real time in a non blocking way.
It also works with a webcam and OpenCV for example to plot measures for each frame.
See the original post.

Question 9

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series don’t always match.

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()

In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013 However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match when I try to plot:

fig, ax = plt.subplots()    
ax.bar(idx.to_pydatetime(), s, color='green')

What’s the proper way to tackle this? Do I want to remove dates with no values from IDX or (which I’d rather do) is add to the series the missing date with a count of 0. I’d rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic reindex function?

Here’s a snippet of S ( df.groupby(['simpleDate']).size() ), notice no entries for 04 and 05.

09-02-2013     2
09-03-2013    10
09-06-2013     5
09-07-2013     1

Question 10

You could use Series.reindex:

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

yields

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

Question 11

A quicker workaround is to use .asfreq(). This doesn’t require creation of a new index to call within .reindex().

# "broken" (staggered) dates
dates = pd.Index([pd.Timestamp('2012-05-01'), 
                  pd.Timestamp('2012-05-04'), 
                  pd.Timestamp('2012-05-06')])
s = pd.Series([1, 2, 3], dates)

print(s.asfreq('D'))
2012-05-01    1.0
2012-05-02    NaN
2012-05-03    NaN
2012-05-04    2.0
2012-05-05    NaN
2012-05-06    3.0
Freq: D, dtype: float64

Question 12

One issue is that reindex will fail if there are duplicate values. Say we’re working with timestamped data, which we want to index by date:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),
    'values':['a','b','c','d']})
df.index = pd.DatetimeIndex(df['timestamps']).floor('D')
df

yields

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-18  "2016-11-18 04:00:00"  d

Due to the duplicate 2016-11-16 date, an attempt to reindex:

all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
df.reindex(all_days)

fails with:

...
ValueError: cannot reindex from a duplicate axis

(by this it means the index has duplicates, not that it is itself a dup)

Instead, we can use .loc to look up entries for all dates in range:

df.loc[all_days]

yields

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-17  NaN                    NaN
2016-11-18  "2016-11-18 04:00:00"  d

fillna can be used on the column series to fill blanks if needed.

Question 13

An alternative approach is resample, which can handle duplicate dates in addition to missing dates. For example:

df.resample('D').mean()

resample is a deferred operation like groupby so you need to follow it with another operation. In this case mean works well, but you can also use many other pandas methods like max, sum, etc.

Here is the original data, but with an extra entry for ‘2013-09-03’:

             val
date           
2013-09-02     2
2013-09-03    10
2013-09-03    20    <- duplicate date added to OP's data
2013-09-06     5
2013-09-07     1

And here are the results:

             val
date            
2013-09-02   2.0
2013-09-03  15.0    <- mean of original values for 2013-09-03
2013-09-04   NaN    <- NaN b/c date not present in orig
2013-09-05   NaN    <- NaN b/c date not present in orig
2013-09-06   5.0
2013-09-07   1.0

I left the missing dates as NaNs to make it clear how this works, but you can add fillna(0) to replace NaNs with zeroes as requested by the OP or alternatively use something like interpolate() to fill with non-zero values based on the neighboring rows.

Question 14

Here’s a nice method to fill in missing dates into a dataframe, with your choice of fill_value, days_back to fill in, and sort order (date_order) by which to sort the dataframe:

def fill_in_missing_dates(df, date_col_name = 'date',date_order = 'asc', fill_value = 0, days_back = 30):

    df.set_index(date_col_name,drop=True,inplace=True)
    df.index = pd.DatetimeIndex(df.index)
    d = datetime.now().date()
    d2 = d - timedelta(days = days_back)
    idx = pd.date_range(d2, d, freq = "D")
    df = df.reindex(idx,fill_value=fill_value)
    df[date_col_name] = pd.DatetimeIndex(df.index)

    return df

Question 15

I’m using matplotlib to plot data (using plot and errorbar functions) from Python. I have to plot a set of totally separate and independent plots, and then adjust their ylim values so they can be easily visually compared.

How can I retrieve the ylim values from each plot, so that I can take the min and max of the lower and upper ylim values, respectively, and adjust the plots so they can be visually compared?

Of course, I could just analyze the data and come up with my own custom ylim values… but I’d like to use matplotlib to do that for me. Any suggestions on how to easily (and efficiently) do this?

Here’s my Python function that plots using matplotlib:

import matplotlib.pyplot as plt

def myplotfunction(title, values, errors, plot_file_name):

    # plot errorbars
    indices = range(0, len(values))
    fig = plt.figure()
    plt.errorbar(tuple(indices), tuple(values), tuple(errors), marker='.')

    # axes
    axes = plt.gca()
    axes.set_xlim([-0.5, len(values) - 0.5])
    axes.set_xlabel('My x-axis title')
    axes.set_ylabel('My y-axis title')

    # title
    plt.title(title)

    # save as file
    plt.savefig(plot_file_name)

    # close figure
    plt.close(fig)

Question 16

Just use axes.get_ylim(), it is very similar to set_ylim. From the docs:

get_ylim()

Get the y-axis range [bottom, top]

Question 17

 ymin, ymax = axes.get_ylim()

If you are using the plt api directly, you can avoid calls to the axes altogether:

def myplotfunction(title, values, errors, plot_file_name):

    # plot errorbars
    indices = range(0, len(values))
    fig = plt.figure()
    plt.errorbar(tuple(indices), tuple(values), tuple(errors), marker='.')

    plt.xlim([-0.5, len(values) - 0.5])
    plt.xlabel('My x-axis title')
    plt.ylabel('My y-axis title')

    # title
    plt.title(title)

    # save as file
    plt.savefig(plot_file_name)

   # close figure
    plt.close(fig)

Question 18

Leveraging from the good answers above and assuming you were only using plt as in

import matplotlib.pyplot as plt

then you can get all four plot limits using plt.axis() as in the following example.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5, 6, 7, 8]  # fake data
y = [1, 2, 3, 4, 3, 2, 5, 6]

plt.plot(x, y, 'k')

xmin, xmax, ymin, ymax = plt.axis()

s = 'xmin = ' + str(round(xmin, 2)) + ', ' + \
    'xmax = ' + str(xmax) + '\n' + \
    'ymin = ' + str(ymin) + ', ' + \
    'ymax = ' + str(ymax) + ' '

plt.annotate(s, (1, 5))

plt.show()

The above code should produce the following output plot.

Question 19

I have an existing plot that was created with pandas like this:

df['myvar'].plot(kind='bar')

The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)

How can I format the y axis as percentages without changing the line above?

Here is the solution I found but requires that I redefine the plot:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick

data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))

fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)

ax.plot(perc, data)

fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)

plt.show()

Link to the above solution: Pyplot: using percentage on x axis

Question 20

This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):

import ...
import matplotlib.ticker as mtick

ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())

PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).

The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.

Update

PercentFormatter was introduced into Matplotlib proper in version 2.1.0.

Question 21

pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,5))

# you get ax from here
ax = df.plot()
type(ax)  # matplotlib.axes._subplots.AxesSubplot

# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])

Question 22

Jianxun‘s solution did the job for me but broke the y value indicator at the bottom left of the window.

I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):

import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter

df = pd.DataFrame(np.random.randn(100,5))

ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y)))

Generally speaking I’d recommend using FuncFormatter for label formatting: it’s reliable, and versatile.

Question 23

For those who are looking for the quick one-liner:

plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()])

Or if you are using Latex as the axis text formatter, you have to add one backslash ‘\’

plt.gca().set_yticklabels(['{:.0f}\%'.format(x*100) for x in plt.gca().get_yticks()])

Question 24

I propose an alternative method using seaborn

Working code:

import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)

Question 25

I’m late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.

Echoing @Mad Physicist answer, using the package PercentFormatter it would be:

import matplotlib.ticker as mtick

plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer

Question 26

How can I put text in the top left (or top right) corner of a matplotlib figure, e.g. where a top left legend would be, or on top of the plot but in the top left corner? E.g. if it’s a plt.scatter(), then something that would be within the square of the scatter, put in the top left most corner.

I’d like to do this without ideally knowing the scale of the scatterplot being plotted for example, since it will change from dataset to data set. I just want it the text to be roughly in the upper left, or roughly in the upper right. With legend type positioning it should not overlap with any scatter plot points anyway.

thanks!

Question 27

You can use text.

text(x, y, s, fontsize=12)

text coordinates can be given relative to the axis, so the position of your text will be independent of the size of the plot:

The default transform specifies that text is in data coords, alternatively, you can specify text in axis coords (0,0 is lower-left and 1,1 is upper-right). The example below places text in the center of the axes::

text(0.5, 0.5,'matplotlib',
     horizontalalignment='center',
     verticalalignment='center',
     transform = ax.transAxes)

To prevent the text to interfere with any point of your scatter is more difficult afaik. The easier method is to set y_axis (ymax in ylim((ymin,ymax))) to a value a bit higher than the max y-coordinate of your points. In this way you will always have this free space for the text.

EDIT: here you have an example:

In [17]: from pylab import figure, text, scatter, show
In [18]: f = figure()
In [19]: ax = f.add_subplot(111)
In [20]: scatter([3,5,2,6,8],[5,3,2,1,5])
Out[20]: <matplotlib.collections.CircleCollection object at 0x0000000007439A90>
In [21]: text(0.1, 0.9,'matplotlib', ha='center', va='center', transform=ax.transAxes)
Out[21]: <matplotlib.text.Text object at 0x0000000007415B38>
In [22]:

The ha and va parameters set the alignment of your text relative to the insertion point. ie. ha=’left’ is a good set to prevent a long text to go out of the left axis when the frame is reduced (made narrower) manually.

Question 28

One solution would be to use the plt.legend function, even if you don’t want an actual legend. You can specify the placement of the legend box by using the loc keyterm. More information can be found at this website but I’ve also included an example showing how to place a legend:

ax.scatter(xa,ya, marker='o', s=20, c="lightgreen", alpha=0.9)
ax.scatter(xb,yb, marker='o', s=20, c="dodgerblue", alpha=0.9)
ax.scatter(xc,yc marker='o', s=20, c="firebrick", alpha=1.0)
ax.scatter(xd,xd,xd, marker='o', s=20, c="goldenrod", alpha=0.9)
line1 = Line2D(range(10), range(10), marker='o', color="goldenrod")
line2 = Line2D(range(10), range(10), marker='o',color="firebrick")
line3 = Line2D(range(10), range(10), marker='o',color="lightgreen")
line4 = Line2D(range(10), range(10), marker='o',color="dodgerblue")
plt.legend((line1,line2,line3, line4),('line1','line2', 'line3', 'line4'),numpoints=1, loc=2)

Note that because loc=2, the legend is in the upper-left corner of the plot. And if the text overlaps with the plot, you can make it smaller by using legend.fontsize, which will then make the legend smaller.

Question 29

I’ve got the following simple script that plots a graph:

import matplotlib.pyplot as plt
import numpy as np

T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])

plt.plot(T,power)
plt.show()

As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.

Is there an easy way to do this in PyPlot? I’ve found some tutorials, but they all seem rather complex.

Question 30

You could use scipy.interpolate.spline to smooth out your data yourself:

from scipy.interpolate import spline

# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)  

power_smooth = spline(T, power, xnew)

plt.plot(xnew,power_smooth)
plt.show()

spline is deprecated in scipy 0.19.0, use BSpline class instead.

Switching from spline to BSpline isn’t a straightforward copy/paste and requires a little tweaking:

from scipy.interpolate import make_interp_spline, BSpline

# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300) 

spl = make_interp_spline(T, power, k=3)  # type: BSpline
power_smooth = spl(xnew)

plt.plot(xnew, power_smooth)
plt.show()

Before:

After:

Question 31

For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:

from scipy.ndimage.filters import gaussian_filter1d

ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()

if you increase sigma you can get a more smoothed function.

Proceed with caution with this one. It modifies the original values and may not be what you want.

Question 32

I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn’t have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you’re using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).

Question 33

See the scipy.interpolate documentation for some examples.

The following example demonstrates its use, for linear and cubic spline interpolation:
>>> from scipy.interpolate import interp1d

>>> x = np.linspace(0, 10, num=11, endpoint=True)
>>> y = np.cos(-x**2/9.0)
>>> f = interp1d(x, y)
>>> f2 = interp1d(x, y, kind='cubic')

>>> xnew = np.linspace(0, 10, num=41, endpoint=True)
>>> import matplotlib.pyplot as plt
>>> plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')
>>> plt.legend(['data', 'linear', 'cubic'], loc='best')
>>> plt.show()

Question 34

Based on this question about heatmaps in matplotlib, I wanted to move the x-axis titles to the top of the plot.

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4,4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[0])+0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1])+0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.set_label_position('top') # <-- This doesn't work!

ax.set_xticklabels(row_labels, minor=False)
ax.set_yticklabels(column_labels, minor=False)
plt.show()

However, calling matplotlib’s set_label_position (as notated above) doesn’t seem to have the desired effect. Here’s my output:

What am I doing wrong?

Question 35

Use

ax.xaxis.tick_top()

to place the tick marks at the top of the image. The command

ax.set_xlabel('X LABEL')    
ax.xaxis.set_label_position('top')

affects the label, not the tick marks.

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()

ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)
plt.show()

Question 36

You want set_ticks_position rather than set_label_position:

ax.xaxis.set_ticks_position('top') # the rest is the same

This gives me:

Question 37

tick_params is very useful for setting tick properties. Labels can be moved to the top with:

    ax.tick_params(labelbottom=False,labeltop=True)

Question 38

You’ve got to do some extra massaging if you want the ticks (not labels) to show up on the top and bottom (not just the top). The only way I could do this is with a minor change to unutbu’s code:

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.xaxis.set_ticks_position('both') # THIS IS THE ONLY CHANGE

ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)
plt.show()

Output:

Question 39

I have a pandas data frame and would like to plot values from one column versus the values from another column. Fortunately, there is plot method associated with the data-frames that seems to do what I need:

df.plot(x='col_name_1', y='col_name_2')

Unfortunately, it looks like among the plot styles (listed here after the kind parameter) there are not points. I can use lines or bars or even density but not points. Is there a work around that can help to solve this problem.

Question 40

You can specify the style of the plotted line when calling df.plot:

df.plot(x='col_name_1', y='col_name_2', style='o')

The style argument can also be a dict or list, e.g.:

import numpy as np
import pandas as pd

d = {'one' : np.random.rand(10),
     'two' : np.random.rand(10)}

df = pd.DataFrame(d)

df.plot(style=['o','rx'])

All the accepted style formats are listed in the documentation of matplotlib.pyplot.plot.

Question 41

For this (and most plotting) I would not rely on the Pandas wrappers to matplotlib. Instead, just use matplotlib directly:

import matplotlib.pyplot as plt
plt.scatter(df['col_name_1'], df['col_name_2'])
plt.show() # Depending on whether you use IPython or interactive mode, etc.

and remember that you can access a NumPy array of the column’s values with df.col_name_1.values for example.

I ran into trouble using this with Pandas default plotting in the case of a column of Timestamp values with millisecond precision. In trying to convert the objects to datetime64 type, I also discovered a nasty issue: < Pandas gives incorrect result when asking if Timestamp column values have attr astype >.

Question 42

Pandas uses matplotlib as a library for basic plots. The easiest way in your case will using the following:

import pandas as pd
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
df.plot(x='col_name_1', y='col_name_2', style='o')

However, I would recommend to use seaborn as an alternative solution if you want have more customized plots while not going into the basic level of matplotlib. In this case you the solution will be following:

import pandas as pd
import seaborn as sns
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
sns.scatterplot(x="col_name_1", y="col_name_2", data=df)

Question 43

Now in latest pandas you can directly use df.plot.scatter function

df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
                   [6.4, 3.2, 1], [5.9, 3.0, 2]],
                  columns=['length', 'width', 'species'])
ax1 = df.plot.scatter(x='length',
                      y='width',
                      c='DarkBlue')

https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html

Question 44

Ever since upgrading matplotlib I get the following error whenever trying to create a legend:

/usr/lib/pymodules/python2.7/matplotlib/legend.py:610: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x3a30810>]
Use proxy artist instead.

http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist

  warnings.warn("Legend does not support %s\nUse proxy artist instead.\n\nhttp://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist\n" % (str(orig_handle),))
/usr/lib/pymodules/python2.7/matplotlib/legend.py:610: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x3a30990>]
Use proxy artist instead.

http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist

  warnings.warn("Legend does not support %s\nUse proxy artist instead.\n\nhttp://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist\n" % (str(orig_handle),))

This even occurs with a trivial script like this:

import matplotlib.pyplot as plt

a = [1,2,3]
b = [4,5,6]
c = [7,8,9]

plot1 = plt.plot(a,b)
plot2 = plt.plot(a,c)

plt.legend([plot1,plot2],["plot 1", "plot 2"])
plt.show()

I’ve found the link that the error points me towards pretty useless in diagnosing the source of the error.

Question 45

You should add commas:

plot1, = plt.plot(a,b)
plot2, = plt.plot(a,c)

The reason you need the commas is because plt.plot() returns a tuple of line objects, no matter how many are actually created from the command. Without the comma, “plot1” and “plot2” are tuples instead of line objects, making the later call to plt.legend() fail.

The comma implicitly unpacks the results so that instead of a tuple, “plot1” and “plot2” automatically become the first objects within the tuple, i.e. the line objects you actually want.

http://matplotlib.sourceforge.net/users/legend_guide.html#adjusting-the-order-of-legend-items

line, = plot(x,sin(x)) what does comma stand for?

Question 46

Use the “label” keyword, like so:

pyplot.plot(x, y, label='x vs. y')

and then add the legend like so:

pyplot.legend()

The legend will retain line properties like thickness, colours, etc.

Question 47

Use handles AKA Proxy artists

import matplotlib.lines as mlines
import matplotlib.pyplot as plt
# defining legend style and data
blue_line = mlines.Line2D([], [], color='blue', label='My Label')
reds_line = mlines.Line2D([], [], color='red', label='My Othes')

plt.legend(handles=[blue_line, reds_line])

plt.show()

Question 48

use label while plotting graph then only u can use legend. x axis name and y axis name is different than legend name.

Question 49

I have a list of pairs (a, b) that I would like to plot with matplotlib in python as actual x-y coordinates. Currently, it is making two plots, where the index of the list gives the x-coordinate, and the first plot’s y values are the as in the pairs and the second plot’s y values are the bs in the pairs.

To clarify, my data looks like this: li = [(a,b), (c,d), ... , (t, u)] I want to do a one-liner that just calls plt.plot() incorrect. If I didn’t require a one-liner I could trivially do:

xs = [x[0] for x in li]
ys = [x[1] for x in li]
plt.plot(xs, ys)

How can I get matplotlib to plot these pairs as x-y coordinates?

Question 50

As per this example:

import numpy as np
import matplotlib.pyplot as plt

N = 50
x = np.random.rand(N)
y = np.random.rand(N)

plt.scatter(x, y)
plt.show()

will produce:

To unpack your data from pairs into lists use zip:

x, y = zip(*li)

So, the one-liner:

plt.scatter(*zip(*li))

Question 51

If you have a numpy array you can do this:

import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    [1, 2],
    [2, 3],
    [3, 6],
])
x, y = data.T
plt.scatter(x,y)
plt.show()

Question 52

If you want to plot a single line connecting all the points in the list

plt.plot(li[:])

plt.show()

This will plot a line connecting all the pairs in the list as points on a Cartesian plane from the starting of the list to the end. I hope that this is what you wanted.

问题：使用Matplotlib以非阻塞方式绘制

回答 0

回答 1

回答 2

回答 3

回答 4

实时绘图

程序退出后的保持图

SaveToFile和在外部查看器中查看

选择保存格式

快速实时更新的轻量级查看器

示例代码和结果

Live Plotting

Holding Plot after Program Exit

SaveToFile and View in external viewer

Selecting Format for Saving

Fast Lightweight Viewer with Live Update

Sample Code & Results

回答 5

回答 6

问题：将缺失的日期添加到熊猫数据框

回答 0

回答 1

回答 2

回答 3

回答 4

问题：matplotlib获取ylim值

回答 0

回答 1

回答 2

问题：将y轴格式化为百分比

回答 0

回答 1

回答 2

回答 3

回答 4

回答 5

问题：将文本放在matplotlib图的左上角

回答 0

回答 1

问题：用PyPlot绘制平滑线

回答 0

回答 1

回答 2

回答 3

问题：在matplotlib中将x轴移动到绘图的顶部

回答 0

回答 1

回答 2

回答 3

问题：如何使用点绘制熊猫数据框的两列？

回答 0

回答 1

回答 2

回答 3

问题：Matplotlib图例不起作用

回答 0

回答 1

回答 2

回答 3

问题：在python matplotlib中绘制（x，y）坐标列表

回答 0

回答 1

回答 2

有趣好用的Python教程