标签归档:plot

使用Matplotlib以非阻塞方式绘制

问题:使用Matplotlib以非阻塞方式绘制

最近几天,我一直在玩Numpy和matplotlib。我在尝试使matplotlib绘制函数而不阻止执行时遇到问题。我知道这里已经有很多线程在问类似的问题,并且我已经在Google上搜索了很多,但是还没有成功完成这项工作。

我曾尝试按照某些人的建议使用show(block = False),但是我得到的只是一个冻结的窗口。如果我简单地调用show(),则将正确绘制结果,但执行将被阻塞,直到关闭窗口为止。从我读过的其他线程中,我怀疑show(block = False)是否起作用取决于后端。这样对吗?我的后端是Qt4Agg。您能否看一下我的代码,并告诉我是否看到错误?这是我的代码。谢谢你的帮助。

from math import *
from matplotlib import pyplot as plt
print plt.get_backend()



def main():
    x = range(-50, 51, 1)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4

        y = [Xi**pow for Xi in x]
        print y

        plt.plot(x, y)
        plt.draw()
        #plt.show()             #this plots correctly, but blocks execution.
        plt.show(block=False)   #this creates an empty frozen window.
        _ = raw_input("Press [enter] to continue.")


if __name__ == '__main__':
    main()

PS。我忘了说,我想在每次绘制图形时都更新现有窗口,而不是创建一个新窗口。

I have been playing with Numpy and matplotlib in the last few days. I am having problems trying to make matplotlib plot a function without blocking execution. I know there are already many threads here on SO asking similar questions, and I ‘ve googled quite a lot but haven’t managed to make this work.

I have tried using show(block=False) as some people suggest, but all I get is a frozen window. If I simply call show(), the result is plotted properly but execution is blocked until the window is closed. From other threads I ‘ve read, I suspect that whether show(block=False) works or not depends on the backend. Is this correct? My back end is Qt4Agg. Could you have a look at my code and tell me if you see something wrong? Here is my code. Thanks for any help.

from math import *
from matplotlib import pyplot as plt
print plt.get_backend()



def main():
    x = range(-50, 51, 1)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4

        y = [Xi**pow for Xi in x]
        print y

        plt.plot(x, y)
        plt.draw()
        #plt.show()             #this plots correctly, but blocks execution.
        plt.show(block=False)   #this creates an empty frozen window.
        _ = raw_input("Press [enter] to continue.")


if __name__ == '__main__':
    main()

PS. I forgot to say that I would like to update the existing window every time I plot something, instead of creating a new one.


回答 0

我花了很长时间寻找解决方案,并找到了答案

看起来,为了获得您(和我)想要的东西,您需要将plt.ion()plt.show()(而不是与block=False)结合在一起,最重要的是,plt.pause(.001)(或您想要的任何时间)结合在一起。该暂停是必须的,因为GUI事件,而主代码正在睡觉,包括绘图发生。这很可能是通过从休眠线程中获取时间来实现的,所以IDE可能会为此惹恼我不知道。

这是对我适用于python 3.5的实现:

import numpy as np
from matplotlib import pyplot as plt

def main():
    plt.axis([-50,50,0,10000])
    plt.ion()
    plt.show()

    x = np.arange(-50, 51)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4
        y = [Xi**pow for Xi in x]
        plt.plot(x, y)
        plt.draw()
        plt.pause(0.001)
        input("Press [enter] to continue.")

if __name__ == '__main__':
    main()

I spent a long time looking for solutions, and found this answer.

It looks like, in order to get what you (and I) want, you need the combination of plt.ion(), plt.show() (not with block=False) and, most importantly, plt.pause(.001) (or whatever time you want). The pause is needed because the GUI events happen while the main code is sleeping, including drawing. It’s possible that this is implemented by picking up time from a sleeping thread, so maybe IDEs mess with that—I don’t know.

Here’s an implementation that works for me on python 3.5:

import numpy as np
from matplotlib import pyplot as plt

def main():
    plt.axis([-50,50,0,10000])
    plt.ion()
    plt.show()

    x = np.arange(-50, 51)
    for pow in range(1,5):   # plot x^1, x^2, ..., x^4
        y = [Xi**pow for Xi in x]
        plt.plot(x, y)
        plt.draw()
        plt.pause(0.001)
        input("Press [enter] to continue.")

if __name__ == '__main__':
    main()

回答 1


一个对我有用的简单技巧如下:

  1. 在show内使用block = False参数:plt.show(block = False)
  2. 在.py脚本的末尾使用另一个plt.show()

范例

import matplotlib.pyplot as plt

plt.imshow(add_something)
plt.xlabel("x")
plt.ylabel("y")

plt.show(block=False)

#more code here (e.g. do calculations and use print to see them on the screen

plt.show()

注意plt.show()是我脚本的最后一行。


A simple trick that works for me is the following:

  1. Use the block = False argument inside show: plt.show(block = False)
  2. Use another plt.show() at the end of the .py script.

Example:

import matplotlib.pyplot as plt

plt.imshow(add_something)
plt.xlabel("x")
plt.ylabel("y")

plt.show(block=False)

#more code here (e.g. do calculations and use print to see them on the screen

plt.show()

Note: plt.show() is the last line of my script.


回答 2

您可以通过将绘图写入数组,然后在另一个线程中显示该数组来避免阻塞执行。这是一个使用pyformulas 0.2.8中的 pf.screen同时生成和显示图的示例:

import pyformulas as pf
import matplotlib.pyplot as plt
import numpy as np
import time

fig = plt.figure()

canvas = np.zeros((480,640))
screen = pf.screen(canvas, 'Sinusoid')

start = time.time()
while True:
    now = time.time() - start

    x = np.linspace(now-2, now, 100)
    y = np.sin(2*np.pi*x) + np.sin(3*np.pi*x)
    plt.xlim(now-2,now+1)
    plt.ylim(-3,3)
    plt.plot(x, y, c='black')

    # If we haven't already shown or saved the plot, then we need to draw the figure first...
    fig.canvas.draw()

    image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8, sep='')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

    screen.update(image)

#screen.close()

结果:

免责声明:我是pypyulas的维护者。

参考:Matplotlib:将图保存到numpy数组

You can avoid blocking execution by writing the plot to an array, then displaying the array in a different thread. Here is an example of generating and displaying plots simultaneously using pf.screen from pyformulas 0.2.8:

import pyformulas as pf
import matplotlib.pyplot as plt
import numpy as np
import time

fig = plt.figure()

canvas = np.zeros((480,640))
screen = pf.screen(canvas, 'Sinusoid')

start = time.time()
while True:
    now = time.time() - start

    x = np.linspace(now-2, now, 100)
    y = np.sin(2*np.pi*x) + np.sin(3*np.pi*x)
    plt.xlim(now-2,now+1)
    plt.ylim(-3,3)
    plt.plot(x, y, c='black')

    # If we haven't already shown or saved the plot, then we need to draw the figure first...
    fig.canvas.draw()

    image = np.fromstring(fig.canvas.tostring_rgb(), dtype=np.uint8, sep='')
    image = image.reshape(fig.canvas.get_width_height()[::-1] + (3,))

    screen.update(image)

#screen.close()

Result:

Disclaimer: I’m the maintainer for pyformulas.

Reference: Matplotlib: save plot to numpy array


回答 3

这些答案中有很多是超级夸张的,从我的发现中,答案并不是那么难理解。

您可以plt.ion()根据需要使用,但我发现使用plt.draw()同样有效

对于我的特定项目,我正在绘制图像,但是您可以使用plot()scatter()或其他任何一种来代替figimage(),这没关系。

plt.figimage(image_to_show)
plt.draw()
plt.pause(0.001)

要么

fig = plt.figure()
...
fig.figimage(image_to_show)
fig.canvas.draw()
plt.pause(0.001)

如果您使用的是实际数字。
我使用了@ krs013和@Default Picture的答案来解决这个问题,
希望这可以使某人不必在一个单独的线程上启动每个单独的角色,或者不必阅读这些小说就可以解决这个问题。

A lot of these answers are super inflated and from what I can find, the answer isn’t all that difficult to understand.

You can use plt.ion() if you want, but I found using plt.draw() just as effective

For my specific project I’m plotting images, but you can use plot() or scatter() or whatever instead of figimage(), it doesn’t matter.

plt.figimage(image_to_show)
plt.draw()
plt.pause(0.001)

Or

fig = plt.figure()
...
fig.figimage(image_to_show)
fig.canvas.draw()
plt.pause(0.001)

If you’re using an actual figure.
I used @krs013, and @Default Picture’s answers to figure this out
Hopefully this saves someone from having launch every single figure on a separate thread, or from having to read these novels just to figure this out


回答 4

实时绘图

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
# plt.axis([x[0], x[-1], -1, 1])      # disable autoscaling
for point in x:
    plt.plot(point, np.sin(2 * point), '.', color='b')
    plt.draw()
    plt.pause(0.01)
# plt.clf()                           # clear the current figure

如果数据量太多,您可以通过一个简单的计数器降低更新率

cnt += 1
if (cnt == 10):       # update plot each 10 points
    plt.draw()
    plt.pause(0.01)
    cnt = 0

程序退出后的保持图

这是我的实际问题,无法找到令人满意的答案,我想在脚本完成后未关闭的绘图(例如MATLAB),

如果您考虑一下,在脚本完成后,程序将终止,并且没有逻辑方式以这种方式保存绘图,因此有两个选择

  1. 阻止脚本退出(这是plt.show()而不是我想要的)
  2. 在单独的线程上运行图(太复杂)

这对我来说并不令人满意,所以我在盒子外面找到了另一个解决方案

SaveToFile和在外部查看器中查看

为此,保存和查看均应快速进行,查看器不应锁定文件,而应自动更新内容

选择保存格式

基于矢量的格式既小又快速

  • SVG不错,但是除了默认情况下需要手动刷新的Web浏览器之外,找不到合适的查看器
  • PDF可支持矢量格式,并且有支持实时更新的轻量级查看器

快速实时更新的轻量级查看器

对于PDF,有几个不错的选择

  • 在Windows上,我使用免费,快速,轻巧的SumatraPDF(我的机箱仅使用1.8MB RAM)

  • 在Linux上,有几种选择,例如Evince(GNOME)和Ocular(KDE)

示例代码和结果

用于将绘图输出到文件的示例代码

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(2 * x)
plt.plot(x, y)
plt.savefig("fig.pdf")

第一次运行后,在上述其中一个查看器中打开输出文件并欣赏。

这是VSCode和SumatraPDF的屏幕截图,该过程也足够快以达到半实时更新率(我的设置可以time.sleep()在间隔之间使用时接近10Hz )

Live Plotting

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
# plt.axis([x[0], x[-1], -1, 1])      # disable autoscaling
for point in x:
    plt.plot(point, np.sin(2 * point), '.', color='b')
    plt.draw()
    plt.pause(0.01)
# plt.clf()                           # clear the current figure

if the amount of data is too much you can lower the update rate with a simple counter

cnt += 1
if (cnt == 10):       # update plot each 10 points
    plt.draw()
    plt.pause(0.01)
    cnt = 0

Holding Plot after Program Exit

This was my actual problem that couldn’t find satisfactory answer for, I wanted plotting that didn’t close after the script was finished (like MATLAB),

If you think about it, after the script is finished, the program is terminated and there is no logical way to hold the plot this way, so there are two options

  1. block the script from exiting (that’s plt.show() and not what I want)
  2. run the plot on a separate thread (too complicated)

this wasn’t satisfactory for me so I found another solution outside of the box

SaveToFile and View in external viewer

For this the saving and viewing should be both fast and the viewer shouldn’t lock the file and should update the content automatically

Selecting Format for Saving

vector based formats are both small and fast

  • SVG is good but coudn’t find good viewer for it except the web browser which by default needs manual refresh
  • PDF can support vector formats and there are lightweight viewers which support live updating

Fast Lightweight Viewer with Live Update

For PDF there are several good options

  • On Windows I use SumatraPDF which is free, fast and light (only uses 1.8MB RAM for my case)

  • On Linux there are several options such as Evince (GNOME) and Ocular (KDE)

Sample Code & Results

Sample code for outputing plot to a file

import numpy as np
import matplotlib.pyplot as plt

x = np.linspace(0, 2 * np.pi, 100)
y = np.sin(2 * x)
plt.plot(x, y)
plt.savefig("fig.pdf")

after first run, open the output file in one of the viewers mentioned above and enjoy.

Here is a screenshot of VSCode alongside SumatraPDF, also the process is fast enough to get semi-live update rate (I can get near 10Hz on my setup just use time.sleep() between intervals)


回答 5

Iggy的答案对我来说是最容易遵循的,但是subplot当我执行刚在执行的后续命令时却遇到了以下错误show

MatplotlibDeprecationWarning:当前使用与先前轴相同的参数添加轴将重用较早的实例。在将来的版本中,将始终创建并返回一个新实例。同时,通过向每个轴实例传递唯一的标签,可以抑制此警告,并确保将来的行为。

为了避免此错误,它有助于在用户点击Enter后关闭(或清除)绘图。

这是对我有用的代码:

def plt_show():
    '''Text-blocking version of plt.show()
    Use this instead of plt.show()'''
    plt.draw()
    plt.pause(0.001)
    input("Press enter to continue...")
    plt.close()

Iggy’s answer was the easiest for me to follow, but I got the following error when doing a subsequent subplot command that was not there when I was just doing show:

MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. Meanwhile, this warning can be suppressed, and the future behavior ensured, by passing a unique label to each axes instance.

In order to avoid this error, it helps to close (or clear) the plot after the user hits enter.

Here’s the code that worked for me:

def plt_show():
    '''Text-blocking version of plt.show()
    Use this instead of plt.show()'''
    plt.draw()
    plt.pause(0.001)
    input("Press enter to continue...")
    plt.close()

回答 6

Python包drawow允许以非阻塞方式实时更新绘图。
它还可以与网络摄像头和OpenCV配合使用,例如绘制每个帧的度量。
请参阅原始帖子

The Python package drawnow allows to update a plot in real time in a non blocking way.
It also works with a webcam and OpenCV for example to plot measures for each frame.
See the original post.


将缺失的日期添加到熊猫数据框

问题:将缺失的日期添加到熊猫数据框

我的数据可以在给定日期包含多个事件,也可以在一个日期包含否事件。我接受这些事件,按日期计数并绘制它们。但是,当我绘制它们时,我的两个系列并不总是匹配。

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()

在上面的代码中,idx变为30个日期范围。2013/09/01至2013/09/30但是S可能只有25或26天,因为在给定日期没有事件发生。然后,当我尝试绘制时,由于大小不匹配,我得到一个AssertionError:

fig, ax = plt.subplots()    
ax.bar(idx.to_pydatetime(), s, color='green')

解决这个问题的正确方法是什么?我是否要从IDX中删除没有值的日期,或者(我希望这样做)是将序列中缺少的日期添加为0(我希望这样做)?我希望有30天的完整图表(值为0)。如果这种方法正确,那么有关如何开始使用的任何建议?我需要某种动态reindex功能吗?

这是Sdf.groupby(['simpleDate']).size() )的代码段,请注意没有输入04和05。

09-02-2013     2
09-03-2013    10
09-06-2013     5
09-07-2013     1

My data can have multiple events on a given date or NO events on a date. I take these events, get a count by date and plot them. However, when I plot them, my two series don’t always match.

idx = pd.date_range(df['simpleDate'].min(), df['simpleDate'].max())
s = df.groupby(['simpleDate']).size()

In the above code idx becomes a range of say 30 dates. 09-01-2013 to 09-30-2013 However S may only have 25 or 26 days because no events happened for a given date. I then get an AssertionError as the sizes dont match when I try to plot:

fig, ax = plt.subplots()    
ax.bar(idx.to_pydatetime(), s, color='green')

What’s the proper way to tackle this? Do I want to remove dates with no values from IDX or (which I’d rather do) is add to the series the missing date with a count of 0. I’d rather have a full graph of 30 days with 0 values. If this approach is right, any suggestions on how to get started? Do I need some sort of dynamic reindex function?

Here’s a snippet of S ( df.groupby(['simpleDate']).size() ), notice no entries for 04 and 05.

09-02-2013     2
09-03-2013    10
09-06-2013     5
09-07-2013     1

回答 0

您可以使用Series.reindex

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

Yield

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

You could use Series.reindex:

import pandas as pd

idx = pd.date_range('09-01-2013', '09-30-2013')

s = pd.Series({'09-02-2013': 2,
               '09-03-2013': 10,
               '09-06-2013': 5,
               '09-07-2013': 1})
s.index = pd.DatetimeIndex(s.index)

s = s.reindex(idx, fill_value=0)
print(s)

yields

2013-09-01     0
2013-09-02     2
2013-09-03    10
2013-09-04     0
2013-09-05     0
2013-09-06     5
2013-09-07     1
2013-09-08     0
...

回答 1

使用更快的解决方法.asfreq()。这不需要创建新索引即可在中调用.reindex()

# "broken" (staggered) dates
dates = pd.Index([pd.Timestamp('2012-05-01'), 
                  pd.Timestamp('2012-05-04'), 
                  pd.Timestamp('2012-05-06')])
s = pd.Series([1, 2, 3], dates)

print(s.asfreq('D'))
2012-05-01    1.0
2012-05-02    NaN
2012-05-03    NaN
2012-05-04    2.0
2012-05-05    NaN
2012-05-06    3.0
Freq: D, dtype: float64

A quicker workaround is to use .asfreq(). This doesn’t require creation of a new index to call within .reindex().

# "broken" (staggered) dates
dates = pd.Index([pd.Timestamp('2012-05-01'), 
                  pd.Timestamp('2012-05-04'), 
                  pd.Timestamp('2012-05-06')])
s = pd.Series([1, 2, 3], dates)

print(s.asfreq('D'))
2012-05-01    1.0
2012-05-02    NaN
2012-05-03    NaN
2012-05-04    2.0
2012-05-05    NaN
2012-05-06    3.0
Freq: D, dtype: float64

回答 2

一个问题是,reindex如果存在重复值,该操作将失败。假设我们正在处理带时间戳的数据,我们希望按日期将其编入索引:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),
    'values':['a','b','c','d']})
df.index = pd.DatetimeIndex(df['timestamps']).floor('D')
df

Yield

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-18  "2016-11-18 04:00:00"  d

由于2016-11-16日期重复,尝试重新编制索引:

all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
df.reindex(all_days)

失败与:

...
ValueError: cannot reindex from a duplicate axis

(这表示索引重复,而不是索引本身是重复项)

相反,我们可以使用.loc查找范围内所有日期的条目:

df.loc[all_days]

Yield

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-17  NaN                    NaN
2016-11-18  "2016-11-18 04:00:00"  d

fillna 如果需要,可用于色谱柱系列以填充空白。

One issue is that reindex will fail if there are duplicate values. Say we’re working with timestamped data, which we want to index by date:

df = pd.DataFrame({
    'timestamps': pd.to_datetime(
        ['2016-11-15 1:00','2016-11-16 2:00','2016-11-16 3:00','2016-11-18 4:00']),
    'values':['a','b','c','d']})
df.index = pd.DatetimeIndex(df['timestamps']).floor('D')
df

yields

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-18  "2016-11-18 04:00:00"  d

Due to the duplicate 2016-11-16 date, an attempt to reindex:

all_days = pd.date_range(df.index.min(), df.index.max(), freq='D')
df.reindex(all_days)

fails with:

...
ValueError: cannot reindex from a duplicate axis

(by this it means the index has duplicates, not that it is itself a dup)

Instead, we can use .loc to look up entries for all dates in range:

df.loc[all_days]

yields

            timestamps             values
2016-11-15  "2016-11-15 01:00:00"  a
2016-11-16  "2016-11-16 02:00:00"  b
2016-11-16  "2016-11-16 03:00:00"  c
2016-11-17  NaN                    NaN
2016-11-18  "2016-11-18 04:00:00"  d

fillna can be used on the column series to fill blanks if needed.


回答 3

另一种方法是resample,除了缺少日期外,还可以处理重复的日期。例如:

df.resample('D').mean()

resample是一个类似的延迟操作,groupby因此您需要执行另一个操作。在这种情况下mean工作得很好,但你也可以使用许多其他的熊猫方法,如maxsum等。

这是原始数据,但带有“ 2013-09-03”的附加条目:

             val
date           
2013-09-02     2
2013-09-03    10
2013-09-03    20    <- duplicate date added to OP's data
2013-09-06     5
2013-09-07     1

结果如下:

             val
date            
2013-09-02   2.0
2013-09-03  15.0    <- mean of original values for 2013-09-03
2013-09-04   NaN    <- NaN b/c date not present in orig
2013-09-05   NaN    <- NaN b/c date not present in orig
2013-09-06   5.0
2013-09-07   1.0

我将遗漏的日期保留为NaN以便清楚地说明其工作原理,但是您可以fillna(0)根据OP的要求添加以零代替NaN的方法,也可以interpolate()根据相邻行使用类似非零值的填充方法。

An alternative approach is resample, which can handle duplicate dates in addition to missing dates. For example:

df.resample('D').mean()

resample is a deferred operation like groupby so you need to follow it with another operation. In this case mean works well, but you can also use many other pandas methods like max, sum, etc.

Here is the original data, but with an extra entry for ‘2013-09-03’:

             val
date           
2013-09-02     2
2013-09-03    10
2013-09-03    20    <- duplicate date added to OP's data
2013-09-06     5
2013-09-07     1

And here are the results:

             val
date            
2013-09-02   2.0
2013-09-03  15.0    <- mean of original values for 2013-09-03
2013-09-04   NaN    <- NaN b/c date not present in orig
2013-09-05   NaN    <- NaN b/c date not present in orig
2013-09-06   5.0
2013-09-07   1.0

I left the missing dates as NaNs to make it clear how this works, but you can add fillna(0) to replace NaNs with zeroes as requested by the OP or alternatively use something like interpolate() to fill with non-zero values based on the neighboring rows.


回答 4

这是一种将缺失的日期填充到数据框中的好方法,您可以选择fill_valuedays_back填充和date_order排序对数据框进行排序的顺序():

def fill_in_missing_dates(df, date_col_name = 'date',date_order = 'asc', fill_value = 0, days_back = 30):

    df.set_index(date_col_name,drop=True,inplace=True)
    df.index = pd.DatetimeIndex(df.index)
    d = datetime.now().date()
    d2 = d - timedelta(days = days_back)
    idx = pd.date_range(d2, d, freq = "D")
    df = df.reindex(idx,fill_value=fill_value)
    df[date_col_name] = pd.DatetimeIndex(df.index)

    return df

Here’s a nice method to fill in missing dates into a dataframe, with your choice of fill_value, days_back to fill in, and sort order (date_order) by which to sort the dataframe:

def fill_in_missing_dates(df, date_col_name = 'date',date_order = 'asc', fill_value = 0, days_back = 30):

    df.set_index(date_col_name,drop=True,inplace=True)
    df.index = pd.DatetimeIndex(df.index)
    d = datetime.now().date()
    d2 = d - timedelta(days = days_back)
    idx = pd.date_range(d2, d, freq = "D")
    df = df.reindex(idx,fill_value=fill_value)
    df[date_col_name] = pd.DatetimeIndex(df.index)

    return df

matplotlib获取ylim值

问题:matplotlib获取ylim值

matplotlib用来从Python 绘制数据(使用ploterrorbar函数)。我必须绘制一组完全独立的图,然后调整它们的ylim值,以便可以轻松地对其进行视觉比较。

如何ylim从每个图检索值,以便分别取下和上ylim值的最小值和最大值,并调整图以便可以对其进行直观比较?

当然,我可以分析数据并提出自己的自定义ylim值…但是我想用它matplotlib来为我做这些。关于如何轻松(高效)执行此操作的任何建议?

这是我的Python函数,使用绘制matplotlib

import matplotlib.pyplot as plt

def myplotfunction(title, values, errors, plot_file_name):

    # plot errorbars
    indices = range(0, len(values))
    fig = plt.figure()
    plt.errorbar(tuple(indices), tuple(values), tuple(errors), marker='.')

    # axes
    axes = plt.gca()
    axes.set_xlim([-0.5, len(values) - 0.5])
    axes.set_xlabel('My x-axis title')
    axes.set_ylabel('My y-axis title')

    # title
    plt.title(title)

    # save as file
    plt.savefig(plot_file_name)

    # close figure
    plt.close(fig)

I’m using matplotlib to plot data (using plot and errorbar functions) from Python. I have to plot a set of totally separate and independent plots, and then adjust their ylim values so they can be easily visually compared.

How can I retrieve the ylim values from each plot, so that I can take the min and max of the lower and upper ylim values, respectively, and adjust the plots so they can be visually compared?

Of course, I could just analyze the data and come up with my own custom ylim values… but I’d like to use matplotlib to do that for me. Any suggestions on how to easily (and efficiently) do this?

Here’s my Python function that plots using matplotlib:

import matplotlib.pyplot as plt

def myplotfunction(title, values, errors, plot_file_name):

    # plot errorbars
    indices = range(0, len(values))
    fig = plt.figure()
    plt.errorbar(tuple(indices), tuple(values), tuple(errors), marker='.')

    # axes
    axes = plt.gca()
    axes.set_xlim([-0.5, len(values) - 0.5])
    axes.set_xlabel('My x-axis title')
    axes.set_ylabel('My y-axis title')

    # title
    plt.title(title)

    # save as file
    plt.savefig(plot_file_name)

    # close figure
    plt.close(fig)

回答 0

只是使用axes.get_ylim(),它非常类似于set_ylim。从文档

get_ylim()

获取y轴范围[底部,顶部]

Just use axes.get_ylim(), it is very similar to set_ylim. From the docs:

get_ylim()

Get the y-axis range [bottom, top]


回答 1

 ymin, ymax = axes.get_ylim()

如果plt直接使用api,则可以避免完全调用轴:

def myplotfunction(title, values, errors, plot_file_name):

    # plot errorbars
    indices = range(0, len(values))
    fig = plt.figure()
    plt.errorbar(tuple(indices), tuple(values), tuple(errors), marker='.')

    plt.xlim([-0.5, len(values) - 0.5])
    plt.xlabel('My x-axis title')
    plt.ylabel('My y-axis title')

    # title
    plt.title(title)

    # save as file
    plt.savefig(plot_file_name)

   # close figure
    plt.close(fig)
 ymin, ymax = axes.get_ylim()

If you are using the plt api directly, you can avoid calls to the axes altogether:

def myplotfunction(title, values, errors, plot_file_name):

    # plot errorbars
    indices = range(0, len(values))
    fig = plt.figure()
    plt.errorbar(tuple(indices), tuple(values), tuple(errors), marker='.')

    plt.xlim([-0.5, len(values) - 0.5])
    plt.xlabel('My x-axis title')
    plt.ylabel('My y-axis title')

    # title
    plt.title(title)

    # save as file
    plt.savefig(plot_file_name)

   # close figure
    plt.close(fig)

回答 2

利用上面的好答案,并假设您仅按如下方式使用plt

import matplotlib.pyplot as plt

那么您可以使用plt.axis()以下示例中的方法获得所有四个绘图限制。

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5, 6, 7, 8]  # fake data
y = [1, 2, 3, 4, 3, 2, 5, 6]

plt.plot(x, y, 'k')

xmin, xmax, ymin, ymax = plt.axis()

s = 'xmin = ' + str(round(xmin, 2)) + ', ' + \
    'xmax = ' + str(xmax) + '\n' + \
    'ymin = ' + str(ymin) + ', ' + \
    'ymax = ' + str(ymax) + ' '

plt.annotate(s, (1, 5))

plt.show()

上面的代码应产生以下输出图。

Leveraging from the good answers above and assuming you were only using plt as in

import matplotlib.pyplot as plt

then you can get all four plot limits using plt.axis() as in the following example.

import matplotlib.pyplot as plt

x = [1, 2, 3, 4, 5, 6, 7, 8]  # fake data
y = [1, 2, 3, 4, 3, 2, 5, 6]

plt.plot(x, y, 'k')

xmin, xmax, ymin, ymax = plt.axis()

s = 'xmin = ' + str(round(xmin, 2)) + ', ' + \
    'xmax = ' + str(xmax) + '\n' + \
    'ymin = ' + str(ymin) + ', ' + \
    'ymax = ' + str(ymax) + ' '

plt.annotate(s, (1, 5))

plt.show()

The above code should produce the following output plot.


将y轴格式化为百分比

问题:将y轴格式化为百分比

我有一个用熊猫创建的现有情节,如下所示:

df['myvar'].plot(kind='bar')

y轴的格式为float,我想将y轴更改为百分比。我发现的所有解决方案都使用ax.xyz语法,并且只能将代码放置在创建绘图的上方行下方(我无法在上面的行中添加ax = ax。)

如何在不更改上面的行的情况下将y轴格式化为百分比?

这是我找到的解决方案,但需要重新定义图

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick

data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))

fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)

ax.plot(perc, data)

fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)

plt.show()

链接到上述解决方案:Pyplot:在x轴上使用百分比

I have an existing plot that was created with pandas like this:

df['myvar'].plot(kind='bar')

The y axis is format as float and I want to change the y axis to percentages. All of the solutions I found use ax.xyz syntax and I can only place code below the line above that creates the plot (I cannot add ax=ax to the line above.)

How can I format the y axis as percentages without changing the line above?

Here is the solution I found but requires that I redefine the plot:

import matplotlib.pyplot as plt
import numpy as np
import matplotlib.ticker as mtick

data = [8,12,15,17,18,18.5]
perc = np.linspace(0,100,len(data))

fig = plt.figure(1, (7,4))
ax = fig.add_subplot(1,1,1)

ax.plot(perc, data)

fmt = '%.0f%%' # Format you want the ticks, e.g. '40%'
xticks = mtick.FormatStrFormatter(fmt)
ax.xaxis.set_major_formatter(xticks)

plt.show()

Link to the above solution: Pyplot: using percentage on x axis


回答 0

这已经晚了几个月,但是我使用matplotlib 创建了PR#6251以添加一个新PercentFormatter类。使用此类,您只需要一行就可以重新格式化轴(如果算上的导入,则需要两行matplotlib.ticker):

import ...
import matplotlib.ticker as mtick

ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())

PercentFormatter()接受三个参数,xmaxdecimalssymbolxmax允许您设置对应于轴上100%的值。如果数据的范围是0.0到1.0,并且要显示的范围是0%到100%,那么这很好。做吧PercentFormatter(1.0)

另外两个参数允许您设置小数点和符号后的位数。它们分别默认为None'%'decimals=None会根据您显示的轴数自动设置小数点的数量。

更新资料

PercentFormatter 已在2.1.0版的Matplotlib中引入。

This is a few months late, but I have created PR#6251 with matplotlib to add a new PercentFormatter class. With this class you just need one line to reformat your axis (two if you count the import of matplotlib.ticker):

import ...
import matplotlib.ticker as mtick

ax = df['myvar'].plot(kind='bar')
ax.yaxis.set_major_formatter(mtick.PercentFormatter())

PercentFormatter() accepts three arguments, xmax, decimals, symbol. xmax allows you to set the value that corresponds to 100% on the axis. This is nice if you have data from 0.0 to 1.0 and you want to display it from 0% to 100%. Just do PercentFormatter(1.0).

The other two parameters allow you to set the number of digits after the decimal point and the symbol. They default to None and '%', respectively. decimals=None will automatically set the number of decimal points based on how much of the axes you are showing.

Update

PercentFormatter was introduced into Matplotlib proper in version 2.1.0.


回答 1

熊猫数据框图将为ax您返回,然后您就可以开始操纵轴了。

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,5))

# you get ax from here
ax = df.plot()
type(ax)  # matplotlib.axes._subplots.AxesSubplot

# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])

pandas dataframe plot will return the ax for you, And then you can start to manipulate the axes whatever you want.

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randn(100,5))

# you get ax from here
ax = df.plot()
type(ax)  # matplotlib.axes._subplots.AxesSubplot

# manipulate
vals = ax.get_yticks()
ax.set_yticklabels(['{:,.2%}'.format(x) for x in vals])


回答 2

建勋的解决方案为我完成了工作,但打破了窗口左下方的y值指示器。

我最终FuncFormatter改为使用它(并且还删除了此处建议的不必要的尾随零):

import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter

df = pd.DataFrame(np.random.randn(100,5))

ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) 

一般来说,我建议使用FuncFormatter标签格式:它可靠且用途广泛。

Jianxun‘s solution did the job for me but broke the y value indicator at the bottom left of the window.

I ended up using FuncFormatterinstead (and also stripped the uneccessary trailing zeroes as suggested here):

import pandas as pd
import numpy as np
from matplotlib.ticker import FuncFormatter

df = pd.DataFrame(np.random.randn(100,5))

ax = df.plot()
ax.yaxis.set_major_formatter(FuncFormatter(lambda y, _: '{:.0%}'.format(y))) 

Generally speaking I’d recommend using FuncFormatter for label formatting: it’s reliable, and versatile.


回答 3

对于那些正在寻找快速一线客的人:

plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 

或者,如果您使用Latex作为轴文本格式程序,则必须添加一个反斜杠“ \”

plt.gca().set_yticklabels(['{:.0f}\%'.format(x*100) for x in plt.gca().get_yticks()]) 

For those who are looking for the quick one-liner:

plt.gca().set_yticklabels(['{:.0f}%'.format(x*100) for x in plt.gca().get_yticks()]) 

Or if you are using Latex as the axis text formatter, you have to add one backslash ‘\’

plt.gca().set_yticklabels(['{:.0f}\%'.format(x*100) for x in plt.gca().get_yticks()]) 

回答 4

我提出了一种替代方法 seaborn

工作代码:

import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)

I propose an alternative method using seaborn

Working code:

import pandas as pd
import seaborn as sns
data=np.random.rand(10,2)*100
df = pd.DataFrame(data, columns=['A', 'B'])
ax= sns.lineplot(data=df, markers= True)
ax.set(xlabel='xlabel', ylabel='ylabel', title='title')
#changing ylables ticks
y_value=['{:,.2f}'.format(x) + '%' for x in ax.get_yticks()]
ax.set_yticklabels(y_value)


回答 5

我玩游戏迟到了,但是我才意识到:ax可以替换为plt.gca()对于那些不使用轴而只是使用子图的人来说,为。

回响@Mad Physicist答案,使用该软件包PercentFormatter将是:

import matplotlib.ticker as mtick

plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer

I’m late to the game but I just realize this: ax can be replaced with plt.gca() for those who are not using axes and just subplots.

Echoing @Mad Physicist answer, using the package PercentFormatter it would be:

import matplotlib.ticker as mtick

plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(1))
#if you already have ticks in the 0 to 1 range. Otherwise see their answer

将文本放在matplotlib图的左上角

问题:将文本放在matplotlib图的左上角

如何在matplotlib图形的左上角(或右上角)放置文本,例如,左上角图例所在的位置,还是绘图的顶部,但在左上角?例如,如果它是一个plt.scatter(),那么将在散点图的平方内放置一些东西,将其放在最左上角。

我想在不理想地知道例如散点图的比例的情况下进行此操作,因为它会随数据集的不同而变化。我只希望它的文字大致在左上方,或大致在右上方。使用图例类型定位时,它无论如何都不应与任何散点图点重叠。

谢谢!

How can I put text in the top left (or top right) corner of a matplotlib figure, e.g. where a top left legend would be, or on top of the plot but in the top left corner? E.g. if it’s a plt.scatter(), then something that would be within the square of the scatter, put in the top left most corner.

I’d like to do this without ideally knowing the scale of the scatterplot being plotted for example, since it will change from dataset to data set. I just want it the text to be roughly in the upper left, or roughly in the upper right. With legend type positioning it should not overlap with any scatter plot points anyway.

thanks!


回答 0

您可以使用text

text(x, y, s, fontsize=12)

text 可以相对于轴指定坐标,因此文本的位置将与绘图的大小无关:

默认转换指定文本在数据坐标中,或者,您也可以在坐标轴中指定文本(0,0是左下角,而1,1是右上角)。下面的示例将文本放置在轴的中心::

text(0.5, 0.5,'matplotlib',
     horizontalalignment='center',
     verticalalignment='center',
     transform = ax.transAxes)

要防止文本干扰散点图的任何点,都是比较困难的。比较简单的方法是将y_axis(的ymax ylim((ymin,ymax))轴)设置为比点的最大y坐标高一点的值。这样,您将始终拥有文本的可用空间。

编辑:这里有一个例子:

In [17]: from pylab import figure, text, scatter, show
In [18]: f = figure()
In [19]: ax = f.add_subplot(111)
In [20]: scatter([3,5,2,6,8],[5,3,2,1,5])
Out[20]: <matplotlib.collections.CircleCollection object at 0x0000000007439A90>
In [21]: text(0.1, 0.9,'matplotlib', ha='center', va='center', transform=ax.transAxes)
Out[21]: <matplotlib.text.Text object at 0x0000000007415B38>
In [22]:

ha和va参数设置文本相对于插入点的对齐方式。即。ha =’left’是一个很好的设置,可以防止在手动缩小(变窄)帧时长文本从左轴移出。

You can use text.

text(x, y, s, fontsize=12)

text coordinates can be given relative to the axis, so the position of your text will be independent of the size of the plot:

The default transform specifies that text is in data coords, alternatively, you can specify text in axis coords (0,0 is lower-left and 1,1 is upper-right). The example below places text in the center of the axes::

text(0.5, 0.5,'matplotlib',
     horizontalalignment='center',
     verticalalignment='center',
     transform = ax.transAxes)

To prevent the text to interfere with any point of your scatter is more difficult afaik. The easier method is to set y_axis (ymax in ylim((ymin,ymax))) to a value a bit higher than the max y-coordinate of your points. In this way you will always have this free space for the text.

EDIT: here you have an example:

In [17]: from pylab import figure, text, scatter, show
In [18]: f = figure()
In [19]: ax = f.add_subplot(111)
In [20]: scatter([3,5,2,6,8],[5,3,2,1,5])
Out[20]: <matplotlib.collections.CircleCollection object at 0x0000000007439A90>
In [21]: text(0.1, 0.9,'matplotlib', ha='center', va='center', transform=ax.transAxes)
Out[21]: <matplotlib.text.Text object at 0x0000000007415B38>
In [22]:

The ha and va parameters set the alignment of your text relative to the insertion point. ie. ha=’left’ is a good set to prevent a long text to go out of the left axis when the frame is reduced (made narrower) manually.


回答 1

一种解决方案是使用该plt.legend功能,即使您不需要实际的图例。您可以使用loc关键字词指定图例框的位置。可以在此网站上找到更多信息但我还提供了一个示例,说明如何放置图例:

ax.scatter(xa,ya, marker='o', s=20, c="lightgreen", alpha=0.9)
ax.scatter(xb,yb, marker='o', s=20, c="dodgerblue", alpha=0.9)
ax.scatter(xc,yc marker='o', s=20, c="firebrick", alpha=1.0)
ax.scatter(xd,xd,xd, marker='o', s=20, c="goldenrod", alpha=0.9)
line1 = Line2D(range(10), range(10), marker='o', color="goldenrod")
line2 = Line2D(range(10), range(10), marker='o',color="firebrick")
line3 = Line2D(range(10), range(10), marker='o',color="lightgreen")
line4 = Line2D(range(10), range(10), marker='o',color="dodgerblue")
plt.legend((line1,line2,line3, line4),('line1','line2', 'line3', 'line4'),numpoints=1, loc=2) 

请注意,因为loc=2,图例位于图的左上角。并且如果文本与图重叠,则可以使用来使其变小legend.fontsize,从而使图例变小。

One solution would be to use the plt.legend function, even if you don’t want an actual legend. You can specify the placement of the legend box by using the loc keyterm. More information can be found at this website but I’ve also included an example showing how to place a legend:

ax.scatter(xa,ya, marker='o', s=20, c="lightgreen", alpha=0.9)
ax.scatter(xb,yb, marker='o', s=20, c="dodgerblue", alpha=0.9)
ax.scatter(xc,yc marker='o', s=20, c="firebrick", alpha=1.0)
ax.scatter(xd,xd,xd, marker='o', s=20, c="goldenrod", alpha=0.9)
line1 = Line2D(range(10), range(10), marker='o', color="goldenrod")
line2 = Line2D(range(10), range(10), marker='o',color="firebrick")
line3 = Line2D(range(10), range(10), marker='o',color="lightgreen")
line4 = Line2D(range(10), range(10), marker='o',color="dodgerblue")
plt.legend((line1,line2,line3, line4),('line1','line2', 'line3', 'line4'),numpoints=1, loc=2) 

Note that because loc=2, the legend is in the upper-left corner of the plot. And if the text overlaps with the plot, you can make it smaller by using legend.fontsize, which will then make the legend smaller.


用PyPlot绘制平滑线

问题:用PyPlot绘制平滑线

我有以下绘制图形的简单脚本:

import matplotlib.pyplot as plt
import numpy as np

T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])

plt.plot(T,power)
plt.show()

就目前而言,这条线从一条直线到另一条直线看起来不错,但在我看来可能会更好。我想要的是使两点之间的线平滑。在Gnuplot中,我会用绘制smooth cplines

在PyPlot中有一种简单的方法吗?我已经找到了一些教程,但是它们似乎都相当复杂。

I’ve got the following simple script that plots a graph:

import matplotlib.pyplot as plt
import numpy as np

T = np.array([6, 7, 8, 9, 10, 11, 12])
power = np.array([1.53E+03, 5.92E+02, 2.04E+02, 7.24E+01, 2.72E+01, 1.10E+01, 4.70E+00])

plt.plot(T,power)
plt.show()

As it is now, the line goes straight from point to point which looks ok, but could be better in my opinion. What I want is to smooth the line between the points. In Gnuplot I would have plotted with smooth cplines.

Is there an easy way to do this in PyPlot? I’ve found some tutorials, but they all seem rather complex.


回答 0

您可以用来scipy.interpolate.spline自己整理数据:

from scipy.interpolate import spline

# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)  

power_smooth = spline(T, power, xnew)

plt.plot(xnew,power_smooth)
plt.show()

scipy 0.19.0中已弃用样条线,请改用BSpline类。

从切换splineBSpline复制并不是简单的复制/粘贴操作,需要进行一些调整:

from scipy.interpolate import make_interp_spline, BSpline

# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300) 

spl = make_interp_spline(T, power, k=3)  # type: BSpline
power_smooth = spl(xnew)

plt.plot(xnew, power_smooth)
plt.show()

之前:

后:

You could use scipy.interpolate.spline to smooth out your data yourself:

from scipy.interpolate import spline

# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300)  

power_smooth = spline(T, power, xnew)

plt.plot(xnew,power_smooth)
plt.show()

spline is deprecated in scipy 0.19.0, use BSpline class instead.

Switching from spline to BSpline isn’t a straightforward copy/paste and requires a little tweaking:

from scipy.interpolate import make_interp_spline, BSpline

# 300 represents number of points to make between T.min and T.max
xnew = np.linspace(T.min(), T.max(), 300) 

spl = make_interp_spline(T, power, k=3)  # type: BSpline
power_smooth = spl(xnew)

plt.plot(xnew, power_smooth)
plt.show()

Before:

After:


回答 1

对于此示例,样条曲线效果很好,但是如果函数固有地不平滑,并且您想要平滑的版本,则还可以尝试:

from scipy.ndimage.filters import gaussian_filter1d

ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()

如果增加sigma,可以获得更平滑的功能。

请谨慎处理这一内容。它会修改原始值,可能不是您想要的。

For this example spline works well, but if the function is not smooth inherently and you want to have smoothed version you can also try:

from scipy.ndimage.filters import gaussian_filter1d

ysmoothed = gaussian_filter1d(y, sigma=2)
plt.plot(x, ysmoothed)
plt.show()

if you increase sigma you can get a more smoothed function.

Proceed with caution with this one. It modifies the original values and may not be what you want.


回答 2

我认为您的意思是曲线拟合,而不是从您的问题的上下文来看抗锯齿。PyPlot没有任何内置的这种支持,但你可以很容易地实现一些基本的曲线拟合自己,喜欢看到的代码在这里,或者如果你使用GuiQwt它有一个曲线拟合模块。(您可能也可以从SciPy窃取代码来执行此操作)。

I presume you mean curve-fitting and not anti-aliasing from the context of your question. PyPlot doesn’t have any built-in support for this, but you can easily implement some basic curve-fitting yourself, like the code seen here, or if you’re using GuiQwt it has a curve fitting module. (You could probably also steal the code from SciPy to do this as well).


回答 3

有关scipy.interpolate一些示例,请参见文档。

以下示例演示了其在线性和三次样条插值中的用法:

>>> from scipy.interpolate import interp1d

>>> x = np.linspace(0, 10, num=11, endpoint=True)
>>> y = np.cos(-x**2/9.0)
>>> f = interp1d(x, y)
>>> f2 = interp1d(x, y, kind='cubic')

>>> xnew = np.linspace(0, 10, num=41, endpoint=True)
>>> import matplotlib.pyplot as plt
>>> plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')
>>> plt.legend(['data', 'linear', 'cubic'], loc='best')
>>> plt.show()

See the scipy.interpolate documentation for some examples.

The following example demonstrates its use, for linear and cubic spline interpolation:

>>> from scipy.interpolate import interp1d

>>> x = np.linspace(0, 10, num=11, endpoint=True)
>>> y = np.cos(-x**2/9.0)
>>> f = interp1d(x, y)
>>> f2 = interp1d(x, y, kind='cubic')

>>> xnew = np.linspace(0, 10, num=41, endpoint=True)
>>> import matplotlib.pyplot as plt
>>> plt.plot(x, y, 'o', xnew, f(xnew), '-', xnew, f2(xnew), '--')
>>> plt.legend(['data', 'linear', 'cubic'], loc='best')
>>> plt.show()


在matplotlib中将x轴移动到绘图的顶部

问题:在matplotlib中将x轴移动到绘图的顶部

基于关于matplotlib中的热图的问题,我想将x轴标题移动到图的顶部。

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4,4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[0])+0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1])+0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.set_label_position('top') # <-- This doesn't work!

ax.set_xticklabels(row_labels, minor=False)
ax.set_yticklabels(column_labels, minor=False)
plt.show()

但是,调用matplotlib的set_label_position(如上所述)似乎没有达到预期的效果。这是我的输出:

我究竟做错了什么?

Based on this question about heatmaps in matplotlib, I wanted to move the x-axis titles to the top of the plot.

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4,4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[0])+0.5, minor=False)
ax.set_yticks(np.arange(data.shape[1])+0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.set_label_position('top') # <-- This doesn't work!

ax.set_xticklabels(row_labels, minor=False)
ax.set_yticklabels(column_labels, minor=False)
plt.show()

However, calling matplotlib’s set_label_position (as notated above) doesn’t seem to have the desired effect. Here’s my output:

What am I doing wrong?


回答 0

ax.xaxis.tick_top()

将刻度线放在图像的顶部。命令

ax.set_xlabel('X LABEL')    
ax.xaxis.set_label_position('top') 

影响标签,而不影响刻度线。

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()

ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)
plt.show()

Use

ax.xaxis.tick_top()

to place the tick marks at the top of the image. The command

ax.set_xlabel('X LABEL')    
ax.xaxis.set_label_position('top') 

affects the label, not the tick marks.

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()

ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)
plt.show()


回答 1

您想要set_ticks_position而不是set_label_position

ax.xaxis.set_ticks_position('top') # the rest is the same

这给了我:

You want set_ticks_position rather than set_label_position:

ax.xaxis.set_ticks_position('top') # the rest is the same

This gives me:


回答 2

tick_params对于设置刻度属性非常有用。可以使用以下命令将标签移到顶部:

    ax.tick_params(labelbottom=False,labeltop=True)

tick_params is very useful for setting tick properties. Labels can be moved to the top with:

    ax.tick_params(labelbottom=False,labeltop=True)

回答 3

如果要让刻度(而不是标签)显示在顶部和底部(而不仅仅是顶部),则必须做一些额外的按摩。我可以做到的唯一方法是对unutbu的代码进行较小的更改:

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.xaxis.set_ticks_position('both') # THIS IS THE ONLY CHANGE

ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)
plt.show()

输出:

You’ve got to do some extra massaging if you want the ticks (not labels) to show up on the top and bottom (not just the top). The only way I could do this is with a minor change to unutbu’s code:

import matplotlib.pyplot as plt
import numpy as np
column_labels = list('ABCD')
row_labels = list('WXYZ')
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=plt.cm.Blues)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)

# want a more natural, table-like display
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.xaxis.set_ticks_position('both') # THIS IS THE ONLY CHANGE

ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)
plt.show()

Output:


如何使用点绘制熊猫数据框的两列?

问题:如何使用点绘制熊猫数据框的两列?

我有一个pandas数据框,想绘制一列的值与另一列的值。幸运的是,有plot一种与数据帧相关的方法似乎可以满足我的需求:

df.plot(x='col_name_1', y='col_name_2')

不幸的是,它看起来像打印样式(上市中这里kind参数)有没有点。我可以使用线或条,甚至可以使用密度,但不能使用点。是否有解决方法可以帮助解决此问题。

I have a pandas data frame and would like to plot values from one column versus the values from another column. Fortunately, there is plot method associated with the data-frames that seems to do what I need:

df.plot(x='col_name_1', y='col_name_2')

Unfortunately, it looks like among the plot styles (listed here after the kind parameter) there are not points. I can use lines or bars or even density but not points. Is there a work around that can help to solve this problem.


回答 0

您可以style在调用时指定标绘线的df.plot

df.plot(x='col_name_1', y='col_name_2', style='o')

style参数也可以是一个dict或者list,如:

import numpy as np
import pandas as pd

d = {'one' : np.random.rand(10),
     'two' : np.random.rand(10)}

df = pd.DataFrame(d)

df.plot(style=['o','rx'])

的文档中列出了所有可接受的样式格式matplotlib.pyplot.plot

You can specify the style of the plotted line when calling df.plot:

df.plot(x='col_name_1', y='col_name_2', style='o')

The style argument can also be a dict or list, e.g.:

import numpy as np
import pandas as pd

d = {'one' : np.random.rand(10),
     'two' : np.random.rand(10)}

df = pd.DataFrame(d)

df.plot(style=['o','rx'])

All the accepted style formats are listed in the documentation of matplotlib.pyplot.plot.


回答 1

对于这个(以及大多数绘图),我不会依赖Pandas包装器来matplotlib。而是直接使用matplotlib:

import matplotlib.pyplot as plt
plt.scatter(df['col_name_1'], df['col_name_2'])
plt.show() # Depending on whether you use IPython or interactive mode, etc.

并记住,您可以使用例如访问列值的NumPy数组df.col_name_1.values

在一列具有毫秒精度的Timestamp值的情况下,我在使用Pandas默认绘图时遇到了麻烦。在尝试将对象转换为datetime64类型时,我还发现了一个令人讨厌的问题:< Pandas在询问Timestamp列值是否具有attr astype时给出了错误的结果

For this (and most plotting) I would not rely on the Pandas wrappers to matplotlib. Instead, just use matplotlib directly:

import matplotlib.pyplot as plt
plt.scatter(df['col_name_1'], df['col_name_2'])
plt.show() # Depending on whether you use IPython or interactive mode, etc.

and remember that you can access a NumPy array of the column’s values with df.col_name_1.values for example.

I ran into trouble using this with Pandas default plotting in the case of a column of Timestamp values with millisecond precision. In trying to convert the objects to datetime64 type, I also discovered a nasty issue: < Pandas gives incorrect result when asking if Timestamp column values have attr astype >.


回答 2

Pandas使用matplotlib作为基本的绘图库。在您的情况下,最简单的方法是使用以下方法:

import pandas as pd
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
df.plot(x='col_name_1', y='col_name_2', style='o')

但是,seaborn如果您想拥有更多的自定义图,而又不涉及基础知识,我建议您使用它作为替代解决方案。matplotlib.在这种情况下,您将遵循以下解决方案:

import pandas as pd
import seaborn as sns
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
sns.scatterplot(x="col_name_1", y="col_name_2", data=df)

Pandas uses matplotlib as a library for basic plots. The easiest way in your case will using the following:

import pandas as pd
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
df.plot(x='col_name_1', y='col_name_2', style='o')

However, I would recommend to use seaborn as an alternative solution if you want have more customized plots while not going into the basic level of matplotlib. In this case you the solution will be following:

import pandas as pd
import seaborn as sns
import numpy as np

#creating sample data 
sample_data={'col_name_1':np.random.rand(20),
      'col_name_2': np.random.rand(20)}
df= pd.DataFrame(sample_data)
sns.scatterplot(x="col_name_1", y="col_name_2", data=df)


回答 3

现在,在最新的熊猫中,您可以直接使用df.plot.scatter函数

df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
                   [6.4, 3.2, 1], [5.9, 3.0, 2]],
                  columns=['length', 'width', 'species'])
ax1 = df.plot.scatter(x='length',
                      y='width',
                      c='DarkBlue')

https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html

Now in latest pandas you can directly use df.plot.scatter function

df = pd.DataFrame([[5.1, 3.5, 0], [4.9, 3.0, 0], [7.0, 3.2, 1],
                   [6.4, 3.2, 1], [5.9, 3.0, 2]],
                  columns=['length', 'width', 'species'])
ax1 = df.plot.scatter(x='length',
                      y='width',
                      c='DarkBlue')

https://pandas.pydata.org/pandas-docs/version/0.23/generated/pandas.DataFrame.plot.scatter.html


Matplotlib图例不起作用

问题:Matplotlib图例不起作用

自从升级matplotlib以来,每当尝试创建图例时都会出现以下错误:

/usr/lib/pymodules/python2.7/matplotlib/legend.py:610: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x3a30810>]
Use proxy artist instead.

http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist

  warnings.warn("Legend does not support %s\nUse proxy artist instead.\n\nhttp://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist\n" % (str(orig_handle),))
/usr/lib/pymodules/python2.7/matplotlib/legend.py:610: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x3a30990>]
Use proxy artist instead.

http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist

  warnings.warn("Legend does not support %s\nUse proxy artist instead.\n\nhttp://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist\n" % (str(orig_handle),))

这种情况甚至发生在像这样的琐碎脚本中:

import matplotlib.pyplot as plt

a = [1,2,3]
b = [4,5,6]
c = [7,8,9]

plot1 = plt.plot(a,b)
plot2 = plt.plot(a,c)

plt.legend([plot1,plot2],["plot 1", "plot 2"])
plt.show()

我发现错误的链接使我无法诊断错误的来源。

Ever since upgrading matplotlib I get the following error whenever trying to create a legend:

/usr/lib/pymodules/python2.7/matplotlib/legend.py:610: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x3a30810>]
Use proxy artist instead.

http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist

  warnings.warn("Legend does not support %s\nUse proxy artist instead.\n\nhttp://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist\n" % (str(orig_handle),))
/usr/lib/pymodules/python2.7/matplotlib/legend.py:610: UserWarning: Legend does not support [<matplotlib.lines.Line2D object at 0x3a30990>]
Use proxy artist instead.

http://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist

  warnings.warn("Legend does not support %s\nUse proxy artist instead.\n\nhttp://matplotlib.sourceforge.net/users/legend_guide.html#using-proxy-artist\n" % (str(orig_handle),))

This even occurs with a trivial script like this:

import matplotlib.pyplot as plt

a = [1,2,3]
b = [4,5,6]
c = [7,8,9]

plot1 = plt.plot(a,b)
plot2 = plt.plot(a,c)

plt.legend([plot1,plot2],["plot 1", "plot 2"])
plt.show()

I’ve found the link that the error points me towards pretty useless in diagnosing the source of the error.


回答 0

您应该添加逗号:

plot1, = plt.plot(a,b)
plot2, = plt.plot(a,c)

需要逗号的原因是,无论实际上从命令中创建了多少行,plt.plot()返回一个行对象的元组。如果没有逗号,则“ plot1”和“ plot2”是元组而不是行对象,从而使以后对plt.legend()的调用失败。

逗号会隐式解压缩结果,以便“ plot1”和“ plot2”自动代替元组,成为元组中的第一个对象,即您实际想要的线对象。

http://matplotlib.sourceforge.net/users/legend_guide.html#adjusting-the-order-of-legend-items

行,= plot(x,sin(x))逗号代表什么?

You should add commas:

plot1, = plt.plot(a,b)
plot2, = plt.plot(a,c)

The reason you need the commas is because plt.plot() returns a tuple of line objects, no matter how many are actually created from the command. Without the comma, “plot1” and “plot2” are tuples instead of line objects, making the later call to plt.legend() fail.

The comma implicitly unpacks the results so that instead of a tuple, “plot1” and “plot2” automatically become the first objects within the tuple, i.e. the line objects you actually want.

http://matplotlib.sourceforge.net/users/legend_guide.html#adjusting-the-order-of-legend-items

line, = plot(x,sin(x)) what does comma stand for?


回答 1

使用“标签”关键字,如下所示:

pyplot.plot(x, y, label='x vs. y')

然后像这样添加图例:

pyplot.legend()

图例将保留线条属性,例如厚度,颜色等。

Use the “label” keyword, like so:

pyplot.plot(x, y, label='x vs. y')

and then add the legend like so:

pyplot.legend()

The legend will retain line properties like thickness, colours, etc.


回答 2

使用handlesAKAProxy artists

import matplotlib.lines as mlines
import matplotlib.pyplot as plt
# defining legend style and data
blue_line = mlines.Line2D([], [], color='blue', label='My Label')
reds_line = mlines.Line2D([], [], color='red', label='My Othes')

plt.legend(handles=[blue_line, reds_line])

plt.show()

Use handles AKA Proxy artists

import matplotlib.lines as mlines
import matplotlib.pyplot as plt
# defining legend style and data
blue_line = mlines.Line2D([], [], color='blue', label='My Label')
reds_line = mlines.Line2D([], [], color='red', label='My Othes')

plt.legend(handles=[blue_line, reds_line])

plt.show()

回答 3

在绘制图形时使用标签,然后只有您可以使用图例。x轴名称和y轴名称与图例名称不同。

use label while plotting graph then only u can use legend. x axis name and y axis name is different than legend name.


在python matplotlib中绘制(x,y)坐标列表

问题:在python matplotlib中绘制(x,y)坐标列表

我有一个列表,(a, b)希望matplotlib在python中绘制为实际的xy坐标。当前,它正在绘制两个图,其中列表的索引给出x坐标,第一个图的y值是对中的as,第二个图的y值是对中的bs。

澄清一下,我的数据看起来像这样:li = [(a,b), (c,d), ... , (t, u)] 我想做一个单行代码,只是调用plt.plot()不正确。如果我不需要单线,我可以轻松地做:

xs = [x[0] for x in li]
ys = [x[1] for x in li]
plt.plot(xs, ys)

如何获取matplotlib将这些对绘制为xy坐标?

I have a list of pairs (a, b) that I would like to plot with matplotlib in python as actual x-y coordinates. Currently, it is making two plots, where the index of the list gives the x-coordinate, and the first plot’s y values are the as in the pairs and the second plot’s y values are the bs in the pairs.

To clarify, my data looks like this: li = [(a,b), (c,d), ... , (t, u)] I want to do a one-liner that just calls plt.plot() incorrect. If I didn’t require a one-liner I could trivially do:

xs = [x[0] for x in li]
ys = [x[1] for x in li]
plt.plot(xs, ys)

How can I get matplotlib to plot these pairs as x-y coordinates?


回答 0

按照这个例子

import numpy as np
import matplotlib.pyplot as plt

N = 50
x = np.random.rand(N)
y = np.random.rand(N)

plt.scatter(x, y)
plt.show()

将生成:

要将数据从成对的数据包解压缩为列表,请使用zip

x, y = zip(*li)

因此,单线:

plt.scatter(*zip(*li))

As per this example:

import numpy as np
import matplotlib.pyplot as plt

N = 50
x = np.random.rand(N)
y = np.random.rand(N)

plt.scatter(x, y)
plt.show()

will produce:

To unpack your data from pairs into lists use zip:

x, y = zip(*li)

So, the one-liner:

plt.scatter(*zip(*li))

回答 1

如果您有一个numpy数组,则可以执行以下操作:

import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    [1, 2],
    [2, 3],
    [3, 6],
])
x, y = data.T
plt.scatter(x,y)
plt.show()

If you have a numpy array you can do this:

import numpy as np
from matplotlib import pyplot as plt

data = np.array([
    [1, 2],
    [2, 3],
    [3, 6],
])
x, y = data.T
plt.scatter(x,y)
plt.show()

回答 2

如果要绘制一条连接列表中所有点的线

plt . plot ( li [ : ] )

plt . show ( )

这将绘制一条连接列表中所有对的直线,作为从列表起点到终点的笛卡尔平面上的点。我希望这就是您想要的。

If you want to plot a single line connecting all the points in the list

plt.plot(li[:])

plt.show()

This will plot a line connecting all the pairs in the list as points on a Cartesian plane from the starting of the list to the end. I hope that this is what you wanted.