标签归档:matplotlib

Matplotlib-向下移动X轴标签,但不向下移动X轴刻度

问题:Matplotlib-向下移动X轴标签,但不向下移动X轴刻度

我正在使用Matplotlib绘制直方图。使用上一个问题的提示:Matplotlib-给每个垃圾箱贴标签,我或多或少地解决了问题。

最后一个问题-以前-x轴标签(“时间(以毫秒为单位)”)呈现在x轴刻度线下方(0.00、0.04、0.08、0.12等)

根据Joe Kingston的建议(请参阅上面的问题),我尝试使用:

ax.tick_params(axis='x', pad=30)

但是,这将同时移动x轴刻度线(0.00、0.04、0.08、0.12等)以及x轴标签(“时间(以毫秒为单位)”):

有什么方法可以仅将x轴标签移动到三行图形的下面吗?

Nb:您可能需要直接在下面打开PNG-右键单击图像,然后单击“查看图像”(以FF键)或“在新标签页中打开图像”(Chrome)。SO完成的图像调整大小使它们几乎不可读

I’m using Matplotlib to plot a histogram. Using tips from my previous question: Matplotlib – label each bin, I’ve more or less go the kinks worked out.

There’s one final issue – previously – the x-axis label (“Time (in milliseconds)”) was being rendered underneath the x-axis tickmarks (0.00, 0.04, 0.08, 0.12 etc.)

Using the advice from Joe Kingston (see question above), I tried using:

ax.tick_params(axis='x', pad=30)

However, this moves both the x-axis tickmarks (0.00, 0.04, 0.08, 0.12 etc.), as well as the x-axis label (“Time (in milliseconds)”):

Is there any way to move only the x-axis label to underneath the three rows of figures?

Nb: You may need to open the PNGs below directly – Right Click on the image, then View Image (in FF), or Open image in new tab (Chrome). The image resize done by SO has rendered them nigh unreadable


回答 0

使用labelpad参数:

pl.xlabel("...", labelpad=20)

或在以下时间设置:

ax.xaxis.labelpad = 20

use labelpad parameter:

pl.xlabel("...", labelpad=20)

or set it after:

ax.xaxis.labelpad = 20

回答 1

如果变量ax.xaxis._autolabelpos = True,则matplotlib根据(某些摘录)在axis.py中的函数_update_label_position中设置标签位置:

    bboxes, bboxes2 = self._get_tick_bboxes(ticks_to_draw, renderer)
    bbox = mtransforms.Bbox.union(bboxes)
    bottom = bbox.y0
    x, y = self.label.get_position()
    self.label.set_position((x, bottom - self.labelpad * self.figure.dpi / 72.0))

您可以使用以下方法独立于刻度线设置标签位置:

    ax.xaxis.set_label_coords(x0, y0)

通过更改labelpad参数将_autolabelpos设置为False或如上所述。

If the variable ax.xaxis._autolabelpos = True, matplotlib sets the label position in function _update_label_position in axis.py according to (some excerpts):

    bboxes, bboxes2 = self._get_tick_bboxes(ticks_to_draw, renderer)
    bbox = mtransforms.Bbox.union(bboxes)
    bottom = bbox.y0
    x, y = self.label.get_position()
    self.label.set_position((x, bottom - self.labelpad * self.figure.dpi / 72.0))

You can set the label position independently of the ticks by using:

    ax.xaxis.set_label_coords(x0, y0)

that sets _autolabelpos to False or as mentioned above by changing the labelpad parameter.


以非常高的质量将图像保存在python中

问题:以非常高的质量将图像保存在python中

如何以极高的质量保存python图?

也就是说,当我继续放大保存在pdf文件中的对象时,没有模糊吗?

另外,保存它的最佳方式是什么?

pngeps?还是其他?我不能做,pdf因为有一个隐藏的数字发生,导致Latexmk编译混乱。

How can I save Python plots at very high quality?

That is, when I keep zooming in on the object saved in a PDF file, why isn’t there any blurring?

Also, what would be the best mode to save it in?

png, eps? Or some other? I can’t do pdf, because there is a hidden number that happens that mess with Latexmk compilation.


回答 0

如果您正在使用matplotlib并试图在乳胶文档中获得良好的数据,请另存为eps。具体来说,请在运行命令以绘制图像后尝试以下操作:

plt.savefig('destination_path.eps', format='eps')

我发现eps文件效果最好,而dpi参数的确是使它们在文档中看起来不错的原因。

更新:

要在保存之前指定图形的方向,只需在调用之前plt.savefig在创建图形后调用以下命令即可(假设您使用名称为的轴进行了绘制ax):

ax.view_init(elev=elevation_angle, azim=azimuthal_angle)

其中elevation_angle的一个数字(以度为单位)指定极角(从垂直z轴向下),并且azimuthal_angle指定方位角(围绕z轴)。

我发现最简单的方法是确定这些值,方法是先绘制图像,然后旋转图像,然后观察角度的当前值出现在窗口的底部,即实际绘制的下方。请记住,默认情况下会显示x,y,z位置,但是当您开始单击+拖动+旋转图像时,它们会被两个角度替换。

If you are using Matplotlib and are trying to get good figures in a LaTeX document, save as an EPS. Specifically, try something like this after running the commands to plot the image:

plt.savefig('destination_path.eps', format='eps')

I have found that EPS files work best and the dpi parameter is what really makes them look good in a document.

To specify the orientation of the figure before saving, simply call the following before the plt.savefig call, but after creating the plot (assuming you have plotted using an axes with the name ax):

ax.view_init(elev=elevation_angle, azim=azimuthal_angle)

Where elevation_angle is a number (in degrees) specifying the polar angle (down from vertical z axis) and the azimuthal_angle specifies the azimuthal angle (around the z axis).

I find that it is easiest to determine these values by first plotting the image and then rotating it and watching the current values of the angles appear towards the bottom of the window just below the actual plot. Keep in mind that the x, y, z, positions appear by default, but they are replaced with the two angles when you start to click+drag+rotate the image.


回答 1

只是添加我的结果,也使用matplotlib。

.eps使我的所有文本变为粗体,并删除了透明度。.svg给了我高分辨率图片,实际上看起来像我的图表。

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Do the plot code
fig.savefig('myimage.svg', format='svg', dpi=1200)

我使用1200 dpi是因为许多科学期刊要求图像的分辨率为1200/600/300 dpi。在GiMP或Inkscape中转换为所需的dpi和格式。

编辑:显然dpi没关系,因为.svg是矢量图形,并且具有“无限分辨率”。

Just to add my results, also using Matplotlib.

.eps made all my text bold and removed transparency. .svg gave me high-resolution pictures that actually looked like my graph.

import matplotlib.pyplot as plt
fig, ax = plt.subplots()
# Do the plot code
fig.savefig('myimage.svg', format='svg', dpi=1200)

I used 1200 dpi because a lot of scientific journals require images in 1200 / 600 / 300 dpi, depending on what the image is of. Convert to desired dpi and format in GIMP or Inkscape.

Obviously the dpi doesn’t matter since .svg are vector graphics and have “infinite resolution”.


回答 2

好的,我发现spencerlyon2的答案有效,但是万一有人发现自己不知道该如何处理那一行,我就必须这样做:

beingsaved = plt.figure()

# some scatters
plt.scatter(X_1_x, X_1_y)
plt.scatter(X_2_x, X_2_y)

beingsaved.savefig('destination_path.eps', format='eps', dpi=1000)

Okay, I found spencerlyon2’s answer working. However, in case anybody would find himself/herself not knowing what to do with that one line, I had to do it this way:

beingsaved = plt.figure()

# Some scatter plots
plt.scatter(X_1_x, X_1_y)
plt.scatter(X_2_x, X_2_y)

beingsaved.savefig('destination_path.eps', format='eps', dpi=1000)

回答 3

如果您正在使用海图,而不是matplotlib,可以保存一个.png图像,如下所示:

假设您有一个matrix对象(pandas或numpy),并且想获取一个热图:

import seaborn as sb

image = sb.heatmap(matrix)   # this gets you the heatmap
image.figure.savefig("C:/Your/Path/ ... /your_image.png")   # this saves it

该代码与seaborn的最新版本兼容。关于stackoverflow的其他代码仅适用于以前的版本。

我喜欢的另一种方式是这样。我将下一张图像的大小设置如下:

plt.subplots(figsize=(15,15))

然后,稍后在控制台中绘制输出,从中可以将其复制粘贴到所需的位置。(由于seaborn是建立在matplotlib之上的,因此不会有问题。)

In case you are working with seaborn plots, instead of Matplotlib, you can save a .png image like this:

Let’s suppose you have a matrix object (either Pandas or NumPy), and you want to take a heatmap:

import seaborn as sb

image = sb.heatmap(matrix)   # This gets you the heatmap
image.figure.savefig("C:/Your/Path/ ... /your_image.png")   # This saves it

This code is compatible with the latest version of Seaborn. Other code around Stack Overflow worked only for previous versions.

Another way I like is this. I set the size of the next image as follows:

plt.subplots(figsize=(15,15))

And then later I plot the output in the console, from which I can copy-paste it where I want. (Since Seaborn is built on top of Matplotlib, there will not be any problem.)


回答 4

您可以使用以下方法将其保存为1920×1080(或1080p)的图形:

fig = plt.figure(figsize=(19.20,10.80))

您也可以更高或更低。上述解决方案可以很好地用于打印,但是如今,您希望将创建的图像转换为PNG / JPG或以宽屏格式显示。

You can save to a figure that is 1920×1080 (or 1080p) using:

fig = plt.figure(figsize=(19.20,10.80))

You can also go much higher or lower. The above solutions work well for printing, but these days you want the created image to go into a PNG/JPG or appear in a wide screen format.


如何从具有透明背景的matplotlib中导出图?

问题:如何从具有透明背景的matplotlib中导出图?

我正在使用matplotlib制作一些图形,但是不幸的是,如果没有白色背景,我将无法导出它们。

换句话说,当我导出这样的绘图并将其放置在另一幅图像的顶部时,白色背景隐藏了其背后的内容,而不是让其显示出来。如何导出具有透明背景的图?

I am using matplotlib to make some graphs and unfortunately I cannot export them without the white background.

In other words, when I export a plot like this and position it on top of another image, the white background hides what is behind it rather than allowing it to show through. How can I export plots with a transparent background instead?


回答 0

使用savefig带有关键字参数的matplotlib 函数transparent=True将图像另存为png文件。

In [30]: x = np.linspace(0,6,31)

In [31]: y = np.exp(-0.5*x) * np.sin(x)

In [32]: plot(x, y, 'bo-')
Out[32]: [<matplotlib.lines.Line2D at 0x3f29750>]            

In [33]: savefig('demo.png', transparent=True)

结果:

当然,该图没有显示出透明度。这是使用ImageMagick display命令显示的PNG文件的屏幕截图。棋盘图案是通过PNG文件的透明部分可见的背景。

Use the matplotlib savefig function with the keyword argument transparent=True to save the image as a png file.

In [30]: x = np.linspace(0,6,31)

In [31]: y = np.exp(-0.5*x) * np.sin(x)

In [32]: plot(x, y, 'bo-')
Out[32]: [<matplotlib.lines.Line2D at 0x3f29750>]            

In [33]: savefig('demo.png', transparent=True)

Result:

Of course, that plot doesn’t demonstrate the transparency. Here’s a screenshot of the PNG file displayed using the ImageMagick display command. The checkerboard pattern is the background that is visible through the transparent parts of the PNG file.


回答 1

Png文件可以处理透明度。因此,您可以使用此问题将图保存到图像文件中,而不是使用Matplotlib显示它,以便将图形另存为png文件。

如果要使所有白色像素透明,则还有另一个问题:使用PIL使所有白色像素透明吗?

如果您想将整个区域变成透明,那么会有一个问题:然后像这个问题一样使用PIL库Python PIL:如何在PNG中使区域透明?以使您的图表透明。

Png files can handle transparency. So you could use this question Save plot to image file instead of displaying it using Matplotlib so as to save you graph as a png file.

And if you want to turn all white pixel transparent, there’s this other question : Using PIL to make all white pixels transparent?

If you want to turn an entire area to transparent, then there’s this question: And then use the PIL library like in this question Python PIL: how to make area transparent in PNG? so as to make your graph transparent.


如何在matplotlib中设置纵横比?

问题:如何在matplotlib中设置纵横比?

我正在尝试绘制一个正方形图(使用imshow),即纵横比为1:1,但我不能。这些都不起作用:

import matplotlib.pyplot as plt

ax = fig.add_subplot(111,aspect='equal')
ax = fig.add_subplot(111,aspect=1.0)
ax.set_aspect('equal')
plt.axes().set_aspect('equal')

似乎只是忽略了这些调用(matplotlib我经常遇到的一个问题)。

I’m trying to make a square plot (using imshow), i.e. aspect ratio of 1:1, but I can’t. None of these work:

import matplotlib.pyplot as plt

ax = fig.add_subplot(111,aspect='equal')
ax = fig.add_subplot(111,aspect=1.0)
ax.set_aspect('equal')
plt.axes().set_aspect('equal')

It seems like the calls are just being ignored (a problem I often seem to have with matplotlib).


回答 0

第三次魅力。我的猜测是这是一个错误,Zhenya的回答表明它已在最新版本中修复。我的版本为0.99.1.1,并创建了以下解决方案:

import matplotlib.pyplot as plt
import numpy as np

def forceAspect(ax,aspect=1):
    im = ax.get_images()
    extent =  im[0].get_extent()
    ax.set_aspect(abs((extent[1]-extent[0])/(extent[3]-extent[2]))/aspect)

data = np.random.rand(10,20)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(data)
ax.set_xlabel('xlabel')
ax.set_aspect(2)
fig.savefig('equal.png')
ax.set_aspect('auto')
fig.savefig('auto.png')
forceAspect(ax,aspect=1)
fig.savefig('force.png')

这是’force.png’:

以下是我的失败尝试,但希望能提供很多帮助。

第二个答案:

我在下面的“原始答案”过于矫kill过正,因为它的作用类似于axes.set_aspect()。我想你要用axes.set_aspect('auto')。我不明白为什么会这样,但是它为我生成了一个正方形图像图,例如以下脚本:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10,20)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(data)
ax.set_aspect('equal')
fig.savefig('equal.png')
ax.set_aspect('auto')
fig.savefig('auto.png')

产生具有“相等”长宽比的图像图: 和具有“自动”长宽比的图像图 :

下面“原始答案”中提供的代码为显式控制的宽高比提供了起点,但是,一旦调用了imshow,它似乎就会被忽略。

原始答案:

这是一个例程的示例,该例程将调整子图参数,以便获得所需的宽高比:

import matplotlib.pyplot as plt

def adjustFigAspect(fig,aspect=1):
    '''
    Adjust the subplot parameters so that the figure has the correct
    aspect ratio.
    '''
    xsize,ysize = fig.get_size_inches()
    minsize = min(xsize,ysize)
    xlim = .4*minsize/xsize
    ylim = .4*minsize/ysize
    if aspect < 1:
        xlim *= aspect
    else:
        ylim /= aspect
    fig.subplots_adjust(left=.5-xlim,
                        right=.5+xlim,
                        bottom=.5-ylim,
                        top=.5+ylim)

fig = plt.figure()
adjustFigAspect(fig,aspect=.5)
ax = fig.add_subplot(111)
ax.plot(range(10),range(10))

fig.savefig('axAspect.png')

这将产生如下图:

我可以想象,如果图中有多个子图,您希望将y和x子图的数量作为关键字参数(每个默认为1)包括到所提供的例程中。然后,使用这些数字以及hspacewspace关键字,可以使所有子图具有正确的宽高比。

Third times the charm. My guess is that this is a bug and Zhenya’s answer suggests it’s fixed in the latest version. I have version 0.99.1.1 and I’ve created the following solution:

import matplotlib.pyplot as plt
import numpy as np

def forceAspect(ax,aspect=1):
    im = ax.get_images()
    extent =  im[0].get_extent()
    ax.set_aspect(abs((extent[1]-extent[0])/(extent[3]-extent[2]))/aspect)

data = np.random.rand(10,20)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(data)
ax.set_xlabel('xlabel')
ax.set_aspect(2)
fig.savefig('equal.png')
ax.set_aspect('auto')
fig.savefig('auto.png')
forceAspect(ax,aspect=1)
fig.savefig('force.png')

This is ‘force.png’:

Below are my unsuccessful, yet hopefully informative attempts.

Second Answer:

My ‘original answer’ below is overkill, as it does something similar to axes.set_aspect(). I think you want to use axes.set_aspect('auto'). I don’t understand why this is the case, but it produces a square image plot for me, for example this script:

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10,20)

fig = plt.figure()
ax = fig.add_subplot(111)
ax.imshow(data)
ax.set_aspect('equal')
fig.savefig('equal.png')
ax.set_aspect('auto')
fig.savefig('auto.png')

Produces an image plot with ‘equal’ aspect ratio: and one with ‘auto’ aspect ratio:

The code provided below in the ‘original answer’ provides a starting off point for an explicitly controlled aspect ratio, but it seems to be ignored once an imshow is called.

Original Answer:

Here’s an example of a routine that will adjust the subplot parameters so that you get the desired aspect ratio:

import matplotlib.pyplot as plt

def adjustFigAspect(fig,aspect=1):
    '''
    Adjust the subplot parameters so that the figure has the correct
    aspect ratio.
    '''
    xsize,ysize = fig.get_size_inches()
    minsize = min(xsize,ysize)
    xlim = .4*minsize/xsize
    ylim = .4*minsize/ysize
    if aspect < 1:
        xlim *= aspect
    else:
        ylim /= aspect
    fig.subplots_adjust(left=.5-xlim,
                        right=.5+xlim,
                        bottom=.5-ylim,
                        top=.5+ylim)

fig = plt.figure()
adjustFigAspect(fig,aspect=.5)
ax = fig.add_subplot(111)
ax.plot(range(10),range(10))

fig.savefig('axAspect.png')

This produces a figure like so:

I can imagine if your having multiple subplots within the figure, you would want to include the number of y and x subplots as keyword parameters (defaulting to 1 each) to the routine provided. Then using those numbers and the hspace and wspace keywords, you can make all the subplots have the correct aspect ratio.


回答 1

什么是matplotlib你正在运行的版本?我最近不得不升级到1.1.0,并且可以使用它add_subplot(111,aspect='equal')

What is the matplotlib version you are running? I have recently had to upgrade to 1.1.0, and with it, add_subplot(111,aspect='equal') works for me.


回答 2

经过以上答案的多年成功之后,我发现它不再起作用-但是我确实找到了适用于以下部门的可行解决方案:

https://jdhao.github.io/2017/06/03/change-aspect-ratio-in-mpl

当然要感谢以上作者(也许可以在这里发表),相关内容如下:

ratio = 1.0
xleft, xright = ax.get_xlim()
ybottom, ytop = ax.get_ylim()
ax.set_aspect(abs((xright-xleft)/(ybottom-ytop))*ratio)

该链接还清晰地说明了matplotlib使用的不同坐标系。

感谢您收到的所有精彩答复-特别是@Yann,它将继续获奖。

After many years of success with the answers above, I have found this not to work again – but I did find a working solution for subplots at

https://jdhao.github.io/2017/06/03/change-aspect-ratio-in-mpl

With full credit of course to the author above (who can perhaps rather post here), the relevant lines are:

ratio = 1.0
xleft, xright = ax.get_xlim()
ybottom, ytop = ax.get_ylim()
ax.set_aspect(abs((xright-xleft)/(ybottom-ytop))*ratio)

The link also has a crystal clear explanation of the different coordinate systems used by matplotlib.

Thanks for all great answers received – especially @Yann’s which will remain the winner.


回答 3

您应该尝试使用Figaspect。这个对我有用。从文档:

创建具有指定纵横比的图形。如果arg是一个数字,请使用该纵横比。>如果arg是一个数组,则figaspect将确定适合保留数组长宽比的图形的宽度和高度。返回图形的宽度,高度(以英寸为单位)。确保创建与和高度相等的轴,例如

用法示例:

  # make a figure twice as tall as it is wide
  w, h = figaspect(2.)
  fig = Figure(figsize=(w,h))
  ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
  ax.imshow(A, **kwargs)

  # make a figure with the proper aspect for an array
  A = rand(5,3)
  w, h = figaspect(A)
  fig = Figure(figsize=(w,h))
  ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
  ax.imshow(A, **kwargs)

编辑:我不确定您在寻找什么。上面的代码更改了画布(绘图大小)。如果要更改该图的matplotlib窗口的大小,请使用:

In [68]: f = figure(figsize=(5,1))

这确实会产生5×1(wxh)的窗口。

you should try with figaspect. It works for me. From the docs:

Create a figure with specified aspect ratio. If arg is a number, use that aspect ratio. > If arg is an array, figaspect will determine the width and height for a figure that would fit array preserving aspect ratio. The figure width, height in inches are returned. Be sure to create an axes with equal with and height, eg

Example usage:

  # make a figure twice as tall as it is wide
  w, h = figaspect(2.)
  fig = Figure(figsize=(w,h))
  ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
  ax.imshow(A, **kwargs)

  # make a figure with the proper aspect for an array
  A = rand(5,3)
  w, h = figaspect(A)
  fig = Figure(figsize=(w,h))
  ax = fig.add_axes([0.1, 0.1, 0.8, 0.8])
  ax.imshow(A, **kwargs)

Edit: I am not sure of what you are looking for. The above code changes the canvas (the plot size). If you want to change the size of the matplotlib window, of the figure, then use:

In [68]: f = figure(figsize=(5,1))

this does produce a window of 5×1 (wxh).


回答 4

该答案基于Yann的答案。它将设置线性或对数对数图的纵横比。我已使用https://stackoverflow.com/a/16290035/2966723的其他信息来测试轴是否为对数刻度。

def forceAspect(ax,aspect=1):
    #aspect is width/height
    scale_str = ax.get_yaxis().get_scale()
    xmin,xmax = ax.get_xlim()
    ymin,ymax = ax.get_ylim()
    if scale_str=='linear':
        asp = abs((xmax-xmin)/(ymax-ymin))/aspect
    elif scale_str=='log':
        asp = abs((scipy.log(xmax)-scipy.log(xmin))/(scipy.log(ymax)-scipy.log(ymin)))/aspect
    ax.set_aspect(asp)

显然,您可以使用所需的任何版本log,我已经使用过scipy,但numpy还是math可以的。

This answer is based on Yann’s answer. It will set the aspect ratio for linear or log-log plots. I’ve used additional information from https://stackoverflow.com/a/16290035/2966723 to test if the axes are log-scale.

def forceAspect(ax,aspect=1):
    #aspect is width/height
    scale_str = ax.get_yaxis().get_scale()
    xmin,xmax = ax.get_xlim()
    ymin,ymax = ax.get_ylim()
    if scale_str=='linear':
        asp = abs((xmax-xmin)/(ymax-ymin))/aspect
    elif scale_str=='log':
        asp = abs((scipy.log(xmax)-scipy.log(xmin))/(scipy.log(ymax)-scipy.log(ymin)))/aspect
    ax.set_aspect(asp)

Obviously you can use any version of log you want, I’ve used scipy, but numpy or math should be fine.


matplotlib:颜色条及其文本标签

问题:matplotlib:颜色条及其文本标签

我想为创建colorbar图例,以heatmap使标签位于每种离散颜色的中心。从这里借来的示例

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

#discrete color scheme
cMap = ListedColormap(['white', 'green', 'blue','red'])

#data
np.random.seed(42)
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=cMap)

#legend
cbar = plt.colorbar(heatmap)
cbar.ax.set_yticklabels(['0','1','2','>3'])
cbar.set_label('# of contacts', rotation=270)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.invert_yaxis()

#labels
column_labels = list('ABCD')
row_labels = list('WXYZ')
ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)

plt.show()

这将生成以下图:

理想情况下,我想生成一个图例栏,该图例栏具有四种颜色,每种颜色的中心都有一个标签:0,1,2,>3。如何做到这一点?

I’d like to create a colorbar legend for a heatmap, such that the labels are in the center of each discrete color. Example borrowed from here:

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

#discrete color scheme
cMap = ListedColormap(['white', 'green', 'blue','red'])

#data
np.random.seed(42)
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=cMap)

#legend
cbar = plt.colorbar(heatmap)
cbar.ax.set_yticklabels(['0','1','2','>3'])
cbar.set_label('# of contacts', rotation=270)

# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.invert_yaxis()

#labels
column_labels = list('ABCD')
row_labels = list('WXYZ')
ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)

plt.show()

This generates the following plot:

Ideally I’d like to generate a legend bar which has the four colors and for each color, a label in its center: 0,1,2,>3. How can this be achieved?


回答 0

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

#discrete color scheme
cMap = ListedColormap(['white', 'green', 'blue','red'])

#data
np.random.seed(42)
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=cMap)

#legend
cbar = plt.colorbar(heatmap)

cbar.ax.get_yaxis().set_ticks([])
for j, lab in enumerate(['$0$','$1$','$2$','$>3$']):
    cbar.ax.text(.5, (2 * j + 1) / 8.0, lab, ha='center', va='center')
cbar.ax.get_yaxis().labelpad = 15
cbar.ax.set_ylabel('# of contacts', rotation=270)


# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.invert_yaxis()

#labels
column_labels = list('ABCD')
row_labels = list('WXYZ')
ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)

plt.show()

你很亲近 引用颜色条轴后,就可以对其进行任何操作,包括将文本标签放在中间。您可能需要使用格式使其更加可见。

import matplotlib.pyplot as plt
import numpy as np
from matplotlib.colors import ListedColormap

#discrete color scheme
cMap = ListedColormap(['white', 'green', 'blue','red'])

#data
np.random.seed(42)
data = np.random.rand(4, 4)
fig, ax = plt.subplots()
heatmap = ax.pcolor(data, cmap=cMap)

#legend
cbar = plt.colorbar(heatmap)

cbar.ax.get_yaxis().set_ticks([])
for j, lab in enumerate(['$0$','$1$','$2$','$>3$']):
    cbar.ax.text(.5, (2 * j + 1) / 8.0, lab, ha='center', va='center')
cbar.ax.get_yaxis().labelpad = 15
cbar.ax.set_ylabel('# of contacts', rotation=270)


# put the major ticks at the middle of each cell
ax.set_xticks(np.arange(data.shape[1]) + 0.5, minor=False)
ax.set_yticks(np.arange(data.shape[0]) + 0.5, minor=False)
ax.invert_yaxis()

#labels
column_labels = list('ABCD')
row_labels = list('WXYZ')
ax.set_xticklabels(column_labels, minor=False)
ax.set_yticklabels(row_labels, minor=False)

plt.show()

You were very close. Once you have a reference to the color bar axis, you can do what ever you want to it, including putting text labels in the middle. You might want to play with the formatting to make it more visible.


回答 1

要添加到tacaswell的答案中,该colorbar()函数具有可选cax输入,可用于传递应在其上绘制颜色条的轴。如果使用该输入,则可以使用该轴直接设置标签。

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable

fig, ax = plt.subplots()
heatmap = ax.imshow(data)
divider = make_axes_locatable(ax)
cax = divider.append_axes('bottom', size='10%', pad=0.6)
cb = fig.colorbar(heatmap, cax=cax, orientation='horizontal')

cax.set_xlabel('data label')  # cax == cb.ax

To add to tacaswell’s answer, the colorbar() function has an optional cax input you can use to pass an axis on which the colorbar should be drawn. If you are using that input, you can directly set a label using that axis.

import matplotlib.pyplot as plt
from mpl_toolkits.axes_grid1 import make_axes_locatable

fig, ax = plt.subplots()
heatmap = ax.imshow(data)
divider = make_axes_locatable(ax)
cax = divider.append_axes('bottom', size='10%', pad=0.6)
cb = fig.colorbar(heatmap, cax=cax, orientation='horizontal')

cax.set_xlabel('data label')  # cax == cb.ax

FutureWarning:逐元素比较失败;返回标量,但将来将执行元素比较

问题:FutureWarning:逐元素比较失败;返回标量,但将来将执行元素比较

0.19.1在Python 3上使用Pandas 。我在这些代码行上收到警告。我正在尝试获取一个包含所有Peter在column处存在string的行号的列表Unnamed: 5

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

它产生一个警告:

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

这是什么FutureFarning,由于它似乎起作用,因此我应该忽略它。

I am using Pandas 0.19.1 on Python 3. I am getting a warning on these lines of code. I’m trying to get a list that contains all the row numbers where string Peter is present at column Unnamed: 5.

df = pd.read_excel(xls_path)
myRows = df[df['Unnamed: 5'] == 'Peter'].index.tolist()

It produces a Warning:

"\Python36\lib\site-packages\pandas\core\ops.py:792: FutureWarning: elementwise 
comparison failed; returning scalar, but in the future will perform 
elementwise comparison 
result = getattr(x, name)(y)"

What is this FutureWarning and should I ignore it since it seems to work.


回答 0

此FutureWarning并非来自Pandas,而是来自numpy,并且该错误也影响了matplotlib和其他漏洞,以下是在更接近问题根源的位置重现警告的方法:

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

使用double equals运算符重现此错误的另一种方法:

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

受此FutureWarning影响的Matplotlib示例在其颤动图实施下:https ://matplotlib.org/examples/pylab_examples/quiver_demo.html

这里发生了什么?

当您将字符串与numpy的数字类型进行比较时,Numpy和本机python之间会发生什么分歧。请注意,左操作数是python的草皮,是原始字符串,中间操作是python的草皮,而右操作数是numpy的草皮。您应该返回Python样式的Scalar还是Numpy样式的ndarray布尔值?Numpy说布尔的ndarray,Pythonic开发人员不同意。经典对峙。

如果数组中存在item,应该是元素比较还是标量?

如果您的代码或库使用in==运算符将python字符串与numpy ndarrays比较,则它们不兼容,因此,当您尝试使用它时,它将返回标量,但仅在现在。警告表示将来这种行为可能会改变,因此,如果python / numpy决定采用Numpy样式,则您的代码会全程吐槽。

提交的错误报告:

Numpy和Python处于僵持状态,目前操作返回标量,但将来可能会改变。

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

两种解决方法:

无论您锁定Python和numpy的版本,忽略这些警告并期望行为不改变,或转换的左侧和右侧的操作数==,并in从一个numpy的类型或原始数值Python类型。

全局禁止警告:

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

逐行抑制警告。

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

只需按名称隐藏警告,然后在其旁边添加一个大声注释,提及python和numpy的当前版本,并说此代码很脆弱,并且需要这些版本,并在此处添加了链接。踢罐子的路。

TLDR: pandas是绝地武士;numpy是小屋 并且python是银河帝国。 https://youtu.be/OZczsiCfQQk?t=3

This FutureWarning isn’t from Pandas, it is from numpy and the bug also affects matplotlib and others, here’s how to reproduce the warning nearer to the source of the trouble:

import numpy as np
print(np.__version__)   # Numpy version '1.12.0'
'x' in np.arange(5)       #Future warning thrown here

FutureWarning: elementwise comparison failed; returning scalar instead, but in the 
future will perform elementwise comparison
False

Another way to reproduce this bug using the double equals operator:

import numpy as np
np.arange(5) == np.arange(5).astype(str)    #FutureWarning thrown here

An example of Matplotlib affected by this FutureWarning under their quiver plot implementation: https://matplotlib.org/examples/pylab_examples/quiver_demo.html

What’s going on here?

There is a disagreement between Numpy and native python on what should happen when you compare a strings to numpy’s numeric types. Notice the left operand is python’s turf, a primitive string, and the middle operation is python’s turf, but the right operand is numpy’s turf. Should you return a Python style Scalar or a Numpy style ndarray of Boolean? Numpy says ndarray of bool, Pythonic developers disagree. Classic standoff.

Should it be elementwise comparison or Scalar if item exists in the array?

If your code or library is using the in or == operators to compare python string to numpy ndarrays, they aren’t compatible, so when if you try it, it returns a scalar, but only for now. The Warning indicates that in the future this behavior might change so your code pukes all over the carpet if python/numpy decide to do adopt Numpy style.

Submitted Bug reports:

Numpy and Python are in a standoff, for now the operation returns a scalar, but in the future it may change.

https://github.com/numpy/numpy/issues/6784

https://github.com/pandas-dev/pandas/issues/7830

Two workaround solutions:

Either lockdown your version of python and numpy, ignore the warnings and expect the behavior to not change, or convert both left and right operands of == and in to be from a numpy type or primitive python numeric type.

Suppress the warning globally:

import warnings
import numpy as np
warnings.simplefilter(action='ignore', category=FutureWarning)
print('x' in np.arange(5))   #returns False, without Warning

Suppress the warning on a line by line basis.

import warnings
import numpy as np

with warnings.catch_warnings():
    warnings.simplefilter(action='ignore', category=FutureWarning)
    print('x' in np.arange(2))   #returns False, warning is suppressed

print('x' in np.arange(10))   #returns False, Throws FutureWarning

Just suppress the warning by name, then put a loud comment next to it mentioning the current version of python and numpy, saying this code is brittle and requires these versions and put a link to here. Kick the can down the road.

TLDR: pandas are Jedi; numpy are the hutts; and python is the galactic empire. https://youtu.be/OZczsiCfQQk?t=3


回答 1

当我尝试将index_col读取文件设置为Panda的数据帧时,出现相同的错误:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

我以前从未遇到过这样的错误。我仍然试图找出背后的原因(使用@Eric Leschinski的解释和其他解释)。

无论如何,在我找出原因之前,以下方法可以立即解决该问题:

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

一旦弄清这种行为的原因,我将立即更新。

I get the same error when I try to set the index_col reading a file into a Panda‘s data-frame:

df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=['0'])  ## or same with the following
df = pd.read_csv('my_file.tsv', sep='\t', header=0, index_col=[0])

I have never encountered such an error previously. I still am trying to figure out the reason behind this (using @Eric Leschinski explanation and others).

Anyhow, the following approach solves the problem for now until I figure the reason out:

df = pd.read_csv('my_file.tsv', sep='\t', header=0)  ## not setting the index_col
df.set_index(['0'], inplace=True)

I will update this as soon as I figure out the reason for such behavior.


回答 2

我对同一条警告消息的体验是由TypeError引起的。

TypeError:类型比较无效

因此,您可能要检查 Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

这是我可以复制警告消息的方法:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

希望能帮助到你。

My experience to the same warning message was caused by TypeError.

TypeError: invalid type comparison

So, you may want to check the data type of the Unnamed: 5

for x in df['Unnamed: 5']:
  print(type(x))  # are they 'str' ?

Here is how I can replicate the warning message:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randn(3, 2), columns=['num1', 'num2'])
df['num3'] = 3
df.loc[df['num3'] == '3', 'num3'] = 4  # TypeError and the Warning
df.loc[df['num3'] == 3, 'num3'] = 4  # No Error

Hope it helps.


回答 3

无法击败Eric Leschinski的详细答案,但这是针对我认为尚未提及的原始问题的快速解决方法-将字符串放在列表中并使用.isin而不是==

例如:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

Can’t beat Eric Leschinski’s awesomely detailed answer, but here’s a quick workaround to the original question that I don’t think has been mentioned yet – put the string in a list and use .isin instead of ==

For example:

import pandas as pd
import numpy as np

df = pd.DataFrame({"Name": ["Peter", "Joe"], "Number": [1, 2]})

# Raises warning using == to compare different types:
df.loc[df["Number"] == "2", "Number"]

# No warning using .isin:
df.loc[df["Number"].isin(["2"]), "Number"]

回答 4

一个快速的解决方法是使用numpy.core.defchararray。我也遇到了同样的警告消息,并且能够使用上述模块来解决它。

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

A quick workaround for this is to use numpy.core.defchararray. I also faced the same warning message and was able to resolve it using above module.

import numpy.core.defchararray as npd
resultdataset = npd.equal(dataset1, dataset2)

回答 5

埃里克(Eric)的回答很有帮助,说明了麻烦来自将Pandas系列(包含NumPy数组)与Python字符串进行比较。不幸的是,他的两个解决方法都只是抑制了警告。

要首先编写不会引起警告的代码,请显式地将字符串与Series的每个元素进行比较,并为每个元素获取单独的布尔值。例如,您可以使用map和匿名函数。

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

Eric’s answer helpfully explains that the trouble comes from comparing a Pandas Series (containing a NumPy array) to a Python string. Unfortunately, his two workarounds both just suppress the warning.

To write code that doesn’t cause the warning in the first place, explicitly compare your string to each element of the Series and get a separate bool for each. For example, you could use map and an anonymous function.

myRows = df[df['Unnamed: 5'].map( lambda x: x == 'Peter' )].index.tolist()

回答 6

如果数组不太大或数组不太多,则可以通过将其左侧强制==为字符串来摆脱困境:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

但这如果df['Unnamed: 5']是字符串则要慢约1.5倍,如果df['Unnamed: 5']是小的numpy数组(长度= 10)则要慢25-30倍,如果是长度为100的numpy数组则要慢150-160倍(时间超过500次试验) 。

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

结果:

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

If your arrays aren’t too big or you don’t have too many of them, you might be able to get away with forcing the left hand side of == to be a string:

myRows = df[str(df['Unnamed: 5']) == 'Peter'].index.tolist()

But this is ~1.5 times slower if df['Unnamed: 5'] is a string, 25-30 times slower if df['Unnamed: 5'] is a small numpy array (length = 10), and 150-160 times slower if it’s a numpy array with length 100 (times averaged over 500 trials).

a = linspace(0, 5, 10)
b = linspace(0, 50, 100)
n = 500
string1 = 'Peter'
string2 = 'blargh'
times_a = zeros(n)
times_str_a = zeros(n)
times_s = zeros(n)
times_str_s = zeros(n)
times_b = zeros(n)
times_str_b = zeros(n)
for i in range(n):
    t0 = time.time()
    tmp1 = a == string1
    t1 = time.time()
    tmp2 = str(a) == string1
    t2 = time.time()
    tmp3 = string2 == string1
    t3 = time.time()
    tmp4 = str(string2) == string1
    t4 = time.time()
    tmp5 = b == string1
    t5 = time.time()
    tmp6 = str(b) == string1
    t6 = time.time()
    times_a[i] = t1 - t0
    times_str_a[i] = t2 - t1
    times_s[i] = t3 - t2
    times_str_s[i] = t4 - t3
    times_b[i] = t5 - t4
    times_str_b[i] = t6 - t5
print('Small array:')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_a), mean(times_str_a)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_a)/mean(times_a)))

print('\nBig array')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_b), mean(times_str_b)))
print(mean(times_str_b)/mean(times_b))

print('\nString')
print('Time to compare without str conversion: {} s. With str conversion: {} s'.format(mean(times_s), mean(times_str_s)))
print('Ratio of time with/without string conversion: {}'.format(mean(times_str_s)/mean(times_s)))

Result:

Small array:
Time to compare without str conversion: 6.58464431763e-06 s. With str conversion: 0.000173756599426 s
Ratio of time with/without string conversion: 26.3881526541

Big array
Time to compare without str conversion: 5.44309616089e-06 s. With str conversion: 0.000870866775513 s
159.99474375821288

String
Time to compare without str conversion: 5.89370727539e-07 s. With str conversion: 8.30173492432e-07 s
Ratio of time with/without string conversion: 1.40857605178

回答 7

就我而言,发出警告的原因仅仅是布尔索引的常规类型-因为该系列只有np.nan。示范(熊猫1.0.3):

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
0    False
1    False

我认为对于pandas 1.0,他们确实希望您使用'string'允许pd.NA值的新数据类型:

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False

不喜欢他们在何时开始使用布尔索引等日常功能。

In my case, the warning occurred because of just the regular type of boolean indexing — because the series had only np.nan. Demonstration (pandas 1.0.3):

>>> import pandas as pd
>>> import numpy as np
>>> pd.Series([np.nan, 'Hi']) == 'Hi'
0    False
1     True
>>> pd.Series([np.nan, np.nan]) == 'Hi'
~/anaconda3/envs/ms3/lib/python3.7/site-packages/pandas/core/ops/array_ops.py:255: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  res_values = method(rvalues)
0    False
1    False

I think with pandas 1.0 they really want you to use the new 'string' datatype which allows for pd.NA values:

>>> pd.Series([pd.NA, pd.NA]) == 'Hi'
0    False
1    False
>>> pd.Series([np.nan, np.nan], dtype='string') == 'Hi'
0    <NA>
1    <NA>
>>> (pd.Series([np.nan, np.nan], dtype='string') == 'Hi').fillna(False)
0    False
1    False

Don’t love at which point they tinkered with every-day functionality such as boolean indexing.


回答 8

我收到此警告是因为我认为我的列包含空字符串,但是在检查时,它包含了np.nan!

if df['column'] == '':

将我的列更改为空字符串有帮助:)

I got this warning because I thought my column contained null strings, but on checking, it contained np.nan!

if df['column'] == '':

Changing my column to empty strings helped :)


回答 9

我已经比较了几种可能的方法,包括熊猫,几种numpy方法和列表理解方法。

首先,让我们从基线开始:

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

因此,我们的基准是该计数应该正确2,并且我们应该大约50 us

现在,我们尝试使用朴素的方法:

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

在这里,我们得到了错误的答案(NotImplemented != 2),这花了我们很长时间,并且引发了警告。

因此,我们将尝试另一种幼稚的方法:

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

同样,错误答案(0 != 2)。这更加隐蔽,因为没有后续警告(0可以像一样传递2)。

现在,让我们尝试一个列表理解:

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension

我们在这里得到正确的答案,而且速度很快!

另一种可能性pandas

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==

慢,但是正确!

最后,我将使用的选项是:将numpy数组转换为object类型:

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal

快速正确!

I’ve compared a few of the methods possible for doing this, including pandas, several numpy methods, and a list comprehension method.

First, let’s start with a baseline:

>>> import numpy as np
>>> import operator
>>> import pandas as pd

>>> x = [1, 2, 1, 2]
>>> %time count = np.sum(np.equal(1, x))
>>> print("Count {} using numpy equal with ints".format(count))
CPU times: user 52 µs, sys: 0 ns, total: 52 µs
Wall time: 56 µs
Count 2 using numpy equal with ints

So, our baseline is that the count should be correct 2, and we should take about 50 us.

Now, we try the naive method:

>>> x = ['s', 'b', 's', 'b']
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 145 µs, sys: 24 µs, total: 169 µs
Wall time: 158 µs
Count NotImplemented using numpy equal
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:1: FutureWarning: elementwise comparison failed; returning scalar instead, but in the future will perform elementwise comparison
  """Entry point for launching an IPython kernel.

And here, we get the wrong answer (NotImplemented != 2), it takes us a long time, and it throws the warning.

So we’ll try another naive method:

>>> %time count = np.sum(x == 's')
>>> print("Count {} using ==".format(count))
CPU times: user 46 µs, sys: 1 µs, total: 47 µs
Wall time: 50.1 µs
Count 0 using ==

Again, the wrong answer (0 != 2). This is even more insidious because there’s no subsequent warnings (0 can be passed around just like 2).

Now, let’s try a list comprehension:

>>> %time count = np.sum([operator.eq(_x, 's') for _x in x])
>>> print("Count {} using list comprehension".format(count))
CPU times: user 55 µs, sys: 1 µs, total: 56 µs
Wall time: 60.3 µs
Count 2 using list comprehension

We get the right answer here, and it’s pretty fast!

Another possibility, pandas:

>>> y = pd.Series(x)
>>> %time count = np.sum(y == 's')
>>> print("Count {} using pandas ==".format(count))
CPU times: user 453 µs, sys: 31 µs, total: 484 µs
Wall time: 463 µs
Count 2 using pandas ==

Slow, but correct!

And finally, the option I’m going to use: casting the numpy array to the object type:

>>> x = np.array(['s', 'b', 's', 'b']).astype(object)
>>> %time count = np.sum(np.equal('s', x))
>>> print("Count {} using numpy equal".format(count))
CPU times: user 50 µs, sys: 1 µs, total: 51 µs
Wall time: 55.1 µs
Count 2 using numpy equal

Fast and correct!


回答 10

我有导致错误的此代码:

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

我将其更改为:

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

为了避免比较,它会发出警告-如上所述。我只需要避免这种异常,因为dfObj.loc在for循环中,也许有一种方法可以告诉它不要检查已更改的行。

I had this code which was causing the error:

for t in dfObj['time']:
  if type(t) == str:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int

I changed it to this:

for t in dfObj['time']:
  try:
    the_date = dateutil.parser.parse(t)
    loc_dt_int = int(the_date.timestamp())
    dfObj.loc[t == dfObj.time, 'time'] = loc_dt_int
  except Exception as e:
    print(e)
    continue

to avoid the comparison, which is throwing the warning – as stated above. I only had to avoid the exception because of dfObj.loc in the for loop, maybe there is a way to tell it not to check the rows it has already changed.


直方图Matplotlib

问题:直方图Matplotlib

所以我有一个小问题。我有一个scipy数据集,该数据集已经是直方图格式,因此我具有了bin的中心以及每个bin的事件数。现在如何绘制直方图。我只是尝试做

bins, n=hist()

但这不是那样。有什么建议吗?

So I have a little problem. I have a data set in scipy that is already in the histogram format, so I have the center of the bins and the number of events per bin. How can I now plot is as a histogram. I tried just doing

bins, n=hist()

but it didn’t like that. Any recommendations?


回答 0

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
hist, bins = np.histogram(x, bins=50)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
plt.show()

面向对象的界面也很简单:

fig, ax = plt.subplots()
ax.bar(center, hist, align='center', width=width)
fig.savefig("1.png")

如果您使用的是自定义(非恒定)箱,则可以使用传递计算宽度np.diff,将宽度传递到,ax.bar并使用ax.set_xticks来标记箱边缘:

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
bins = [0, 40, 60, 75, 90, 110, 125, 140, 160, 200]
hist, bins = np.histogram(x, bins=bins)
width = np.diff(bins)
center = (bins[:-1] + bins[1:]) / 2

fig, ax = plt.subplots(figsize=(8,3))
ax.bar(center, hist, align='center', width=width)
ax.set_xticks(bins)
fig.savefig("/tmp/out.png")

plt.show()

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
hist, bins = np.histogram(x, bins=50)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
plt.show()

The object-oriented interface is also straightforward:

fig, ax = plt.subplots()
ax.bar(center, hist, align='center', width=width)
fig.savefig("1.png")

If you are using custom (non-constant) bins, you can pass compute the widths using np.diff, pass the widths to ax.bar and use ax.set_xticks to label the bin edges:

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
bins = [0, 40, 60, 75, 90, 110, 125, 140, 160, 200]
hist, bins = np.histogram(x, bins=bins)
width = np.diff(bins)
center = (bins[:-1] + bins[1:]) / 2

fig, ax = plt.subplots(figsize=(8,3))
ax.bar(center, hist, align='center', width=width)
ax.set_xticks(bins)
fig.savefig("/tmp/out.png")

plt.show()


回答 1

如果您不想要条形图,可以这样绘制:

import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

bins, edges = np.histogram(x, 50, normed=1)
left,right = edges[:-1],edges[1:]
X = np.array([left,right]).T.flatten()
Y = np.array([bins,bins]).T.flatten()

plt.plot(X,Y)
plt.show()

If you don’t want bars you can plot it like this:

import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

bins, edges = np.histogram(x, 50, normed=1)
left,right = edges[:-1],edges[1:]
X = np.array([left,right]).T.flatten()
Y = np.array([bins,bins]).T.flatten()

plt.plot(X,Y)
plt.show()


回答 2

我知道这不能回答您的问题,但是当我搜索matplotlib直方图解决方案时,我总是最终会在此页面上结束,因为histogram_demo从matplotlib示例库页面中删除了简单方法。

这是一个解决方案,不需要numpy导入。我只导入numpy来生成x要绘制的数据。它依赖于函数hist而不是@unutbu bar答案中的函数。

import numpy as np
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

import matplotlib.pyplot as plt
plt.hist(x, bins=50)
plt.savefig('hist.png')

还要查看matplotlib画廊matplotlib示例

I know this does not answer your question, but I always end up on this page, when I search for the matplotlib solution to histograms, because the simple histogram_demo was removed from the matplotlib example gallery page.

Here is a solution, which doesn’t require numpy to be imported. I only import numpy to generate the data x to be plotted. It relies on the function hist instead of the function bar as in the answer by @unutbu.

import numpy as np
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

import matplotlib.pyplot as plt
plt.hist(x, bins=50)
plt.savefig('hist.png')

Also check out the matplotlib gallery and the matplotlib examples.


回答 3

如果您愿意使用pandas

pandas.DataFrame({'x':hist[1][1:],'y':hist[0]}).plot(x='x',kind='bar')

If you’re willing to use pandas:

pandas.DataFrame({'x':hist[1][1:],'y':hist[0]}).plot(x='x',kind='bar')

回答 4

我认为这可能对某人有用。

令我烦恼的是Numpy的直方图函数(尽管我很高兴有这样做的理由),它返回了每个bin的边缘,而不是bin的值。尽管这对于浮点数有意义,浮点数可以位于一个区间内(即,中心值没有太大意义),但在处理离散值或整数(0、1、2等)时,这不是理想的输出。特别是,从np.histogram返回的bin的长度不等于计数/密度的长度。

为了解决这个问题,我使用了np.digitize来量化输入,并返回离散数量的bin,以及每个bin的计数分数。您可以轻松地进行编辑以获得计数的整数。

def compute_PMF(data)
    import numpy as np
    from collections import Counter
    _, bins = np.histogram(data, bins='auto', range=(data.min(), data.max()), density=False)
    h = Counter(np.digitize(data,bins) - 1)
    weights = np.asarray(list(h.values())) 
    weights = weights / weights.sum()
    values = np.asarray(list(h.keys()))
    return weights, values
####

参考:

[1] https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

[2] https://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html

I think this might be useful for someone.

Numpy’s histogram function, to my annoyance (although, I appreciate there is a good reason for it), returns back the edges of each bin, rather than the value of the bin. While, this makes sense for floating-point numbers, which can lie within an interval (i.e. the center value is not super meaningful), this is not the desired output when dealing with discrete values or integers (0, 1, 2, etc). In particular, the length of bins returned from np.histogram is not equal to the length of the counts / density.

To get around this, I used np.digitize to quantize the input, and return a discrete number of bins, along with fraction of counts for each bin. You could easily edit to get the integer number of counts.

def compute_PMF(data)
    import numpy as np
    from collections import Counter
    _, bins = np.histogram(data, bins='auto', range=(data.min(), data.max()), density=False)
    h = Counter(np.digitize(data,bins) - 1)
    weights = np.asarray(list(h.values())) 
    weights = weights / weights.sum()
    values = np.asarray(list(h.keys()))
    return weights, values
####

Refs:

[1] https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

[2] https://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html


matplotlib / seaborn:热图图的第一行和最后一行被切成两半

问题:matplotlib / seaborn:热图图的第一行和最后一行被切成两半

用seaborn(和matplotlib关联矩阵)绘制热图时,第一行和最后一行被切成两半。当我运行这个在网上找到的最小代码示例时,也会发生这种情况。

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv')
plt.figure(figsize=(10,5))
sns.heatmap(data.corr())
plt.show()

y轴上的标签在正确的位置,但是行并不完全在此处。

几天前,它按预期工作。从那时起,我安装了texlive-xetex,因此我再次将其删除,但是并不能解决我的问题。

有什么想法我可能会错过吗?

When plotting heatmaps with seaborn (and correlation matrices with matplotlib) the first and the last row is cut in halve. This happens also when I run this minimal code example which I found online.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data = pd.read_csv('https://raw.githubusercontent.com/resbaz/r-novice-gapminder-files/master/data/gapminder-FiveYearData.csv')
plt.figure(figsize=(10,5))
sns.heatmap(data.corr())
plt.show()

The labels at the y axis are on the correct spot, but the rows aren’t completely there.

A few days ago, it work as intended. Since then, I installed texlive-xetex so I removed it again but it didn’t solve my problem.

Any ideas what I could be missing?


回答 0

不幸的是,matplotlib 3.1.1 打破了海洋热图;并通常使用固定刻度的倒轴。
在当前的开发版本中已修复此问题。你可能因此

  • 恢复到matplotlib 3.1.0
  • 使用matplotlib 3.1.2或更高版本
  • 手动设置热图限制(ax.set_ylim(bottom, top) # set the ylim to bottom, top

Unfortunately matplotlib 3.1.1 broke seaborn heatmaps; and in general inverted axes with fixed ticks.
This is fixed in the current development version; you may hence

  • revert to matplotlib 3.1.0
  • use matplotlib 3.1.2 or higher
  • set the heatmap limits manually (ax.set_ylim(bottom, top) # set the ylim to bottom, top)

回答 1

这是3.1.0和3.1.1之间的matplotlib回归中的错误,您可以通过以下方法更正此错误:

import seaborn as sns
df_corr = someDataFrame.corr()
ax = sns.heatmap(df_corr, annot=True) #notation: "annot" not "annote"
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

Its a bug in the matplotlib regression between 3.1.0 and 3.1.1 You can correct this by:

import seaborn as sns
df_corr = someDataFrame.corr()
ax = sns.heatmap(df_corr, annot=True) #notation: "annot" not "annote"
bottom, top = ax.get_ylim()
ax.set_ylim(bottom + 0.5, top - 0.5)

回答 2

已使用上述方法修复并手动设置了热图限制。

第一

ax = sns.heatmap(...

检查当前轴

ax.get_ylim()
(5.5, 0.5)

固定于

ax.set_ylim(6.0, 0)

Fixed using the above and setting the heatmap limits manually.

First

ax = sns.heatmap(...

checked the current axes with

ax.get_ylim()
(5.5, 0.5)

Fixed with

ax.set_ylim(6.0, 0)

回答 3

我通过在代码中添加以下行来解决了这一问题matplotlib==3.1.1

ax.set_ylim(sorted(ax.get_xlim(), reverse=True))

注意 起作用的唯一原因是因为x轴未更改,因此使用未来的mpl版本需要您自担风险

I solved it by adding this line in my code, with matplotlib==3.1.1:

ax.set_ylim(sorted(ax.get_xlim(), reverse=True))

NB. The only reason this works is because the x-axis isn’t changed, so use at your own risk with future mpl versions


回答 4

matplotlib 3.1.2已发布-可通过conda-forge在Anaconda云中找到,但我无法通过conda install进行安装。手动替代方法可行:从github下载matplotlib 3.1.2并通过pip安装

 % curl https://codeload.github.com/matplotlib/matplotlib/tar.gz/v3.1.2 --output matplotlib-3.1.2.tar.gz
 % pip install matplotlib-3.1.2.tar.gz

matplotlib 3.1.2 is out – It is available in the Anaconda cloud via conda-forge but I was not able to install it via conda install. The manual alternative worked: Download matplotlib 3.1.2 from github and install via pip

 % curl https://codeload.github.com/matplotlib/matplotlib/tar.gz/v3.1.2 --output matplotlib-3.1.2.tar.gz
 % pip install matplotlib-3.1.2.tar.gz

回答 5

重要性提示所建议的那样,它发生在matplotlib版本3.1.1中

以下解决了我的问题

pip install matplotlib==3.1.0

It happens with matplotlib version 3.1.1 as suggested by importanceofbeingernest

Following solved my problem

pip install matplotlib==3.1.0


回答 6

rustyDev关于conda-forge是正确的,但是我不需要从github下载进行手动pip安装。对我来说,在Windows上,它可以直接工作。而且情节都很好。

https://anaconda.org/conda-forge/matplotlib

conda install -c conda-forge matplotlib

可选点,答案不需要:

之后,我尝试了其他步骤,但没有必要:在conda提示符下:conda search matplotlib –info未显示新版本信息,最新信息为3.1.1。因此,我尝试使用pip进行尝试,pip install matplotlib==3.1.2但是pip说“要求已经满足”

然后根据medium.com/@rakshithvasudev/…获取版本,python - import matplotlib - matplotlib.__version__表明3.1.2已成功安装。

顺便说一句,将Spyder更新到v4.0.0后,我直接遇到了此错误。误差在混淆矩阵图中。几个月前已经提到过这一点。stackoverflow.com/questions/57225685/…已经与这个棘手的问题相关联。

rustyDev is right about conda-forge, but I did not need to do a manual pip install from a github download. For me, on Windows, it worked directly. And the plots are all nice again.

https://anaconda.org/conda-forge/matplotlib

conda install -c conda-forge matplotlib

optional points, not needed for the answer:

Afterwards, I tried other steps, but they are not needed: In conda prompt: conda search matplotlib –info showed no new version info, the most recent info was for 3.1.1. Thus I tried pip using pip install matplotlib==3.1.2 But pip says “Requirement already satisfied”

Then getting the version according to medium.com/@rakshithvasudev/… python - import matplotlib - matplotlib.__version__ shows that 3.1.2 was successfully installed

Btw, I had this error directly after updating Spyder to v4.0.0. The error was in a plot of a confusion matrix. This was mentioned already some months ago. stackoverflow.com/questions/57225685/… which is already linked to this seaborn question.


回答 7

康达安装matplotlib = 3.1.0

这对我有用,并将matplotlib从3.1.1降级到3.1.0,并且热图开始正确运行

conda install matplotlib=3.1.0

This worked for me and downgraded matplotlib from 3.1.1 to 3.1.0 and the heatmaps started to behave correctly


回答 8

我用以下代码解决了这个问题:

I solved this problem with the following code:


使用自定义文本绘制x轴点

问题:使用自定义文本绘制x轴点

我正在使用matplotlib和python绘制图,如下面的示例代码。

x = array([0,1,2,3])
y = array([20,21,22,23])
plot(x,y)
show()

因为它是上面x轴上的代码,所以我将看到绘制的值,0.0, 0.5, 1.0, 1.5即与参考x值相同的值。

无论如何,有没有将x的每个点映射到不同的字符串?因此,例如,我希望x轴显示月份名称(字符串Jun, July,...)或其他字符串,如人员姓名("John", "Arnold", ...)或时钟时间("12:20", "12:21", "12:22", ..)。

您知道我能做什么或要看什么功能吗?
对我而言,这有matplotlib.ticker帮助吗?

I am drawing a plot using matplotlib and python like the sample code below.

x = array([0,1,2,3])
y = array([20,21,22,23])
plot(x,y)
show()

As it is the code above on the x axis I will see drawn values 0.0, 0.5, 1.0, 1.5 i.e. the same values of my reference x values.

Is there anyway to map each point of x to a different string? So for example I want x axis to show months names( strings Jun, July,...) or other strings like people names ( "John", "Arnold", ... ) or clock time ( "12:20", "12:21", "12:22", .. ).

Do you know what I can do or what function to have a look at?
For my purpose could it be matplotlib.ticker of help?


回答 0

您可以使用pyplot.xticks手动设置xticks(和yticks):

import matplotlib.pyplot as plt
import numpy as np

x = np.array([0,1,2,3])
y = np.array([20,21,22,23])
my_xticks = ['John','Arnold','Mavis','Matt']
plt.xticks(x, my_xticks)
plt.plot(x, y)
plt.show()

You can manually set xticks (and yticks) using pyplot.xticks:

import matplotlib.pyplot as plt
import numpy as np

x = np.array([0,1,2,3])
y = np.array([20,21,22,23])
my_xticks = ['John','Arnold','Mavis','Matt']
plt.xticks(x, my_xticks)
plt.plot(x, y)
plt.show()


回答 1

这对我有用。X轴上的每个月

str_month_list = ['January','February','March','April','May','June','July','August','September','October','November','December']
ax.set_xticks(range(0,12))
ax.set_xticklabels(str_month_list)

This worked for me. Each month on X axis

str_month_list = ['January','February','March','April','May','June','July','August','September','October','November','December']
ax.set_xticks(range(0,12))
ax.set_xticklabels(str_month_list)

使用virtualenv进行pip安装Matplotlib错误

问题:使用virtualenv进行pip安装Matplotlib错误

我正在尝试在新的virtualenv中安装matplotlib。

当我做:

pip install matplotlib

要么

pip install http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-1.1.0/matplotlib-1.1.0.tar.gz

我收到此错误:

building 'matplotlib._png' extension

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -  DPY_ARRAY_UNIQUE_SYMBOL=MPL_ARRAY_API -DPYCXX_ISO_CPP_LIB=1 -I/usr/local/include -I/usr/include -I. -I/home/sam/django-projects/datazone/local/lib/python2.7/site-packages/numpy/core/include -I. -I/usr/include/python2.7 -c src/_png.cpp -o build/temp.linux-x86_64-2.7/src/_png.o

src/_png.cpp:10:20: fatal error: png.h: No such file or directory

compilation terminated.

error: command 'gcc' failed with exit status 1

有人知道发生了什么吗?

任何帮助,不胜感激。

I am trying to install matplotlib in a new virtualenv.

When I do:

pip install matplotlib

or

pip install http://sourceforge.net/projects/matplotlib/files/matplotlib/matplotlib-1.1.0/matplotlib-1.1.0.tar.gz

I get this error:

building 'matplotlib._png' extension

gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -fPIC -  DPY_ARRAY_UNIQUE_SYMBOL=MPL_ARRAY_API -DPYCXX_ISO_CPP_LIB=1 -I/usr/local/include -I/usr/include -I. -I/home/sam/django-projects/datazone/local/lib/python2.7/site-packages/numpy/core/include -I. -I/usr/include/python2.7 -c src/_png.cpp -o build/temp.linux-x86_64-2.7/src/_png.o

src/_png.cpp:10:20: fatal error: png.h: No such file or directory

compilation terminated.

error: command 'gcc' failed with exit status 1

Anyone have an idea what is going on?

Any help much appreciated.


回答 0

构建Matplotlib需要libpngfreetype以及)不是python库,因此pip无法处理(或freetype)安装它。

您需要按照libpng-develfreetype-devel(或与您的操作系统等效的其他方式)安装一些东西。

请参阅matplotlib 的构建要求/说明

Building Matplotlib requires libpng (and freetype, as well) which isn’t a python library, so pip doesn’t handle installing it (or freetype).

You’ll need to install something along the lines of libpng-devel and freetype-devel (or whatever the equivalent is for your OS).

See the building requirements/instructions for matplotlib.


回答 1

要生成png格式的图形,您需要安装以下依赖包

sudo apt-get install libpng-dev
sudo apt-get install libfreetype6-dev

Ubuntu https://apps.ubuntu.com/cat/applications/libpng12-0/ 或使用以下命令

sudo apt-get install libpng12-0

To generate graph in png format you need to Install following dependent packages

sudo apt-get install libpng-dev
sudo apt-get install libfreetype6-dev

Ubuntu https://apps.ubuntu.com/cat/applications/libpng12-0/ or using following command

sudo apt-get install libpng12-0

回答 2

由于我两次都在这个问题上苦苦挣扎(即使在全新安装kubuntu 15.04之后),而安装freetype也解决不了任何问题,因此我进行了进一步调查。

解决方案:
来自github问题:

仅当未安装pkg-config时,才会发生此错误;
现在,简单的操作
sudo apt-get install pkg-config
将支持包含路径。

安装完成后顺利进行。

As I have struggled with this issue twice (even after fresh kubuntu 15.04 install) and installing freetype did not solve anything, I investigated further.

The solution:
From github issue:

This bug only occurs if pkg-config is not installed;
a simple
sudo apt-get install pkg-config
will shore up the include paths for now.

After this installation proceeds smoothly.


回答 3

作为补充,在Amazon EC2上,我需要做的是:

sudo yum install freetype-devel
sudo yum install libpng-devel
sudo pip install matplotlib

As a supplementary, on Amazon EC2, what I need to do is:

sudo yum install freetype-devel
sudo yum install libpng-devel
sudo pip install matplotlib

回答 4

在OSX上,我可以通过以下方式安装matplotlib:

pip install matplotlib==1.4.0

只有在我跑完之后:

brew install freetype

On OSX I was able to get matplotlib to install via:

pip install matplotlib==1.4.0

only after I ran:

brew install freetype

回答 5

在Windows下,这对我有用:

python -m pip install -U pip setuptools
python -m pip install matplotlib

(来自https://matplotlib.org/users/installing.html

Under Windows this worked for me:

python -m pip install -U pip setuptools
python -m pip install matplotlib

(from https://matplotlib.org/users/installing.html)


回答 6

sudo apt-get install libpng-dev libjpeg8-dev libfreetype6-dev

在Ubuntu 14.04上为我工作

sudo apt-get install libpng-dev libjpeg8-dev libfreetype6-dev

worked for me on Ubuntu 14.04


回答 7

以上答案均不适用于Mint,因此我做到了:

sudo apt-get install build-essential g++

None of the above answers worked for me in Mint, so I did:

sudo apt-get install build-essential g++

回答 8

如果在MacOSx上尝试

xcode-select --install

这符合subprocess 32,失败的原因。

If on MacOSx try

xcode-select --install

This complies subprocess 32, the reason for the failure.


回答 9

要减少安装所需的软件包,您只需要

apt-get install -y \
    libfreetype6-dev \
    libxft-dev && \
    pip install matplotlib

您将在本地安装以下软件包

Collecting matplotlib
  Downloading matplotlib-2.2.0-cp35-cp35m-manylinux1_x86_64.whl (12.5MB)
Collecting pytz (from matplotlib)
  Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting python-dateutil>=2.1 (from matplotlib)
  Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib)
  Downloading pyparsing-2.2.0-py2.py3-none-any.whl (56kB)
Requirement already satisfied: six>=1.10 in /opt/conda/envs/pytorch-py35/lib/python3.5/site-packages (from matplotlib)
Collecting cycler>=0.10 (from matplotlib)
  Downloading cycler-0.10.0-py2.py3-none-any.whl
Collecting kiwisolver>=1.0.1 (from matplotlib)
  Downloading kiwisolver-1.0.1-cp35-cp35m-manylinux1_x86_64.whl (949kB)
Requirement already satisfied: numpy>=1.7.1 in /opt/conda/envs/pytorch-py35/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: setuptools in /opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg (from kiwisolver>=1.0.1->matplotlib)
Installing collected packages: pytz, python-dateutil, pyparsing, cycler, kiwisolver, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-2.2.0 pyparsing-2.2.0 python-dateutil-2.6.1 pytz-2018.3

To reduce the required packages to install you just need

apt-get install -y \
    libfreetype6-dev \
    libxft-dev && \
    pip install matplotlib

and you will get the following packages locally installed

Collecting matplotlib
  Downloading matplotlib-2.2.0-cp35-cp35m-manylinux1_x86_64.whl (12.5MB)
Collecting pytz (from matplotlib)
  Downloading pytz-2018.3-py2.py3-none-any.whl (509kB)
Collecting python-dateutil>=2.1 (from matplotlib)
  Downloading python_dateutil-2.6.1-py2.py3-none-any.whl (194kB)
Collecting pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 (from matplotlib)
  Downloading pyparsing-2.2.0-py2.py3-none-any.whl (56kB)
Requirement already satisfied: six>=1.10 in /opt/conda/envs/pytorch-py35/lib/python3.5/site-packages (from matplotlib)
Collecting cycler>=0.10 (from matplotlib)
  Downloading cycler-0.10.0-py2.py3-none-any.whl
Collecting kiwisolver>=1.0.1 (from matplotlib)
  Downloading kiwisolver-1.0.1-cp35-cp35m-manylinux1_x86_64.whl (949kB)
Requirement already satisfied: numpy>=1.7.1 in /opt/conda/envs/pytorch-py35/lib/python3.5/site-packages (from matplotlib)
Requirement already satisfied: setuptools in /opt/conda/envs/pytorch-py35/lib/python3.5/site-packages/setuptools-27.2.0-py3.5.egg (from kiwisolver>=1.0.1->matplotlib)
Installing collected packages: pytz, python-dateutil, pyparsing, cycler, kiwisolver, matplotlib
Successfully installed cycler-0.10.0 kiwisolver-1.0.1 matplotlib-2.2.0 pyparsing-2.2.0 python-dateutil-2.6.1 pytz-2018.3

回答 10

另一个选择是安装anaconda,该软件包随附以下软件包:Matplotlib,numpy和pandas。

https://anaconda.org

Another option is to install anaconda, which comes with packages such as: Matplotlib, numpy and pandas.

https://anaconda.org