标签归档:histogram

numpy.histogram()如何工作?

问题:numpy.histogram()如何工作?

在阅读numpy时,我遇到了函数numpy.histogram()

它是做什么用的,它是如何工作的?他们在文档中提到了bin:它们是什么?

一些谷歌搜索使我大致了解直方图定义。我明白了。但不幸的是,我无法将这些知识与文档中给出的示例联系起来。

While reading up on numpy, I encountered the function numpy.histogram().

What is it for and how does it work? In the docs they mention bins: What are they?

Some googling led me to the definition of Histograms in general. I get that. But unfortunately I can’t link this knowledge to the examples given in the docs.


回答 0

bin是一个范围,代表直方图的单个条形沿X轴的宽度。您也可以将其称为间隔。(维基百科更正式地将它们定义为“不相交的类别”。)

脾气暴躁 histogram函数不会绘制直方图,但是会计算落在每个仓中的输入数据的出现次数,这反过来又确定了每个条的面积(如果仓的宽度不相等,则不一定是高度)。

在此示例中:

 np.histogram([1, 2, 1], bins=[0, 1, 2, 3])

共有3个档位,其值分别从0到1(不包括1),1到2(不包括2)和2到3(包括3)。[0, 1, 2, 3]在本示例中,Numpy通过给出定界符列表()来定义这些bin ,尽管它也会返回结果中的bin,因为如果未指定,则可以从输入中自动选择它们。如果bins=5,例如,它会使用5桶相等宽度传播的最小输入值和最高输入值之间。

输入值为1、2和1。因此,仓“ 1至2”包含两个事件(两个1值),仓“ 2至3”包含一个事件(2)。这些结果在返回的元组的第一项中array([0, 2, 1])

由于此处的垃圾箱宽度相等,因此可以将出现次数用于每个条形的高度。绘制时,您将具有:

  • X轴上范围/ bin [0,1]的高度为0的条,
  • 范围/箱[1,2]的高度为2的条,
  • 范围/箱[2,3]的高度为1的条。

您可以直接使用Matplotlib绘制此图(它的hist函数还会返回垃圾箱和值):

>>> import matplotlib.pyplot as plt
>>> plt.hist([1, 2, 1], bins=[0, 1, 2, 3])
(array([0, 2, 1]), array([0, 1, 2, 3]), <a list of 3 Patch objects>)
>>> plt.show()

A bin is range that represents the width of a single bar of the histogram along the X-axis. You could also call this the interval. (Wikipedia defines them more formally as “disjoint categories”.)

The Numpy histogram function doesn’t draw the histogram, but it computes the occurrences of input data that fall within each bin, which in turns determines the area (not necessarily the height if the bins aren’t of equal width) of each bar.

In this example:

 np.histogram([1, 2, 1], bins=[0, 1, 2, 3])

There are 3 bins, for values ranging from 0 to 1 (excl 1.), 1 to 2 (excl. 2) and 2 to 3 (incl. 3), respectively. The way Numpy defines these bins if by giving a list of delimiters ([0, 1, 2, 3]) in this example, although it also returns the bins in the results, since it can choose them automatically from the input, if none are specified. If bins=5, for example, it will use 5 bins of equal width spread between the minimum input value and the maximum input value.

The input values are 1, 2 and 1. Therefore, bin “1 to 2” contains two occurrences (the two 1 values), and bin “2 to 3” contains one occurrence (the 2). These results are in the first item in the returned tuple: array([0, 2, 1]).

Since the bins here are of equal width, you can use the number of occurrences for the height of each bar. When drawn, you would have:

  • a bar of height 0 for range/bin [0,1] on the X-axis,
  • a bar of height 2 for range/bin [1,2],
  • a bar of height 1 for range/bin [2,3].

You can plot this directly with Matplotlib (its hist function also returns the bins and the values):

>>> import matplotlib.pyplot as plt
>>> plt.hist([1, 2, 1], bins=[0, 1, 2, 3])
(array([0, 2, 1]), array([0, 1, 2, 3]), <a list of 3 Patch objects>)
>>> plt.show()


回答 1

import numpy as np    
hist, bin_edges = np.histogram([1, 1, 2, 2, 2, 2, 3], bins = range(5))

在下面,hist指示箱#0中有0个物料,箱#1中有2个物料,箱#3中有4个物料,箱#4中有1个物料。

print(hist)
# array([0, 2, 4, 1])   

bin_edges 表示bin#0是间隔[0,1),bin#1是[1,2),…,bin#3是[3,4)。

print (bin_edges)
# array([0, 1, 2, 3, 4]))  

玩上面的代码,将输入更改为np.histogram,看看它如何工作。


但是一张图片值得一千个字:

import matplotlib.pyplot as plt
plt.bar(bin_edges[:-1], hist, width = 1)
plt.xlim(min(bin_edges), max(bin_edges))
plt.show()   

import numpy as np    
hist, bin_edges = np.histogram([1, 1, 2, 2, 2, 2, 3], bins = range(5))

Below, hist indicates that there are 0 items in bin #0, 2 in bin #1, 4 in bin #3, 1 in bin #4.

print(hist)
# array([0, 2, 4, 1])   

bin_edges indicates that bin #0 is the interval [0,1), bin #1 is [1,2), …, bin #3 is [3,4).

print (bin_edges)
# array([0, 1, 2, 3, 4]))  

Play with the above code, change the input to np.histogram and see how it works.


But a picture is worth a thousand words:

import matplotlib.pyplot as plt
plt.bar(bin_edges[:-1], hist, width = 1)
plt.xlim(min(bin_edges), max(bin_edges))
plt.show()   


回答 2

另一个有用的事情numpy.histogram是将输出绘制为线图上的x和y坐标。例如:

arr = np.random.randint(1, 51, 500)
y, x = np.histogram(arr, bins=np.arange(51))
fig, ax = plt.subplots()
ax.plot(x[:-1], y)
fig.show()

这对于可视化直方图可能是一种有用的方法,在这种情况下,您希望获得更高的粒度级别,而无需到处都有条形图。在图像直方图中用于识别极端像素值非常有用。

Another useful thing to do with numpy.histogram is to plot the output as the x and y coordinates on a linegraph. For example:

arr = np.random.randint(1, 51, 500)
y, x = np.histogram(arr, bins=np.arange(51))
fig, ax = plt.subplots()
ax.plot(x[:-1], y)
fig.show()

This can be a useful way to visualize histograms where you would like a higher level of granularity without bars everywhere. Very useful in image histograms for identifying extreme pixel values.


直方图Matplotlib

问题:直方图Matplotlib

所以我有一个小问题。我有一个scipy数据集,该数据集已经是直方图格式,因此我具有了bin的中心以及每个bin的事件数。现在如何绘制直方图。我只是尝试做

bins, n=hist()

但这不是那样。有什么建议吗?

So I have a little problem. I have a data set in scipy that is already in the histogram format, so I have the center of the bins and the number of events per bin. How can I now plot is as a histogram. I tried just doing

bins, n=hist()

but it didn’t like that. Any recommendations?


回答 0

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
hist, bins = np.histogram(x, bins=50)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
plt.show()

面向对象的界面也很简单:

fig, ax = plt.subplots()
ax.bar(center, hist, align='center', width=width)
fig.savefig("1.png")

如果您使用的是自定义(非恒定)箱,则可以使用传递计算宽度np.diff,将宽度传递到,ax.bar并使用ax.set_xticks来标记箱边缘:

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
bins = [0, 40, 60, 75, 90, 110, 125, 140, 160, 200]
hist, bins = np.histogram(x, bins=bins)
width = np.diff(bins)
center = (bins[:-1] + bins[1:]) / 2

fig, ax = plt.subplots(figsize=(8,3))
ax.bar(center, hist, align='center', width=width)
ax.set_xticks(bins)
fig.savefig("/tmp/out.png")

plt.show()

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
hist, bins = np.histogram(x, bins=50)
width = 0.7 * (bins[1] - bins[0])
center = (bins[:-1] + bins[1:]) / 2
plt.bar(center, hist, align='center', width=width)
plt.show()

The object-oriented interface is also straightforward:

fig, ax = plt.subplots()
ax.bar(center, hist, align='center', width=width)
fig.savefig("1.png")

If you are using custom (non-constant) bins, you can pass compute the widths using np.diff, pass the widths to ax.bar and use ax.set_xticks to label the bin edges:

import matplotlib.pyplot as plt
import numpy as np

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)
bins = [0, 40, 60, 75, 90, 110, 125, 140, 160, 200]
hist, bins = np.histogram(x, bins=bins)
width = np.diff(bins)
center = (bins[:-1] + bins[1:]) / 2

fig, ax = plt.subplots(figsize=(8,3))
ax.bar(center, hist, align='center', width=width)
ax.set_xticks(bins)
fig.savefig("/tmp/out.png")

plt.show()


回答 1

如果您不想要条形图,可以这样绘制:

import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

bins, edges = np.histogram(x, 50, normed=1)
left,right = edges[:-1],edges[1:]
X = np.array([left,right]).T.flatten()
Y = np.array([bins,bins]).T.flatten()

plt.plot(X,Y)
plt.show()

If you don’t want bars you can plot it like this:

import numpy as np
import matplotlib.pyplot as plt

mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

bins, edges = np.histogram(x, 50, normed=1)
left,right = edges[:-1],edges[1:]
X = np.array([left,right]).T.flatten()
Y = np.array([bins,bins]).T.flatten()

plt.plot(X,Y)
plt.show()


回答 2

我知道这不能回答您的问题,但是当我搜索matplotlib直方图解决方案时,我总是最终会在此页面上结束,因为histogram_demo从matplotlib示例库页面中删除了简单方法。

这是一个解决方案,不需要numpy导入。我只导入numpy来生成x要绘制的数据。它依赖于函数hist而不是@unutbu bar答案中的函数。

import numpy as np
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

import matplotlib.pyplot as plt
plt.hist(x, bins=50)
plt.savefig('hist.png')

还要查看matplotlib画廊matplotlib示例

I know this does not answer your question, but I always end up on this page, when I search for the matplotlib solution to histograms, because the simple histogram_demo was removed from the matplotlib example gallery page.

Here is a solution, which doesn’t require numpy to be imported. I only import numpy to generate the data x to be plotted. It relies on the function hist instead of the function bar as in the answer by @unutbu.

import numpy as np
mu, sigma = 100, 15
x = mu + sigma * np.random.randn(10000)

import matplotlib.pyplot as plt
plt.hist(x, bins=50)
plt.savefig('hist.png')

Also check out the matplotlib gallery and the matplotlib examples.


回答 3

如果您愿意使用pandas

pandas.DataFrame({'x':hist[1][1:],'y':hist[0]}).plot(x='x',kind='bar')

If you’re willing to use pandas:

pandas.DataFrame({'x':hist[1][1:],'y':hist[0]}).plot(x='x',kind='bar')

回答 4

我认为这可能对某人有用。

令我烦恼的是Numpy的直方图函数(尽管我很高兴有这样做的理由),它返回了每个bin的边缘,而不是bin的值。尽管这对于浮点数有意义,浮点数可以位于一个区间内(即,中心值没有太大意义),但在处理离散值或整数(0、1、2等)时,这不是理想的输出。特别是,从np.histogram返回的bin的长度不等于计数/密度的长度。

为了解决这个问题,我使用了np.digitize来量化输入,并返回离散数量的bin,以及每个bin的计数分数。您可以轻松地进行编辑以获得计数的整数。

def compute_PMF(data)
    import numpy as np
    from collections import Counter
    _, bins = np.histogram(data, bins='auto', range=(data.min(), data.max()), density=False)
    h = Counter(np.digitize(data,bins) - 1)
    weights = np.asarray(list(h.values())) 
    weights = weights / weights.sum()
    values = np.asarray(list(h.keys()))
    return weights, values
####

参考:

[1] https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

[2] https://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html

I think this might be useful for someone.

Numpy’s histogram function, to my annoyance (although, I appreciate there is a good reason for it), returns back the edges of each bin, rather than the value of the bin. While, this makes sense for floating-point numbers, which can lie within an interval (i.e. the center value is not super meaningful), this is not the desired output when dealing with discrete values or integers (0, 1, 2, etc). In particular, the length of bins returned from np.histogram is not equal to the length of the counts / density.

To get around this, I used np.digitize to quantize the input, and return a discrete number of bins, along with fraction of counts for each bin. You could easily edit to get the integer number of counts.

def compute_PMF(data)
    import numpy as np
    from collections import Counter
    _, bins = np.histogram(data, bins='auto', range=(data.min(), data.max()), density=False)
    h = Counter(np.digitize(data,bins) - 1)
    weights = np.asarray(list(h.values())) 
    weights = weights / weights.sum()
    values = np.asarray(list(h.keys()))
    return weights, values
####

Refs:

[1] https://docs.scipy.org/doc/numpy/reference/generated/numpy.histogram.html

[2] https://docs.scipy.org/doc/numpy/reference/generated/numpy.digitize.html


Matplotlib中的bin大小(直方图)

问题:Matplotlib中的bin大小(直方图)

我正在使用matplotlib制作直方图。

有什么方法可以手动设置垃圾箱的大小,而不是垃圾箱的数量吗?

I’m using matplotlib to make a histogram.

Is there any way to manually set the size of the bins as opposed to the number of bins?


回答 0

实际上,这很简单:您可以提供一个带有bin边界的列表,而不是bin的数量。它们也可能分布不均:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

如果只希望它们均匀分布,则可以使用range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

添加到原始答案

上一行data仅适用于整数填充。正如macrocosme所指出的,对于浮点数,您可以使用:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

Actually, it’s quite easy: instead of the number of bins you can give a list with the bin boundaries. They can be unequally distributed, too:

plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])

If you just want them equally distributed, you can simply use range:

plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))

Added to original answer

The above line works for data filled with integers only. As macrocosme points out, for floats you can use:

import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))

回答 1

对于N个仓,仓边缘由N + 1个值的列表指定,其中前N个给出较低仓边缘,而+1给出最后一个仓的较高边缘。

码:

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

请注意,linspace产生从min_edge到max_edge的数组,该数组分为N + 1个值或N个bin

For N bins, the bin edges are specified by list of N+1 values where the first N give the lower bin edges and the +1 gives the upper edge of the last bin.

Code:

from numpy import np; from pylab import *

bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)

Note that linspace produces array from min_edge to max_edge broken into N+1 values or N bins


回答 2

我猜最简单的方法是计算您拥有的数据的最小值和最大值,然后计算L = max - min。然后L,用所需的箱宽度除(我假设这就是箱大小),然后将该值的上限用作箱数。

I guess the easy way would be to calculate the minimum and maximum of the data you have, then calculate L = max - min. Then you divide L by the desired bin width (I’m assuming this is what you mean by bin size) and use the ceiling of this value as the number of bins.


回答 3

我喜欢事情会自动发生,而垃圾箱却落在“不错的”价值上。以下似乎很好用。

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

结果以良好的间隔大小间隔包含了间隔。

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]

I like things to happen automatically and for bins to fall on “nice” values. The following seems to work quite well.

import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
    min_val = np.min(data)
    max_val = np.max(data)
    min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
    max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
    n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
    bins = np.linspace(min_boundary, max_boundary, n_bins)
    return bins

if __name__ == '__main__':
    data = np.random.random_sample(100) * 123.34 - 67.23
    bins = compute_histogram_bins(data, 10.0)
    print(bins)
    plt.hist(data, bins=bins)
    plt.xlabel('Value')
    plt.ylabel('Counts')
    plt.title('Compute Bins Example')
    plt.grid(True)
    plt.show()

The result has bins on nice intervals of bin size.

[-70. -60. -50. -40. -30. -20. -10.   0.  10.  20.  30.  40.  50.  60.]


回答 4

我使用分位数来使容器均匀并适合于采样:

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')

I use quantiles to do bins uniform and fitted to sample:

bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()

plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')


回答 5

我遇到了与OP相同的问题(我认为!),但是我无法按照Lastalda指定的方式使其正常工作。我不知道我是否正确解释了这个问题,但是我找到了另一种解决方案(尽管这可能是一种非常糟糕的方法)。

我就是这样的:

plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50], weights=[10,1,40,33,6]);

这创建了这个:

因此,第一个参数基本上是“初始化”垃圾箱-我专门创建一个数字,该数字介于我在垃圾箱参数中设置的范围之间。

为了说明这一点,请查看第一个参数([1,11,21,31,41])中的数组和第二个参数([0,10,20,30,40,50]中的’bins’数组:

  • 数字1(从第一个数组开始)介于0到10之间(在“ bins”数组中)
  • 数字11(来自第一个数组)介于11和20之间(在“ bins”数组中)
  • 数字21(从第一个数组开始)介于21到30(在“ bins”数组中)之间,依此类推。

然后,我使用’weights’参数定义每个垃圾箱的大小。这是用于weights参数的数组:[10,1,40,33,6]。

因此0到10 bin的值是10,11到20 bin的值是1,21到30 bin的值是40,依此类推。

I had the same issue as OP (I think!), but I couldn’t get it to work in the way that Lastalda specified. I don’t know if I have interpreted the question properly, but I have found another solution (it probably is a really bad way of doing it though).

This was the way that I did it:

plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50], weights=[10,1,40,33,6]);

Which creates this:

So the first parameter basically ‘initialises’ the bin – I’m specifically creating a number that is in between the range I set in the bins parameter.

To demonstrate this, look at the array in the first parameter ([1,11,21,31,41]) and the ‘bins’ array in the second parameter ([0,10,20,30,40,50]):

  • The number 1 (from the first array) falls between 0 and 10 (in the ‘bins’ array)
  • The number 11 (from the first array) falls between 11 and 20 (in the ‘bins’ array)
  • The number 21 (from the first array) falls between 21 and 30 (in the ‘bins’ array), etc.

Then I’m using the ‘weights’ parameter to define the size of each bin. This is the array used for the weights parameter: [10,1,40,33,6].

So the 0 to 10 bin is given the value 10, the 11 to 20 bin is given the value of 1, the 21 to 30 bin is given the value of 40, etc.


回答 6

对于具有整数x值的直方图,我最终使用

plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))

0.5的偏移量使分箱在x轴值上居中。该plt.xticks调用为每个整数添加一个刻度。

For a histogram with integer x-values I ended up using

plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))

The offset of 0.5 centers the bins on the x-axis values. The plt.xticks call adds a tick for every integer.


使用matplotlib在单个图表上绘制两个直方图

问题:使用matplotlib在单个图表上绘制两个直方图

我使用文件中的数据创建了直方图,没问题。现在,我想在同一直方图中叠加来自另一个文件的数据,所以我要做类似的事情

n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)

但是问题在于,对于每个间隔,只有最高值的条出现,而另一个被隐藏。我想知道如何同时用不同的颜色绘制两个直方图。

I created a histogram plot using data from a file and no problem. Now I wanted to superpose data from another file in the same histogram, so I do something like this

n,bins,patchs = ax.hist(mydata1,100)
n,bins,patchs = ax.hist(mydata2,100)

but the problem is that for each interval, only the bar with the highest value appears, and the other is hidden. I wonder how could I plot both histograms at the same time with different colors.


回答 0

这里有一个工作示例:

import random
import numpy
from matplotlib import pyplot

x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

bins = numpy.linspace(-10, 10, 100)

pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()

Here you have a working example:

import random
import numpy
from matplotlib import pyplot

x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

bins = numpy.linspace(-10, 10, 100)

pyplot.hist(x, bins, alpha=0.5, label='x')
pyplot.hist(y, bins, alpha=0.5, label='y')
pyplot.legend(loc='upper right')
pyplot.show()


回答 1

可接受的答案给出了带有重叠条形图的直方图的代码,但是如果您希望每个条形图并排(如我所做的那样),请尝试以下变化:

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

参考:http : //matplotlib.org/examples/statistics/histogram_demo_multihist.html

编辑[2018/03/16]:已更新,以允许绘制不同大小的数组,如@stochastic_zeitgeist所建议

The accepted answers gives the code for a histogram with overlapping bars, but in case you want each bar to be side-by-side (as I did), try the variation below:

import numpy as np
import matplotlib.pyplot as plt
plt.style.use('seaborn-deep')

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

Reference: http://matplotlib.org/examples/statistics/histogram_demo_multihist.html

EDIT [2018/03/16]: Updated to allow plotting of arrays of different sizes, as suggested by @stochastic_zeitgeist


回答 2

如果您使用不同的样本量,则可能难以比较单个y轴的分布。例如:

import numpy as np
import matplotlib.pyplot as plt

#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']

#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()

在这种情况下,您可以在不同的轴上绘制两个数据集。为此,您可以使用matplotlib获取直方图数据,清除轴,然后在两个单独的轴上重新绘图(移动bin边缘,以免它们重叠):

#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis

#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])

#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()

In the case you have different sample sizes, it may be difficult to compare the distributions with a single y-axis. For example:

import numpy as np
import matplotlib.pyplot as plt

#makes the data
y1 = np.random.normal(-2, 2, 1000)
y2 = np.random.normal(2, 2, 5000)
colors = ['b','g']

#plots the histogram
fig, ax1 = plt.subplots()
ax1.hist([y1,y2],color=colors)
ax1.set_xlim(-10,10)
ax1.set_ylabel("Count")
plt.tight_layout()
plt.show()

In this case, you can plot your two data sets on different axes. To do so, you can get your histogram data using matplotlib, clear the axis, and then re-plot it on two separate axes (shifting the bin edges so that they don’t overlap):

#sets up the axis and gets histogram data
fig, ax1 = plt.subplots()
ax2 = ax1.twinx()
ax1.hist([y1, y2], color=colors)
n, bins, patches = ax1.hist([y1,y2])
ax1.cla() #clear the axis

#plots the histogram data
width = (bins[1] - bins[0]) * 0.4
bins_shifted = bins + width
ax1.bar(bins[:-1], n[0], width, align='edge', color=colors[0])
ax2.bar(bins_shifted[:-1], n[1], width, align='edge', color=colors[1])

#finishes the plot
ax1.set_ylabel("Count", color=colors[0])
ax2.set_ylabel("Count", color=colors[1])
ax1.tick_params('y', colors=colors[0])
ax2.tick_params('y', colors=colors[1])
plt.tight_layout()
plt.show()


回答 3

作为对Gustavo Bezerra的回答的补充

如果要对每个直方图进行归一化normed对于mpl <= 2.1和densitympl> = 3.1),则不能仅使用normed/density=True,而需要为每个值设置权重:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

作为比较,具有默认权重和的完全相同xy向量density=True

As a completion to Gustavo Bezerra’s answer:

If you want each histogram to be normalized (normed for mpl<=2.1 and density for mpl>=3.1) you cannot just use normed/density=True, you need to set the weights for each value instead:

import numpy as np
import matplotlib.pyplot as plt

x = np.random.normal(1, 2, 5000)
y = np.random.normal(-1, 3, 2000)
x_w = np.empty(x.shape)
x_w.fill(1/x.shape[0])
y_w = np.empty(y.shape)
y_w.fill(1/y.shape[0])
bins = np.linspace(-10, 10, 30)

plt.hist([x, y], bins, weights=[x_w, y_w], label=['x', 'y'])
plt.legend(loc='upper right')
plt.show()

As a comparison, the exact same x and y vectors with default weights and density=True:


回答 4

您应该使用bins以下方法返回的值hist

import numpy as np
import matplotlib.pyplot as plt

foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution

_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)

You should use bins from the values returned by hist:

import numpy as np
import matplotlib.pyplot as plt

foo = np.random.normal(loc=1, size=100) # a normal distribution
bar = np.random.normal(loc=-1, size=10000) # a normal distribution

_, bins, _ = plt.hist(foo, bins=50, range=[-6, 6], normed=True)
_ = plt.hist(bar, bins=bins, alpha=0.5, normed=True)


回答 5

这是一种在数据大小不同的情况下在同一图上并排绘制两个直方图的简单方法:

def plotHistogram(p, o):
    """
    p and o are iterables with the values you want to 
    plot the histogram of
    """
    plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
    plt.show()

Here is a simple method to plot two histograms, with their bars side-by-side, on the same plot when the data has different sizes:

def plotHistogram(p, o):
    """
    p and o are iterables with the values you want to 
    plot the histogram of
    """
    plt.hist([p, o], color=['g','r'], alpha=0.8, bins=50)
    plt.show()

回答 6


回答 7

万一您有熊猫(import pandas as pd)或可以使用它,可以:

test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)], 
                     [random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()

Just in case you have pandas (import pandas as pd) or are ok with using it:

test = pd.DataFrame([[random.gauss(3,1) for _ in range(400)], 
                     [random.gauss(4,2) for _ in range(400)]])
plt.hist(test.values.T)
plt.show()

回答 8

要从二维numpy数组绘制直方图时,有一个警告。您需要交换2个轴。

import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()

There is one caveat when you want to plot the histogram from a 2-d numpy array. You need to swap the 2 axes.

import numpy as np
import matplotlib.pyplot as plt

data = np.random.normal(size=(2, 300))
# swapped_data.shape == (300, 2)
swapped_data = np.swapaxes(x, axis1=0, axis2=1)
plt.hist(swapped_data, bins=30, label=['x', 'y'])
plt.legend()
plt.show()


回答 9

之前已经回答了这个问题,但是希望添加另一个快速/简便的解决方法,它可能会对这个问题的其他访问者有所帮助。

import seasborn as sns 
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)

这里有一些有用的示例,可用于kde与直方图的比较。

This question has been answered before, but wanted to add another quick/easy workaround that might help other visitors to this question.

import seasborn as sns 
sns.kdeplot(mydata1)
sns.kdeplot(mydata2)

Some helpful examples are here for kde vs histogram comparison.


回答 10

受所罗门答案的启发,但为了坚持与直方图有关的问题,一个干净的解决方案是:

sns.distplot(bar)
sns.distplot(foo)
plt.show()

确保先绘制较高的直方图,否则需要设置plt.ylim(0,0.45),以免截掉较高的直方图。

Inspired by Solomon’s answer, but to stick with the question, which is related to histogram, a clean solution is:

sns.distplot(bar)
sns.distplot(foo)
plt.show()

Make sure to plot the taller one first, otherwise you would need to set plt.ylim(0,0.45) so that the taller histogram is not chopped off.


回答 11

还有一个与华金答案非常相似的选项:

import random
from matplotlib import pyplot

#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()

提供以下输出:

Also an option which is quite similar to joaquin answer:

import random
from matplotlib import pyplot

#random data
x = [random.gauss(3,1) for _ in range(400)]
y = [random.gauss(4,2) for _ in range(400)]

#plot both histograms(range from -10 to 10), bins set to 100
pyplot.hist([x,y], bins= 100, range=[-10,10], alpha=0.5, label=['x', 'y'])
#plot legend
pyplot.legend(loc='upper right')
#show it
pyplot.show()

Gives the following output: