问题:Matplotlib中的bin大小(直方图)
我正在使用matplotlib制作直方图。
有什么方法可以手动设置垃圾箱的大小,而不是垃圾箱的数量吗?
I’m using matplotlib to make a histogram.
Is there any way to manually set the size of the bins as opposed to the number of bins?
回答 0
实际上,这很简单:您可以提供一个带有bin边界的列表,而不是bin的数量。它们也可能分布不均:
plt . hist ( data , bins =[ 0 , 10 , 20 , 30 , 40 , 50 , 100 ])
如果只希望它们均匀分布,则可以使用range:
plt . hist ( data , bins = range ( min ( data ), max ( data ) + binwidth , binwidth ))
添加到原始答案
上一行data
仅适用于整数填充。正如macrocosme所 指出的,对于浮点数,您可以使用:
import numpy as np
plt . hist ( data , bins = np . arange ( min ( data ), max ( data ) + binwidth , binwidth ))
Actually, it’s quite easy: instead of the number of bins you can give a list with the bin boundaries. They can be unequally distributed, too:
plt.hist(data, bins=[0, 10, 20, 30, 40, 50, 100])
If you just want them equally distributed, you can simply use range:
plt.hist(data, bins=range(min(data), max(data) + binwidth, binwidth))
Added to original answer
The above line works for data
filled with integers only. As macrocosme points out, for floats you can use:
import numpy as np
plt.hist(data, bins=np.arange(min(data), max(data) + binwidth, binwidth))
回答 1
对于N个仓,仓边缘由N + 1个值的列表指定,其中前N个给出较低仓边缘,而+1给出最后一个仓的较高边缘。
码:
from numpy import np ; from pylab import *
bin_size = 0.1 ; min_edge = 0 ; max_edge = 2.5
N = ( max_edge - min_edge )/ bin_size ; Nplus1 = N + 1
bin_list = np . linspace ( min_edge , max_edge , Nplus1 )
请注意,linspace产生从min_edge到max_edge的数组,该数组分为N + 1个值或N个bin
For N bins, the bin edges are specified by list of N+1 values where the first N give the lower bin edges and the +1 gives the upper edge of the last bin.
Code:
from numpy import np; from pylab import *
bin_size = 0.1; min_edge = 0; max_edge = 2.5
N = (max_edge-min_edge)/bin_size; Nplus1 = N + 1
bin_list = np.linspace(min_edge, max_edge, Nplus1)
Note that linspace produces array from min_edge to max_edge broken into N+1 values or N bins
回答 2
我猜最简单的方法是计算您拥有的数据的最小值和最大值,然后计算L = max - min
。然后L
,用所需的箱宽度除(我假设这就是箱大小),然后将该值的上限用作箱数。
I guess the easy way would be to calculate the minimum and maximum of the data you have, then calculate L = max - min
. Then you divide L
by the desired bin width (I’m assuming this is what you mean by bin size) and use the ceiling of this value as the number of bins.
回答 3
我喜欢事情会自动发生,而垃圾箱却落在“不错的”价值上。以下似乎很好用。
import numpy as np
import numpy . random as random
import matplotlib . pyplot as plt
def compute_histogram_bins ( data , desired_bin_size ):
min_val = np . min ( data )
max_val = np . max ( data )
min_boundary = - 1.0 * ( min_val % desired_bin_size - min_val )
max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
n_bins = int (( max_boundary - min_boundary ) / desired_bin_size ) + 1
bins = np . linspace ( min_boundary , max_boundary , n_bins )
return bins
if __name__ == '__main__' :
data = np . random . random_sample ( 100 ) * 123.34 - 67.23
bins = compute_histogram_bins ( data , 10.0 )
print ( bins )
plt . hist ( data , bins = bins )
plt . xlabel ( 'Value' )
plt . ylabel ( 'Counts' )
plt . title ( 'Compute Bins Example' )
plt . grid ( True )
plt . show ()
结果以良好的间隔大小间隔包含了间隔。
[- 70. - 60. - 50. - 40. - 30. - 20. - 10. 0. 10. 20. 30. 40. 50. 60. ]
I like things to happen automatically and for bins to fall on “nice” values. The following seems to work quite well.
import numpy as np
import numpy.random as random
import matplotlib.pyplot as plt
def compute_histogram_bins(data, desired_bin_size):
min_val = np.min(data)
max_val = np.max(data)
min_boundary = -1.0 * (min_val % desired_bin_size - min_val)
max_boundary = max_val - max_val % desired_bin_size + desired_bin_size
n_bins = int((max_boundary - min_boundary) / desired_bin_size) + 1
bins = np.linspace(min_boundary, max_boundary, n_bins)
return bins
if __name__ == '__main__':
data = np.random.random_sample(100) * 123.34 - 67.23
bins = compute_histogram_bins(data, 10.0)
print(bins)
plt.hist(data, bins=bins)
plt.xlabel('Value')
plt.ylabel('Counts')
plt.title('Compute Bins Example')
plt.grid(True)
plt.show()
The result has bins on nice intervals of bin size.
[-70. -60. -50. -40. -30. -20. -10. 0. 10. 20. 30. 40. 50. 60.]
回答 4
我使用分位数来使容器均匀并适合于采样:
bins = df [ 'Generosity' ]. quantile ([ 0 ,. 05 , 0.1 , 0.15 , 0.20 , 0.25 , 0.3 , 0.35 , 0.40 , 0.45 , 0.5 , 0.55 , 0.6 , 0.65 , 0.70 , 0.75 , 0.80 , 0.85 , 0.90 , 0.95 , 1 ]). to_list ()
plt . hist ( df [ 'Generosity' ], bins = bins , normed = True , alpha = 0.5 , histtype = 'stepfilled' , color = 'steelblue' , edgecolor = 'none' )
I use quantiles to do bins uniform and fitted to sample:
bins=df['Generosity'].quantile([0,.05,0.1,0.15,0.20,0.25,0.3,0.35,0.40,0.45,0.5,0.55,0.6,0.65,0.70,0.75,0.80,0.85,0.90,0.95,1]).to_list()
plt.hist(df['Generosity'], bins=bins, normed=True, alpha=0.5, histtype='stepfilled', color='steelblue', edgecolor='none')
回答 5
我遇到了与OP相同的问题(我认为!),但是我无法按照Lastalda指定的方式使其正常工作。我不知道我是否正确解释了这个问题,但是我找到了另一种解决方案(尽管这可能是一种非常糟糕的方法)。
我就是这样的:
plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50], weights=[10,1,40,33,6]);
这创建了这个:
因此,第一个参数基本上是“初始化”垃圾箱-我专门创建一个数字,该数字介于我在垃圾箱参数中设置的范围之间。
为了说明这一点,请查看第一个参数([1,11,21,31,41])中的数组和第二个参数([0,10,20,30,40,50]中的’bins’数组:
数字1(从第一个数组开始)介于0到10之间(在“ bins”数组中)
数字11(来自第一个数组)介于11和20之间(在“ bins”数组中)
数字21(从第一个数组开始)介于21到30(在“ bins”数组中)之间,依此类推。
然后,我使用’weights’参数定义每个垃圾箱的大小。这是用于weights参数的数组:[10,1,40,33,6]。
因此0到10 bin的值是10,11到20 bin的值是1,21到30 bin的值是40,依此类推。
I had the same issue as OP (I think!), but I couldn’t get it to work in the way that Lastalda specified. I don’t know if I have interpreted the question properly, but I have found another solution (it probably is a really bad way of doing it though).
This was the way that I did it:
plt.hist([1,11,21,31,41], bins=[0,10,20,30,40,50], weights=[10,1,40,33,6]);
Which creates this:
So the first parameter basically ‘initialises’ the bin – I’m specifically creating a number that is in between the range I set in the bins parameter.
To demonstrate this, look at the array in the first parameter ([1,11,21,31,41]) and the ‘bins’ array in the second parameter ([0,10,20,30,40,50]):
The number 1 (from the first array) falls between 0 and 10 (in the ‘bins’ array)
The number 11 (from the first array) falls between 11 and 20 (in the ‘bins’ array)
The number 21 (from the first array) falls between 21 and 30 (in the ‘bins’ array), etc.
Then I’m using the ‘weights’ parameter to define the size of each bin. This is the array used for the weights parameter: [10,1,40,33,6].
So the 0 to 10 bin is given the value 10, the 11 to 20 bin is given the value of 1, the 21 to 30 bin is given the value of 40, etc.
回答 6
对于具有整数x值的直方图,我最终使用
plt . hist ( data , np . arange ( min ( data )- 0.5 , max ( data )+ 0.5 ))
plt . xticks ( range ( min ( data ), max ( data )))
0.5的偏移量使分箱在x轴值上居中。该plt.xticks
调用为每个整数添加一个刻度。
For a histogram with integer x-values I ended up using
plt.hist(data, np.arange(min(data)-0.5, max(data)+0.5))
plt.xticks(range(min(data), max(data)))
The offset of 0.5 centers the bins on the x-axis values. The plt.xticks
call adds a tick for every integer.