Python fit distribution to histogram

Here is an example that uses scipy.optimize to fit a non-linear functions like a Gaussian, even when the data is in a histogram that isn't well ranged, so that a simple mean estimate would fail. An offset constant also would cause simple normal statistics to fail ( just remove p[3] and c[3] for plain gaussian data).

from pylab import *
from numpy import loadtxt
from scipy.optimize import leastsq

fitfunc  = lambda p, x: p[0]*exp(-0.5*((x-p[1])/p[2])**2)+p[3]
errfunc  = lambda p, x, y: (y - fitfunc(p, x))

filename = "gaussdata.csv"
data     = loadtxt(filename,skiprows=1,delimiter=',')
xdata    = data[:,0]
ydata    = data[:,1]

init  = [1.0, 0.5, 0.5, 0.5]

out   = leastsq( errfunc, init, args=(xdata, ydata))
c = out[0]

print "A exp[-0.5((x-mu)/sigma)^2] + k "
print "Parent Coefficients:"
print "1.000, 0.200, 0.300, 0.625"
print "Fit Coefficients:"
print c[0],c[1],abs(c[2]),c[3]

plot(xdata, fitfunc(c, xdata))
plot(xdata, ydata)

title(r'$A = %.3f\  \mu = %.3f\  \sigma = %.3f\ k = %.3f $' %(c[0],c[1],abs(c[2]),c[3]));

show()

Output:

A exp[-0.5((x-mu)/sigma)^2] + k 
Parent Coefficients:
1.000, 0.200, 0.300, 0.625
Fit Coefficients:
0.961231625289 0.197254597618 0.293989275502 0.65370344131

Python fit distribution to histogram

In this article, we will discuss how to Plot Normal Distribution over Histogram using Python. First, we will discuss Histogram and Normal Distribution graphs separately, and then we will merge both graphs together. 

Histogram

A histogram is a graphical representation of a set of data points arranged in a user-defined range. Similar to a bar chart, a bar chart compresses a series of data into easy-to-interpret visual objects by grouping multiple data points into logical areas or containers.

To draw this we will use:

  • random.normal() method for finding the normal distribution of the data. It has three parameters:  
    • loc – (average) where the top of the bell is located.
    • Scale – (standard deviation) how uniform you want the graph to be distributed.
    • size – Shape of the returning Array
  • The function hist() in the Pyplot module of the Matplotlib library is used to draw histograms. It has parameters like:  
    • data: This parameter is a data sequence.
    • bin: This parameter is optional and contains integers, sequences or strings.
    • Density: This parameter is optional and contains a Boolean value.
    • Alpha: Value is an integer between 0 and 1, which represents the transparency of each histogram. The smaller the value of n, the more transparent the histogram.

Python3

import numpy as np

import matplotlib.pyplot as plt

data = np.random.normal(170, 10, 250)

plt.hist(data, bins=25, density=True, alpha=0.6, color='b')

plt.show()

Output:

Python fit distribution to histogram

Normal Distribution

The normal distribution chart is characterized by two parameters: 

  • The average value, which represents the maximum value of the chart, and the chart is always symmetrical. 
  • And the standard deviation, which determines the amount of change beyond the mean. Smaller standard deviations (compared to the mean) appear steeper, while larger standard deviations (compared to the mean) appear flat.

Plotting the Normal Distribution

  • NumPy arange() is used to create and return a reference to a uniformly distributed ndarray instance. 
  • With the help of mean() and stdev() method, we calculated the mean and standard deviation and initialized to mean and sd variable. 
  • Inside the plot() method, we used one method pdf() for displaying the probability density function. This pdf() method present inside the scipy.stats.norm. 

Example:

Python3

import numpy as np

import matplotlib.pyplot as plt

from scipy.stats import norm

import statistics

x_axis = np.arange(-30, 30, 0.1)

mean = statistics.mean(x_axis)

sd = statistics.stdev(x_axis)

plt.plot(x_axis, norm.pdf(x_axis, mean, sd))

plt.show()

Output:

Python fit distribution to histogram

Normal Distribution over Histogram

Now, we are done separated the histogram and the normal distribution plot discussion, but it would be great if we can visualize them in a graph with the same scale. This can be easily achieved by accessing two charts in the same cell and then using plt.show(). Now, Let’s discuss about Plotting Normal Distribution over Histogram using Python. 

We believe that the histogram of some data follows a normal distribution. SciPy has a variety of methods that can be used to estimate the best distribution of random variables, as well as parameters that can best simulate this adaptability. For example, for the data in this problem, the mean and standard deviation of the best-fitting normal distribution can be found as follows:

# Make the normal distribution fit the data: 
mu, std = norm.fit (data) # mean and standard deviation

The function xlim() within the Pyplot module of the Matplotlib library is used to obtain or set the x limit of this axis.

Syntax: matplotlib.pyplot.xlim (*args, **kwargs)  

Parameters:  This method uses the following parameters, as described below:  

  • left: Use this parameter to set xlim to the left.
  • Right: Use this parameter to set xlim on the right.
  • ** kwargs: This parameter is a text attribute that controls the appearance of the label.

Return value:  

  • left, right: return a tuple of the new limit value of the x-axis.

Python3

import numpy as np

from scipy.stats import norm

import matplotlib.pyplot as plt

data = np.random.normal(170, 10, 250)

mu, std = norm.fit(data) 

plt.hist(data, bins=25, density=True, alpha=0.6, color='b')

xmin, xmax = plt.xlim()

x = np.linspace(xmin, xmax, 100)

p = norm.pdf(x, mu, std)

plt.plot(x, p, 'k', linewidth=2)

title = "Fit Values: {:.2f} and {:.2f}".format(mu, std)

plt.title(title)

plt.show()

Output:

Python fit distribution to histogram


How do you fit a normal distribution in Python?

How to fit data to a distribution in Python.
data = np. random. normal(0, 0.5, 1000).
mean, var = scipy. stats. distributions. norm. fit(data).
x = np. linspace(-5,5,100).
fitted_data = scipy. stats. distributions. norm. ... .
plt. hist(data, density=True).
plt. plot(x,fitted_data,'r-') Plotting data and fitted_data..

How do you normalize a histogram in Python?

To normalize a histogram in Python, we can use hist() method. In normalized bar, the area underneath the plot should be 1.

How do you check data distribution in Python?

To visualize the data set we can draw a histogram with the data we collected. We will use the Python module Matplotlib to draw a histogram..
52 values are between 0 and 1..
48 values are between 1 and 2..
49 values are between 2 and 3..
51 values are between 3 and 4..
50 values are between 4 and 5..