Pdf and cdf in python

View Discussion

Improve Article

Save Article

  • Read
  • Discuss
  • View Discussion

    Improve Article

    Save Article

    Prerequisites: Matplotlib 

    Matplotlib is a library in Python and it is a numerical — mathematical extension for the NumPy library.  The cumulative distribution function (CDF) of a real-valued random variable X, or just distribution function of X, evaluated at x, is the probability that X will take a value less than or equal to x.

    Properties of CDF:

    • Every cumulative distribution function F(X) is non-decreasing
    • If maximum value of the cdf function is at x, F(x) = 1.
    • The CDF ranges from 0 to 1.

    Method 1: Using the histogram

    CDF can be calculated using PDF (Probability Distribution Function). Each point of random variable will contribute cumulatively to form CDF.

    Example : 

    A combination set containing 2 balls which can be either red or blue can be in the following set.

    {RR, RB, BR, BB}

    t -> No of red balls.

    P(x = t) -> t = 0 : 1 / 4 [BB] 

                t = 1 : 2 / 4 [RB, BR]

                t = 2 : 1 / 4 [RR]

    CDF :

    F(x) = P(x<=t)

    x = 0 : P(0)               -> 1 / 4

    x = 1 : P(1) + P(0)        -> 3 / 4

    x = 2 : P(2) + P(1) + P(0) -> 1

    Approach

    • Import modules
    • Declare number of data points
    • Initialize random values
    • Plot histogram using above data
    • Get histogram data
    • Finding PDF using histogram data
    • Calculate CDF
    • Plot CDF

    Example:

    Python3

    import numpy as np

    import matplotlib.pyplot as plt

    import pandas as pd

    %matplotlib inline

    N = 500

    data = np.random.randn(N)

    count, bins_count = np.histogram(data, bins=10)

    pdf = count / sum(count)

    cdf = np.cumsum(pdf)

    plt.plot(bins_count[1:], pdf, color="red", label="PDF")

    plt.plot(bins_count[1:], cdf, label="CDF")

    plt.legend()

    Output:

    Histogram plot of the PDF and CDF :

    Pdf and cdf in python

    Plotted CDF:

    Pdf and cdf in python

    CDF plotting

    Method 2: Data sort 

    This method depicts how CDF can be calculated and plotted using sorted data. For this, we first sort the data and then handle further calculations.

    Approach

    • Import module
    • Declare number of data points
    • Create data
    • Sort data in ascending order
    • Get CDF
    • Plot CDF
    • Display plot

    Example:

    Python3

    import numpy as np

    import matplotlib.pyplot as plt

    import pandas as pd

    %matplotlib inline

    N = 500

    data = np.random.randn(N)

    x = np.sort(data)

    y = np.arange(N) / float(N)

    plt.xlabel('x-axis')

    plt.ylabel('y-axis')

    plt.title('CDF using sorting the data')

    plt.plot(x, y, marker='o')

    Output:

    Pdf and cdf in python


    What is PDF and CDF in Python?

    CDF is the cumulative density function that is used for continuous types of variables. On the other hand, PDF is the probability density function for both discrete & continuous variables.

    How do you draw CDF and PDF in Python?

    MatPlotLib with Python.
    Set the figure size and adjust the padding between and around the subplots..
    Initialize a variable N for the number of sample data..
    Create random data using numpy..
    Compute the histogram of a set of data with data and bins=10..
    Find the probability distribution function (pdf)..

    What is PDF and CDF?

    Probability Density Function (PDF) vs Cumulative Distribution Function (CDF) The CDF is the probability that random variable values less than or equal to x whereas the PDF is a probability that a random variable, say X, will take a value exactly equal to x.

    What is CDF in Python?

    A cumulative distribution function (CDF) tells us the probability that a random variable takes on a value less than or equal to some value. This tutorial explains how to calculate and plot values for the normal CDF in Python.