Which function is used to create a histogram in python?
In this tutorial, you’ll be equipped to make production-quality, presentation-ready Python histogram plots with a range of choices and features. Show
If you have introductory to intermediate knowledge in Python and statistics, then you can use this article as a one-stop shop for building and plotting histograms in Python using libraries from its scientific stack, including NumPy, Matplotlib, Pandas, and Seaborn. A histogram is a great tool for quickly assessing a probability distribution that is intuitively understood by almost any audience. Python offers a handful of different options for building and plotting histograms. Most people know a histogram by its graphical representation, which is similar to a bar graph: This article will guide you through creating plots like the one above as well as more complex ones. Here’s what you’ll cover:
Histograms in Pure PythonWhen you are preparing to plot a histogram, it is simplest to not think in terms of bins but rather to report how many times each value appears (a frequency table). A Python dictionary is well-suited for this task: >>>
In
fact, this is precisely what is done by the >>>
You can confirm that your handmade function does virtually the same thing as >>>
It can be helpful to build simplified functions from scratch as a first step to understanding more complex ones. Let’s further reinvent the wheel a bit with an ASCII histogram that takes advantage of Python’s output formatting:
This function
creates a sorted frequency plot where counts are represented as tallies of plus ( >>>
Here,
you’re simulating plucking from Building Up From the Base: Histogram Calculations in NumPyThus far, you have been working with what could best be called “frequency tables.” But mathematically, a histogram is a mapping of bins (intervals) to frequencies. More technically, it can be used to approximate the probability density function (PDF) of the underlying variable. Moving on from the “frequency table” above, a true histogram first “bins” the range of values and then counts the number of values that fall into each bin. This is what NumPy’s Consider a sample of floats drawn from the Laplace distribution. This distribution has fatter tails than a normal distribution and has two descriptive parameters (location and scale): >>>
In this case, you’re working with a continuous distribution, and it wouldn’t be very helpful to tally each float independently, down to the umpteenth decimal place. Instead, you can bin or “bucket” the data and count the observations that fall into each bin. The histogram is the resulting count of values within each bin: >>>
This result may not be immediately intuitive. >>>
A very condensed breakdown of how the bins are constructed by NumPy looks like this: >>>
The case above makes a lot of sense: 10 equally spaced bins over a peak-to-peak range of 23 means intervals of width 2.3. From there, the function delegates to either >>>
Visualizing Histograms with Matplotlib and PandasNow that you’ve seen how to build a histogram in Python from the ground up, let’s see how other Python packages can do the job for you. Matplotlib provides the
functionality to visualize Python histograms out of the box with a versatile wrapper around NumPy’s
As defined earlier, a plot of a histogram uses its bin edges on the x-axis and the corresponding frequencies on the y-axis. In the chart above, passing Staying in Python’s scientific stack, Pandas’
Plotting a Kernel Density Estimate (KDE)In this tutorial, you’ve been working with samples, statistically speaking. Whether the data is discrete or continuous, it’s assumed to be derived from a population that has a true, exact distribution described by just a few parameters. A kernel density estimation (KDE) is a way to estimate the probability density function (PDF) of the random variable that “underlies” our sample. KDE is a means of data smoothing. Sticking with the Pandas library, you can create and overlay density plots using >>>
Now, to plot each histogram on the same Matplotlib axes:
These methods leverage SciPy’s If you take a closer look at this function, you can see how well it approximates the “true” PDF for a relatively small sample of 1000 data points. Below, you can
first build the “analytical” distribution with Building from there, you can take a random sample of 1000 datapoints from this distribution, then attempt to back into an estimation of the PDF with
This is a bigger chunk of code, so let’s take a second to touch on a few key lines:
A Fancy Alternative with SeabornLet’s bring one more Python package into the mix. Seaborn has a
The call above produces a KDE. There is also optionality to fit a specific distribution to the data. This is different than a KDE and consists of parameter estimation for generic data and a specified distribution name:
Again, note the slight difference. In the first case, you’re estimating some unknown PDF; in the second, you’re taking a known distribution and finding what parameters best describe it given the empirical data. Other Tools in PandasIn addition to its plotting tools, Pandas also offers a convenient >>>
Elsewhere, >>>
What’s nice is that both of these operations ultimately utilize Cython code that makes them competitive on speed while maintaining their flexibility. Alright, So Which Should I Use?At this point, you’ve seen more than a handful of functions and methods to choose from for plotting a Python histogram. How do they compare? In short, there is no “one-size-fits-all.” Here’s a recap of the functions and methods you’ve covered thus far, all of which relate to breaking down and representing distributions in Python:
You can also find the code snippets from this article together in one script at the Real Python materials page. With that, good luck creating histograms in the wild. Hopefully one of the tools above will suit your needs. Whatever you do, just don’t use a pie chart. Which method is used to create a histogram in Python?Creating a Histogram
pyplot. hist() function is used to compute and create histogram of x.
Which function is used to create a histogram?Histogram can be created using the hist() function in R programming language. This function takes in a vector of values for which the histogram is plotted.
What does hist () do?hist( x ) creates a histogram bar chart of the elements in vector x . The elements in x are sorted into 10 equally spaced bins along the x-axis between the minimum and maximum values of x . hist displays bins as rectangles, such that the height of each rectangle indicates the number of elements in the bin.
|