How does python numpy calculate variance?
Show
Example :
Parameters :
Code #1:
Output : arr : [20, 2, 7, 1, 34] var of arr : 158.16 var of arr : 158.16 var of arr : 158.16
Output : var of arr, axis = None : 236.14000000000004 var of arr, axis = 0 : [ 57.1875 312.75 345.6875 9.25 0. ] var of arr, axis = 1 : [ 0. 77.04 421.84 269.04] This tutorial will explain how to use the Numpy variance function (AKA, np.var). In the tutorial, I’ll do a few things. I’ll give you a quick overview of the Numpy variance function and what it does. I’ll explain the syntax. And I’ll show you clear, step-by-step examples of how we can use np.var to compute variance with Numpy arrays. Each of those topics is handled in a separate section. You can click on any of the following links, and it will take you to that
section.
Having said that, if you’re new to Numpy, or need a quick refresher on what “variance” is, you should probably read the whole tutorial. A Quick Introduction to Numpy VarianceFirst of all, let’s start with what the variance function does. The Numpy variance function calculates the variance of values in a Numpy array. At a high level, that’s really all it does? To help you understand this though, let’s do a quick review of what variance is, as well as a review of Numpy arrays. Variance Measures the Dispersion of a Set of NumbersIn statistics, variance is a measure of the dispersion of a set of numbers. Another way of saying this, is that variances measures how spread out a set of numbers is. Specifically, the variance of a set of numbers is the average of the squared deviations from the mean. So if we have a dataset with N numbers, can compute variance as follows: (1) Where: = the individual values in the dataset = the number of values in the dataset = the mean of the values To be honest, computing variance by hand is sort of a pain in the a**. This is especially true when we have a large amount of numbers. Thankfully, we can use computers to compute variance for us. Numpy Variance Computes the Variance on Numpy ArraysIn particular, when you’re using the Python programming language, you can use the np.var function to calculate variance. Let’s quickly review Numpy and Numpy arrays. Numpy is a Python Package for Working with Numeric Data Organized in ArraysNumpy is a package for working with numeric data. For the most part, Numpy operates on a data structure called a Numpy array. A Numpy array is a row-and-column data structure that contains numeric data. So obviously, we can use Numpy arrays to store numeric data. But Numpy also has a variety of functions for operating on Numpy arrays. For example, we have tools like Numpy power, which calculates exponents, and Numpy log, which calculates the natural logarithm. We have tools for reshaping Numpy arrays, like the Numpy reshape method. And then we have statistical functions that compute statistics on the Numbers in an array. For example we have functions like Numpy mean and Numpy Max, which compute the mean and maximum, respectively. We have Numpy standard deviation, which computes the standard deviation. And of course, we have Numpy variance, which as I’ve stated, computes the variance. The point here is that Numpy is a toolkit for working with data that’s organized in Numpy arrays, and Numpy variance is one of those tools. Having said that, the specifics of how these functions work depends on the syntax. So to understand Numpy variance in detail, you need to understand the syntax. Let’s take a look at the syntax for The syntax of np.varThe syntax of the Numpy variance function is fairly straight forward, but there are a few important details. Let’s start with the basics. A quick note: the exact syntax depends on how you import NumpyOne important thing that you need to know is that the exact syntax depends on how you’ve imported Numpy. (Remember: before you can use the Numpy package, you need to import Numpy into your code.) Among Python programmers and data scientists, the common convention is to import Numpy with the alias ‘ You can do that with the following code: import numpy as np If you import Numpy with this alias, you can call the Numpy variance function as Ok, that being said, let’s take a closer look at the syntax. np.var syntaxAt a high level, the syntax for np.var looks something like this: We typically call the function as Then inside of the parenthesis, there are several parameters that control the exact behavior of the function. Let’s look at those parameters. The parameters of numpy.stdThere are a few important parameters you should know for the np.var function:
Let’s take a look at these one at a time. a (required)First of all, we have the The This input can actually take a few possible forms. You can provide a Numpy array as the argument to this parameter, but you can also use “array like” objects. These include Python lists and similar Python sequences. Keep in mind that you must provide an argument to this parameter (since the argument to this parameter is the input to the function). Having said that, you don’t need to explicitly use this parameter. So for example, if you have an input array called I’ll show you examples of this in example 1. axis (optional) The If you don’t understand axes, then I recommend that you read our tutorial about Numpy axes. Having said that, here’s a quick overview. Numpy arrays have axes. Axes are like directions (much like the x and y axes in a Cartesian space). By default, if we don’t use the However, if you do use the This enables you to compute things like the row variances and column variances. As I said earlier, to really understand this, you need to understand how axes work. That said, I’ll show you some examples in the examples section, to help you understand. dtype (optional) The If the values in the
input array are floats, then If the values in the input array are integers, then Otherwise, you can manually specify an alternative datatype using the ddof (optional) The To understand this, it helps to look again at exactly how we compute variance.
Notice that when we calculate variance for a population, the first term in the equation is . Here, would be the total number of values in your Numpy array or dataset.But in statistics, the computation changes slightly when we calculate the variance for a sample. When we calculate the variance for a sample, we replace with , where is the number of elements in the sample, such that:
Ultimately, what you need to understand is that the variance calculation is typically different for a population vs a sample. Now back to Numpy … To implement this in Numpy, we can use the This enables you to specify the “degrees of freedom” for the calculation When we use (2) By default, And you can set it to If I’m being honest, this is a little complex and difficult to understand. If you’re interested in learning more about this, you can check out this tutorial on Khan academy. out
(optional) This output array should have the same shape as the expected output. keepdims (optional) The To understand this, think about what happens when we use the Numpy variance function. When we use np.var, we’re taking an input array, possibly with multiple dimensions, and we’re summarizing that multi-dimensional structure down to a single number as an output. So if your input array has 2 dimensions, and you use np.var with the default setting, it’s going to produce a floating point number (a scalar value) as an output. The input might have multiple dimensions, but the output will be a scalar. But what if you want to have the same number of dimensions in the output? What if you want that output number to be formatted as a 2-dimensional Numpy array (so that the output has the same dimensions as the input). You can do that with the By default, But if you set To be clear, this is a little hard to understand without an example. So let’s move on to some examples of Numpy variance. Examples: how to compute the variance with NumpyHere, we’ll work through a few examples of the Numpy variance function. We’ll start with a simple example and then increase the complexity from there. Examples:
Run this code firstBefore you run any of these examples, you need to import the Numpy module. You can do that with the following code: import numpy as np This will import Numpy with the alias “ EXAMPLE 1: Calculate variance of a 1 dimensional arrayIn this example, we’ll simply calculate the variance of a 1 dimensional Numpy array. Create 1D arrayFirst, we’ll create our 1-dimensional array: array_1d = np.array([12, 14, 99, 72, 42, 55, 72])Calculate standard dev Next, we’ll calculate the variance of the numbers in the array. np.var(array_1d) OUT: 880.2040816326529Explanation Here, Numpy variance computed the variance of the numbers In this example, we didn’t explicitly use the That’s because the Having said that,
it’s also possible to explicitly use the np.var(a = array_1d) OUT: 880.2040816326529 EXAMPLE 2: Calculate the variance of a 2-dimensional arrayNext, in this example, we’ll calculate the variance of a 2-dimensional Numpy array. Create 2-dimensional arrayFirst, we’ll create a 2D array of integers with Numpy random randint. np.random.seed(22) array_2d = np.random.randint(size =(3, 4), low = 0, high = 20) This Numpy array has 3 rows and 4 columns. Let’s also print out the array, so we can see the contents: print(array_2d) OUT: [[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]] As you can see, Now, let’s compute the variance of the values in the array. np.var(array_2d) OUT: 25.07638888888889Explanation When we use numpy.var on a 2-dimensional or multi-dimensional array, then by default, it computes the variance of all of the values. So in this case, np.var is computing the variance of all 12 integers, and the variance is 25.07638888888889. EXAMPLE 3: Compute the variance of the columnsNow, in this example, we’ll compute the variance of the columns. How? To do this, we need to use the axis parameter. Remember what I mentioned in the section explaining the axis parameter. Numpy axes are like directions along a Numpy array. And specifically, for a 2D array, axis 0 is the axis that points downwards. So to calculate the column variance, we need to set Let’s take a look. Create 2-dimensional arrayFirst, we’ll create our 2D array. (Note that this is the same array that we created in example 2, so if you already created it there, then you don’t need to re-run this code.) np.random.seed(22) array_2d = np.random.randint(size =(3, 4), low = 0, high = 20) And let’s quickly print it out, so you can see the contents. print(array_2d) OUT: [[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]] Again, this is a 3 by 4 array with 12 random integers. Use np.std to compute standard deviation of the columnsNext, we’ll use Numpy variance with axis = 0. np.var(array_2d, axis = 0) OUT: array([38.22222222, 1.55555556, 28.66666667, 2.])Explanation Here, we computed the variance in the axis-0 direction. Effectively, this causes Numpy var to compute the column variances. Next, let’s compute the row variances. EXAMPLE 4: Use np.var to compute the variances of the rowsHere, we’re going to use the np.var technique to compute the row variances. That is, we’re going to compute the variance along the axis-1 direction. Again, remember what I said earlier: Numpy axes are like directions. And in a 2D array, axis-1 points horizontally, like this: So to compute the variance in this direction, we need to set Let’s take a look. Create 2-dimensional arrayAgain, here we’ll quickly create a 2D array. (This is the same array that we created in example 2, so if you already created it, you shouldn’t need to create it again.) np.random.seed(22) array_2d = np.random.randint(size =(3, 4), low = 0, high = 20) And let’s print it out: print(array_2d) OUT: [[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]] Again, this is just a 2-dimensional array with 3 rows and 4 columns that contains random integers. Use np.var to compute variances of the rowsNow that we have our array, we’ll compute the row variances. To do this, we’ll call np.var with np.var(array_2d, axis = 1) OUT: array([19. , 6.6875 , 15.5 ])Explanation This example is very similar to example 3, except here, we’re setting When we use Numpy variance with Effectively, this computes the row variances. EXAMPLE 5: Change the degrees of freedomIn this example, we’re going to change the degrees of freedom. Remember what I said in the section about the ddof parameter. When we compute a population variance, we typically set the degrees of freedom to 0 (i.e., However, when we compute a sample variance we typically need to set the degrees of freedom to 1. To do that
with Numpy variance, we need to explicitly set I’ll show you how in this example. We’re going to first create a large array of numbers. Then we’ll take a random sample from that array, and compute the variance of that sample. Create Numpy arrayFirst, let’s create our “population” array. We’ll create an array of 100 normally distributed numbers with a mean of 0 and standard deviation of 10. np.random.seed(22) population_array = np.random.normal(size = 100, loc = 0, scale = 10) Create sampleNext, we’ll take a random sample of 10 items from np.random.seed(22) sample_array = np.random.choice(population_array, size = 10) So now, Calculate the variance of the sampleNext, we’ll use Numpy variance to calculate the variance of the sample. Again, remember what I said earlier: when we compute a sample variance, we typically need to set the degrees of freedom to 1. So here, we’re going to call np.var with np.var(sample_array, ddof = 1) OUT: 40.14434256384447 ExplanationHere, we’ve calculated:
So if we set
I realize that this might be a little confusing, and the reason why we do this is a little technical. To understand this better I recommend that you watch lesson from Khan Academy. EXAMPLE 6: Use the keepdims parameter in np.varFinally, I’ll show you how to use the Remember that by default, the But here in this example, we’re going to change that and set Create 2-dimensional arrayFirst, we’ll create a 2D array of random integers. (This is the same array that we created in example 2, so if you already created it there, you don’t need to create it again.) np.random.seed(22) array_2d = np.random.randint(size =(3, 4), low = 0, high = 20) And we can print it to look at the contents: print(array_2d) OUT: [[ 4 12 0 4] [ 6 11 8 4] [18 14 13 7]]Check the dimensions Now that we have our array, let’s check the dimensions. This is important because the array_2d.ndim OUT: 2 As you can see, this array, Next, let’s just compute the variance without using output = np.var(array_2d) Just for reference, let’s print the output: print(output) OUT: 25.07638888888889 The variance of the input array is 25.07638888888889. Now, let’s check the dimensions of the output. output.ndim OUT: 0 Do you see that? The output array Why? When we use np.var with But what if we want to force the output to have 2 dimensions? To do that, we can use the Keep the original dimensions when we use np.varIn the following code, we’re going use np.var and set the output_2d = np.var(array_2d, keepdims = True) And let’s print the output: print(output_2d) OUT: [[25.07638889]] Notice that the value of the output (the variance) is the same. The variance is 25.07638889. But the value is enclosed inside of double brackets. Why? Let’s inspect the output to take a closer look. type(output_2d) OUT: numpy.ndarray Ok. So immediately, we can see that Now that we know it’s a Numpy array, let’s check the dimensions: output_2d.ndim OUT: 2 Notice then that So what happened here? We called Numpy
variance with I’ve kept this simple for the sake of clarity, but it might not be immediately obvious why we would do this. As an exercise, you should try running this code with Ultimately, if you ever need your output to have the same dimensions as the input, you can do that by setting Frequently asked questions about Numpy standard deviationNow that you’ve learned how to use the Numpy variance function let’s look at a common question. Frequently asked questions:
Question 1: Why isn’t np.var calculating the sample variance properly?As I noted earlier, by default, the Numpy variance function calculates the population variance.
But to calculate the sample variance, you need to compute:
Notice the difference in the leading fraction. Essentially, you need to compute the variance with the leading term set to To do this, you need to run np.var with the For a full explanation, check out example 5 in this tutorial. Leave your other questions in the comments belowDo you have other questions about how to use the Numpy variance function? If so, leave your questions in the comments section near the bottom of the page. Join our course to learn more about NumpyAre you interested in learning more about Numpy? This tutorial should have shown you how to use the Numpy variance function, but if you want to master Numpy, there’s a lot more to learn. If you’re ready to master Numpy you should join our premium course, Numpy Mastery. In this course, you’ll learn everything you need to know about Numpy.
Additionally, you’ll discover our unique practice system that will enable you to memorize all of the syntax that you learn. Once you take this course and practice like we show you, you’ll be able to write Numpy code fluently, accurately, and 100% from memory. Find out more here: Learn More About Numpy Mastery Is it easy to calculate variance in Python?It is very easy to calculate variance in Python. With Numpy it is even easier. There is dedicated function in Numpy module to calculate variance. Sea also How to rotate a matrix?
How to calculate the variance of a NumPy array?One can calculate the variance by using numpy.var () function in python. dtype: Type to use in computing the variance. out: Alternate output array in which to place the result. keepdims: If this is set to True, the axes which are reduced are left in the result as dimensions with size one Standard Deviation is the square root of variance.
What is NumPy in Python?It is the fundamental package for scientific computing with Python. Numpy provides very easy methods to calculate the average, variance, and standard deviation.
What is return type of NumPy var () function?Return type of Numpy var () function in Python: Returns variance of the data elements of the input array. If out=None, returns a new array containing the variance; otherwise, a reference to the output array is returned. Example of Numpy Variance:
How do you calculate variance in Python?Steps to Finding Variance. Find a mean of the set of data.. Subtract each number from a mean.. Square the result.. Add the results together.. Divide a result by the total number of numbers in the data set.. How does Python NumPy calculate standard deviation?The standard deviation is the square root of the average of the squared deviations from the mean, i.e., std = sqrt(mean(x)) , where x = abs(a - a. mean())**2 . The average squared deviation is typically calculated as x. sum() / N , where N = len(x) .
How variance is calculated?The variance is a measure of variability. It is calculated by taking the average of squared deviations from the mean. Variance tells you the degree of spread in your data set. The more spread the data, the larger the variance is in relation to the mean.
How does NumPy calculate mode?How to find the mode of a NumPy array in Python. print(array). mode_info = stats. mode(array). print(mode_info[0]). |