How to generate covariance and correlation table in python

In this short guide, I’ll show you how to create a Correlation Matrix using Pandas. I’ll also review the steps to display the matrix using Seaborn and Matplotlib.

To start, here is a template that you can apply in order to create a correlation matrix using pandas:

df.corr()

Next, I’ll show you an example with the steps to create a correlation matrix for a given dataset.

Step 1: Collect the Data

Firstly, collect the data that will be used for the correlation matrix.

For example, I collected the following data about 3 variables:

A B C
45 38 10
37 31 15
42 26 17
35 28 21
39 33 12

Step 2: Create a DataFrame using Pandas

Next, create a DataFrame in order to capture the above dataset in Python:

import pandas as pd

data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = pd.DataFrame(data,columns=['A','B','C'])
print (df)

Once you run the code, you’ll get the following DataFrame:

How to generate covariance and correlation table in python

Step 3: Create a Correlation Matrix using Pandas

Now, create a correlation matrix using this template:

df.corr()

This is the complete Python code that you can use to create the correlation matrix for our example:

import pandas as pd

data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = pd.DataFrame(data,columns=['A','B','C'])

corrMatrix = df.corr()
print (corrMatrix)

Run the code in Python, and you’ll get the following matrix:

How to generate covariance and correlation table in python

Step 4 (optional): Get a Visual Representation of the Correlation Matrix using Seaborn and Matplotlib

You can use the seaborn and matplotlib packages in order to get a visual representation of the correlation matrix.

First import the seaborn and matplotlib packages:

import seaborn as sn
import matplotlib.pyplot as plt

Then, add the following syntax at the bottom of the code:

sn.heatmap(corrMatrix, annot=True)
plt.show()

So the complete Python code would look like this:

import pandas as pd
import seaborn as sn
import matplotlib.pyplot as plt

data = {'A': [45,37,42,35,39],
        'B': [38,31,26,28,33],
        'C': [10,15,17,21,12]
        }

df = pd.DataFrame(data,columns=['A','B','C'])

corrMatrix = df.corr()
sn.heatmap(corrMatrix, annot=True)
plt.show()

Run the code, and you’ll get the following correlation matrix:

How to generate covariance and correlation table in python

That’s it! You may also want to review the following source that explains the steps to create a Confusion Matrix using Python. Alternatively, you may check this guide about creating a Covariance Matrix in Python.

A correlation matrix is a table containing correlation coefficients between variables. Each cell in the table represents the correlation between two variables. The value lies between -1 and 1. A correlation matrix is used to summarize data, as a diagnostic for advanced analyses and as an input into a more advanced analysis. The two key components of the correlation are:

  • Magnitude: larger the magnitude, stronger the correlation.
  • Sign: if positive, there is a regular correlation. If negative, there is an inverse correlation.

A correlation matrix has been created using the following two libraries:

  1. Numpy Library
  2. Pandas Library

Method 1: Creating a correlation matrix using Numpy library

Numpy library make use of corrcoef() function that returns a matrix of 2×2. The matrix consists of correlations of x with x (0,0), x with y (0,1), y with x (1,0) and y with y (1,1). We are only concerned with the correlation of x with y i.e. cell (0,1) or (1,0). See below for an example.

Example 1: Suppose an ice cream shop keeps track of total sales of ice creams versus the temperature on that day.

Python3

import numpy as np

x = [215, 325, 185, 332, 406, 522, 412,

     614, 544, 421, 445, 408],

y = [14.2, 16.4, 11.9, 15.2, 18.5, 22.1,

     19.4, 25.1, 23.4, 18.1, 22.6, 17.2]

matrix = np.corrcoef(x, y)

print(matrix)

Output

[[1.         0.95750662]
 [0.95750662 1.        ]]

From the above matrix, if we see cell (0,1) and (1,0) both have the same value equal to 0.95750662 which lead us to conclude that whenever the temperature is high we have more sales.

Example 2: Suppose we are given glucose level in boy respective to age. Find correlation between age(x) and glucose level in body(y).

Python3

import numpy as np

x = [43, 21, 25, 42, 57, 59]

y = [99, 65, 79, 75, 87, 81]

matrix = np.corrcoef(x, y)

print(matrix)

Output

[[1.        0.5298089]
 [0.5298089 1.       ]]

From the above correlation matrix, 0.5298089 or 52.98% that means the variable has a moderate positive correlation.

Method 2: Creating correlation matrix using Pandas library 

In order to create a correlation matrix for a given dataset, we use corr() method on dataframes.

Example 1:

Python3

import pandas as pd

data = {

    'x': [45, 37, 42, 35, 39],

    'y': [38, 31, 26, 28, 33],

    'z': [10, 15, 17, 21, 12]

}

dataframe = pd.DataFrame(data, columns=['x', 'y', 'z'])

print("Dataframe is : ")

print(dataframe)

matrix = dataframe.corr()

print("Correlation matrix is : ")

print(matrix)

 Output:

Dataframe is : 
    x   y   z
0  45  38  10
1  37  31  15
2  42  26  17
3  35  28  21
4  39  33  12
Correlation matrix is :
          x         y         z
x  1.000000  0.518457 -0.701886
y  0.518457  1.000000 -0.860941
z -0.701886 -0.860941  1.000000

Example 2:

CSV File used:

How to generate covariance and correlation table in python

Python3

import pandas as pd

dataframe = pd.read_csv("C:\\GFG\\sample.csv")

print(dataframe)

matrix = dataframe.corr()

print("Correlation Matrix is : ")

print(matrix)

Output:

Correlation Matrix is : 
                     AVG temp C  Ice Cream production
AVG temp C              1.000000              0.718032
Ice Cream production    0.718032              1.000000

How do you calculate covariance and correlation in Python?

cov() function. Covariance provides the a measure of strength of correlation between two variable or more set of variables. The covariance matrix element Cij is the covariance of xi and xj.

How do I make a correlation chart in Python?

Method 1: Creating a correlation matrix using Numpy library Numpy library make use of corrcoef() function that returns a matrix of 2×2. The matrix consists of correlations of x with x (0,0), x with y (0,1), y with x (1,0) and y with y (1,1).

How do you create a correlation matrix in pandas?

Steps to Create a Correlation Matrix using Pandas.
Step 1: Collect the Data. ... .
Step 2: Create a DataFrame using Pandas. ... .
Step 3: Create a Correlation Matrix using Pandas. ... .
Step 4 (optional): Get a Visual Representation of the Correlation Matrix using Seaborn and Matplotlib..

Which command is used to generate the correlation matrix in Python?

The Pearson Correlation coefficient can be computed in Python using corrcoef() method from Numpy. The input for this function is typically a matrix, say of size mxn , where: Each column represents the values of a random variable. Each row represents a single sample of n random variables.