Scatter plot for all variables python

I want to make a method which will produce scatter plots for all independent variables in my dataset, but I have an error and I don't know why it appear in that case

class DataAnalysis[]:
  def __init__[self, X_train, X_test]:
    self.X_train = X_train # Train set
    self.X_test = X_test # Test set

  def multi_scatter[self,x_list, y]:
    length = np.ceil[len[x_list]/3].astype[int]
    for x in range[0, length]:
      fig, axs = plt.subplots[1,3, figsize = [20,10]]
      fig.suptitle['Independent variables correlation with target']
      axs[0,0].scatter[self.X_train[x_list[x]], self.X_train[y]]
      axs[0,0].set_title[x_list[x]]
      axs[0,1].scatter[self.X_train[x_list[x+1]], self.X_train[y]]
      axs[0,1].set_title[x_list[x+1]]
      axs[0,2].scatter[self.X_train[x_list[x+2]], self.X_train[y]]
      axs[0,2].set_title[x_list[x+2]]
      x *= 3
      plt.show[]

Here is an error i get:

IndexError                                Traceback [most recent call last]
 in []
----> 1 analyser.multi_scatter[x_list=train_columns,y=target]

 in multi_scatter[self, x_list, y]
      9       fig, axs = plt.subplots[1,3, figsize = [20,10]]
     10       fig.suptitle['Independent variables correlation with target']
---> 11       axs[0,0].scatter[ds_train['ExterQual'], ds_train['SalePrice']]
     12       axs[0,0].set_title[x_list[x]]
     13       axs[0,1].scatter[self.X_train[x_list[x+1]], self.X_train[y]]

IndexError: too many indices for array

Thank You in advance for Your help

Scatter plot is a graph in which the values of two variables are plotted along two axes. It is a most basic type of plot that helps you visualize the relationship between two variables.

Concept

  1. What is a Scatter plot?
  2. Basic Scatter plot in python
  3. Correlation with Scatter plot
  4. Changing the color of groups of points
  5. Changing the Color and Marker
  6. Scatter plot with Linear fit plot using seaborn
  7. Scatter Plot with Histograms using seaborn
  8. Bubble plot
  9. Exploratory Analysis using mtcars Dataset
    • Multiple line of best fits
    • Adjusting color and style for different categories
    • Text Annotation in Scatter Plot
    • Bubble Plot with categorical variables
    • Categorical Plot

What is a Scatter plot?

Scatter plot is a graph of two sets of data along the two axes. It is used to visualize the relationship between the two variables.

If the value along the Y axis seem to increase as X axis increases[or decreases], it could indicate a positive [or negative] linear relationship. Whereas, if the points are randomly distributed with no obvious pattern, it could possibly indicate a lack of dependent relationship.

In python matplotlib, the scatterplot can be created using the pyplot.plot[] or the pyplot.scatter[]. Using these functions, you can add more feature to your scatter plot, like changing the size, color or shape of the points.

So what is the difference between plt.scatter[] vs plt.plot[]?

The difference between the two functions is: with pyplot.plot[] any property you apply [color, shape, size of points] will be applied across all points whereas in pyplot.scatter[] you have more control in each point’s appearance.

That is, in plt.scatter[] you can have the color, shape and size of each dot [datapoint] to vary based on another variable. Or even the same variable [y]. Whereas, with pyplot.plot[], the properties you set will be applied to all the points in the chart.

First, I am going to import the libraries I will be using.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
plt.rcParams.update[{'figure.figsize':[10,8], 'figure.dpi':100}]

The plt.rcParams.update[] function is used to change the default parameters of the plot’s figure.

Basic Scatter plot in python

First, let’s create artifical data using the np.random.randint[]. You need to specify the no. of points you require as the arguments.

You can also specify the lower and upper limit of the random variable you need.

Then use the plt.scatter[] function to draw a scatter plot using matplotlib. You need to specify the variables x and y as arguments.

plt.title[] is used to set title to your plot.

plt.xlabel[] is used to label the x axis.

plt.ylabel[] is used to label the y axis.

Get Free Complete Python Course

Facing the same situation like everyone else?

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

Get Free Complete Python Course

Build your data science career with a globally recognised, industry-approved qualification. Get the mindset, the confidence and the skills that make Data Scientist so valuable.

# Simple Scatterplot
x = range[50]
y = range[50] + np.random.randint[0,30,50]
plt.scatter[x, y]
plt.rcParams.update[{'figure.figsize':[10,8], 'figure.dpi':100}]
plt.title['Simple Scatter plot']
plt.xlabel['X - value']
plt.ylabel['Y - value']
plt.show[]

You can see that there is a positive linear relation between the points. That is, as X increases, Y increases as well, because the Y is actually just X + random_number.

If you want the color of the points to vary depending on the value of Y [or another variable of same size], specify the color each dot should take using the c argument.

You can also provide different variable of same size as X.

# Simple Scatterplot with colored points
x = range[50]
y = range[50] + np.random.randint[0,30,50]
plt.rcParams.update[{'figure.figsize':[10,8], 'figure.dpi':100}]
plt.scatter[x, y, c=y, cmap='Spectral']
plt.colorbar[]
plt.title['Simple Scatter plot']
plt.xlabel['X - value']
plt.ylabel['Y - value']
plt.show[]

Lets create a dataset with exponentially increasing relation and visualize the plot.

# Scatterplot of non-random vzriables
x=np.arange[1,10,0.2]
y= np.exp[x]
plt.scatter[x,y]
plt.rcParams.update[{'figure.figsize':[10,8], 'figure.dpi':100}]
plt.title['Exponential Relation dataset']
plt.show[]

np.arrange[lower_limit, upper_limit, interval] is used to create a dataset between the lower limit and upper limit with a step of ‘interval’ no. of points.

Now you can see that there is a exponential relation between the x and y axis.

Correlation with Scatter plot

1] If the value of y increases with the value of x, then we can say that the variables have a positive correlation.

2] If the value of y decreases with the value of x, then we can say that the variables have a negative correlation.

3] If the value of y changes randomly independent of x, then it is said to have a zero corelation.

# Scatterplot and Correlations
# Data
x=np.random.randn[100]
y1= x*5 +9 
y2= -5*x
y3=np.random.randn[100]

# Plot
plt.rcParams.update[{'figure.figsize':[10,8], 'figure.dpi':100}]
plt.scatter[x, y1, label=f'y1 Correlation = {np.round[np.corrcoef[x,y1][0,1], 2]}']
plt.scatter[x, y2, label=f'y2 Correlation = {np.round[np.corrcoef[x,y2][0,1], 2]}']
plt.scatter[x, y3, label=f'y3 Correlation = {np.round[np.corrcoef[x,y3][0,1], 2]}']

# Plot
plt.title['Scatterplot and Correlations']
plt.legend[]
plt.show[]

In the above graph, you can see that the blue line shows an positive correlation, the orange line shows a negative corealtion and the green dots show no relation with the x values[it changes randomly independently].

Changing the color of groups of points

Use the color ='____' command to change the colour to represent scatter plot.

# Scatterplot - Color Change
x = np.random.randn[50]
y1 = np.random.randn[50]
y2= np.random.randn[50]

# Plot
plt.scatter[x,y1,color='blue']
plt.scatter[x,y2,color= 'red']
plt.rcParams.update[{'figure.figsize':[10,8], 'figure.dpi':100}]

# Decorate
plt.title['Color Change']
plt.xlabel['X - value']
plt.ylabel['Y - value']
plt.show[]

Changing the Color and Marker

Use the marker =_____ command to change the marker type in scatter plot.

[‘.’,’o’,’v’,’^’,’>’,'

Chủ Đề