Calculate conditional probability in python

Use python to calculate the conditional probability of a student getting an A in math given they missed 10 or more classes.

This article has 2 parts:
1. Theory behind conditional probability
2. Example with python

Part 1: Theory and formula behind conditional probability

For once, wikipedia has an approachable definition,

In probability theory, conditional probability is a measure of the probability of an event occurring given that another event has [by assumption, presumption, assertion or evidence] occurred.

Translation: given B is true, what is the probability that A is also true.

It’s easier to understand something with concrete examples. Below are a few random examples of conditional probabilities we could calculate.

Examples:

What’s the probability of someone sleeping less than 8 hours if they’re a college student.
What’s the probability of a dog living longer than 15 years if they’re a border collie.
What’s the probability of using all your vacation days if you work for the government.

Formula:

The formula for conditional probability is P[A|B] = P[A ∩ B] / P[B].

The parts:
P[A|B] = probability of A occurring, given B occurs
P[A ∩ B] = probability of both A and B occurring
P[B] = probability of B occurring

| means “given”. Meaning “in cases where something else occurs”.

∩ means intersection which you can think of as and, or the overlap in the context of a Venn diagram.

But why do we divide P[A ∩ B] by P[B]in the formula?

Because we want to exclude the probability of non-B cases. We’re scoping our probability to that falling within B.

Dividing byP[B] removes the probability of anything not B . C — B above.

Part 2: Example with python

We’re going to calculate the probability a student gets an A [80%+] in math, given they miss 10 or more classes.

Download the dataset from kaggle and inspect the data.

import pandas as pd
df = pd.read_csv['student-alcohol-consumption/student-mat.csv']
df.head[3]

And check the number of records.

len[df]
#=> 395

We’re only concerned with the columns, absences [number of absences], and G3 [final grade from 0 to 20].

Let’s create a couple new boolean columns based on these columns to make our lives easier.

Add a boolean column called grade_A noting if a student achieved 80% or higher as a final score. Original values are on a 0–20 scale so we multiply by 5.

df['grade_A'] = np.where[df['G3']*5 >= 80, 1, 0]

Make another boolean column called high_absenses with a value of 1 if a student missed 10 or more classes.

df['high_absenses'] = np.where[df['absences'] >= 10, 1, 0]

Add one more column to make building a pivot table easier.

df['count'] = 1

And drop all columns we don’t care about.

df = df[['grade_A','high_absenses','count']]
df.head[]

Nice. Now we’ll create a pivot table from this.

pd.pivot_table[
    df, 
    values='count', 
    index=['grade_A'], 
    columns=['high_absenses'], 
    aggfunc=np.size, 
    fill_value=0
]

We now have all the data we need to do our calculation. Let’s start by calculating each individual part in the formula.

In our case:
P[A] is the probability of a grade of 80% or greater.
P[B] is the probability of missing 10 or more classes.
P[A|B] is the probability of a 80%+ grade, given missing 10 or more classes.

Calculations of parts:
P[A] = [35 + 5] / [35 + 5 + 277 + 78] = 0.10126582278481013
P[B] = [78 + 5] / [35 + 5 + 277 + 78] = 0.21012658227848102
P[A ∩ B] = 5 / [35 + 5 + 277 + 78] = 0.012658227848101266

And per the formula, P[A|B] = P[A ∩ B] / P[B], put it together.

P[A|B] = 0.012658227848101266/ 0.21012658227848102= 0.06

There we have it. The probability of getting at least an 80% final grade, given missing 10 or more classes is 6%.

Conclusion

While the learning from our specific example is clear - go to class if you want good grades, conditional probability can be applied to more serious circumstances.

For example, the probability a person has a particular disease, given test results.

An understanding is also essential before diving into more complicated probability estimations using Bayes’ theorem.

How do you calculate conditional probability?

How Do You Calculate Conditional Probability? Conditional probability is calculated by multiplying the probability of the preceding event by the probability of the succeeding or conditional event.

Can Python calculate probability?

In fact, difficult probability problems can be solved in Python without needing to know a single math equation. Such an equation-free approach to probability requires a baseline understanding of what mathematicians call a sample space.

How do you find conditional probability with three variables?

We can rearrange the formula for conditional probability to get the so-called product rule:.

P[A,B] = p[A|B] p[B].

We can extend this for three variables:.

P[A,B,C] = P[A| B,C] P[B,C] = P[A|B,C] P[B|C] P[C].

and in general to n variables:.

P[A1, A2, ..., An] = P[A1| A2, ..., An] P[A2| A3, ..., An] P[An-1|An] P[An].

Use python to calculate the conditional probability of a student getting an A in math given they missed 10 or more classes.

Part 1: Theory and formula behind conditional probability

Examples:

Formula:

Part 2: Example with python

Conclusion

How do you calculate conditional probability?

Can Python calculate probability?

How do you find conditional probability with three variables?

Bài Viết Liên Quan

Toplist mới

Bài mới nhất

Chủ Đề