Educative Answers Team
Model fitting is the measure of how well a machine learning model generalizes data similar to that with which it was trained. A good model fit refers to a model that accurately approximates the output when it is provided with unseen inputs.
Fitting refers to adjusting the parameters in the model to improve accuracy. The process involves running an algorithm on data for which the target variable [“labeled” data] is known to produce a machine learning model. Then, the model’s outcomes are compared to the real, observed values of the target variable to determine the accuracy.
The next step involves adjusting the algorithm’s standard parameters in order to reduce the level of error and make the model more accurate when determining the relationship between the features and the target variable. This process is repeated several times until the model finds the optimal parameters to make predictions with substantial accuracy.
Overfitting and Underfitting
Overfitting negatively impacts the performance of the model on new data. It occurs when a model learns the details and noise in the training data too efficiently. When random fluctuations or the noise in the training data are picked up and learned as concepts by the model, the model “overfits”. It will perform well on the training set, but very poorly on the test set. This negatively impacts the model’s ability to generalize and make accurate predictions for new data.
Underfitting happens when the machine learning model cannot sufficiently model the training data nor generalize new data. An underfit machine learning model is not a suitable model; this will be obvious as it will have a poor performance on the training data.
RELATED TAGS
data sciences
data mining
data
jargon
Copyright ©2022 Educative, Inc. All rights reserved
In this course, we will expand our exploration of statistical inference techniques by focusing on the science and art of fitting statistical models to data. We will build on the concepts presented in the Statistical Inference course [Course 2] to emphasize the importance of connecting research questions to our data analysis methods. We will also focus on various modeling objectives, including making inference about relationships between variables and generating predictions for future observations. This course will introduce and explore various statistical modeling techniques, including linear regression, logistic regression, generalized linear models, hierarchical and mixed effects [or multilevel] models, and Bayesian inference techniques. All techniques will be illustrated using a variety of real data sets, and the course will emphasize different modeling approaches for different types of data sets, depending on the study design underlying the data [referring back to Course 1, Understanding and Visualizing Data with Python]. During these lab-based sessions, learners will work through tutorials focusing on specific case studies to help solidify the week’s statistical concepts, which will include further deep dives into Python libraries including Statsmodels, Pandas, and Seaborn. This course utilizes the Jupyter Notebook environment within Coursera.
View Syllabus
Skills You'll Learn
Bayesian Statistics, Python Programming, Statistical Model, statistical regression
Reviews
5 stars
65.33%
4 stars
20.44%
3 stars
8.46%
2 stars
3.35%
1 star
2.39%
ET
Jul 1, 2020
Awesome overview about what can we do with statictics knowlegde! Half theory, half practice with Python is a great format
NA
Dec 20, 2019
Challenging but excellent course, especially how content was organized and examples used to explain concepts
From the lesson
WEEK 1 - OVERVIEW & CONSIDERATIONS FOR STATISTICAL MODELING
We begin this third course of the Statistics with Python specialization with an overview of what is meant by “fitting statistical models to data.” In this first week, we will introduce key model fitting concepts, including the distinction between dependent and independent variables, how to account for study designs when fitting models, assessing the quality of model fit, exploring how different types of variables are handled in statistical modeling, and clearly defining the objectives of fitting models.
Taught By
Brenda Gunderson
Lecturer IV and Research Fellow
Brady T. West
Research Associate Professor