Machine Learning Model Fit

Table of Contents

A machine learning model can perform poorly due to how well it “fits” the data.

Overfitting
- The model performs extremely well on training data but poorly on new/unseen data.
- It tries to memorize every data point instead of learning the general pattern.
- Leads to high variance.
Underfitting
- The model performs poorly even on training data.
- Usually happens when the model is too simple or features are inadequate.
- Leads to high bias.
Balanced Model
- The ideal situation.
- The model captures the overall trend without memorizing the data.
- Results in low bias and low variance.

Bias is the error between predicted values and actual values.
High bias means the model makes incorrect assumptions and fails to learn patterns properly.
Example: using a straight-line model for nonlinear data.
High bias usually causes underfitting.

Ways to reduce bias:

Variance measures how much the model changes when trained on different datasets.
High variance means the model is too sensitive to small changes in training data.
High variance usually causes overfitting.

Ways to reduce variance:

Condition	Bias	Variance	Result
Underfitting	High	Low	Poor learning
Overfitting	Low	High	Poor generalization
Balanced Model	Low	Low	Best performance

A good machine learning model requires a balance between bias and variance:

Too much bias → model is too simple.
Too much variance → model is too complex.
The goal is a model that generalizes well to unseen data while still learning meaningful patterns from training data.

Categorized in: