Table of Contents
Summary
1. Model Fit
A machine learning model can perform poorly due to how well it “fits” the data.
- Overfitting
- The model performs extremely well on training data but poorly on new/unseen data.
- It tries to memorize every data point instead of learning the general pattern.
- Leads to high variance.
- Underfitting
- The model performs poorly even on training data.
- Usually happens when the model is too simple or features are inadequate.
- Leads to high bias.
- Balanced Model
- The ideal situation.
- The model captures the overall trend without memorizing the data.
- Results in low bias and low variance.

2. Bias
- Bias is the error between predicted values and actual values.
- High bias means the model makes incorrect assumptions and fails to learn patterns properly.
- Example: using a straight-line model for nonlinear data.
- High bias usually causes underfitting.
Ways to reduce bias:
- Use a more complex model.
- Add better or more relevant features.
3. Variance
- Variance measures how much the model changes when trained on different datasets.
- High variance means the model is too sensitive to small changes in training data.
- High variance usually causes overfitting.
Ways to reduce variance:
- Use fewer or more important features.
- Use techniques like train-test splitting and cross-validation.
Key Relationship
| Condition | Bias | Variance | Result |
|---|---|---|---|
| Underfitting | High | Low | Poor learning |
| Overfitting | Low | High | Poor generalization |
| Balanced Model | Low | Low | Best performance |
Main Takeaway
A good machine learning model requires a balance between bias and variance:
- Too much bias → model is too simple.
- Too much variance → model is too complex.
- The goal is a model that generalizes well to unseen data while still learning meaningful patterns from training data.