Table of Contents

Summary

1. Model Fit

A machine learning model can perform poorly due to how well it “fits” the data.

  • Overfitting
    • The model performs extremely well on training data but poorly on new/unseen data.
    • It tries to memorize every data point instead of learning the general pattern.
    • Leads to high variance.
  • Underfitting
    • The model performs poorly even on training data.
    • Usually happens when the model is too simple or features are inadequate.
    • Leads to high bias.
  • Balanced Model
    • The ideal situation.
    • The model captures the overall trend without memorizing the data.
    • Results in low bias and low variance.

2. Bias

  • Bias is the error between predicted values and actual values.
  • High bias means the model makes incorrect assumptions and fails to learn patterns properly.
  • Example: using a straight-line model for nonlinear data.
  • High bias usually causes underfitting.

Ways to reduce bias:

  • Use a more complex model.
  • Add better or more relevant features.

3. Variance

  • Variance measures how much the model changes when trained on different datasets.
  • High variance means the model is too sensitive to small changes in training data.
  • High variance usually causes overfitting.

Ways to reduce variance:

  • Use fewer or more important features.
  • Use techniques like train-test splitting and cross-validation.

Key Relationship

ConditionBiasVarianceResult
UnderfittingHighLowPoor learning
OverfittingLowHighPoor generalization
Balanced ModelLowLowBest performance

Main Takeaway

A good machine learning model requires a balance between bias and variance:

  • Too much bias → model is too simple.
  • Too much variance → model is too complex.
  • The goal is a model that generalizes well to unseen data while still learning meaningful patterns from training data.

Categorized in:

AI,