The F1 Score is a model evaluation metric used mainly in classification problems to measure the balance between:
- Precision
- Recall
It is especially useful when:
- Data is imbalanced
- False positives and false negatives are both important
Simple Definition
The F1 Score is the harmonic mean of Precision and Recall.
Formula:
F1=2×Precision+RecallPrecision×Recall
Understanding Precision and Recall
Precision
Out of all predicted positive cases, how many were actually correct?
Formula:
Precision=TP+FPTP
Where:
- TP = True Positives
- FP = False Positives
Example
If a fraud detection model flags 100 transactions as fraud, and only 80 are actually fraud:
Precision = 80%
Recall
Out of all actual positive cases, how many did the model correctly identify?
Formula:
Recall=TP+FNTP
Where:
- FN = False Negatives
Example
If there were actually 100 fraudulent transactions, and model identified 80:
Recall = 80%
Why F1 Score is Important
Sometimes:
- Precision is high but recall is low
- Recall is high but precision is low
F1 Score balances both.
Example
Suppose:
| Metric | Value |
|---|---|
| Precision | 0.8 |
| Recall | 0.6 |
Then:
F1=2×0.8+0.60.8×0.6
Result:
F1 = 0.6857
Interpretation
| F1 Score | Meaning |
|---|---|
| 1.0 | Perfect model |
| 0.8+ | Very good |
| 0.5 | Moderate |
| 0 | Poor |
Higher F1 score means:
- Better balance between precision and recall
- Better classification performance
When to Use F1 Score
Use F1 Score When:
- Dataset is imbalanced
- Fraud detection
- Medical diagnosis
- Spam detection
- Risk prediction
- Rare event prediction
Real-Life Example
Healthcare Example
Suppose AI predicts cancer.
False Negative
Patient actually has cancer but model says no.
Very dangerous.
False Positive
Patient doesn’t have cancer but model says yes.
Still problematic.
You need both:
- High precision
- High recall
Hence F1 Score becomes important.
Difference Between Accuracy and F1 Score
| Accuracy | F1 Score |
|---|---|
| Overall correctness | Balance of precision & recall |
| Bad for imbalanced data | Better for imbalanced data |
| Can be misleading | More realistic |
Example of Accuracy Problem
Suppose:
- 990 non-fraud transactions
- 10 fraud transactions
Model predicts:
- Everything as non-fraud
Accuracy:
990 / 1000 = 99%
Looks great — but model found ZERO frauds.
F1 Score would expose this weakness.

Friendly Answer
“F1 Score is a machine learning evaluation metric used for classification problems. It is the harmonic mean of precision and recall and helps measure the balance between false positives and false negatives. F1 Score is especially useful when working with imbalanced datasets such as fraud detection, healthcare diagnosis, and anomaly detection.”