Overfitting (Intuition) Examples in Math

Start with the recap, study the fully worked examples, then use the practice problems to check your understanding of Overfitting (Intuition).

This page combines explanation, solved examples, and follow-up practice so you can move from recognition to confident problem-solving in Math.

Concept Recap

Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.

The model memorized the training data instead of learning the underlying pattern.

Read the full concept explanation →

How to Use These Examples

  • Read the first worked example with the solution open so the structure is clear.
  • Try the practice problems before revealing each solution.
  • Use the related concepts and background knowledge badges if you feel stuck.

What to Focus On

Core idea: Overfitting is when a model learns the noise in its training data and then stumbles on new data.

Common stuck point: The procedure for overfitting (intuition) is the easy part; the trap is picking the model with the lowest training error. Asking "Does the model do much better on the data it was trained on than on fresh data?" first is what keeps a correct-looking calculation from being attached to the wrong concept.

Sense of Study hint: Ask: Does the model do much better on the data it was trained on than on fresh data?

Worked Examples

Example 1

medium
A model fits 10 data points with a degree-9 polynomial (perfect fit, R2=1R^2=1). A simpler linear model has R2=0.85R^2=0.85. Explain which model is better for prediction and why.

Answer

The linear model (R²=0.85) is better for prediction; the polynomial overfits noise and won't generalize.

First step

1
Degree-9 polynomial: 10 parameters for 10 points — fits every point exactly (interpolates), but captures noise not just signal

See the full worked solution + why-it-works coaching

SetupKey insightWhy it worksCommon pitfallConnection

Unlock answer keys One Family plan — every worked solution, all subjects

Example 2

hard
A machine learning model is trained on 1000 observations with 50 predictors. Training error is near zero; test error on 200 held-out observations is very high. Diagnose the problem and suggest two remedies.

Example 3

medium
You add features one at a time. Training R2R^2 keeps rising; validation R2R^2 rises, then peaks, then falls. What does the peak mark?

Example 4

medium
A student memorizes 200 sample exam questions verbatim and scores 100%100\% on the sample. On the real exam (new questions) she scores 60%60\%. Which type of model failure does this mimic?

Example 5

medium
A neural network's training loss falls steadily, but validation loss rises after epoch 25. What technique can you use at epoch 25 to prevent overfitting?

Example 6

hard
A study reports an R2=0.99R^2 = 0.99 for a model with 50 predictors fit on 60 observations. The author claims high explanatory power. Critique this claim.

Example 7

hard
Two models are compared by AIC: Model 1 fits very well but has many parameters. AIC penalizes the parameter count. Why does AIC help guard against overfitting?

Example 8

challenge
You build a stock-trading model that achieves 100%100\% accuracy on historical data from 2010–2020. In live trading during 2021 it loses money consistently. Explain what likely happened and what you would change.

Practice Problems

Try these problems on your own first, then open the solution to compare your method.

Example 1

easy
A student memorizes all 500 practice problems but performs poorly on the exam, which has new problems. How does this analogy illustrate overfitting?

Example 2

hard
Explain the bias-variance tradeoff: how does increasing model complexity affect bias and variance, and where is the optimal model?

Example 3

easy
A model gets 100% accuracy on training data but 60% on test data. Overfit or underfit?

Example 4

easy
Overfitting means the model learned the ___ in the training data instead of the underlying pattern.

Example 5

easy
A wiggly curve passes through every single data point exactly. Likely overfit or well-fit?

Example 6

easy
To detect overfitting, you must compare training error with error on ___ data.

Example 7

easy
Adding more and more variables to push training accuracy up, without testing, risks ___.

Example 8

easy
Train error 1%, test error 25%. Is the model generalizing well?

Example 9

easy
Which model is more likely overfit: a straight line, or a degree-15 polynomial through 16 points?

Example 10

easy
Overfitting performs well on ___ data but poorly on new data.

Example 11

medium
Model A: train acc 92%, test acc 90%. Model B: train acc 99%, test acc 70%. Which is overfit and which to deploy?

Example 12

medium
How does adding a regularization penalty (e.g. shrinking coefficients) help with overfitting?

Example 13

medium
Splitting data into train and validation sets, you pick the model with lowest ___ error to avoid overfitting.

Example 14

medium
As model complexity increases, training error keeps falling but test error first falls then rises. The rise indicates ___.

Example 15

medium
More training data tends to reduce overfitting. Why?

Example 16

medium
A model memorizes the training set, scoring perfectly there but failing on a held-out set. What single word names this?

Example 17

medium
If both training and test error are high, is the problem overfitting or underfitting?

Example 18

medium
A spam filter is tuned until it flags every training email perfectly, but it misses new spam. What happened?

Example 19

medium
Cross-validation reports lower error than the model's training error suggests. Why is cross-validation a better guide against overfitting?

Example 20

challenge
A polynomial of degree dd is fit to 8 points. At what degree can it pass through all points exactly (interpolate), risking maximal overfit?

Example 21

challenge
Model train errors by complexity: {c1:10,c2:4,c3:1}\{c1:10, c2:4, c3:1\}; test errors: {c1:11,c2:5,c3:9}\{c1:11, c2:5, c3:9\}. Which complexity should you pick and why?

Example 22

challenge
A model's variance contributes error 0.2c0.2c and bias contributes 8c\frac{8}{c} at complexity cc. Total error =0.2c+8c=0.2c+\frac{8}{c}. Find the cc minimizing total error.

Example 23

easy
Model A: training accuracy 99%99\%, test accuracy 72%72\%. Is this best described as overfit, underfit, or well-fit?

Example 24

easy
Which curve is more likely to overfit 10 data points: a straight line or a degree-9 polynomial?

Example 25

easy
A model's R2R^2 on training data is 1.01.0 but 0.20.2 on test data. What is the diagnosis?

Example 26

easy
True or false: a perfect 0%0\% training error always indicates the model will generalize well.

Example 27

medium
You have 30 observations and 40 candidate predictors. Without regularization, what risk dominates?

Example 28

medium
Name two practical remedies that reduce overfitting.

Example 29

medium
A decision tree is grown until every leaf contains exactly one training example. Training error is 00. Most likely diagnosis?

Example 30

medium
True or false: adding a regularization term like λβj2\lambda \sum \beta_j^2 typically increases training error but reduces test error when the model is overfit.

Example 31

medium
You hold out 20%20\% of your data as a test set. Why is it dangerous to repeatedly tweak the model based on test-set performance?

Example 32

medium
Cross-validation reports a mean R2R^2 of 0.450.45 across folds, but a single train/test split shows R2=0.85R^2 = 0.85 on training and 0.300.30 on test. Which estimate of generalization is more trustworthy?

Example 33

hard
As model complexity rises with sample size nn fixed, sketch what happens to training error vs. test error.

Example 34

hard
A random forest trained on 1000 examples has training error 1%1\% and test error 3%3\%. Is this concerning overfitting?

Example 35

hard
True or false: increasing the training-set size NEVER reduces overfitting, no matter how large.

Example 36

hard
For ridge regression β^=argminyXβ2+λβ2\hat\beta = \arg\min \|y - X\beta\|^2 + \lambda \|\beta\|^2, what happens to overfitting risk as λ0\lambda \to 0?

Example 37

hard
You train 100 models with random feature subsets and pick the one with the best test error. Why is the reported test error optimistic?

Example 38

hard
Why does dropping out random neurons during training help prevent overfitting in neural networks?

Background Knowledge

These ideas may be useful before you work through the harder examples.

model fit intuition