Start with the recap, study the fully worked examples, then use the practice problems to
check your understanding of Overfitting (Intuition).
This page combines explanation, solved examples, and follow-up practice so you can move
from recognition to confident problem-solving in Math.
Concept Recap
Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.
The model memorized the training data instead of learning the underlying pattern.
Read the first worked example with the solution open so the structure is clear.
Try the practice problems before revealing each solution.
Use the related concepts and background knowledge badges if you feel stuck.
What to Focus On
Core idea:Overfitting is when a model learns the noise in its training data and then stumbles on new data.
Common stuck point:The procedure for overfitting (intuition) is the easy part; the trap is picking the model with the lowest training error. Asking "Does the model do much better on the data it was trained on than on fresh data?" first is what keeps a correct-looking calculation from being attached to the wrong concept.
Sense of Study hint:Ask: Does the model do much better on the data it was trained on than on fresh data?
Worked Examples
Example 1
medium
A model fits 10 data points with a degree-9 polynomial (perfect fit, R2=1). A simpler linear model has R2=0.85. Explain which model is better for prediction and why.
Answer
The linear model (R²=0.85) is better for prediction; the polynomial overfits noise and won't generalize.
First step
1
Degree-9 polynomial: 10 parameters for 10 points — fits every point exactly (interpolates), but captures noise not just signal
See the full worked solution + why-it-works coaching
Setup·Key insight·Why it works·Common pitfall·Connection
A machine learning model is trained on 1000 observations with 50 predictors. Training error is near zero; test error on 200 held-out observations is very high. Diagnose the problem and suggest two remedies.
Example 3
medium
You add features one at a time. Training R2 keeps rising; validation R2 rises, then peaks, then falls. What does the peak mark?
Example 4
medium
A student memorizes 200 sample exam questions verbatim and scores 100% on the sample. On the real exam (new questions) she scores 60%. Which type of model failure does this mimic?
Example 5
medium
A neural network's training loss falls steadily, but validation loss rises after epoch 25. What technique can you use at epoch 25 to prevent overfitting?
Example 6
hard
A study reports an R2=0.99 for a model with 50 predictors fit on 60 observations. The author claims high explanatory power. Critique this claim.
Example 7
hard
Two models are compared by AIC: Model 1 fits very well but has many parameters. AIC penalizes the parameter count. Why does AIC help guard against overfitting?
Example 8
challenge
You build a stock-trading model that achieves 100% accuracy on historical data from 2010–2020. In live trading during 2021 it loses money consistently. Explain what likely happened and what you would change.
Practice Problems
Try these problems on your own first, then open the solution to compare your method.
Example 1
easy
A student memorizes all 500 practice problems but performs poorly on the exam, which has new problems. How does this analogy illustrate overfitting?
Example 2
hard
Explain the bias-variance tradeoff: how does increasing model complexity affect bias and variance, and where is the optimal model?
Example 3
easy
A model gets 100% accuracy on training data but 60% on test data. Overfit or underfit?
Example 4
easy
Overfitting means the model learned the ___ in the training data instead of the underlying pattern.
Example 5
easy
A wiggly curve passes through every single data point exactly. Likely overfit or well-fit?
Example 6
easy
To detect overfitting, you must compare training error with error on ___ data.
Example 7
easy
Adding more and more variables to push training accuracy up, without testing, risks ___.
Example 8
easy
Train error 1%, test error 25%. Is the model generalizing well?
Example 9
easy
Which model is more likely overfit: a straight line, or a degree-15 polynomial through 16 points?
Example 10
easy
Overfitting performs well on ___ data but poorly on new data.
Example 11
medium
Model A: train acc 92%, test acc 90%. Model B: train acc 99%, test acc 70%. Which is overfit and which to deploy?
Example 12
medium
How does adding a regularization penalty (e.g. shrinking coefficients) help with overfitting?
Example 13
medium
Splitting data into train and validation sets, you pick the model with lowest ___ error to avoid overfitting.
Example 14
medium
As model complexity increases, training error keeps falling but test error first falls then rises. The rise indicates ___.
Example 15
medium
More training data tends to reduce overfitting. Why?
Example 16
medium
A model memorizes the training set, scoring perfectly there but failing on a held-out set. What single word names this?
Example 17
medium
If both training and test error are high, is the problem overfitting or underfitting?
Example 18
medium
A spam filter is tuned until it flags every training email perfectly, but it misses new spam. What happened?
Example 19
medium
Cross-validation reports lower error than the model's training error suggests. Why is cross-validation a better guide against overfitting?
Example 20
challenge
A polynomial of degree d is fit to 8 points. At what degree can it pass through all points exactly (interpolate), risking maximal overfit?
Example 21
challenge
Model train errors by complexity: {c1:10,c2:4,c3:1}; test errors: {c1:11,c2:5,c3:9}. Which complexity should you pick and why?
Example 22
challenge
A model's variance contributes error 0.2c and bias contributes c8 at complexity c. Total error =0.2c+c8. Find the c minimizing total error.
Example 23
easy
Model A: training accuracy 99%, test accuracy 72%. Is this best described as overfit, underfit, or well-fit?
Example 24
easy
Which curve is more likely to overfit 10 data points: a straight line or a degree-9 polynomial?
Example 25
easy
A model's R2 on training data is 1.0 but 0.2 on test data. What is the diagnosis?
Example 26
easy
True or false: a perfect 0% training error always indicates the model will generalize well.
Example 27
medium
You have 30 observations and 40 candidate predictors. Without regularization, what risk dominates?
Example 28
medium
Name two practical remedies that reduce overfitting.
Example 29
medium
A decision tree is grown until every leaf contains exactly one training example. Training error is 0. Most likely diagnosis?
Example 30
medium
True or false: adding a regularization term like λ∑βj2 typically increases training error but reduces test error when the model is overfit.
Example 31
medium
You hold out 20% of your data as a test set. Why is it dangerous to repeatedly tweak the model based on test-set performance?
Example 32
medium
Cross-validation reports a mean R2 of 0.45 across folds, but a single train/test split shows R2=0.85 on training and 0.30 on test. Which estimate of generalization is more trustworthy?
Example 33
hard
As model complexity rises with sample size n fixed, sketch what happens to training error vs. test error.
Example 34
hard
A random forest trained on 1000 examples has training error 1% and test error 3%. Is this concerning overfitting?
Example 35
hard
True or false: increasing the training-set size NEVER reduces overfitting, no matter how large.
Example 36
hard
For ridge regression β^=argmin∥y−Xβ∥2+λ∥β∥2, what happens to overfitting risk as λ→0?
Example 37
hard
You train 100 models with random feature subsets and pick the one with the best test error. Why is the reported test error optimistic?
Example 38
hard
Why does dropping out random neurons during training help prevent overfitting in neural networks?