Practice Overfitting (Intuition) in Math

Use these practice problems to test your method after reviewing the concept explanation and worked examples.

Quick Recap

Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.

The model memorized the training data instead of learning the underlying pattern.

Showing a random 20 of 50 problems.

Example 1

challenge
You build a stock-trading model that achieves 100%100\% accuracy on historical data from 2010–2020. In live trading during 2021 it loses money consistently. Explain what likely happened and what you would change.

Example 2

hard
For ridge regression β^=argminyXβ2+λβ2\hat\beta = \arg\min \|y - X\beta\|^2 + \lambda \|\beta\|^2, what happens to overfitting risk as λ0\lambda \to 0?

Example 3

hard
Explain the bias-variance tradeoff: how does increasing model complexity affect bias and variance, and where is the optimal model?

Example 4

medium
More training data tends to reduce overfitting. Why?

Example 5

medium
In the bias-variance decomposition, overfitting corresponds primarily to high ___.

Example 6

medium
A decision tree is grown until every leaf contains exactly one training example. Training error is 00. Most likely diagnosis?

Example 7

medium
A model memorizes the training set, scoring perfectly there but failing on a held-out set. What single word names this?

Example 8

hard
Why does dropping out random neurons during training help prevent overfitting in neural networks?

Example 9

hard
A study reports an R2=0.99R^2 = 0.99 for a model with 50 predictors fit on 60 observations. The author claims high explanatory power. Critique this claim.

Example 10

medium
A spam filter is tuned until it flags every training email perfectly, but it misses new spam. What happened?

Example 11

hard
True or false: increasing the training-set size NEVER reduces overfitting, no matter how large.

Example 12

easy
Overfitting means the model has learned the ___ instead of the underlying pattern.

Example 13

challenge
A polynomial of degree dd is fit to 8 points. At what degree can it pass through all points exactly (interpolate), risking maximal overfit?

Example 14

easy
True or false: a perfect 0%0\% training error always indicates the model will generalize well.

Example 15

medium
True or false: adding a regularization term like λβj2\lambda \sum \beta_j^2 typically increases training error but reduces test error when the model is overfit.

Example 16

easy
Which model is more likely overfit: a straight line, or a degree-15 polynomial through 16 points?

Example 17

challenge
Model train errors by complexity: {c1:10,c2:4,c3:1}\{c1:10, c2:4, c3:1\}; test errors: {c1:11,c2:5,c3:9}\{c1:11, c2:5, c3:9\}. Which complexity should you pick and why?

Example 18

hard
A random forest trained on 1000 examples has training error 1%1\% and test error 3%3\%. Is this concerning overfitting?

Example 19

medium
Cross-validation reports a mean R2R^2 of 0.450.45 across folds, but a single train/test split shows R2=0.85R^2 = 0.85 on training and 0.300.30 on test. Which estimate of generalization is more trustworthy?

Example 20

medium
A model fits 10 data points with a degree-9 polynomial (perfect fit, R2=1R^2=1). A simpler linear model has R2=0.85R^2=0.85. Explain which model is better for prediction and why.