Overfitting (Intuition) Math Example 2

Follow the full solution, then compare it with the other examples linked below.

Example 2

hard
A machine learning model is trained on 1000 observations with 50 predictors. Training error is near zero; test error on 200 held-out observations is very high. Diagnose the problem and suggest two remedies.

Solution

  1. 1
    Diagnosis: overfitting โ€” 50 predictors for 1000 observations is a high model complexity; model memorized training data
  2. 2
    Sign: training error โ‰ˆ 0 but high test error โ€” classic overfitting signature
  3. 3
    Remedy 1: regularization (L1/Lasso or L2/Ridge) โ€” penalizes large coefficients, shrinking less important predictors toward zero
  4. 4
    Remedy 2: reduce predictors โ€” use feature selection or domain knowledge to eliminate irrelevant predictors; also collect more training data

Answer

Overfitting: too many predictors for dataset size. Fix with regularization or predictor reduction.
The ratio of predictors to observations matters greatly. As a rule of thumb, each predictor should have at least 10-20 observations. Regularization is the standard remedy in machine learning, adding a penalty for complexity to prevent memorizing noise.

About Overfitting (Intuition)

Overfitting occurs when a model learns the noise in training data instead of just the underlying pattern, performing well on training data but poorly on new data.

Learn more about Overfitting (Intuition) โ†’

More Overfitting (Intuition) Examples