Prediction Math Example 2

Follow the full solution, then compare it with the other examples linked below.

Example 2

hard
A model predicts house prices. In-sample R2=0.92R^2 = 0.92, but out-of-sample R2=0.45R^2 = 0.45. Explain what this means and identify the problem with the model.

Solution

  1. 1
    In-sample R2=0.92R^2 = 0.92: model explains 92% of price variation for the training data โ€” appears excellent
  2. 2
    Out-of-sample R2=0.45R^2 = 0.45: model explains only 45% of variation for new data โ€” poor generalization
  3. 3
    Problem: overfitting โ€” the model learned the specific training data's quirks/noise rather than the true underlying pattern
  4. 4
    An overfit model performs well on training data but poorly on new predictions โ€” useless for actual prediction purposes

Answer

Overfitting: model memorized training data (Rยฒ=0.92) but fails on new data (Rยฒ=0.45).
Prediction quality must be evaluated on out-of-sample (held-out) data. Excellent in-sample performance with poor out-of-sample performance is the hallmark of overfitting. The true measure of a predictive model is how well it predicts new, unseen observations.

About Prediction

A prediction is a model-based estimate of an unknown or future value, accompanied by a measure of confidence or uncertainty.

Learn more about Prediction โ†’

More Prediction Examples