Residuals Formula

Residuals are the difference between an observed value and its predicted value from a regression model: residual = y - y (observed minus predicted).

The Formula

ei=yiโˆ’y^ie_i = y_i - \hat{y}_i

When to use: A residual is how much the model got wrong for a specific data point. Positive residual means the actual value was higher than predicted; negative means it was lower. If you plot all residuals, the pattern (or lack thereof) tells you whether the model is appropriate.

Quick Example

Regression predicts a student who studies 5 hours will score y^=76\hat{y} = 76. Actual score is y=82y = 82. Residual=82โˆ’76=+6\text{Residual} = 82 - 76 = +6 The model underpredicted by 6 points.

Notation

eie_i is the residual for the ii-th observation. The sum of all residuals from a LSRL is always zero: โˆ‘ei=0\sum e_i = 0.

What This Formula Means

The difference between an observed value and its predicted value from a regression model: residual=yโˆ’y^\text{residual} = y - \hat{y} (observed minus predicted).

A residual is how much the model got wrong for a specific data point. Positive residual means the actual value was higher than predicted; negative means it was lower. If you plot all residuals, the pattern (or lack thereof) tells you whether the model is appropriate.

Formal View

ei=yiโˆ’y^ie_i = y_i - \hat{y}_i where y^i=a+bxi\hat{y}_i = a + bx_i; for LSRL, โˆ‘i=1nei=0\sum_{i=1}^{n} e_i = 0 and โˆ‘i=1nxiei=0\sum_{i=1}^{n} x_i e_i = 0

Worked Examples

Example 1

easy
Given y^=2+3x\hat{y} = 2 + 3x, and observed point (4,15)(4, 15): calculate the residual and interpret whether the model over- or under-predicts.

Answer

e=15โˆ’14=1e = 15 - 14 = 1 (positive). The model under-predicts by 1 unit.

First step

1
Calculate predicted value: y^=2+3(4)=2+12=14\hat{y} = 2 + 3(4) = 2 + 12 = 14

Full solution

  1. 2
    Calculate residual: e=yโˆ’y^=15โˆ’14=1e = y - \hat{y} = 15 - 14 = 1
  2. 3
    Positive residual: actual value (15) is ABOVE the predicted value (14)
  3. 4
    Interpretation: the model under-predicts by 1 unit for this observation
A residual e=yโˆ’y^e = y - \hat{y} measures the vertical distance between observed and predicted. Positive residual = point above the line (model under-predicts); negative residual = point below the line (model over-predicts). Residuals should average to zero for a good model.

Example 2

medium
Five observed and predicted values: (y,y^)(y, \hat{y}): (10,8),(15,14),(12,13),(20,19),(8,11)(10,8), (15,14), (12,13), (20,19), (8,11). Calculate all residuals, verify they sum to 0, and compute โˆ‘ei2\sum e_i^2.

Example 3

medium
Data: (1,3),(2,5),(3,4),(4,8)(1, 3), (2, 5), (3, 4), (4, 8). Fitted line: y^=1.5x+1.5\hat{y} = 1.5x + 1.5. Compute all residuals.

Common Mistakes

  • Computing predicted minus observed - the standard is observed minus predicted, yโˆ’y^y-\hat{y}.
  • Expecting nonzero residuals to sum to something meaningful - for an LSRL the residuals always sum to zero, so use squared residuals to measure total error.
  • Ignoring a curved pattern in the residual plot - a clear curve means a line is the wrong model, even if individual residuals are small.

Why This Formula Matters

Individual residuals tell you where the model fails, and a residual PLOT is the main diagnostic for whether a straight line was the right choice at all โ€” a curved residual pattern is the tell that you fit the wrong model. Without residuals you'd trust a line that's secretly bending through the data. Recognizing it by "Am I taking one point's actual value minus the line's predicted value to measure its individual miss?" โ€” rather than by familiar numbers โ€” is what lets a student tell it apart from lsrl and r2r^2 and deviation from the mean in a mixed problem set.

Frequently Asked Questions

What is the Residuals formula?

The difference between an observed value and its predicted value from a regression model: residual=yโˆ’y^\text{residual} = y - \hat{y} (observed minus predicted).

How do you use the Residuals formula?

A residual is how much the model got wrong for a specific data point. Positive residual means the actual value was higher than predicted; negative means it was lower. If you plot all residuals, the pattern (or lack thereof) tells you whether the model is appropriate.

What do the symbols mean in the Residuals formula?

eie_i is the residual for the ii-th observation. The sum of all residuals from a LSRL is always zero: โˆ‘ei=0\sum e_i = 0.

Why is the Residuals formula important in Math?

Individual residuals tell you where the model fails, and a residual PLOT is the main diagnostic for whether a straight line was the right choice at all โ€” a curved residual pattern is the tell that you fit the wrong model. Without residuals you'd trust a line that's secretly bending through the data. Recognizing it by "Am I taking one point's actual value minus the line's predicted value to measure its individual miss?" โ€” rather than by familiar numbers โ€” is what lets a student tell it apart from lsrl and r2r^2 and deviation from the mean in a mixed problem set.

What do students get wrong about Residuals?

The procedure for residuals is the easy part; the trap is computing predicted minus observed. Asking "Am I taking one point's actual value minus the line's predicted value to measure its individual miss?" first is what keeps a correct-looking calculation from being attached to the wrong concept.

What should I learn before the Residuals formula?

Before studying the Residuals formula, you should understand: linear regression lsrl.

Want the Full Guide?

This formula is covered in depth in our complete guide:

Data Representation, Variability, and Sampling Guide โ†’