Residuals Formula

Q: What do the symbols mean in the Residuals formula?

$e_i$ is the residual for the $i$-th observation. The sum of all residuals from a LSRL is always zero: $\sum e_i = 0$.

Residuals are the difference between an observed value and its predicted value from a regression model: residual = y - y (observed minus predicted).

The Formula

e_i = y_i - \hat{y}_i

When to use: A residual is how much the model got wrong for a specific data point. Positive residual means the actual value was higher than predicted; negative means it was lower. If you plot all residuals, the pattern (or lack thereof) tells you whether the model is appropriate.

Quick Example

Regression predicts a student who studies 5 hours will score

\hat{y} = 76

. Actual score is

y = 82

\text{Residual} = 82 - 76 = +6

The model underpredicted by 6 points.

Notation

e_i

is the residual for the

i

-th observation. The sum of all residuals from a LSRL is always zero:

\sum e_i = 0

What This Formula Means

The difference between an observed value and its predicted value from a regression model: $\text{residual} = y - \hat{y}$ (observed minus predicted).

A residual is how much the model got wrong for a specific data point. Positive residual means the actual value was higher than predicted; negative means it was lower. If you plot all residuals, the pattern (or lack thereof) tells you whether the model is appropriate.

Formal View

e_i = y_i - \hat{y}_i

where

\hat{y}_i = a + bx_i

; for LSRL,

\sum_{i=1}^{n} e_i = 0

and

\sum_{i=1}^{n} x_i e_i = 0

Worked Examples

Example 1

easy

Given

\hat{y} = 2 + 3x

, and observed point

(4, 15)

: calculate the residual and interpret whether the model over- or under-predicts.

Answer

e = 15 - 14 = 1

(positive). The model under-predicts by 1 unit.

First step

Calculate predicted value:

\hat{y} = 2 + 3(4) = 2 + 12 = 14

Full solution

2
Calculate residual: $e = y - \hat{y} = 15 - 14 = 1$
3
Positive residual: actual value (15) is ABOVE the predicted value (14)
4
Interpretation: the model under-predicts by 1 unit for this observation

A residual

e = y - \hat{y}

measures the vertical distance between observed and predicted. Positive residual = point above the line (model under-predicts); negative residual = point below the line (model over-predicts). Residuals should average to zero for a good model.

Example 2

medium

Five observed and predicted values:

(y, \hat{y})

(10,8), (15,14), (12,13), (20,19), (8,11)

. Calculate all residuals, verify they sum to 0, and compute

\sum e_i^2

Example 3

medium

Data:

(1, 3), (2, 5), (3, 4), (4, 8)

. Fitted line:

\hat{y} = 1.5x + 1.5

. Compute all residuals.

Common Mistakes

Computing predicted minus observed - the standard is observed minus predicted, $y-\hat{y}$ .
Expecting nonzero residuals to sum to something meaningful - for an LSRL the residuals always sum to zero, so use squared residuals to measure total error.
Ignoring a curved pattern in the residual plot - a clear curve means a line is the wrong model, even if individual residuals are small.

Why This Formula Matters

Individual residuals tell you where the model fails, and a residual PLOT is the main diagnostic for whether a straight line was the right choice at all — a curved residual pattern is the tell that you fit the wrong model. Without residuals you'd trust a line that's secretly bending through the data. Recognizing it by "Am I taking one point's actual value minus the line's predicted value to measure its individual miss?" — rather than by familiar numbers — is what lets a student tell it apart from lsrl and $r^2$ and deviation from the mean in a mixed problem set.

Frequently Asked Questions

What is the Residuals formula?

The difference between an observed value and its predicted value from a regression model: $\text{residual} = y - \hat{y}$ (observed minus predicted).

How do you use the Residuals formula?

What do the symbols mean in the Residuals formula?

$e_i$ is the residual for the $i$ -th observation. The sum of all residuals from a LSRL is always zero: $\sum e_i = 0$ .

Why is the Residuals formula important in Math?

What do students get wrong about Residuals?

The procedure for residuals is the easy part; the trap is computing predicted minus observed. Asking "Am I taking one point's actual value minus the line's predicted value to measure its individual miss?" first is what keeps a correct-looking calculation from being attached to the wrong concept.

What should I learn before the Residuals formula?

Before studying the Residuals formula, you should understand: linear regression lsrl.

Want the Full Guide?

This formula is covered in depth in our complete guide:

Data Representation, Variability, and Sampling Guide →

← Back to Residuals See All Examples Practice Problems

Related Concepts

Least Squares Regression Line Coefficient of Determination Inference for Regression

Related Formulas

Least Squares Regression Line Formula Coefficient of Determination Formula Inference for Regression Formula