Least Squares Regression Line Examples in Math

Start with the recap, study the fully worked examples, then use the practice problems to check your understanding of Least Squares Regression Line.

This page combines explanation, solved examples, and follow-up practice so you can move from recognition to confident problem-solving in Math.

Concept Recap

The unique straight line \hat{y} = a + bx that minimizes the sum of squared vertical distances (residuals) between the observed data points and the line.

You have a scatter plot with points scattered around a general trend. The LSRL is the line that gets as close as possible to all the points simultaneouslyβ€”it's the 'best' straight line through the cloud. 'Best' means it minimizes the total squared prediction error.

Read the full concept explanation β†’

How to Use These Examples

  • Read the first worked example with the solution open so the structure is clear.
  • Try the practice problems before revealing each solution.
  • Use the related concepts and background knowledge badges if you feel stuck.

What to Focus On

Core idea: The slope b tells you the predicted change in y for a one-unit increase in x. The LSRL always passes through the point (\bar{x}, \bar{y}). The strength of the linear relationship is measured by r (correlation) and r^2 (coefficient of determination).

Common stuck point: The slope is NOT the correlation. The slope has units (\text{change in } y per unit x), while r is unitless and bounded between -1 and 1.

Sense of Study hint: When asked to find the least-squares regression line, first compute the means \bar{x} and \bar{y}, then the slope b = r \cdot (s_y / s_x) using the correlation and standard deviations. Finally, find the intercept a = \bar{y} - b\bar{x} and write the equation \hat{y} = a + bx. Always check that your line passes through (\bar{x}, \bar{y}).

Worked Examples

Example 1

medium
Find the least-squares regression line for: (x,y): (1,2), (2,4), (3,5), (4,4), (5,5). Use b = r \frac{s_y}{s_x} and a = \bar{y} - b\bar{x}.

Solution

  1. 1
    \bar{x} = 3, \bar{y} = 4; s_x = \sqrt{2.5} \approx 1.58; s_y = \sqrt{1.5} \approx 1.22
  2. 2
    Calculate r: \sum(x_i-\bar{x})(y_i-\bar{y}) = (-2)(-2)+(-1)(0)+(0)(1)+(1)(0)+(2)(1) = 4+0+0+0+2=6; r = \frac{6}{4 \times s_x \times s_y} = \frac{6}{4(1.58)(1.22)} = \frac{6}{7.71} \approx 0.778
  3. 3
    Slope: b = r \frac{s_y}{s_x} = 0.778 \times \frac{1.22}{1.58} \approx 0.778 \times 0.772 \approx 0.60
  4. 4
    Intercept: a = \bar{y} - b\bar{x} = 4 - 0.60(3) = 4 - 1.8 = 2.2

Answer

\hat{y} = 2.2 + 0.60x
The least-squares regression line minimizes the sum of squared residuals. The slope b represents the expected change in y for a one-unit increase in x. The line always passes through (\bar{x}, \bar{y}).

Example 2

hard
The LSRL for predicting weight (y, kg) from height (x, cm) is \hat{y} = -100 + 0.8x. Interpret the slope and intercept, predict weight for height=175 cm, and explain why extrapolating to height=50 cm is problematic.

Practice Problems

Try these problems on your own first, then open the solution to compare your method.

Example 1

easy
Given \hat{y} = 5 + 3x: (a) predict y when x=4, (b) interpret the slope, (c) does the line pass through the origin?

Example 2

hard
The LSRL has the property of minimizing \sum e_i^2 = \sum (y_i - \hat{y}_i)^2. Explain why minimizing squared residuals (rather than absolute residuals) is preferred, and name two consequences of this choice.

Background Knowledge

These ideas may be useful before you work through the harder examples.

correlationscatter plotmeanstandard deviation