Line of Best Fit Formula

The Formula

\hat{y} = mx + b

When to use: If you stretched a rubber band through a scatter plot to be as close to all points as possible, that's the line of best fit. It captures the overall trend.

Quick Example

Plotting study hours vs test scores. The line of best fit might be: \text{score} = 5(\text{hours}) + 60 showing each hour adds ~5 points.

Notation

\hat{y} = b_0 + b_1 x is the equation of the line. b_1 (slope) is the change in y per unit change in x. b_0 (intercept) is the predicted y when x = 0.

What This Formula Means

The line of best fit (trend line) is the straight line that best represents the overall trend in a scatter plot by minimizing the sum of squared vertical distances between the line and all data points. Its equation enables predictions for new x-values.

If you stretched a rubber band through a scatter plot to be as close to all points as possible, that's the line of best fit. It captures the overall trend.

Formal View

The line of best fit minimizes \sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}(y_i - b_0 - b_1 x_i)^2, yielding \hat{y} = b_0 + b_1 x.

Worked Examples

Example 1

easy
A scatter plot shows the relationship between hours studied (x) and test score (y). The data points generally trend upward. A line of best fit has equation y = 5x + 40. (a) Interpret the slope. (b) Predict the score for a student who studies 8 hours.

Solution

  1. 1
    Step 1: (a) The slope is 5, meaning for each additional hour studied, the predicted test score increases by 5 points.
  2. 2
    Step 2: (b) Substitute x = 8: y = 5(8) + 40 = 40 + 40 = 80.
  3. 3
    Step 3: The predicted score is 80. This is an interpolation if 8 hours is within the range of the data, or an extrapolation if it is outside the data range.

Answer

(a) Each additional hour of study is associated with a 5-point increase in test score. (b) Predicted score for 8 hours: 80.
The line of best fit summarises the linear relationship between two variables. The slope represents the rate of change, and the y-intercept is the predicted value when x = 0. Predictions are most reliable within the range of observed data (interpolation) and less reliable outside it (extrapolation).

Example 2

medium
Given five data points: (1,3), (2,5), (3,6), (4,8), (5,11). Estimate the line of best fit by finding the slope using the first and last points, then adjust to pass through the centroid (\bar{x}, \bar{y}).

Common Mistakes

  • Forcing line through origin when inappropriate
  • Using when relationship isn't linear
  • Ignoring outliers' influence

Why This Formula Matters

The line of best fit enables prediction and summarizes the relationship between variables with a simple equation.

Frequently Asked Questions

What is the Line of Best Fit formula?

The line of best fit (trend line) is the straight line that best represents the overall trend in a scatter plot by minimizing the sum of squared vertical distances between the line and all data points. Its equation enables predictions for new x-values.

How do you use the Line of Best Fit formula?

If you stretched a rubber band through a scatter plot to be as close to all points as possible, that's the line of best fit. It captures the overall trend.

What do the symbols mean in the Line of Best Fit formula?

\hat{y} = b_0 + b_1 x is the equation of the line. b_1 (slope) is the change in y per unit change in x. b_0 (intercept) is the predicted y when x = 0.

Why is the Line of Best Fit formula important in Statistics?

The line of best fit enables prediction and summarizes the relationship between variables with a simple equation.

What do students get wrong about Line of Best Fit?

Students draw the line of best fit by eye, often forcing it through too many points rather than balancing points above and below the line.

What should I learn before the Line of Best Fit formula?

Before studying the Line of Best Fit formula, you should understand: stat scatter plot.