Line of Best Fit Formula

The line of best fit (trend line) is the straight line that best represents the overall trend in a scatter plot by minimizing the sum of squared vertical.

The Formula

y^=mx+b\hat{y} = mx + b

When to use: If you stretched a rubber band through a scatter plot to be as close to all points as possible, that's the line of best fit. It captures the overall trend.

Quick Example

Plotting study hours vs test scores. The line of best fit might be: score=5(hours)+60\text{score} = 5(\text{hours}) + 60 showing each hour adds ~5 points.

Notation

y^=b0+b1x\hat{y} = b_0 + b_1 x is the equation of the line. b1b_1 (slope) is the change in yy per unit change in xx. b0b_0 (intercept) is the predicted yy when x=0x = 0.

What This Formula Means

The line of best fit (trend line) is the straight line that best represents the overall trend in a scatter plot by minimizing the sum of squared vertical distances between the line and all data points. Its equation enables predictions for new x-values.

If you stretched a rubber band through a scatter plot to be as close to all points as possible, that's the line of best fit. It captures the overall trend.

Formal View

The line of best fit minimizes โˆ‘i=1n(yiโˆ’y^i)2=โˆ‘i=1n(yiโˆ’b0โˆ’b1xi)2\sum_{i=1}^{n}(y_i - \hat{y}_i)^2 = \sum_{i=1}^{n}(y_i - b_0 - b_1 x_i)^2, yielding y^=b0+b1x\hat{y} = b_0 + b_1 x.

Worked Examples

Example 1

medium
Mean point is (xห‰,yห‰)=(4,11)(\bar{x},\bar{y}) = (4, 11) and slope is โˆ’2-2. Write the line of best fit.

Answer

y^=โˆ’2x+19\hat{y} = -2x + 19

First step

1
Use y^=mx+b\hat{y} = mx + b and the mean point.

See the full worked solution + why-it-works coaching

SetupKey insightWhy it worksCommon pitfallConnection

Unlock answer keys One Family plan โ€” every worked solution, all subjects

Example 2

medium
Trend line for study hours xx vs. test score y^\hat{y} is y^=8x+50\hat{y} = 8x + 50. Interpret the slope in context.

Example 3

medium
A scatter plot of weight (lb) vs. age (yr) for puppies gives y^=5x+2\hat{y} = 5x + 2. Predict the weight at age 66 years and explain whether you should trust the prediction.

Common Mistakes

  • Forcing line through origin when inappropriate - The safer move is to ask "Am I studying a relationship between variables, and have I separated association from causation?" and then state the data source, denominator, or variable before interpreting the result.
  • Using when relationship isn't linear - The safer move is to ask "Am I studying a relationship between variables, and have I separated association from causation?" and then state the data source, denominator, or variable before interpreting the result.
  • Ignoring outliers' influence - The safer move is to ask "Am I studying a relationship between variables, and have I separated association from causation?" and then state the data source, denominator, or variable before interpreting the result.
  • Choosing line of best fit from a keyword alone - Keywords like relationship, association, predict are only clues; the data structure must match the concept.

Why This Formula Matters

Line of Best Fit gives students a careful language for comparing variables without jumping to a causal story. It is useful for reading scatter plots, two-way tables, regression models, and real-world claims where patterns are tempting but hidden variables may matter.

Frequently Asked Questions

What is the Line of Best Fit formula?

The line of best fit (trend line) is the straight line that best represents the overall trend in a scatter plot by minimizing the sum of squared vertical distances between the line and all data points. Its equation enables predictions for new x-values.

How do you use the Line of Best Fit formula?

If you stretched a rubber band through a scatter plot to be as close to all points as possible, that's the line of best fit. It captures the overall trend.

What do the symbols mean in the Line of Best Fit formula?

y^=b0+b1x\hat{y} = b_0 + b_1 x is the equation of the line. b1b_1 (slope) is the change in yy per unit change in xx. b0b_0 (intercept) is the predicted yy when x=0x = 0.

Why is the Line of Best Fit formula important in Statistics?

Line of Best Fit gives students a careful language for comparing variables without jumping to a causal story. It is useful for reading scatter plots, two-way tables, regression models, and real-world claims where patterns are tempting but hidden variables may matter.

What do students get wrong about Line of Best Fit?

Students often know a procedure related to line of best fit but skip the recognition step: Am I studying a relationship between variables, and have I separated association from causation? That leads to a calculation or graph that looks reasonable but answers a different question.

What should I learn before the Line of Best Fit formula?

Before studying the Line of Best Fit formula, you should understand: stat scatter plot.