Coefficient of Determination Formula

The Formula

r^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

When to use: Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.

Quick Example

Correlation between study hours and test score: r = 0.9. r^2 = 0.81 Interpretation: 81\% of the variation in test scores is explained by the linear relationship with study hours.

Notation

r^2 ranges from 0 to 1. \text{SS}_{\text{total}} = total sum of squares. \text{SS}_{\text{residual}} = residual sum of squares.

What This Formula Means

The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. It equals the square of the correlation coefficient: r^2.

Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.

Formal View

r^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2} where 0 \leq r^2 \leq 1

Worked Examples

Example 1

medium
A regression model has SST = 500 (total variation) and SSE = 125 (unexplained variation). Calculate R^2 and interpret its meaning.

Solution

  1. 1
    R^2 = 1 - \frac{SSE}{SST} = 1 - \frac{125}{500} = 1 - 0.25 = 0.75
  2. 2
    Alternatively: R^2 = \frac{SSR}{SST} = \frac{SST - SSE}{SST} = \frac{375}{500} = 0.75
  3. 3
    Interpretation: the regression model explains 75% of the variation in y; 25% remains unexplained
  4. 4
    For the square root: r = \sqrt{0.75} \approx 0.866 (if positive association)

Answer

R^2 = 0.75. The model explains 75% of variation in y.
R^2 = 1 - SSE/SST measures the proportion of total variation in y explained by the model. SST = total variation; SSE = residual (unexplained) variation; SSR = explained variation. R^2 = r^2 for simple linear regression.

Example 2

hard
Two models predict house prices: Model 1 (size only): R^2=0.60. Model 2 (size + neighborhood + age): R^2=0.85. Explain what the increase in R^2 means and what caution should be applied with multi-variable R^2.

Common Mistakes

  • Interpreting r^2 = 0.64 as 'the correlation is 0.64'—actually r = \pm 0.8 (check the sign from the slope).
  • Thinking a high r^2 means the linear model is appropriate—a curved relationship can have high r^2 but the linear model is still wrong.
  • Saying 'r^2 = 0.81 means 81\% of the data points fall on the line'—it means 81\% of the variation in y is accounted for by the linear model.

Why This Formula Matters

The most commonly reported measure of how well a regression model fits the data. It translates the abstract correlation into a concrete percentage that's easy to communicate.

Frequently Asked Questions

What is the Coefficient of Determination formula?

The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. It equals the square of the correlation coefficient: r^2.

How do you use the Coefficient of Determination formula?

Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.

What do the symbols mean in the Coefficient of Determination formula?

r^2 ranges from 0 to 1. \text{SS}_{\text{total}} = total sum of squares. \text{SS}_{\text{residual}} = residual sum of squares.

Why is the Coefficient of Determination formula important in Math?

The most commonly reported measure of how well a regression model fits the data. It translates the abstract correlation into a concrete percentage that's easy to communicate.

What do students get wrong about Coefficient of Determination?

Students confuse r and r^2. If r = 0.7, the model explains r^2 = 0.49 or only 49\% of variation—much less impressive than r sounds.

What should I learn before the Coefficient of Determination formula?

Before studying the Coefficient of Determination formula, you should understand: correlation, linear regression lsrl, residuals.