Coefficient of Determination

Statistics
definition

Also known as: R², R-squared

Grade 9-12

View on concept map

The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. The most commonly reported measure of how well a regression model fits the data.

Definition

The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. It equals the square of the correlation coefficient: r^2.

💡 Intuition

Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.

🎯 Core Idea

r^2 close to 1 means the model explains most of the variation; close to 0 means it explains very little. But a high r^2 does NOT prove the model is correct—always check the residual plot.

Example

Correlation between study hours and test score: r = 0.9. r^2 = 0.81 Interpretation: 81\% of the variation in test scores is explained by the linear relationship with study hours.

Formula

r^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

Notation

r^2 ranges from 0 to 1. \text{SS}_{\text{total}} = total sum of squares. \text{SS}_{\text{residual}} = residual sum of squares.

🌟 Why It Matters

The most commonly reported measure of how well a regression model fits the data. It translates the abstract correlation into a concrete percentage that's easy to communicate.

Formal View

r^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2} where 0 \leq r^2 \leq 1

See Also

🚧 Common Stuck Point

Students confuse r and r^2. If r = 0.7, the model explains r^2 = 0.49 or only 49\% of variation—much less impressive than r sounds.

⚠️ Common Mistakes

  • Interpreting r^2 = 0.64 as 'the correlation is 0.64'—actually r = \pm 0.8 (check the sign from the slope).
  • Thinking a high r^2 means the linear model is appropriate—a curved relationship can have high r^2 but the linear model is still wrong.
  • Saying 'r^2 = 0.81 means 81\% of the data points fall on the line'—it means 81\% of the variation in y is accounted for by the linear model.

Frequently Asked Questions

What is Coefficient of Determination in Math?

The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. It equals the square of the correlation coefficient: r^2.

Why is Coefficient of Determination important?

The most commonly reported measure of how well a regression model fits the data. It translates the abstract correlation into a concrete percentage that's easy to communicate.

What do students usually get wrong about Coefficient of Determination?

Students confuse r and r^2. If r = 0.7, the model explains r^2 = 0.49 or only 49\% of variation—much less impressive than r sounds.

What should I learn before Coefficient of Determination?

Before studying Coefficient of Determination, you should understand: correlation, linear regression lsrl, residuals.

How Coefficient of Determination Connects to Other Ideas

To understand coefficient of determination, you should first be comfortable with correlation, linear regression lsrl and residuals. Once you have a solid grasp of coefficient of determination, you can move on to regression inference.