Coefficient of Determination Formula
The Formula
When to use: Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.
Quick Example
Notation
What This Formula Means
The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. It equals the square of the correlation coefficient: r^2.
Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.
Formal View
Worked Examples
Example 1
mediumSolution
- 1 R^2 = 1 - \frac{SSE}{SST} = 1 - \frac{125}{500} = 1 - 0.25 = 0.75
- 2 Alternatively: R^2 = \frac{SSR}{SST} = \frac{SST - SSE}{SST} = \frac{375}{500} = 0.75
- 3 Interpretation: the regression model explains 75% of the variation in y; 25% remains unexplained
- 4 For the square root: r = \sqrt{0.75} \approx 0.866 (if positive association)
Answer
Example 2
hardCommon Mistakes
- Interpreting r^2 = 0.64 as 'the correlation is 0.64'—actually r = \pm 0.8 (check the sign from the slope).
- Thinking a high r^2 means the linear model is appropriate—a curved relationship can have high r^2 but the linear model is still wrong.
- Saying 'r^2 = 0.81 means 81\% of the data points fall on the line'—it means 81\% of the variation in y is accounted for by the linear model.
Why This Formula Matters
The most commonly reported measure of how well a regression model fits the data. It translates the abstract correlation into a concrete percentage that's easy to communicate.
Frequently Asked Questions
What is the Coefficient of Determination formula?
The proportion of the total variation in the response variable y that is explained by the linear relationship with the explanatory variable x. It equals the square of the correlation coefficient: r^2.
How do you use the Coefficient of Determination formula?
Total variation in y has two parts: what the regression line explains and what's left over (residual variation). If r^2 = 0.85, the regression line accounts for 85\% of why y values differ from each other, and 15\% is unexplained. Think of r^2 as a report card for how well x predicts y.
What do the symbols mean in the Coefficient of Determination formula?
r^2 ranges from 0 to 1. \text{SS}_{\text{total}} = total sum of squares. \text{SS}_{\text{residual}} = residual sum of squares.
Why is the Coefficient of Determination formula important in Math?
The most commonly reported measure of how well a regression model fits the data. It translates the abstract correlation into a concrete percentage that's easy to communicate.
What do students get wrong about Coefficient of Determination?
Students confuse r and r^2. If r = 0.7, the model explains r^2 = 0.49 or only 49\% of variation—much less impressive than r sounds.
What should I learn before the Coefficient of Determination formula?
Before studying the Coefficient of Determination formula, you should understand: correlation, linear regression lsrl, residuals.