Coefficient of Determination Formula

Coefficient of determination is the proportion of the total variation in the response variable y that is explained by the linear relationship with the.

The Formula

r2=1โˆ’SSresidualSStotal=1โˆ’โˆ‘(yiโˆ’y^i)2โˆ‘(yiโˆ’yห‰)2r^2 = 1 - \frac{\text{SS}_{\text{residual}}}{\text{SS}_{\text{total}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2}

When to use: Total variation in yy has two parts: what the regression line explains and what's left over (residual variation). If r2=0.85r^2 = 0.85, the regression line accounts for 85%85\% of why yy values differ from each other, and 15%15\% is unexplained. Think of r2r^2 as a report card for how well xx predicts yy.

Quick Example

Correlation between study hours and test score: r=0.9r = 0.9. r2=0.81r^2 = 0.81 Interpretation: 81%81\% of the variation in test scores is explained by the linear relationship with study hours.

Notation

r2r^2 ranges from 0 to 1. SStotal\text{SS}_{\text{total}} = total sum of squares. SSresidual\text{SS}_{\text{residual}} = residual sum of squares.

What This Formula Means

The proportion of the total variation in the response variable yy that is explained by the linear relationship with the explanatory variable xx. It equals the square of the correlation coefficient: r2r^2.

Total variation in yy has two parts: what the regression line explains and what's left over (residual variation). If r2=0.85r^2 = 0.85, the regression line accounts for 85%85\% of why yy values differ from each other, and 15%15\% is unexplained. Think of r2r^2 as a report card for how well xx predicts yy.

Formal View

r2=1โˆ’SSresSStot=1โˆ’โˆ‘(yiโˆ’y^i)2โˆ‘(yiโˆ’yห‰)2r^2 = 1 - \frac{\text{SS}_{\text{res}}}{\text{SS}_{\text{tot}}} = 1 - \frac{\sum(y_i - \hat{y}_i)^2}{\sum(y_i - \bar{y})^2} where 0โ‰คr2โ‰ค10 \leq r^2 \leq 1

Worked Examples

Example 1

medium
A regression model has SST=500SST = 500 (total variation) and SSE=125SSE = 125 (unexplained variation). Calculate R2R^2 and interpret its meaning.

Answer

R2=0.75R^2 = 0.75. The model explains 75% of variation in y.

First step

1
R2=1โˆ’SSESST=1โˆ’125500=1โˆ’0.25=0.75R^2 = 1 - \frac{SSE}{SST} = 1 - \frac{125}{500} = 1 - 0.25 = 0.75

See the full worked solution + why-it-works coaching

SetupKey insightWhy it worksCommon pitfallConnection

Unlock answer keys One Family plan โ€” every worked solution, all subjects

Example 2

hard
Two models predict house prices: Model 1 (size only): R2=0.60R^2=0.60. Model 2 (size + neighborhood + age): R2=0.85R^2=0.85. Explain what the increase in R2R^2 means and what caution should be applied with multi-variable R2R^2.

Example 3

medium
A model has SST=1200SST=1200 and r2=0.7r^2=0.7. Find the explained sum of squares SSRSSR and the residual SSESSE.

Common Mistakes

  • Reporting rr when the question asks for r2r^2 - square the correlation; r=0.7r=0.7 gives r2=0.49r^2=0.49, not 0.7.
  • Reading r2r^2 as causation - it measures explained variation, never that xx causes yy.
  • Letting r2r^2 go negative or above 1 - it's a proportion between 0 and 1, so any value outside that range is an error.

Why This Formula Matters

r2r^2 is the standard one-number report card for a regression's predictive usefulness, and squaring rr exposes how much weaker a 'decent' correlation really is (r=0.7r=0.7 explains only 49%). Mixing it up with rr or with causation is what leads people to overstate how much a model actually tells them. Recognizing it by "Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?" โ€” rather than by familiar numbers โ€” is what lets a student tell it apart from correlation rr and slope bb and residual variation in a mixed problem set.

Frequently Asked Questions

What is the Coefficient of Determination formula?

The proportion of the total variation in the response variable yy that is explained by the linear relationship with the explanatory variable xx. It equals the square of the correlation coefficient: r2r^2.

How do you use the Coefficient of Determination formula?

Total variation in yy has two parts: what the regression line explains and what's left over (residual variation). If r2=0.85r^2 = 0.85, the regression line accounts for 85%85\% of why yy values differ from each other, and 15%15\% is unexplained. Think of r2r^2 as a report card for how well xx predicts yy.

What do the symbols mean in the Coefficient of Determination formula?

r2r^2 ranges from 0 to 1. SStotal\text{SS}_{\text{total}} = total sum of squares. SSresidual\text{SS}_{\text{residual}} = residual sum of squares.

Why is the Coefficient of Determination formula important in Math?

r2r^2 is the standard one-number report card for a regression's predictive usefulness, and squaring rr exposes how much weaker a 'decent' correlation really is (r=0.7r=0.7 explains only 49%). Mixing it up with rr or with causation is what leads people to overstate how much a model actually tells them. Recognizing it by "Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?" โ€” rather than by familiar numbers โ€” is what lets a student tell it apart from correlation rr and slope bb and residual variation in a mixed problem set.

What do students get wrong about Coefficient of Determination?

The procedure for coefficient of determination is the easy part; the trap is reporting rr when the question asks for r2r^2. Asking "Am I reporting the fraction of yy's variation explained by the linear model (a 0-to-1 number), not the slope or the correlation's sign?" first is what keeps a correct-looking calculation from being attached to the wrong concept.

What should I learn before the Coefficient of Determination formula?

Before studying the Coefficient of Determination formula, you should understand: correlation, linear regression lsrl, residuals.